Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I have a new project I am working on that involves fetching a webpage, (using PHP and cURL) parsing the HTML and javascript out of it and then handling the data in the results.

Basically I hit a brick wall when the site uses javascript to fetch its data by AJAX. In this case, the initial data will not appear in the fetched page unless the javascript is run in a browser.

Are there any PHP libraries for this? (I suspect not, but I could be wrong.)

I would really rather build this as a server-based solution, otherwise I am forced to have to build an application for this and using mozilla and/or IE runtime libraries - which kind of defeats the purpose.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
390 views
Welcome To Ask or Share your Answers For Others

1 Answer

You will need:

  • one JavaScript interpreter
  • one DOM Level 2 Core and HTML implementation
  • 500g of non-standard but commonly-used DOM extensions
  • a pinch of DOM Level 2 Style (which might mean also a CSS interpreter and layout engine)
  • yoghurt pots, round-ended scissors and sticky-back plastic

Once you have assembled your components (remember to get a grown-up to help you with the sandboxing), you'll find what you have is essentially indistinguishable from a web browser.

JAVA is not part of the shell build on the server. V8/SquirrelFish is C++ code I would need to convert to PHP.

Porting a JS engine to PHP would be a huge task, and the resulting performance likely horrible. You can't even really get away with a nearly-solution on JavaScript any more, since so many pages are using hideously complex libraries like jQuery to do everything, which will require in-depth JS support.

I don't think you're going to be able to do this purely in PHP. You'll have to hook up Java/Rhino/HTMLUnit or a proper web browser like Mozilla. If your hosting environment doesn't give you the flexibility you need to compile and deploy that sort of thing, you'd have to move to a better hosting setup with a shell (preferably VPS).

If you can avoid this unpleasantness some other way, by special-casing known pages' AJAX access, do that.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...