Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I'm developing a tool that needs to download a web page from 3rd party server, execute it as a browser would and then parse the HTML. What I struggle with is that the tool need to parse the HTML after all javascript is executed and DOM is modified. I'm trying to use PhantomJS for this purpose and it works on small snippets of code (just a tiny html document with external javascript that adds some nodes to DOM) but when I do the same with a real site (http://www.dba.dk/) I'm not getting the final HTML after all modifications done by the js code.

I really need help on this as I have been stuck with it for more than a week.

My PhantomJS code is simple:

if (phantom.state.length === 0) {
     if (phantom.args.length === 0) {
             console.log('Usage: test.js <some URL>');
             phantom.exit();
     } else {
             var address = phantom.args[0];
             phantom.state = Date.now().toString();
             phantom.viewportSize = { width: 1280, height: 800 };
             phantom.open(address);
     }
} else {
     var elapsed = Date.now() - new Date().setTime(phantom.state);
     if (phantom.loadStatus === 'success') {
             if (!first_time) {
                     var first_time = true;
                     if (!document.addEventListener) {
                             console.log('Not SUPPORTED!');
                     }
                     phantom.render('result.png');
                     var markup = document.documentElement.innerHTML;
                     console.log(markup);
                     phantom.exit();
             }
     } else {
             console.log('FAIL to load the address');
             phantom.exit();
     }
}

the HTML dumped to the console doesn't contain content generated dynamically

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
166 views
Welcome To Ask or Share your Answers For Others

1 Answer

The problem was in the Flash plugin. The pages were detecting its absense. Once it was loaded correctly the problem was gone


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share

548k questions

547k answers

4 comments

86.3k users

...