Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I've come across many tutorials explaining how to scrape public websites that don't require authentication/login, using node.js.

Can somebody explain how to scrape sites that require login using node.js?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
226 views
Welcome To Ask or Share your Answers For Others

1 Answer

Use Mikeal's Request library, you need to enable cookies support like this:

var request = request.defaults({jar: true})

So you first should create a username on that site (manually) and pass the username and the password as params when making the POST request to that site. After that the server will respond with a cookie which Request will remember, so you will be able to access the pages that require you to be logged into that site.

Note: this approach doesn't work if something like reCaptcha is used on the login page.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...