I've come across many tutorials explaining how to scrape public websites that don't require authentication/login, using node.js.
Can somebody explain how to scrape sites that require login using node.js?
See Question&Answers more detail:osI've come across many tutorials explaining how to scrape public websites that don't require authentication/login, using node.js.
Can somebody explain how to scrape sites that require login using node.js?
See Question&Answers more detail:osUse Mikeal's Request library, you need to enable cookies support like this:
var request = request.defaults({jar: true})
So you first should create a username on that site (manually) and pass the username and the password as params when making the POST request to that site. After that the server will respond with a cookie which Request will remember, so you will be able to access the pages that require you to be logged into that site.
Note: this approach doesn't work if something like reCaptcha is used on the login page.