Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

So, I need to write a python webscraper to collect data from this website: https://online.portalberni.ca/WebApps/PIP/Pages/Search.aspx?templateName=permit%20reporting

As you can see, it does not appear possible to input text into the date field manually. This is what I would normally do when writing a script for pages like this. The script will run daily on a headless ubuntu server. I need to be able to select a date range for the 7 days leading up to the day the script runs, which again, normally would be easy by inputting text, but I don't think that is an option here. Any idea on how to do this with a javascript element like this?

question from:https://stackoverflow.com/questions/65660950/writing-a-webscraper-for-a-page-with-javascript-elements

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
108 views
Welcome To Ask or Share your Answers For Others

1 Answer

This got me to the next page (where there is another form to do something similar):

from requests import Session
from bs4 import BeautifulSoup as Bs

s = Session() # Keeps things stored in for future use

# If you look at the HTML, this is the "action" of the form (in this case happens to be the same as the form itself, not always true)
form_url = "https://online.portalberni.ca/WebApps/PIP/Pages/Search.aspx?templateName=permit%20reporting"

# Gets the HTML of the form
r = s.get(form_url)
html = Bs(r.text, "lxml")
form = html.find("form")

# Finds hidden inputs in the form that are necessary for a successful POST
hidden = form.find_all("input", {"type": "hidden"})
data = {i["name"]: i["value"] for i in hidden}

"""
There is javascript code that changes the form data before submission (onsubmit in the
form). I found this by using developer tools in chrome to see what the POST data actually
was, not by analyzing the javascript
"""
data["ctl00$FeaturedContent$ToolkitScriptManager1"] = "ctl00$FeaturedContent$updpnl_search|ctl00$FeaturedContent$btn_ViewReport"
data["__EVENTTARGET"] = ""
data["__EVENTARGUMENT"] = ""
data["__ASYNCPOST"] = "true"
data["ctl00$FeaturedContent$btn_ViewReport"] = "Search"

# Change to your date range
data["ctl00$FeaturedContent$txt_FromDate"] = "01/01/2021"
data["ctl00$FeaturedContent$txt_ToDate"] = "01/10/2021"

# Submits the form
headers = {
    "Content-type": "application/x-www-form-urlencoded; charset=UTF-8",
    "Referer": "https://online.portalberni.ca/WebApps/PIP/Pages/Search.aspx?templateName=permit%20reporting",
    "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.84 Safari/537.36",
}
s.post(form_url, data=data, headers=headers)

# The page with the results you're looking for
results_url = "https://online.portalberni.ca/WebApps/PIP/Pages/PropBasedReportSelection.aspx?templateName=permit%20reporting"
r = s.get(results_url)

It might be possible to skip this form, and only do the second page form, but I didn't try. This should at least get you on the right track.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...