How do I pull multiple values from html page using python?

Question

Welcome To Ask or Share your Answers For Others

How do I pull multiple values from html page using python?

asked Jan 31, 2022 in Technique[技术] by 深蓝 (71.8m points)

I'm performing some data analysis for my own knowledge from nhl spread/betting odds information. I'm able to pull some information, but Not the entire data set. I want to pull the list of games and the associated into a panda dataframe, but I have been able to perform the proper loop around the html tags. I've tried the findAll option and the xpath route. I'm not successful with either.

from bs4 import BeautifulSoup
import requests

page_link = 'https://www.thespread.com/nhl-hockey-public-betting-chart'

page_response = requests.get(page_link, timeout=5)

# here, we fetch the content from the url, using the requests library
page_content = BeautifulSoup(page_response.content, "html.parser")


# Take out the <div> of name and get its value
name_box = page_content.find('div', attrs={'class': 'datarow'})
name = name_box.text.strip()

print (name)

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

161 views

1 Answer

深蓝 · Answer 1 · 2022-01-31T07:15:54+0000

This script goes through each datarow and pulls out each item individually and then appends them into a pandas DataFrame.

from bs4 import BeautifulSoup
import requests
import pandas as pd

page_link = 'https://www.thespread.com/nhl-hockey-public-betting-chart'

page_response = requests.get(page_link, timeout=5)

# here, we fetch the content from the url, using the requests library
page_content = BeautifulSoup(page_response.content, "html.parser")


# Take out the <div> of name and get its value
tables = page_content.find_all('div', class_='datarow')

# Iterate through rows
rows = []

# Iterate through each datarow and pull out each home/away separately
for table in tables:
    # Get time and date
    time_and_date_tag = table.find_all('div', attrs={"class": "time"})[0].contents
    date = time_and_date_tag[1]
    time = time_and_date_tag[-1]
    # Get teams
    teams_tag = table.find_all('div', attrs={"class": "datacell teams"})[0].contents[-1].contents
    home_team = teams_tag[1].text
    away_team = teams_tag[-1].text
    # Get opening
    opening_tag = table.find_all('div', attrs={"class": "child-open"})[0].contents
    home_open_value = opening_tag[1]
    away_open_value = opening_tag[-1]
    # Get current
    current_tag = table.find_all('div', attrs={"class": "child-current"})[0].contents
    home_current_value = current_tag[1]
    away_current_value = current_tag[-1]
    # Create list
    rows.append([time, date, home_team, away_team,
                 home_open_value, away_open_value,
                 home_current_value, away_current_value])

columns = ['time', 'date', 'home_team', 'away_team',
           'home_open', 'away_open',
           'home_current', 'away_current']

print(pd.DataFrame(rows, columns=columns))

Categories

How do I pull multiple values from html page using python?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags