Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I'm performing some data analysis for my own knowledge from nhl spread/betting odds information. I'm able to pull some information, but Not the entire data set. I want to pull the list of games and the associated into a panda dataframe, but I have been able to perform the proper loop around the html tags. I've tried the findAll option and the xpath route. I'm not successful with either.

from bs4 import BeautifulSoup
import requests

page_link = 'https://www.thespread.com/nhl-hockey-public-betting-chart'

page_response = requests.get(page_link, timeout=5)

# here, we fetch the content from the url, using the requests library
page_content = BeautifulSoup(page_response.content, "html.parser")


# Take out the <div> of name and get its value
name_box = page_content.find('div', attrs={'class': 'datarow'})
name = name_box.text.strip()

print (name)
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
161 views
Welcome To Ask or Share your Answers For Others

1 Answer

This script goes through each datarow and pulls out each item individually and then appends them into a pandas DataFrame.

from bs4 import BeautifulSoup
import requests
import pandas as pd

page_link = 'https://www.thespread.com/nhl-hockey-public-betting-chart'

page_response = requests.get(page_link, timeout=5)

# here, we fetch the content from the url, using the requests library
page_content = BeautifulSoup(page_response.content, "html.parser")


# Take out the <div> of name and get its value
tables = page_content.find_all('div', class_='datarow')

# Iterate through rows
rows = []

# Iterate through each datarow and pull out each home/away separately
for table in tables:
    # Get time and date
    time_and_date_tag = table.find_all('div', attrs={"class": "time"})[0].contents
    date = time_and_date_tag[1]
    time = time_and_date_tag[-1]
    # Get teams
    teams_tag = table.find_all('div', attrs={"class": "datacell teams"})[0].contents[-1].contents
    home_team = teams_tag[1].text
    away_team = teams_tag[-1].text
    # Get opening
    opening_tag = table.find_all('div', attrs={"class": "child-open"})[0].contents
    home_open_value = opening_tag[1]
    away_open_value = opening_tag[-1]
    # Get current
    current_tag = table.find_all('div', attrs={"class": "child-current"})[0].contents
    home_current_value = current_tag[1]
    away_current_value = current_tag[-1]
    # Create list
    rows.append([time, date, home_team, away_team,
                 home_open_value, away_open_value,
                 home_current_value, away_current_value])

columns = ['time', 'date', 'home_team', 'away_team',
           'home_open', 'away_open',
           'home_current', 'away_current']

print(pd.DataFrame(rows, columns=columns))

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...