python - Unable to scrape this movie website using BeautifulSoup

Question

Ask a Question

Welcome To Ask or Share your Answers For Others

python - Unable to scrape this movie website using BeautifulSoup

asked Jan 31, 2022 in Technique[技术] by 深蓝 (71.8m points)

I am trying to scrap a movie website here: http://www.21cineplex.com/nowplaying

I have uploaded the screenshot with the HTML body as the image in this questions.link to screenshot here I am having difficulty trying to grab the movie title and the description which is part of the <P> tag. For some strange reason, the description is not part of requests object. Also when I tried to use soup to find the ul and class name it cannot be found. Anyone know why? I am using python 3. This is my code so far:

    r = requests.get('http://www.21cineplex.com/nowplaying')
    r.text (no description here)
    soup = bs4.BeautifulSoup(r.text)
    soup.find('ul', class_='w462') # why is this empty?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

184 views

1 Answer

深蓝 · Answer 1 · 2022-01-31T07:13:32+0000

This server is checking Referer header. If there is no Referer it sends main page. But it doesn't check text in this header so it can be even empty string.

import requests
import bs4

headers = {
    #'Referer': any url (or even random text, or empty string)

    #'Referer': 'http://google.com',
    #'Referer': 'http://www.21cineplex.com',
    #'Referer': 'hello world!',
    'Referer': '',
}

s = requests.get('http://www.21cineplex.com/nowplaying', headers=headers)
soup = bs4.BeautifulSoup(s.text)

for x in soup.find_all('ul', class_='w462'):
    print(x.text)

for x in soup.select('ul.w462'):
    print(x.text)

for x in soup.select('ul.w462'):
    print(x.select('a')[0].text)
    print(x.select('p')[0].text)

Categories

python - Unable to scrape this movie website using BeautifulSoup

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags