Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I am trying to scrap a movie website here: http://www.21cineplex.com/nowplaying

I have uploaded the screenshot with the HTML body as the image in this questions.link to screenshot here I am having difficulty trying to grab the movie title and the description which is part of the <P> tag. For some strange reason, the description is not part of requests object. Also when I tried to use soup to find the ul and class name it cannot be found. Anyone know why? I am using python 3. This is my code so far:

    r = requests.get('http://www.21cineplex.com/nowplaying')
    r.text (no description here)
    soup = bs4.BeautifulSoup(r.text)
    soup.find('ul', class_='w462') # why is this empty?
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
184 views
Welcome To Ask or Share your Answers For Others

1 Answer

This server is checking Referer header. If there is no Referer it sends main page. But it doesn't check text in this header so it can be even empty string.

import requests
import bs4

headers = {
    #'Referer': any url (or even random text, or empty string)

    #'Referer': 'http://google.com',
    #'Referer': 'http://www.21cineplex.com',
    #'Referer': 'hello world!',
    'Referer': '',
}

s = requests.get('http://www.21cineplex.com/nowplaying', headers=headers)
soup = bs4.BeautifulSoup(s.text)

for x in soup.find_all('ul', class_='w462'):
    print(x.text)

for x in soup.select('ul.w462'):
    print(x.text)

for x in soup.select('ul.w462'):
    print(x.select('a')[0].text)
    print(x.select('p')[0].text)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share

548k questions

547k answers

4 comments

86.3k users

...