Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

i'm scraping some information from mobile_comparison_website. but it's content are looking dynamic. I'm trying to scraping the dynamic content using selenium but its also does not given me expected output.

from bs4 import BeautifulSoup as bs
from selenium import webdriver
path = r'C:\Users\Goku\Downloads\Compressed\chromedriver'

driver = webdriver.Chrome(path)

driver.get('https://versus.com/en')

res = driver.execute_script("return document.documentElement.outerHTML")

soup = bs(res, 'lxml')
box = soup.find('div', {'class':'CarouList__carouList___2WspW 
CarouList__isLandingPage___rPe4J'})

print(box)

for example - i want to scrape all images inside the div and name

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
123 views
Welcome To Ask or Share your Answers For Others

1 Answer

You can find data within the html source code under the <script> tag. find that text, manipulate the string into a valid json format, then use json.loads() to read that in. Then you can have a look around that structure and pull out what you want. The url of the images are found there:

import requests
from bs4 import BeautifulSoup as soup
import json

my_url = 'https://versus.com/en'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36'}

# opening up connection, grabbing the page
response = requests.get(my_url, headers=headers)

#html parsing
page_soup = soup(response.text, "html.parser")

scripts = page_soup.find_all('script')
for script in scripts:
   if 'window.__data=' in script.text:
       jsonStr = script.text
       jsonStr = jsonStr.split('window.__data=')[-1]

       jsonData = json.loads(jsonStr)

phones = jsonData['landing']['trendings']['phone']['list']
for each in phones:
    root_url = 'https://versus.dadi.network'
    popImage = root_url + each['popImage']
    rivalImage = root_url + each['rivalImage']

    print ('%s
%s' %(popImage, rivalImage))

Output:

https://versus.dadi.network/samsung-galaxy-a9-2018/front/front-1539337417084.variety.jpg
https://versus.dadi.network/samsung-galaxy-a50/front/front-1551183669492.variety.jpg
https://versus.dadi.network/samsung-galaxy-s10-plus/front/front-1550699605210.variety.jpg
https://versus.dadi.network/apple-iphone-xs-max/front/front-1536781345067.variety.jpg
https://versus.dadi.network/samsung-galaxy-a50/front/front-1551183669492.variety.jpg
https://versus.dadi.network/huawei-p30-lite/front/front-1555000229505.variety.jpg
https://versus.dadi.network/xiaomi-redmi-note-7/front/front-1550507767671.variety.jpg
https://versus.dadi.network/xiaomi-mi-8-lite/front/front-1537824165879.variety.jpg
https://versus.dadi.network/samsung-galaxy-s8/front/front-1490950798404.variety.jpg
https://versus.dadi.network/samsung-galaxy-a50/front/front-1551183669492.variety.jpg
https://versus.dadi.network/huawei-p20-lite/front/front-1521538430205.variety.jpg
https://versus.dadi.network/huawei-p-smart-2019/front/front-1547733931933.variety.jpg
https://versus.dadi.network/samsung-galaxy-a50/front/front-1551183669492.variety.jpg
https://versus.dadi.network/samsung-galaxy-a30/front/front-1551187893794.variety.jpg
https://versus.dadi.network/samsung-galaxy-m20/front/front-1550059143173.variety.jpg
https://versus.dadi.network/samsung-galaxy-a30/front/front-1551187893794.variety.jpg
https://versus.dadi.network/oneplus-6t/front/front-1540985964061.variety.jpg
https://versus.dadi.network/google-pixel-3/front/front-1539114763774.variety.jpg
https://versus.dadi.network/samsung-galaxy-a40/front/front-1555086727000.variety.jpg
https://versus.dadi.network/huawei-p20-lite/front/front-1521538430205.variety.jpg

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...