Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

Writing a class crawler in python, I got stuck on the half-way. I can't find any idea how to pass the newly produced links [generated by app_crawler class] to the "App" class so that I can do the rest over there. If anyone points me into the right direction by showing how can I run it, I would be very helpful. Thanks in advance. Btw, it is also running but only for a single link.

from lxml import html
import requests

class app_crawler:

    starturl = "https://itunes.apple.com/us/app/candy-crush-saga/id553834731?mt=8"

    def crawler(self):
        self.get_app(self.starturl)


    def get_app(self, link):
        page = requests.get(link)
        tree = html.fromstring(page.text)
        links = tree.xpath('//div[@class="lockup-info"]//*/a[@class="name"]/@href')
        for link in links:
            return link # I wish to make this link penetrate through the App class but can't get any idea


class App(app_crawler):

    def __init__(self, link):
        self.links = [link]

    def process_links(self):
        for link in self.links:
            self.get_item(link)

    def get_item(self, url):
        page = requests.get(url)
        tree = html.fromstring(page.text)
        name = tree.xpath('//h1[@itemprop="name"]/text()')[0]
        developer = tree.xpath('//div[@class="left"]/h2/text()')[0]        
        price = tree.xpath('//div[@itemprop="price"]/text()')[0]
        print(name, developer, price)

if __name__ == '__main__':

    parse = App(app_crawler.starturl)
    parse.crawler()
    parse.process_links()

I've created another one which is working fine but I wanted to make the above crawler to get a different look. Here is the link for the working one: "https://www.dropbox.com/s/galjorcdynueequ/Working%20one.txt?dl=0"

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
149 views
Welcome To Ask or Share your Answers For Others

1 Answer

There are several issues with your code:

  • App inherits from app_crawler yet you provide an app_crawler instance to App.__init__.

  • App.__init__ calls app_crawler.__init__ instead of super().__init__().

  • Not only app_crawler.get_app doesn't actually return anything, it creates a brand new App object.

This results in your code passing an app_crawler object to requests.get instead of a url string.

You have too much encapsulation in your code.

Consider the following code that is shorter than your not-working code, cleaner and without needing to needlessly pass objects around:

from lxml import html
import requests

class App:
    def __init__(self, starturl):
        self.starturl = starturl
        self.links = []

    def get_links(self):
        page = requests.get(self.starturl)
        tree = html.fromstring(page.text)
        self.links = tree.xpath('//div[@class="lockup-info"]//*/a[@class="name"]/@href')

    def process_links(self):
        for link in self.links:
            self.get_docs(link)

    def get_docs(self, url):
        page = requests.get(url)
        tree = html.fromstring(page.text)
        name = tree.xpath('//h1[@itemprop="name"]/text()')[0]
        developper = tree.xpath('//div[@class="left"]/h2/text()')[0]
        price = tree.xpath('//div[@itemprop="price"]/text()')[0]
        print(name, developper, price)

if __name__ == '__main__':
    parse = App("https://itunes.apple.com/us/app/candy-crush-saga/id553834731?mt=8")
    parse.get_links()
    parse.process_links()

outputs

Cookie Jam By Jam City, Inc. Free
Zombie Tsunami By Mobigame Free
Flow Free By Big Duck Games LLC Free
Bejeweled Blitz By PopCap Free
Juice Jam By Jam City, Inc. Free
Candy Crush Soda Saga By King Free
Bubble Witch 3 Saga By King Free
Candy Crush Jelly Saga By King Free
Farm Heroes Saga By King Free
Pet Rescue Saga By King Free

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...