Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I need to store intermediate data. So, in spider, at parse method i create variable, that stores it.

text_from_pdf = pdf_to_text(response.body)

Now i need to get access to this variable in pipeline.py How can i do it? I tried this

def open_spider(self, spider):
    self.file = open('items.txt', 'w')

def close_spider(self, spider):
    self.file.close()

def process_item(self, item, spider):
    if spider.text_from_pdf:
        line = json.dumps(**spider.text_from_pdf**) + "
"
        self.file.write(line)
        return item

But it doesn't work for me with attribute error.

question from:https://stackoverflow.com/questions/65884897/get-variables-from-spider-in-pipelines-py

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
1.1k views
Welcome To Ask or Share your Answers For Others

1 Answer

Add the data to the item.

You can use the pipeline to remove the data from the item if the final item should not include id.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...