python - How do I fix scrapy Unsupported URL scheme error?

Question

Ask a Question

Welcome To Ask or Share your Answers For Others

python - How do I fix scrapy Unsupported URL scheme error?

asked Jan 31, 2022 in Technique[技术] by 深蓝 (71.8m points)

I collect url from command python and then insert it into start_urls

from flask import Flask, jsonify, request
import scrapy
import subprocess

class ClassSpider(scrapy.Spider):
    name        = 'mySpider'
    #start_urls = []
    #pages      = 0
    news        = []

    def __init__(self, url, nbrPage):
        self.pages      = nbrPage
        self.start_urls = []
        self.start_urlsappend(url)

    def parse(self):
        ...

    def run(self):
        subprocess.check_output(['scrapy', 'crawl', 'mySpider', '-a', f'url={self.start_urls}', '-a', f'nbrPage={self.pages}'])
        return self.news

app = Flask(__name__)
data = []

@app.route('/', methods=['POST'])
def getNews():
    mySpiderClass = ClassSpider(request.json['url'], 2)
    return jsonify({'data': mySpider.run()})

if __name__ == "__main__":
    app.run(debug=True)

I got this error: raise not supported("unsupported url scheme %s: %s" % scrapy.exceptions.NotSupported: Unsupported URL scheme '': no handler available for that scheme

When I put a print('my urls List: ' + str(self.start_urls)), it prints a list of url like --> my urls List: ['www.googole.com']

Any help plz

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

320 views

1 Answer

深蓝 · Answer 1 · 2022-01-31T07:16:06+0000

I guess this happens because you first append url to self.start_urls and then you call ClassSpiders run method with your list self.start_urls which in turn appends the list to a list and you end up with a nested list instead of a list of strings.
To avoid this you should maybe change your __init__ method like this:

    def __init__(self, url, nbrPage):
        self.pages      = nbrPage
        self.url        = url
        self.start_urls = []
        self.start_urls.append(url)

And then pass self.url instead of self.start_urls in run:

    def run(self):
        subprocess.check_output(['scrapy', 'crawl', 'mySpider', '-a', f'url={self.url}', '-a', f'nbrPage={self.pages}'])
        return self.news

Categories

python - How do I fix scrapy Unsupported URL scheme error?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags