web crawler - How to give URL to scrapy for crawling?

Question

Welcome To Ask or Share your Answers For Others

web crawler - How to give URL to scrapy for crawling?

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

web crawler - How to give URL to scrapy for crawling?

I want to use scrapy for crawling web pages. Is there a way to pass the start URL from the terminal itself?

It is given in the documentation that either the name of the spider or the URL can be given, but when i given the url it throws an error:

//name of my spider is example, but i am giving url instead of my spider name(It works fine if i give spider name).

scrapy crawl example.com

ERROR:

File "/usr/local/lib/python2.7/dist-packages/Scrapy-0.14.1-py2.7.egg/scrapy/spidermanager.py", line 43, in create raise KeyError("Spider not found: %s" % spider_name) KeyError: 'Spider not found: example.com'

How can i make scrapy to use my spider on the url given in the terminal??

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T00:10:11+0000

I'm not really sure about the commandline option. However, you could write your spider like this.

class MySpider(BaseSpider):

    name = 'my_spider'    

    def __init__(self, *args, **kwargs): 
      super(MySpider, self).__init__(*args, **kwargs) 

      self.start_urls = [kwargs.get('start_url')]

And start it like: scrapy crawl my_spider -a start_url="http://some_url"

Categories

web crawler - How to give URL to scrapy for crawling?

web crawler - How to give URL to scrapy for crawling?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags