I'm using subprocess to run several scrapy spiders in parallel like this:
processes = [Popen(['scrapy', 'crawl', 'myspider', '-a', 'custom_argument={}'.format(argument)])
for argument in custom_arguments]
while processes:
for p in processes[:]:
if p.poll() is not None:
processes.remove(p)
To enter custom arguments into each spider via subprocess my spider starts like this:
class myspider(scrapy.Spider):
name = 'myspider'
def __init__(self, custom_argument=None, *args, **kwargs):
super(myspider, self).__init__(*args, **kwargs)
...
def start_requests(self):
...
This seems to work fine except the settings I chose in settings.py
get overridden:
2021-01-06 16:57:16 [scrapy.crawler] INFO: Overridden settings: {'AUTOTHROTTLE_ENABLED': 'True', 'AUTOTHROTTLE_START_DELAY': '0.5', 'BOT_NAME': 'openrent', 'COOKIES_ENABLED': False, 'NEWSPIDER_MODULE': 'openrent.spiders', 'SPIDER_MODULES': ['openrent.spiders'], 'USER_AGENT': 'Safari/537.36'}
How do I stop original settings getting overridden like this?
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…