Не могу перезапустить spider (scrapy) для регулярного парсинга веб-сайта
Мой код:
def start_parce():
process = CrawlerProcess(get_project_settings())
process.crawl(SpiderCardSpider)
settings = process.settings
settings.set("FEEDS", {
"items.json": {"format": "json"},
})
process.settings = settings
process.start()
process.stop()
def start_parce_time():
while True:
start_parce()
time.sleep(60)
os.remove('results.json')
Получаю следующую ошибку:
2023-11-24 18:24:03 [scrapy.core.engine] INFO: Spider opened
2023-11-24 18:24:03 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2023-11-24 18:24:03 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
Traceback (most recent call last):
File "D:\fa-treid\card_parc\card_parc\spiders\spider_card.py", line 59, in <module>
start_parce_time()
File "D:\fa-treid\card_parc\card_parc\spiders\spider_card.py", line 52, in start_parce_time
start_parce()
File "D:\fa-treid\card_parc\card_parc\spiders\spider_card.py", line 44, in start_parce
process.start()
File "D:\fa-treid\venv\Lib\site-packages\scrapy\crawler.py", line 427, in start
reactor.run(installSignalHandlers=False) # blocking call
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\fa-treid\venv\Lib\site-packages\twisted\internet\asyncioreactor.py", line 254, in run
self.startRunning(installSignalHandlers=installSignalHandlers)
File "D:\fa-treid\venv\Lib\site-packages\twisted\internet\base.py", line 1299, in startRunning
ReactorBase.startRunning(cast(ReactorBase, self))
File "D:\fa-treid\venv\Lib\site-packages\twisted\internet\base.py", line 843, in startRunning
raise error.ReactorNotRestartable()
twisted.internet.error.ReactorNotRestartable
Process finished with exit code 1
Ответы (1 шт):
Автор решения: Danil Pet
→ Ссылка
Решил проблему вызовом команды из терминала:
def run_script_per():
command = 'cd .. && cd .. && cd card_parc && scrapy crawl spider_card -O
results.json'
subprocess.run(command, shell=True)
Правда я не думаю, что это хорошее решение.