Scrapy 芹菜

问题描述 投票:0回答:1

如何在我的任务中以 2 分钟的间隔按顺序(一个接一个)运行蜘蛛?

信号.py

@receiver(post_save, sender=ParseCategoryUrl)
def start_parse_from_category_url(sender, created, instance, **kwargs):
    """
    The start_parse_from_category_url function is a signal receiver that
    runs the run_spider_to_pars_ads_list task
    when an instance of CategoryUrl model is created.
    """
    if instance.url:
        group(
            run_spider_to_pars_ads_list.s(url=instance.url, user=instance.user.id),
        ).apply_async()

任务.py

@app.task(name="run_spider_to_pars_ads_list")
def run_spider_to_pars_ads_list(
    url: str, user: int, pages: Optional[int] = None
) -> None:
    process.crawl(ListitemsSpider, url=url, user=user, pages=pages)
    d = process.join()
    d.addBoth(lambda _: reactor.stop())
    reactor.run()
python scrapy celery python-asyncio django-celery
1个回答
0
投票

使用

Celery
中的 chain。不要忘记在函数之间添加延迟。您需要添加一个单独的任务,该任务在所需的时间内不执行任何错误睡眠操作:

from celery import chain
import time

@app.task(name="delay_task")
def delay_task():
    time.sleep(120) #Adjust the delay as you need it. In this case it is 120 seconds or 2 minutes.
if instance.url:
    chain(
        run_spider_to_pars_ads_list.s(url=instance.url, user=instance.user.id),
        delay_task.s(),
        run_spider_to_pars_ads_list.s(url=instance.url, user=instance.user.id),
    ).apply_async()
© www.soinside.com 2019 - 2024. All rights reserved.