如何在python scrapy中获取队列中的请求数量？

Question

在下面的代码中，

```
len(self.crawler.engine.slot.scheduler)
```
始终返回 0

和

self.crawler.engine.slot.scheduler.stats._stats['scheduler/enqueued']

按升序返回值：1, 2, 3, 4, 5, 6, 7, 8, 9, 10

我预计队列最初会很高，并且随着 URL 被抓取而按递减顺序排列。抓取前队列值较高，抓取后队列值较低。

此外，取消注释此代码也显示了队列大小增加的类似趋势。

if next_page is not None:
    next_page = response.urljoin(next_page)
    yield scrapy.Request(next_page, callback=self.parse)

注意：我已在设置中设置了

CONCURRENT_REQUESTS =  1

import scrapy

class QuotesSpider(scrapy.Spider):
    name = "quotes_spider"
    start_urls = [
        "https://quotes.toscrape.com/page/1/",
        "https://quotes.toscrape.com/page/2/",
        "https://quotes.toscrape.com/page/3/",
        "https://quotes.toscrape.com/page/4/",
        "https://quotes.toscrape.com/page/5/",
        "https://quotes.toscrape.com/page/6/",
        "https://quotes.toscrape.com/page/7/",
        "https://quotes.toscrape.com/page/8/",
        "https://quotes.toscrape.com/page/9/",
        "https://quotes.toscrape.com/page/10/",
    ]

    def parse(self, response):
        
        print(f"\n before {self.crawler.engine.slot.scheduler.stats._stats['scheduler/enqueued']} \n\n")
        print(f"\n before2 {len(self.crawler.engine.slot.scheduler)}")  # dont know why it always returns zero
                
        for quote in response.css("div.quote"):
            yield {
                "text": quote.css("span.text::text").get(),
                "author": quote.css("small.author::text").get(),
                "tags": quote.css("div.tags a.tag::text").getall(),
            }
            next_page = response.css("li.next a::attr(href)").get()
            if next_page is not None:
                next_page = response.urljoin(next_page)
                yield scrapy.Request(next_page, callback=self.parse)
        print(f"\n After {self.crawler.engine.slot.scheduler.stats._stats['scheduler/enqueued']} \n\n")
        print(f"\n after2 {len(self.crawler.engine.slot.scheduler)}") # dont know why it always returns zero

这是最初的问题（由于声誉较低，我无法在那里发表评论）：How to get the number of requests in scrapy in scrapy?
scrapy 代码复制自：https://docs.scrapy.org/en/latest/intro/tutorial.html

Answer 1

如何在python scrapy中获取队列中的请求数量？

len(self.crawler.engine.slot.scheduler)

self.crawler.engine.slot.scheduler.stats._stats['scheduler/enqueued']
按升序返回值：1, 2, 3, 4, 5, 6, 7, 8, 9, 10

这是统计计划请求总数的统计数据的预期行为。一般来说，统计数据并不反映某事物的当前状态。

len(self.crawler.engine.slot.scheduler)

始终返回 0

这意味着调度程序队列在这些点上是空的，这对于下载页面速度比计划速度快的蜘蛛来说是有意义的。

如何在python scrapy中获取队列中的请求数量？

问题描述投票：0回答：1

1个回答

最新问题

如何在python scrapy中获取队列中的请求数量？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1