在 bookspider.py 我有:
from typing import Iterable
import scrapy
from scrapy.http import Request
class BookSpider(scrapy.Spider):
name = None
def start_requests(self) -> Iterable[Request]:
yield scrapy.Request("https://books.toscrape.com/")
def parse(self, response):
books = response.css("article.product_pod")
for book in books:
yield {
"name": self.name,
"title": book.css("h3 a::text").get().strip(),
}
在 test_bookspider.py 我有:
import json
import os
from pytest_twisted import inlineCallbacks
from scrapy.crawler import CrawlerRunner
from twisted.internet import defer
from bookspider import BookSpider
@inlineCallbacks
def test_bookspider():
runner = CrawlerRunner(
settings={
"REQUEST_FINGERPRINTER_IMPLEMENTATION": "2.7",
"FEEDS": {"books.json": {"format": "json"}},
"TWISTED_REACTOR": "twisted.internet.asyncioreactor.AsyncioSelectorReactor",
# "TWISTED_REACTOR": "twisted.internet.selectreactor.SelectReactor",
}
)
yield runner.crawl(BookSpider, name="books")
with open("books.json", "r") as f:
books = json.load(f)
assert len(books) >= 1
assert books[0]["name"] == "books"
assert books[0]["title"] == "A Light in the ..."
os.remove("books.json")
defer.returnValue(None)
未注释
"TWISTED_REACTOR": "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
时,我收到以下错误:
Exception: The installed reactor (twisted.internet.selectreactor.SelectReactor) does not match the requested one (twisted.internet.asyncioreactor.AsyncioSelectorReactor)
随着
"TWISTED_REACTOR": "twisted.internet.selectreactor.SelectReactor"
取消注释,我的测试通过了。
任何人都可以解释这种行为以及更广泛地解释如何使用 pytest 测试 CrawlerRunner 或 CrawlerProcess 吗?
如果您使用
pytest-twisted
,您需要通过将 --reactor=asyncio
传递给 pytest 命令来告诉它安装适当的反应器,否则它将安装默认反应器。请参阅https://github.com/pytest-dev/pytest-twisted#using-the-plugin