我正在研究https://www.accordbox.com/blog/how-crawl-infinite-scrolling-pages-using-python/上的示例
关于其中的Scrapy解决方案代码的yield request
,我很困惑。有三个yield request
。有时会生成Request,有时会生成并执行,有时会执行。您能告诉我它们之间有什么区别吗?
谢谢!。
def parse_list_page(self, response):
next_link = response.xpath(
"//a[@class='page-link next-page']/@href").extract_first()
if next_link:
url = response.url
next_link = url[:url.find('?')] + next_link
################################
# Generate and Execute Request
################################
yield Request(
url=next_link,
callback=self.parse_list_page
)
for req in self.extract_product(response):
################################
#Just Execute Request
################################
yield req
def extract_product(self, response):
links = response.xpath("//div[@class='col-lg-8']//div[@class='card']/a/@href").extract()
for url in links:
result = parse.urlparse(response.url)
base_url = parse.urlunparse(
(result.scheme, result.netloc, "", "", "", "")
)
url = parse.urljoin(base_url, url)
################################
#Just Generate Request
################################
yield Request (
url=url,
callback=self.parse_product_page
)
def parse_product_page(self, response):
logging.info("processing " + response.url)
yield None
[我正在https://www.accordbox.com/blog/how-crawl-infinite-scrolling-pages-using-python/研究刮擦示例,关于在其中产生Scrapy解决方案代码的请求,我非常困惑。 ...