通过皮卡,我从rabbitmq获取网址,并尝试为Scrapy Spider创建新请求当我由scrapy crawl spider
蜘蛛启动我的蜘蛛时,只是由于raise DontCloseSpider()
而没有关闭,但不要创建对Spider的请求我的自定义例外:
import pika
from scrapy import signals
from scrapy.http import Request
from scrapy.exceptions import DontCloseSpider
class AddRequestExample:
def __init__(self, stats):
self.stats = stats
@classmethod
def from_crawler(cls, crawler):
s = cls(crawler)
crawler.signals.connect(s.spider_idle, signal=signals.spider_idle)
return s
def spider_idle(self, spider):
connection = pika.BlockingConnection(pika.ConnectionParameters(host='localhost'))
channel = connection.channel()
try:
url = channel.basic_get(queue='hello')[2]
url = url.decode()
crawler.engine.crawl(Request(url), self)
except Exception:
pass
raise DontCloseSpider()
我的蜘蛛:
import scrapy
class QuotesSpider(scrapy.Spider):
name = "spider"
def parse(self, response):
yield {
'url': response.url
}
似乎您正在尝试从此answer复制方法。 在这种情况下,您需要定义请求回调函数。当您处理来自分机(而不是来自蜘蛛网)的spider_idle
信号时-应该是spider.parse
方法。
def spider_idle(self, spider):
....
try:
url = channel.basic_get(queue='hello')[2]
url = url.decode()
spider.crawler.engine.crawl(Request(url=url, callback = spider.parse), self)
except Exception:
....