在Scrapy类中更正来自多个解析def的输出

问题描述 投票:0回答:1

我从一个页面提取数据,然后从此页面迭代URL并从另一个页面获取另一个信息。但输出不正确 - 请参见截图。来自第二个'def'的项目在输出中下降,并且它们的排序与首先'def'的项目不匹配!检查下面的代码结构。谢谢!

***

def parse(self, response):
    rows = ***

    for row in rows:
        item = Items()
        item['number'] = ***
        item['name'] = ***
        ***
        yield item

        urls = ***

        for url in urls.extract():
            yield Request(urlparse.urljoin(response.url, url), callback=self.parse_player)

def parse_player(self, response):
    item = Items()
    item['mainposition'] = ***
    item['altposition'] = ***
    yield item

结果在截图:https://snag.gy/tCaDm3.jpg

web-scraping scrapy
1个回答
0
投票

我想你应该在第一页收集姓名等;然后不要屈服,只需通过meta传递到下一页;然后才产生整个项目。像这儿:

def parse(self, response):
    rows = ***
    for row in rows:
        item = Items()
        item['number'] = ***
        item['name'] = ***
        # don't yield item here!

        urls = ***
        for url in urls.extract():
            yield Request(response.urljoin(url), self.parse_player, meta={'item': item})

def parse_player(self, response):
    item = response.meta['item']
    item['mainposition'] = ***
    item['altposition'] = ***
    yield item
© www.soinside.com 2019 - 2024. All rights reserved.