使用yield语句在使用scrapy python找不到搜索查询时返回输出

问题描述 投票:0回答:1

我正在按照教程使用scrapy库从网站上抓取多个页面。本教程使用yield语句使用css选择器和xpath选择器从页面的html和css结构中获取信息。我决定使用if语句检查搜索查询是否找到结果,并使用else语句输出搜索查询未遇到结果时要执行的操作。当代码执行提取公司名称的else语句时,问题就出现了,而对于Location和Sales字段,我想要一个传达'Not Found'的自定义输出字符串。

当我运行脚本时,我收到以下错误:

File "C:\Users\....\hoover-scraper\scraper.py", line 28

'Location': 'Not Found'
         ^

我认为这不是使用yield语句的正确方法,这就是我收到SyntaxError消息的原因。因此,我想知道是否有任何方法可以在查询遇到空搜索时输出销售和位置字段的字符串'Not found'。

这部分代码:

def parse(self, response):
    NAME_SELECTOR ="td a::text"
    LOCATION_SELECTOR ='.//tr/td/text()' #using xpath to grab information for Location and Sales
    SALES_SELECTOR = './/tr/td/text()' 

if response.css(NAME_SELECTOR).extract_first(): #Checks to see if the company name field has data if not prints 'No results found'
        yield {

            'Company Name': response.css(NAME_SELECTOR).extract_first(),
            'Location' : response.xpath(LOCATION_SELECTOR)[0].extract(), #Location comes first inside the td tags thus the [0]
            'Sales' : response.xpath(SALES_SELECTOR)[1].extract(),
        }

    else:
        yield {
            'Company Name': response.css("dd.value.term::text").extract_first() #identifies company name which data was not found
            'Location': 'Not Found'
            'Sales': 'Not Found'
        }
python python-3.x web-scraping scrapy
1个回答
2
投票

yield仅用于发电机。您是否只想从方法中返回该值?然后在两个地方用yield替换return

如果稍后需要在同一方法中使用该值,请将字典分配给变量。喜欢

if response.css(NAME_SELECTOR).extract_first(): #Checks to see if the company name field has data if not prints 'No results found'
        result = {

            'Company Name': response.css(NAME_SELECTOR).extract_first(),
            'Location' : response.xpath(LOCATION_SELECTOR)[0].extract(), #Location comes first inside the td tags thus the [0]
            'Sales' : response.xpath(SALES_SELECTOR)[1].extract(),
        }

    else:
        result = {
            'Company Name': response.css("dd.value.term::text").extract_first(), #identifies company name which data was not found
            'Location': 'Not Found',
            'Sales': 'Not Found'
        }
    # do something with result
    ...
    # or just:
    return result
© www.soinside.com 2019 - 2024. All rights reserved.