scrapy 响应返回 None 值

问题描述 投票:0回答:1

我想从 id="hotel_address" 下的此页面抓取位置坐标。

class CrawlerSpider(scrapy.Spider):

name='crawler'
headers={'User-Agent':
    'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Mobile Safari/537.36'} 

start_urls=['https://www.booking.com/hotel/it/heart-of-san-lorenzo-roma12345678910111213141516171819202122.it.html?group_adults=1;no_rooms=1;']
#               
def start_requests(self):
     for link in self.start_urls:
         yield scrapy.Request(url=link,headers=self.headers, callback=self.parse)

def parse(self, response):  
   coordinate=""
   print('===========================================================')
   coordinate=response.xpath('//*[@id="hotel_address"]/@data-atlas-latlng/text()').get()
   print(coordinate) 
   print('===========================================================')
process=CrawlerProcess()
process.crawl(CrawlerSpider)
process.start()

但它返回 None 值。我的错误是什么?

python web-scraping scrapy
1个回答
0
投票

原因是你使用了text()data-atlas-latlng 属性包含坐标作为值,而不是作为子文本节点。要解决此问题,您需要在 XPath 表达式中使用 @data-atlas-latlng 直接获取 data-atlas-latlng 属性的值。

我提供更新的代码。

class CrawlerSpider(scrapy.Spider):
    name = 'crawler'
    headers = {'User-Agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Mobile Safari/537.36'} 

    start_urls = ['https://www.booking.com/hotel/it/heart-of-san-lorenzo-roma12345678910111213141516171819202122.it.html?group_adults=1;no_rooms=1;']

    def start_requests(self):
        for link in self.start_urls:
            yield scrapy.Request(url=link, headers=self.headers, callback=self.parse)

    def parse(self, response):
        print('===========================================================')
        # Extract the coordinates from the data-atlas-latlng attribute
        coordinate = response.xpath('//*[@id="hotel_address"]/@data-atlas-latlng').get()
        print(f"Coordinates: {coordinate}")  # Prints the coordinates
        print('===========================================================')

process = CrawlerProcess()
process.crawl(CrawlerSpider)
process.start()

© www.soinside.com 2019 - 2024. All rights reserved.