我从Indeed下载了一页的源代码,我正试图从那里获得所有的职称,因为我正在使用这个xpath:
response.xpath('//*[@class=" row result"]//*[@class="jobtitle"]//text()').extract()
问题是结果不是一行而是得到这个结果:
[u'\n ',
u'Data',
u' ',
u'Scientist',
u' Experto SQL con conocimiento en R',
u'\n ',
u'\n ',
u'Data',
u' Analytic con Python',
u'\n ',
u'\n ',
u'Data',
u' Analytic con R',
与其他数据进行映射存在问题,我想要的是逐个选择处理作业,类似于extract_first()
response.xpath('//*[@class=" row result"]').extract_first()
但是对于任何给定的索引并且可以选择继续处理数据。我试过这个:
current_job = response.xpath('//*[@class=" row result"]').extract_first()
current_job = TextResponse(url='',body=current_job,encoding='utf-8')
但它只适用于第一个结果,它对我来说看起来不像是一个pythonic方法。
首先我只得到a
(没有text()
和extract()
)然后我会使用for
将text()
和extract()
与每个a
单独使用,并且join()
将元素连接到带有标题的字符串。
import scrapy
class MySpider(scrapy.Spider):
name = 'myspider'
start_urls = ['https://www.indeed.cl/trabajo?q=Data%20scientist&l=']
def parse(self, response):
print('url:', response.url)
results = response.xpath('//h2[@class="jobtitle"]/a')
print('number:', len(results))
for item in results:
title = ''.join(item.xpath('.//text()').extract())
print('title:', title)
# --- it runs without project and saves in `output.csv` ---
from scrapy.crawler import CrawlerProcess
c = CrawlerProcess({
'USER_AGENT': 'Mozilla/5.0',
})
c.crawl(MySpider)
c.start()
结果:
number: 10
title: Data Scientist
title: CONSULTOR DATA SCIENCE SANTIAGO DE CHILE
title: Líder Análisis de Datos MCoE Minerals Americas
title: Ingeniero Inteligencia Mercado, BI
title: Ingeniero Inteligencia de Mercado, Business Intelligence
title: Data Scientist
title: Data Scientist
title: Data Scientist (Machine Learning)
title: Data Scientist / Ml Scientist
title: Young Professional - Spanish LatAm
搏一搏。您需要稍微更改我的脚本以适合您的项目。它可以解决您上面提到的问题。
import requests
from scrapy import Selector
res = requests.get("https://www.indeed.cl/trabajo?q=Data%20scientist")
sel = Selector(res)
for item in sel.css("h2.jobtitle a"):
title = ' '.join(item.css("::text").extract())
print(title)
输出:
Data Scientist
CONSULTOR DATA SCIENCE SANTIAGO DE CHILE
Líder Análisis de Datos MCoE Minerals Americas
Ingeniero Inteligencia Mercado, BI
Ingeniero Inteligencia de Mercado, Business Intelligence
Data Scientist
Data Scientist
Young Professional - Spanish LatAm
Data Scientist (Machine Learning)
Data Scientist / Ml Scientist