我正在尝试使用 Scrapy (http://www.umdata.org/SearchChurches.aspx) 向此表单发送 POST 请求。我是否正确发送了有效负载(表单数据)?
import scrapy
from scrapy.http import FormRequest
class QuotesSpider(scrapy.Spider):
name = 'greatriver'
login_url = 'http://www.umdata.org/SearchChurches.aspx'
start_urls = [login_url]
def parse(self, response):
login_url = 'http://www.umdata.org/SearchChurches.aspx'
return FormRequest.from_response(
response,
url=login_url,
clickdata={'name': 'ctl00$ContentPlaceHolder1$btnSearch'},
formdata = {'ctl00$ContentPlaceHolder1$ddlState': 'AK'},
callback = self.parse_after_search,
)
def parse_after_search (self, response):
print (response.xpath('//*[@id="ContentPlaceHolder1_gvResults"]/tbody/tr[3]/td[1]/a'))
我运行这个脚本时的结果:
2023-04-20 20:00:04 [scrapy.core.engine] INFO: Spider opened
2023-04-20 20:00:04 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2023-04-20 20:00:04 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024
2023-04-20 20:00:05 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://www.umdata.org/SearchChurches.aspx> (referer: None)
2023-04-20 20:00:06 [scrapy.core.engine] DEBUG: Crawled (200) <POST http://www.umdata.org/SearchChurches.aspx> (referer: http://www.umdata.org/SearchChurches.aspx)
[]
2023-04-20 20:00:06 [scrapy.core.engine] INFO: Closing spider (finished)
2023-04-20 20:00:06 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
我期待我的 response.xpath 返回一个字符串
你的要求很好,你得到一个空列表,因为选择器是错误的(有时使用 scrapy 时没有
tbody
)。
import scrapy
from scrapy.http import FormRequest
class QuotesSpider(scrapy.Spider):
name = 'greatriver'
login_url = 'http://www.umdata.org/SearchChurches.aspx'
start_urls = [login_url]
def parse(self, response):
login_url = 'http://www.umdata.org/SearchChurches.aspx'
return FormRequest.from_response(
response,
url=login_url,
clickdata={'name': 'ctl00$ContentPlaceHolder1$btnSearch'},
formdata={'ctl00$ContentPlaceHolder1$ddlState': 'AK'},
callback=self.parse_after_search,
)
def parse_after_search (self, response):
for line in response.xpath('//table[@id="ContentPlaceHolder1_gvResults"]//tr[not(@style)]'):
yield {
'Church Name': line.xpath('./td[1]//text()').get(),
'Conference': line.xpath('./td[2]/text()').get(),
'District': line.xpath('./td[3]/text()').get(),
'Location': line.xpath('./td[4]/text()').get(),
}
输出:
...
...
{'Church Name': 'FIRST UNITED METHODIST CHURCH OF ANCHORAGE', 'Conference': 'ALASKA', 'District': 'ALASKA', 'Location': 'ANCHORAGE, AK'}
{'Church Name': 'FIRST UNITED METHODIST CHURCH OF FAIRBANKS', 'Conference': 'ALASKA', 'District': 'ALASKA', 'Location': 'FAIRBANKS, AK'}
...
...