我已经这样做好几天了,我正在尝试抓取这个网站:“https://careers.ispor.org/jobseeker/search/results/” 我已经涵盖了所有内容,从提取每个职位页面中的信息的脚本到计算有多少页面的脚本,但为了获取每个职位发布的单独链接,我需要循环浏览页面每个包含 25 个列表,但这就是问题所在:
这是到目前为止我的代码,此时我已成功获取第一页的数据:
from zenrows import ZenRowsClient
client = ZenRowsClient("the_api_key_i_got_from_the_trial_account")
url = "https://careers.ispor.org/jobseeker/search/results"
params = {"js_render":"true","premium_proxy":"true"}
response = client.get(url, params=params)
soup = BeautifulSoup(response.content, 'html.parser')
此后,我一直在尝试复制单击“下一页”按钮时检查的ajax请求,但我得到的结果只是模仿进入下一页,而返回结果上的职位发布是第一页:
data = {
'page': '2',
'pos_flt': '0',
'location_autocomplete': 'true',
'radius': '320',
'ajaxRequest': '1',
'user_latlong': 'lat=33.874698638916$long=10.102299690247',
"js_render": "true",
"premium_proxy": "true"
}
cookies = {
'AWSALB': "e6+c5w9IR/N4+ERov3onMB85zlZbl+mughxR4zfjLRLMoq9SJwBHTesVwdSAoTLuK88spU0tbqTVZ8jI7NGHLxMo/7Q+DefZBboxMZDGRMLBY60+HRQaBnKOYDhJ",
'AWSALBCORS': "e6+c5w9IR/N4+ERov3onMB85zlZbl+mughxR4zfjLRLMoq9SJwBHTesVwdSAoTLuK88spU0tbqTVZ8jI7NGHLxMo/7Q+DefZBboxMZDGRMLBY60+HRQaBnKOYDhJ",
'JTSUBREF': "careers.ispor.org",
'datadome': "pn970laC_lalBETD5NWHB~pVKYYLrP2fg9_1JlfW1POc~Ny5Usr37BfuNP1UiAl3kCxoOA7z0Pvlwo69rK5WBre5T9znj0U3p55vC_mMGn1w56eqcSU1eWpla3DYLyJb"
}
headers = {
'authority': 'careers.ispor.org',
'method': 'GET',
'path': '/c/@search_results/controller/includes/search_jobs.cfm?page=2&pos_flt=0&location_autocomplete=true&radius=320&ajaxRequest=1&user_latlong=lat%3D33.874698638916%24long%3D10.102299690247',
'scheme': 'https'
}
response = client.get(url, params=params, headers=headers, cookies=cookies, data=data)
soup2 = BeautifulSoup(response.content, 'html.parser')
这里您需要使用付费代理 参数类似于以下
'render_js': 'false',
'residential': 'true'
使用 BeautifulSoup 获取 25 个链接的所有 href,然后为每个链接抓取职位发布,我用几个例子对此进行了测试,它得到的 href 如下所示,您需要将基本链接添加到这些链接,然后再次刮取每个结果。
希望这有帮助
#
/job/postdoctoral-fellow-us-infectious-disease-epidemiology/75118162/
#
/job/assistant-professor-center-for-value-based-care-research/75118214/
#
/job/oncology-clinical-pharmacist/74728623/
#
/job/director-institutional-outreach-and-collaboration/73729688/
#
/job/director-cler-site-visit-operations/73729629/
#
/job/staff-associate-ii/75118241/
#
/job/dircorporate-medical-records/74866586/
#