Beautifulsoup：href 链接未定义

Question

我想废弃一个网站，当我到达任何

<a>

标签时，链接是“job/undefined”，我使用post请求从页面获取数据。

在此代码中使用 postdata 发布请求：

from bs4 import BeautifulSoup
import requests

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36"}

postData = {
  'search': 'search',
  'facets[camp_type]':'day_camp',
  'open[choices-made-content]': 'true'}

url = 'https://www.trustme.work/en'
html_1 = requests.post(url, headers=headers, data=postData)

soup1 = BeautifulSoup(html_1.text, 'lxml')
a = soup1.select('div.MuiGrid-root MuiGrid-grid-xs-12 ')
b = soup1.select('span[class="MuiTypography-root MuiTypography-h2"]')
print('soup:',b)

输出样本：

<span class="MuiTypography-root MuiTypography-h2" style="cursor:pointer">
    <a href="job/undefined" style="color:#413E52;text-decoration:none">
    Network and Security engineer
    </a>
</span>

Answer 1

编辑

部分内容是动态提供的，因此，您必须通过 api 获取作业 hashid，然后自己创建链接或使用 JSON 响应中的数据：

import requests

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36"}
url = 'https://api.trustme.work/api/job_offers?include=technologies%2Cjob%2Ccompany%2Ccontract_type%2Clevel'
jobs = requests.get(url, headers=headers).json()['included']['jobs']

['https://www.trustme.work/job/' + v['hashid'] for k,v in jobs.items()]

要从每个职位发布中获取链接，请更改

css selector

以选择更具体的元素，还可以尝试在类上使用静态标识符或 HTML 结构：

.select('h2 a')

要获取所有链接的

list

，请使用列表理解：

['https://www.trustme.work' + a.get('href') for a in soup1.select('h2 a')]

示例

from bs4 import BeautifulSoup
import requests

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36"}

postData = {
 'search': 'search',
 'facets[camp_type]':'day_camp',
 'open[choices-made-content]': 'true'}

url = 'https://www.trustme.work/en'
html_1 = requests.post(url, headers=headers, data=postData)

soup1 = BeautifulSoup(html_1.text, 'lxml')
['https://www.trustme.work' + a.get('href') for a in soup1.select('h2 a')]

Beautifulsoup：href 链接未定义

问题描述投票：0回答：1

1个回答

编辑

示例

最新问题

Beautifulsoup：href 链接未定义

问题描述 投票：0回答：1

1个回答

编辑

示例

最新问题

问题描述投票：0回答：1