美丽的汤确实刮

Question

我试图从确实的网站上抓取数据，并想首先创建一个转换函数来查找所有具有此部分的 div 并返回长度：

我的代码不断返回零，我不太明白为什么。

import requests
from bs4 import BeautifulSoup

def extract(page):
    headers = {'User_Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:130.0) Gecko/20100101 Firefox/130.0'}
    url = f'https://www.indeed.com/jobs?q=data+analyst&start={page}'
    r = requests.get(url, headers)
    soup = BeautifulSoup(r.content, 'html.parser')
    return soup

def transform(soup):
    divs = soup.find_all('div', class_ = 'cardOutline tapItem dd-privacy-allow')
    return len(divs)

#extract everything from page one and stored in variable
c = extract(0)

#prints length of every instance the class_ argument shows up in the extracted data
print(transform(c))

谢谢！

Answer 1

Indeed 网站使用 Cloudflare 的机器人防护。您可以通过检查浏览器中的 cookie 来验证这一点。因此，使用 requests 库检索内容是不可行的。请考虑替代解决方案。

美丽的汤确实刮

问题描述投票：0回答：1

1个回答

最新问题

美丽的汤确实刮

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1