我已经抓取了nhatot.com网站的内容,但没有可用的内容。我认为我的计算机被阻止了,但事实并非如此。我可以正常访问网站(nhatot.com) 这是我的Python代码
page = 1
header = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36"}
payload = {
'render_js' :'true'
}
proxies = {
'http': 'http://eiQqeQQ5:[email protected]:63362/'
# 'https': 'https://eiQqeQQ5:[email protected]:63362/',
}
base_url = "https://www.nhatot.com"
list_house_url=[]
for i in range(1,10):
url = "https://www.nhatot.com/mua-ban-bat-dong-san?page="+str(i)
print(url)
request =requests_a.get(url,headers=header,proxies=proxies ,params=payload,verify=False)
soup= BeautifulSoup(request.content,'html.parser')
soup1= BeautifulSoup(soup.prettify(),'html.parser')
page_content = soup1.find_all('a',class_='AdItem_adItem__gDDQT')
print(page_content)
for a in page_content:
list_house_url.append(base_url+a['href'])
print(list_house_url)
这是结果
https://www.nhatot.com/mua-ban-bat-dong-san?page=1
[]
https://www.nhatot.com/mua-ban-bat-dong-san?page=2
[]
我尝试过使用代理,但它仍然有效。