如果使用python和beautifulsoup只知道域链接,如何获取网站的多个页面来抓取特定数据

问题描述 投票:0回答:1

我是python报废的新手,想要编写一个网站的代码报废数据,当没有分页可用且页面链接是动态的时,它就是所有内页你可以看到我发布的链接我试图收集的信息公司,姓名,地址和电话号码 这是我的代码。

我尝试了很多来自stackoverflow的问题但是它们与我的要求不符。

from bs4 import BeautifulSoup
import requests
source= requests.get('http://businessdirectory.pk/Default.aspx?action=Business&pid=762390').text

soup= BeautifulSoup(source, 'lxml')
ParentDiv= soup.find('div' , class_='businessDetails')
CompanyName= ParentDiv.find('p' , class_='title').text
CityName= ParentDiv.find('p' , class_='cityName').text
CityAddress= ParentDiv.find('p' , class_='address').text
PhoneNumber= ParentDiv.find('p' , class_='phone').text
MobileNo= ParentDiv.find('p' , class_='mobNo').text
print(CompanyName)
print(CityName)
print(CityAddress)
print(PhoneNumber)

所有我想只给出一个域的链接,它将获得所有内页并在那里搜索相同的数据。

python web-scraping beautifulsoup
1个回答
0
投票

尝试以下代码希望这会有所帮助。

from bs4 import BeautifulSoup
import requests
page_num = 0

company_name=[]
City_Name=[]
City_Address=[]
Phone_Number=[]
Maxpage=12
while page_num<Maxpage:
    page = "http://businessdirectory.pk/Default.aspx?action=Business&pid=762390&page={}".format(page_num)
    pageTree = requests.get(page)
    soup = BeautifulSoup(pageTree.text, 'html.parser')
    ParentDiv = soup.find('div', class_='businessDetails')

    CompanyName = ParentDiv.find('p', class_='title').text
    CityName = ParentDiv.find('p', class_='cityName').text
    CityAddress = ParentDiv.find('p', class_='address').text
    PhoneNumber = ParentDiv.find('p', class_='phone').text

    company_name.append(CompanyName)
    City_Name.append(CityName)
    City_Address.append(CityAddress)
    Phone_Number.append(PhoneNumber)
    page_num += 1


print(company_name)
print(City_Name)
print(City_Address)
print(Phone_Number)

输出将是这样的。

['Ab Traders', 'Al Faisal Machinery Store', 'Ameen Pipe Store', 'Aslam Air Compressor', 'Best Engineering Works', 'China Center', 'Empyrean Group', 'General Industrial Corporation', 'Habib Mill Store', 'Humayun Traders', 'Islam Air Corporation', 'Khalid Hussain Workshop 3']
['Faisalabad', 'Faisalabad', 'Faisalabad', 'Faisalabad', 'Faisalabad', 'Faisalabad', 'Lahore', 'Faisalabad', 'Faisalabad', 'Faisalabad', 'Faisalabad', 'Faisalabad']
['Sadiq Market, Railway Road', 'Railway Road', 'Sadiq Market, Railway Road', 'Railway Road', 'Sadiq Market, Railway Road', 'Railway Road', '8-E 1, Jagawar Chowk, Near Allah Hu Chowk, Johar Town', 'Sadiq Market, Railway Road', 'Railway Road', 'Sadiq Market, Railway Road', 'Railway Road', 'General Bus Stand']
['0412639166', '0412646985-2606985', '0412618759', '0412600387', '0412632037', '0412600504-2634502', '0336-9954475', '0412636174-2637446', '0412617274', '0412635348-2617469', '0412618242', '0418781513']
© www.soinside.com 2019 - 2024. All rights reserved.