形成网络刮擦数据BS4

问题描述 投票:0回答:1

输出FulfordRoad所需的输出是这样的:
WaterLane
york
york
YO104PA
YO306PQ

line1城 postCodeFulfordRoadyorkYO104PAWaterLaneyorkYO306PQ代码 import requests from bs4 import BeautifulSoup import pandas as pd list1 = [] response3 = requests.get("https://stores.aldi.co.uk/yorkshire-amp-humber/york") soup3 = BeautifulSoup(response3.text, "html.parser") try: for a1 in soup3.find_all('span', attrs={'class':'Address-field Address-line1'}): line1 = a1.get_text() print(line1) for a2 in soup3.find_all('span', attrs={'class':'Address-field Address-city'}): line2 = a2.get_text() print(line2) for a3 in soup3.find_all('span', attrs={'class':'Address-field Address-postalCode'}): line3 = a3.get_text() print(line3) except: pass data = pd.DataFrame(list1)
我真的很感谢您能给我解决这个问题的任何支持。 thanks, s

跟着您的方法,我会这样做(

单一的for-lop
):

list1 = [] for addr in soup3.find_all("div", class_="Address"): line1 = addr.find("span", class_="Address-line1").get_text() city = addr.find("span", class_="Address-city").get_text() postcode = addr.find("span", class_="Address-postalCode").get_text() list1.append([line1, city, postcode]) df = pd.DataFrame(list1, columns=["line1", "city", "postcode"])

另一个变体:

from collections import defaultdict data = defaultdict(list) for addr in soup3.find_all("div", class_="Address"): data["line1"].append(addr.find("span", class_="Address-line1").get_text()) data["city"].append(addr.find("span", class_="Address-city").get_text()) data["postcode"].append(addr.find("span", class_="Address-postalCode").get_text()) df = pd.DataFrame(data)
python pandas web-scraping beautifulsoup
1个回答
1
投票
输出:

print(df) line1 city postcode 0 Fulford Road York YO10 4PA 1 Water Lane York YO30 6PQ


最新问题
© www.soinside.com 2019 - 2025. All rights reserved.