我觉得我已经很接近了,但在几个小时没有进展之后我正在尝试这里。 我想抓取跨度值并将它们分配给变量或列表以进行进一步处理。
import requests
from bs4 import BeautifulSoup
import pandas as pd
r=requests.get('https://www.finn.no/car/used/search.html?exterior_colour=6&exterior_colour=13&exterior_colour=14&location=22046&model=1.8078.2000555&q=Long+range&stored-id=75430241')
soup = BeautifulSoup(r.content, 'html.parser')
finn=soup.find("div", class_="mb-8 flex justify-between whitespace-nowrap font-bold")
for i in finn:
print(i.text)
OUTPUT:
2024
10 200 km
517 129 kr
因此,我从第一个搜索结果中获得了我想要的值,但我很难将它们分配给变量或列表。我也想对其余搜索结果执行相同的操作,但如果我使用“find_all”,数据输出看起来会有所不同:
OUTPUT:
202410 200 km517 129 kr
202336 800 km499 999 kr
202237 500 km429 900 kr
202215 000 km460 000 kr
202331 700 km449 000 kr
202234 000 km427 129 kr
202247 000 km456 129 kr
202319 185 km497 129 kr
202247 500 km429 000 kr
202182 000 km397 129 kr
202419 200 km482 129 kr
20246 100 km537 129 kr
202224 000 km469 000 kr
你们确实很接近。使用
find_all()
检索所有与 mb-8 flex justify-between whitespace-nowrap font-bold
类匹配的 div 元素,如下所示
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = 'https://www.finn.no/car/used/search.html?exterior_colour=6&exterior_colour=13&exterior_colour=14&location=22046&model=1.8078.2000555&q=Long+range&stored-id=75430241'
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
listings = soup.find_all("div", class_="mb-8 flex justify-between whitespace-nowrap font-bold")
car_data = []
for listing in listings:
spans = listing.find_all('span')
if spans:
car_info = [span.text.strip() for span in spans]
car_data.append(car_info)
df = pd.DataFrame(car_data, columns=["Year", "Mileage", "Price"])
print(df)
这给出了
Year Mileage Price
0 2024 10 200 km 517 129 kr
1 2023 36 800 km 499 999 kr
2 2022 37 500 km 429 900 kr
3 2022 15 000 km 460 000 kr
4 2023 31 700 km 449 000 kr
5 2022 34 000 km 427 129 kr
6 2022 47 000 km 456 129 kr
7 2023 19 185 km 497 129 kr
8 2022 47 500 km 429 000 kr
9 2021 82 000 km 397 129 kr
10 2024 19 200 km 482 129 kr
11 2024 6 100 km 537 129 kr
12 2022 24 000 km 469 000 k