如何使用 python selenium 从任意数量的行列表中抓取数据?

问题描述 投票:0回答:1

因此,我正在尝试创建一个机器人,该机器人可以识别 blur 上的 nft 贷款列表,满足某些标准,例如贷款总价值等于或低于其底价的 80% 或 APY 大于 100%。我已经了解了使用 selenium 加载 chrome 并导航到网站的正确部分以查看托收贷款的基础知识。但我正在努力从贷款表中实际提取数据。 id 想要做的是将贷款清单表提取到数组数组或字典数组中,每个数组/字典包含代表每个名称、状态、借款金额、LTV 和 APY 的数据。

到目前为止我所做的工作:

import selenium
from selenium.webdriver.common.by import By
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
import time

path = "/Users/#########/Desktop/chromedriver-mac-arm64/chromedriver"

# Create an instance of ChromeOptions
options = Options()
options.add_experimental_option("detach", True)
options.add_argument("disable-infobars");

# Specify the path to the ChromeDriver
service = Service(path)

# Initialize the WebDriver with the service and options
driver = webdriver.Chrome(service=service, options=options)

# Open Blur beanz collection and navigating to active loans page
driver.maximize_window
driver.get("https://blur.io/eth/collection/beanzofficial/loans")
time.sleep(3)
loan_button = driver.find_element(By.XPATH, "/html/body/div/div/main/div/div[3]/div/div[2]/div[1]/div[1]/nav/button[2]")
loan_button.click()

enter image description here

老实说,我对硒很陌生,所以我只是凭着自己的直觉和chatgpt试图解决这个问题。到目前为止,我最好的猜测是下面的一段代码,它试图提取所有贷款的年利率。这不起作用,因为我确信有一些错误的直觉。

elements = driver.find_elements(By.CSS_SELECTOR, 'Text-sc-m23s7f-0 hAGCAO')


# Initialize an empty list to store the percentage values
percentages = []

# Iterate through each element and extract its text (which contains the percentage)
for element in elements:
    percentage = element.text
    percentages.append(percentage)

# Print the extracted percentage values
print(percentages)

time.sleep(10)
# Close the WebDriver
driver.quit()

我也觉得这有点复杂,必须一次提取表中的每一列而不是每一行。不确定是否有更简单的方法来做到这一点,如果有那就太好了。如果不行也可以!

python html selenium-webdriver web-scraping
1个回答
0
投票

为了获取表数据,我将解析以 Json 格式存储数据的

<script>
元素:

import json

import pandas as pd
import requests
from bs4 import BeautifulSoup

url = "https://blur.io/"

soup = BeautifulSoup(requests.get(url).content, "html.parser")
data = json.loads(soup.select_one("#__NEXT_DATA__").text)

table = data["props"]["pageProps"]["sections"]["tableCollections"]

for tab, data in zip(table["tabs"], table["data"]):
    df = pd.DataFrame(data)
    del df["iconUrl"]

    values = df.pop("values").apply(pd.Series)
    values.columns = table["headings"][1:]

    df = pd.concat([df, values], axis=1)

    print(tab)
    print(df)
    print()

打印:

Trending
                                                collectionUrl                                         title Floor Price 1D Change 7D Change 1D Volume 7D Volume  Owners  Supply
0                        https://blur.io/collection/lilpudgys                                     LilPudgys        0.67    -12.89    -22.77    149.34    855.78    8269   21663
1                       https://blur.io/collection/degods-eth                                        DeGods        0.77     -8.62    -32.22     49.83    640.14    2006    5679
2                           https://blur.io/collection/clonex                                        CloneX        0.40     -0.12     -4.52     42.39    219.07    9488   19600
3                    https://blur.io/collection/mypethooligan                               My Pet Hooligan        0.34     -4.39    -12.42      9.83     68.65    3528    8888
4              https://blur.io/collection/cryptochasers-robot                           CryptoChasers Robot        0.88      0.00      0.00      0.88      0.88     472     499
5                  https://blur.io/collection/proof-moonbirds                                     Moonbirds        0.38    -10.93    -19.20     17.23     97.76    5797   10000
6            https://blur.io/collection/bored-ape-kennel-club                            BoredApeKennelClub        0.29    -12.28    -20.81     33.17    162.73    5076    9602
7            https://blur.io/collection/influenceth-asteroids                           Influence Asteroids        0.02     55.83    -18.33      1.93      2.77    1936    9666
8    https://blur.io/collection/lacma-cactoid-labs-rotf-vol-1  LACMA Remembrance of Things Future Volume #1        0.30    -24.05    -24.53      0.20      1.15     118     500
9                    https://blur.io/collection/beanzofficial                                         Beanz        0.17    -19.08    -20.61     30.37    153.57    7677   19950
10  https://blur.io/collection/murakami-flowers-2022-official                              Murakami.Flowers        0.15     -0.07      5.57      1.29     11.11    5365   10153
11                   https://blur.io/collection/habbo-avatars                                 Habbo Avatars        0.12     -0.69     -4.17      1.49      8.76    4410   11599
12                      https://blur.io/collection/deadfellaz                                    DeadFellaz        0.07     -0.14     -4.27      0.56      1.98    6413   10000
13                    https://blur.io/collection/trainersgen1              Pixelmon Trainers - Generation 1        0.08    -13.26    -17.59      0.25     11.89    1254    7000
14                https://blur.io/collection/str8fire-og-pass                              STR8FIRE OG PASS        0.05    -14.11     39.15      0.17      3.48     548    1200

Top
                                       collectionUrl                    title Floor Price 1D Change 7D Change 1D Volume 7D Volume  Owners  Supply
0           https://blur.io/collection/pudgypenguins            PudgyPenguins        7.29    -22.60    -28.50    894.49   2565.28    5182    8888
1       https://blur.io/collection/boredapeyachtclub        BoredApeYachtClub        8.89    -12.17    -24.25    680.12   3546.91    5460    9998
2                  https://blur.io/collection/milady                   Milady        4.73     -0.46     15.29    279.38   3113.31    5135    9978
3   https://blur.io/collection/mutant-ape-yacht-club       MutantApeYachtClub        1.47    -13.48    -20.11    219.61   2203.20   11658   19495
4          https://blur.io/collection/remilio-babies  Redacted Remilio Babies        1.28     -9.81      4.56    209.64   1303.70    4610    9998
5               https://blur.io/collection/lilpudgys                LilPudgys        0.67    -12.89    -22.77    149.34    855.78    8269   21663
6                   https://blur.io/collection/azuki                    Azuki        3.20    -10.64    -15.13    128.56   1994.88    4255   10000
7              https://blur.io/collection/degods-eth                   DeGods        0.77     -8.62    -32.22     49.09    640.54    2005    5679
8           https://blur.io/collection/kanpai-pandas            Kanpai Pandas        0.79    -15.17    -16.16     48.26    330.87    3191    7976
9                  https://blur.io/collection/clonex                   CloneX        0.40     -0.12     -4.52     42.39    219.07    9488   19600
10           https://blur.io/collection/pixelmongen1                 Pixelmon        0.44    -10.41     -9.65     37.48    191.28    2599   12566
11  https://blur.io/collection/bored-ape-kennel-club       BoredApeKennelClub        0.29    -12.28    -20.81     33.17    162.73    5076    9602
12                https://blur.io/collection/persona                  Persona        0.22    -11.29    -13.96     30.89    200.20    3222    8875
13            https://blur.io/collection/sappy-seals              Sappy Seals        0.42    -13.36    -18.76     30.79    135.66    1618    9997
14          https://blur.io/collection/beanzofficial                    Beanz        0.17    -19.08    -20.61     30.37    153.57    7677   19950
© www.soinside.com 2019 - 2024. All rights reserved.