我正在尝试使用以下 data-testid 获取 HTML 中每个元素的数据,如下所示:
<div data-testid="sl.explore.card-description"</div>
我写的代码如下,但不起作用:
desc_list = []
desc = driver.find_elements(By.XPATH, "//div[@data-testid='sl.explore.card-description']")
for i, cp in enumerate(desc):
splitted = desc[i].text.split('\n');
data_desc = str(splitted[0:1])
desc_list.append(data_desc);
df_desc = pd.DataFrame(desc_list)
代码中的主要问题可能在于访问元素和提取文本的方式。检查您的 XPath 以确保它选择正确的元素。此外,请确保您选择的元素具有可提取的文本内容。
import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.by import By
# You must have initialized your Selenium WebDriver instance as a 'driver'
desc_list = []
# Use find_elements instead of find_element to get a list of elements
desc = driver.find_elements(By.XPATH, "//div[@data-testid='sl.explore.card-description']")
for cp in desc:
# Check if an element has text content before splitting it
if cp.text:
# Split text content with newline characters
splitted = cp.text.split('\n')
# Extract the desired data (for example, the first row)
data_desc = splitted[0]
desc_list.append(data_desc)
# Create the DataFrame after the loop to avoid creating unnecessary DataFrames inside the loop
df_desc = pd.DataFrame(desc_list, columns=['Description'])
确保您的 XPath 实际上选择了您想要的元素,并确保它具有可提取的文本内容。