如何从特定标签中获取文本？

Question

我有以下来源，我想从标签的特定属性获取文本

image

我可以使用以下 xpath 访问图像标签。

//*[name()='g' and contains(@entityid, '61042482270050282')]/*[name()='image']

但我不知道如何在selenium中使用python

获取图像中带有红色下划线的文本

down_state_16x16

Answer 1

要从特定标签中获取文本，可以使用 Selenium 中的

get_attribut

e 方法来获取

xlink:href

标签的

image

属性，其中包含图像的 URL。但是，在您的情况下，您想要从图像本身提取文本，这不能直接使用 Selenium 实现，因为它用于与 Web 元素交互，而不是图像的内容。

要从图像中提取文本，您需要下载图像，然后使用光学字符识别 (OCR) 库（例如 pytesseract）从图像中提取文本。

但是，根据您提供的上下文，图像似乎是“向下”状态指示器，而您要提取的文本是“MINOR”。由于此文本不是图像的一部分，而是

标签的属性，因此您可以使用以下代码提取它：

from selenium import webdriver

# Initialize the driver
driver = webdriver.Firefox()

# Navigate to the webpage
driver.get('http://your-webpage-url.com')

# Find the g tag using the XPath expression
g_tag = driver.find_element_by_xpath('//*[name()=\'g\' and contains(@entityid, \'61042482270050282\')]')

# Extract the aria-label attribute value
aria_label = g_tag.get_attribute('aria-label')

# Extract the text after the comma
text = aria_label.split(', ')[1]

# Print the extracted text
print(text)

如何从特定标签中获取文本？

问题描述投票：0回答：1

1个回答

最新问题

如何从特定标签中获取文本？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1