如何通过Selenium和Xpath定位元素

问题描述 投票:0回答:4

所以我试图从网站上删除一些信息,当我尝试通过xpath获取元素时,当我提供的路径直接从检查工具中复制时,我收到错误“无法找到元素”。我尝试了几件但是它没有用,所以我告诉自己我将尝试更简单的路径(TEST),但仍然无法工作。在检查时,网站是否可能没有显示所有的HTML代码?

这是代码,我尝试了网站和xpath。

URL_TRADER = 'https://www.tipranks.com/analysts/joseph-foresi?benchmark=none&period=yearly'

TEST = 'html/body/div[@id="app"]/div[@class="logged-out free"]/div[@class="client-components-app-app__wrapper undefined undefined"]'#/div/div[1]/div/div[2]/div/section/main/table/tbody/tr[3]/td[3]/div/div/div/div[1]/span'

X_PATH = '//*[@id="app"]/div/div/div[2]/div/div[1]/div/div[2]/div/section/main/table/tbody/tr[1]/td[3]/div/div/div/div[1]/span'

主要功能是:

def trader_table():

  # Loading Chrome and getting to the website
  driver = webdriver.Chrome(executable_path='/usr/local/bin/chromedriver')
  driver.get(URL_TRADER)
  driver.implicitly_wait(10)
  text = driver.find_element_by_xpath(X_PATH).get_attribute('innerHTML')

  return text
python-3.x selenium xpath web-scraping webdriverwait
4个回答
1
投票

我添加了一个等待条件并使用了css选择器组合,但这与我认为的xpath相同

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

url = 'https://www.tipranks.com/analysts/joseph-foresi?benchmark=none&period=yearly'
driver = webdriver.Chrome()
driver.get(url)
data =  WebDriverWait(driver,10).until(EC.presence_of_element_located((By.CSS_SELECTOR, ".client-components-experts-infoTable-expertTable__table .client-components-experts-infoTable-expertTable__dataRow td:nth-child(3)"))).get_attribute('innerHTML')
print(data)

1
投票

您已经提供了构建答案所需的所有必要详细信息,但您没有明确提到您要获取的元素。

但是,在TEST中注释掉的xpath给了我们一个提示你是在价格目标之后并提取这些元素中的文本,因为元素是JavaScript启用的元素,你需要为visibility_of_all_elements_located()引入WebDriverWait,你可以使用以下解决方案:

  • 代码块: from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC options = webdriver.ChromeOptions() options.add_argument('start-maximized') options.add_argument('disable-infobars') options.add_argument('--disable-extensions') driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\WebDrivers\chromedriver.exe') driver.get("https://www.tipranks.com/analysts/joseph-foresi?benchmark=none&period=yearly") print([element.get_attribute('innerHTML') for element in WebDriverWait(driver,10).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='client-components-experts-infoTable-expertTable__isBuy']//span")))])
  • 控制台输出: ['$14.00', '$110.00', '$237.00', '$36.00', '$150.00', '$71.00', '$188.00', '$91.00', '$101.00', '$110.00']

1
投票

我想你正在照顾price

from selenium import webdriver
URL_TRADER = 'https://www.tipranks.com/analysts/joseph-foresi?benchmark=none&period=yearly'

TEST = 'html/body/div[@id="app"]/div[@class="logged-out free"]/div[@class="client-components-app-app__wrapper undefined undefined"]'#/div/div[1]/div/div[2]/div/section/main/table/tbody/tr[3]/td[3]/div/div/div/div[1]/span'

X_PATH = "//div[@class='client-components-experts-infoTable-expertTable__isBuy']/div/span"

def trader_table():
 driver = webdriver.Chrome(executable_path='/usr/local/bin/chromedriver')
 driver.get(URL_TRADER)
 driver.implicitly_wait(10)
 text = driver.find_element_by_xpath(X_PATH).get_attribute('innerHTML')
 print(text)
 return text

编辑所有行

    from selenium import webdriver
    URL_TRADER = 'https://www.tipranks.com/analysts/joseph-foresi?benchmark=none&period=yearly'

    X_PATH = "//div[@class='client-components-experts-infoTable-expertTable__isBuy']/div/span"


    def trader_table():
     driver = webdriver.Chrome(executable_path='/usr/local/bin/chromedriver')
     driver.get(URL_TRADER)
     driver.implicitly_wait(10)
     list_ele= driver.find_elements_by_xpath(X_PATH)
     price_list = []
     for ele in list_ele:
         print(ele.text)
         price_list.append(ele.text)

     return price_list

list=trader_table()
print(list)

0
投票

from selenium import webdriver
import time

driver = webdriver.Chrome("your webdriver location")
driver.get("https://www.tipranks.com/analysts/joseph-foresi?benchmark=none&period=yearly")
time.sleep(10)
y = driver.find_element_by_id('app').get_attribute('innerHTML')
print(y)

打印完整的内部HTML

© www.soinside.com 2019 - 2024. All rights reserved.