此代码的目的是从特定URL中抓取多页数据表。并且它不再适用于第一行。
这是代码:
from selenium import webdriver
class DataEngine:
def __init__(self):
self.url = 'https://www.investing.com/economic-calendar/house-price-index-147'
self.driver = webdriver.PhantomJS(r"D:\Projects\Tutorial\Driver\phantomjs-2.1.1-windows\bin\phantomjs.exe")
def title(self):
self.driver.get(self.url)
title = self.driver.find_elements_by_xpath('//*[@id="leftColumn"]/h1')
for title in title:
print(title.text)
def table(self):
self.driver.get(self.url)
while True:
table = self.driver.find_elements_by_xpath('//*[@id="historicEvent_372690"]')
for table in table:
print(table.text)
要确保代码擦除页面上的所有行,请更新xpath
//*[@id="historicEvent_372690"]
至
//*[contains(@id,"historicEvent_")]
您当前使用的xpath只读取第一行。我共享的xpath使用contains关键字来查找包含id historicEvent_
的所有元素