硒在外汇工厂中找不到我想要的元素

问题描述 投票:0回答:1

enter image description here我似乎无法从外汇工厂获得高影响力的新闻,我需要有人检查我的代码,认为我不知何故编码错误

这是我的代码

from selenium import webdriver
import chromedriver_autoinstaller
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# Automatically install the correct version of ChromeDriver
chromedriver_autoinstaller.install()

# Set up Chrome options
chrome_options = Options()
chrome_options.add_argument("--headless")  # Runs Chrome in headless mode.
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-dev-shm-usage")

# Provide the path to the ChromeDriver executable
chrome_driver_path = r'C:\Users\smart\Documents\chromedriver.exe'  # Ensure this is the correct path to chromedriver.exe

# Set up the Chrome driver service
service = Service(chrome_driver_path)

# Initialize the WebDriver
driver = webdriver.Chrome(service=service, options=chrome_options)

# Set an implicit wait of 10 seconds
driver.implicitly_wait(10)

# Open the Forex Factory calendar page for today
driver.get('https://www.forexfactory.com/calendar?day=jun9.2024')

try:
    # Find all <tr> elements representing high impact news
    high_impact_news = driver.find_elements(By.XPATH, "//tr[.//span[@title='High Impact Expected' and contains(@class, 'icon--ff-impact-red')]]")

    # Process each high impact news element
    for element in high_impact_news_elements:
        # Extract information from the element as needed
        # Example: Get the event title
        event_title = element.find_element(By.CLASS_NAME, "calendar__event-title").text
        print("Event Title:", event_title)
        # Example: Get the event date
        event_date = element.find_element(By.CLASS_NAME, "date").text
        print("Event Date:", event_date)
        # Add more attributes or actions as required

   
    print(f"Found {len(high_impact_news)} high impact news events")

    if high_impact_news:
        print("High impact news found")

    if not high_impact_news:
        print("No high impact news found")
finally:
    # Close the browser
    driver.quit()

我通过 chatgpt 运行了它并添加了一些建议,我还浏览了很多 youtube 教程,但我希望代码能够打印找到的 1 个高影响力新闻以及新闻在页面上发布的时间,我还没有但由于我仍在努力获取新闻,因此对时间进行了编码,这是我从外汇工厂复制的元素,我认为高影响力图标位于页面的脚本中

所以当红色图标出现时(高影响力新闻)我希望它打印出有高影响力新闻

selenium-webdriver web-scraping selenium-chromedriver forex
1个回答
0
投票

如果您每天有多个高影响力的事件,这可能会更复杂一些。例如,请参阅2024 年 6 月 18 日。在这种情况下,日期和/或时间将跨越多行,您需要更加努力才能获取这些数据。

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options

options = Options()
options.add_argument("--no-sandbox")
options.add_argument("--disable-dev-shm-usage")

driver = webdriver.Remote(
  "http://127.0.0.1:4444/wd/hub",
  options=options
)

driver.get('https://www.forexfactory.com/calendar?day=jun18.2024')

try:
    high_impact_news = high_impact_rows = driver.find_elements(By.XPATH, "//tr[.//span[contains(@class, 'icon--ff-impact-red')]]")

    for row in high_impact_news:
        print("==================================================")
        date = row.find_element(By.XPATH, "preceding-sibling::tr[.//td[@rowspan]][1]//span[@class='date']").text
        title = row.find_element(By.XPATH, ".//span[@class='calendar__event-title']").text

        try:
            # Time in same row.
            time = row.find_element(By.XPATH, ".//td[@class='calendar__cell calendar__time']//span")
        except:
            # Time in spanning row.
            time = row.find_element(By.XPATH, "preceding-sibling::tr[.//td[@class='calendar__cell calendar__time']][1]//span[not(contains(@class, 'icon'))]")

        # Remove line break from date.
        date = date.replace("\n", " ")

        print(title)
        print(date)
        print(time.text)

    print(f"Found {len(high_impact_news)} high impact news events")

    if high_impact_news:
        print("High impact news found")
    else:
        print("No high impact news found")
finally:
    driver.quit()

🚨 我正在使用远程 Selenium 驱动程序,但您可以恢复到启动浏览器的设置。

输出:

==================================================
Cash Rate
Tue Jun 18
5:30am
==================================================
RBA Rate Statement
Tue Jun 18
5:30am
==================================================
Core Retail Sales m/m
Tue Jun 18
1:30pm
==================================================
Retail Sales m/m
Tue Jun 18
1:30pm
Found 4 high impact news events
High impact news found

还值得注意的是,如果您想从weeklymonthly视图中提取这些数据,那么您将需要向下滚动以确保所有内容可见,否则您将只获得首屏上方的数据。

© www.soinside.com 2019 - 2024. All rights reserved.