我正在尝试从此会议列表中抓取会议网站:https://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/en/meetings/archive/#2022
这需要 Selenium 单击“详细信息”才能访问该网站。不幸的是,我的代码无法运行。
以下是我尝试过的:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
# Setup Chrome WebDriver
options = webdriver.ChromeOptions()
options.add_argument('--headless') # Run Chrome in headless mode
driver = webdriver.Chrome(options=options)
# Navigate to the webpage
driver.get("https://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/en/meetings/archive/#2022")
# Wait for the element to be clickable
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="archive_meetings_list"]/li/div/div/details/summary')))
# Click the "Details" summary element
details_button = driver.find_element(By.XPATH, '//*[@id="archive_meetings_list"]/li/div/div/details/summary')
details_button.click()
但是,我收到 TimeOutException,因为 selenium 无法找到 XPATH 指定的元素。
会议网站列表通过 API 调用填充在页面上
https://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/meetings/meetings?year=2022
如果您想要每个“详细信息”部分的数据,您可以使用:
import requests
url_base = 'https://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/meetings/meetings?year='
conference_info = []
sess = requests.Session()
for year in range(2015, 2025):
res = sess.get(f'{url_base}{year}')
conf_list = res.json()
conference_info.extend(conf_list)
生成的数据如下所示:
[
{
"address": "",
"keywords": "Observacional, Astronomia, Astronomy, Escuela, Astronomy School",
"start": "2015-01-11",
"web2": "",
"web1": "http://www.astroscu.unam.mx/cursos/esaobela/",
"title": "Escuela de Astronomía Observacional para Estudiantes Latinoamericanos.",
"meetingNumber": "4414",
"ingested": "Jun 24, 2014 06:26:00 AM",
"phone": "",
"contact": "Jose H. Peña",
"modified": "Jun 24, 2014 06:26:00 AM",
"location": "Tonantzintla, Puebla. México.",
"end": "2015-01-30",
"fax": "",
"bibCode": "",
"email": __removed__
},
{
"address": "Centre for Theoretical Atomic, Molecular and Optical Physics (CTAMOP) School of Mathematics and Physics Queens University Belfast David Bates Bldg, Room 01.016 7 College Park Belfast BT7 1NN, UK",
"keywords": "quantitative spectroscopy, emission lines, Active Nuclei, Starburst galaxies, H II regions, star formation",
"start": "2015-01-12",
"web2": "",
"web1": "http://cloud9.pa.uky.edu/~gary/cloudy/CloudySummerSchool/",
"title": "Cloudy workshop 2015 Jan 12-16 at Queen's University Belfast",
"meetingNumber": "4440",
"ingested": "Aug 03, 2014 10:13:00 AM",
"phone": "",
"contact": "Gary Ferland",
"modified": "Aug 03, 2014 10:13:00 AM",
"location": "Queen's University Belfast, NI, UK",
"end": "2015-01-16",
"fax": "",
"bibCode": "",
"email": __removed__
},
...