我正在尝试使用 Python、requests、Pandas 和 BeautifulSoup 从 RaceRoster 网站 (https://raceroster.com/search?q=5k&t=upcoming) 抓取活动链接和联系信息。目标是提取每个事件的事件名称、事件 URL、联系人姓名和电子邮件地址,并将数据保存到 Excel 文件中,以便我们可以出于业务开发目的联系这些事件。
但是,脚本始终报告在搜索结果页面上找不到事件链接,尽管在浏览器中检查 HTML 时链接可见。以下是搜索结果页面中活动链接的相关 HTML:
<a href="https://raceroster.com/events/2025/98542/13th-annual-delaware-tech-chocolate-run-5k"
target="_blank"
rel="noopener noreferrer"
class="search-results__card-event-name">
13th Annual Delaware Tech Chocolate Run 5k
</a>
采取的步骤:
soup.select("a.search-results__card-event-name")
使用 soup.prettify() 检查 requests.get() 调用的响应内容。 HTML 似乎缺少浏览器中可见的事件链接,这表明内容可能是通过 JavaScript 动态加载的。
尝试使用 BeautifulSoup 抓取数据,但始终得到:
Found 0 events on the page.
Scraped 0 events.
No contacts were scraped.
我需要什么帮助:
当前脚本:
import requests
from bs4 import BeautifulSoup
import pandas as pd
def scrape_event_contacts(base_url, search_url):
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
}
event_contacts = []
# Fetch the main search page
print(f"Scraping page: {search_url}")
response = requests.get(search_url, headers=headers)
if response.status_code != 200:
print(f"Failed to fetch page: {search_url}, status code: {response.status_code}")
return event_contacts
soup = BeautifulSoup(response.content, "html.parser")
# Select event links
event_links = soup.select("a.search-results__card-event-name")
print(f"Found {len(event_links)} events on the page.")
for link in event_links:
event_url = link['href']
event_name = link.text.strip() # Extract Event Name
try:
print(f"Scraping event: {event_url}")
event_response = requests.get(event_url, headers=headers)
if event_response.status_code != 200:
print(f"Failed to fetch event page: {event_url}, status code: {event_response.status_code}")
continue
event_soup = BeautifulSoup(event_response.content, "html.parser")
# Extract contact name and email
contact_name = event_soup.find("dd", class_="event-details__contact-list-definition")
email = event_soup.find("a", href=lambda href: href and "mailto:" in href)
contact_name_text = contact_name.text.strip() if contact_name else "N/A"
email_address = email['href'].split("mailto:")[1].split("?")[0] if email else "N/A"
if contact_name or email:
print(f"Found contact: {contact_name_text}, email: {email_address}")
event_contacts.append({
"Event Name": event_name,
"Event URL": event_url,
"Event Contact": contact_name_text,
"Email": email_address
})
else:
print(f"No contact information found for {event_url}")
except Exception as e:
print(f"Error scraping event {event_url}: {e}")
print(f"Scraped {len(event_contacts)} events.")
return event_contacts
def save_to_spreadsheet(data, output_file):
if not data:
print("No data to save.")
return
df = pd.DataFrame(data)
df.to_excel(output_file, index=False)
print(f"Data saved to {output_file}")
if __name__ == "__main__":
base_url = "https://raceroster.com"
search_url = "https://raceroster.com/search?q=5k&t=upcoming"
output_file = "/Users/my_name/Documents/event_contacts.xlsx"
contact_data = scrape_event_contacts(base_url, search_url)
if contact_data:
save_to_spreadsheet(contact_data, output_file)
else:
print("No contacts were scraped.")
预期结果:
使用 API 端点获取即将发生的事件的数据。
具体方法如下:
import requests
from tabulate import tabulate
url = 'https://search.raceroster.com/search?q=5k&t=upcoming'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36',
}
events = requests.get(url,headers=headers).json()['data']
table = [
[event["name"], event["url"]] for event in events
]
print(tabulate(table, headers=["Name", "URL"]))
这应该打印:
Name URL
------------------------------------------- ------------------------------------------------------------------------------------------
Credit Union Cherry Blossom https://raceroster.com/events/2025/72646/credit-union-cherry-blossom
Big Cork Wine Run 5k https://raceroster.com/events/2025/98998/big-cork-wine-run-5k
3rd Annual #OptOutside Black Friday Fun Run https://raceroster.com/events/2025/98146/3rd-annual-number-optoutside-black-friday-fun-run
Ryan's Race 5K walk Run https://raceroster.com/events/2025/97852/ryans-race-5k-walk-run
13th Annual Delaware Tech Chocolate Run 5k https://raceroster.com/events/2025/98542/13th-annual-delaware-tech-chocolate-run-5k
Builders Dash 5k https://raceroster.com/events/2025/99146/builders-dash-5k
The Ivy Scholarship 5k https://raceroster.com/events/2025/96874/the-ivy-scholarship-5k
39th Firecracker 5k Run Walk https://raceroster.com/events/2025/96907/39th-firecracker-5k-run-walk
24th Annual John D Kelly Logan House 5k https://raceroster.com/events/2025/97364/24th-annual-john-d-kelly-logan-house-5k
2nd Annual Scott Trot 5K https://raceroster.com/events/2025/96904/2nd-annual-scott-trot-5k