当我从 VS Code 内部运行代码时,出现此错误。据我了解,它警告我关于我的代码中不存在的
&
符号,或者它是其他东西?:
& C:/Users/iwanh/AppData/Local/Programs/Python/Python310/python.exe "c:/Users/iwanh/AppData/Local/Programs/Giannis' Programs/bbc_news.py"
File "<stdin>", line 1
& C:/Users/iwanh/AppData/Local/Programs/Python/Python310/python.exe"c:/Users/iwanh/AppData/Local/Programs/Giannis' Programs/bbc_news.py"
^
SyntaxError: invalid syntax
但是如果我使用
python.exe bbc_news.py
从 cmd 运行 .py 文件,它会按预期运行。当前运行Python 3.10.8。 VS Code 使用相同版本。
仅供参考,这是我的代码:
import requests
from bs4 import BeautifulSoup
import pandas as pd
def scrape_bbc_news(url):
try:
# Send a GET request to the URL
response = requests.get(url)
response.raise_for_status() # Raise an exception for bad response status
# Parse the HTML content
soup = BeautifulSoup(response.content, 'html.parser')
# Initialize lists to store headlines, descriptions, and links
headlines = []
descriptions = []
links = []
# Find all containers that hold both headline, description, and link
article_containers = []
# First type of container
containers_type1 = soup.find_all('div', class_='sc-f98732b0-3 ephYtw')
article_containers.extend(containers_type1)
for container in article_containers:
# Check if the article contains a "live" icon
live_icon = container.find('svg', class_='sc-3387039d-0 hgdstu sc-1097f7fe-0 jmthjj')
if live_icon:
continue # Skip this article if it contains a live icon
# Extract headline
headline = container.find('h2', class_='sc-4fedabc7-3 dsoipF').text.strip()
headlines.append(headline)
# Extract description
description = container.find('p', class_='sc-f98732b0-0 iQbkqW').text.strip()
descriptions.append(description)
# Extract link
link = container.find('a', class_='sc-2e6baa30-0 gILusN')['href'] if container.find('a', class_='gs-c-promo-heading') else ''
links.append(link)
# Check if lengths of headlines, descriptions, and links match
if len(headlines) != len(descriptions) or len(headlines) != len(links):
raise ValueError("Number of headlines, descriptions, and links do not match")
# Create a DataFrame from the extracted data
df = pd.DataFrame({
'headline': headlines,
'description': descriptions,
'link': links
})
return df
except requests.exceptions.RequestException as e:
print(f"Error fetching data: {e}")
return None
except ValueError as ve:
print(f"ValueError: {ve}")
return None
# Example usage:
url = "https://www.bbc.com/news"
df = scrape_bbc_news(url)
if df is not None:
print("Headlines, descriptions, and links scraped successfully:")
print(df.head())
df.to_csv('bbc_news_headlines.csv', index=False, encoding='utf-8')
print("Data saved to 'bbc_news_headlines.csv' successfully.")
else:
print("Failed to scrape data.")
您看到的错误消息是 Python 错误消息,该命令应该在 Powershell 中执行。
如果您手动输入命令,则应在终端窗口中执行此操作,确保当前提示符以
PS
(Powershell) 开头。
使用播放按钮(右上角)运行代码应该会打开一个 Powershell 会话来运行该命令。