我收到此错误:InvalidSchema(“未找到{!r}的连接适配器。”。format(url))
当我尝试运行此代码时:
import pandas as pd
pd.set_option('display.max_colwidth', -1)
url_file = 'https://github.com/MarissaFosse/ryersoncapstone/raw/master/DailyNewsArticles.xlsx'
tstar_articles = pd.read_excel(url_file, "TorontoStar Articles", header=0)
url_to_sents = {}
for url in tstar_articles:
url = tstar_articles['URL']
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
results = soup.find(class_='c-article-body__content')
results_text = [tag.get_text().strip() for tag in results]
sentence_list = [sentence for sentence in results_text if not '\n' in sentence]
sentence_list = [sentence for sentence in sentence_list if '.' in sentence]
article = ' '.join(sentence_list)
url_to_sents[url] = article
我正在尝试使用request()从我创建的Excel文件中读取URL。我怀疑这是由于看不见的字符引起的,但不知道如何检查。
更改:
for url in tstar_articles:
url = tstar_articles['URL']
page = requests.get(url)
至:
for url in tstar_articles['URL']:
page = requests.get(url)