我有一个要在Excel中进行鸣叫的URL列表。是否可以在Python中从这些推文(URL)中提取文本?然后将其保存在Excel工作表中吗?
我看到有人使用了下面的代码,但这仅适用于1个URL。
from lxml import html
import requests
page = requests.get('https://twitter.com/realDonaldTrump/status/1237448419284783105')
tree = html.fromstring(page.content)
tree.xpath('//div[contains(@class, "permalink-tweet-container")]//p[contains(@class, "tweet-text")]//text()')
Excel包含以下列:作者和URL。excelfile('twitter.xlsx')看起来像这样:
Author URL
realDon.. https://twitter.com/realDon..
. .
. .
. .
我尝试过此代码:
import pandas as pd
from lxml import html
import requests
input_data = pd.read_excel('twitter.xlsx')
input_data1 = input_data[['URL']]
tweets = []
for url in input_data1.values:
x = requests.get(url)
tree = html.fromstring(x.content)
i = tree.xpath('//div[contains(@class, "permalink-tweet container")]//p[contains(@class, "tweet-text")]//text()')
tweets.append(i)
错误:InvalidSchema:找不到“ ['https://twitter.com/realDonaldTrump/status/1237448419284783105']'
的连接适配器简短回答-是。
答案很长-是的,有可能。我建议您对此主题进行一些阅读。
openpyxl
库是您的朋友-here's their documentation。requests
是一个很棒的图书馆,可用于访问网站! Here is their documentation这是模拟程序逻辑的伪代码:
input_data = read(excel_file)
tweets = []
for url in input_data:
x = get(url)
tweets.append(x)
for tweet in tweets:
write(tweet, excel_file)