我试图从用户定义的Twitter个人资料中抓取推文。阅读之前的帖子后,我了解Twitter JSON有一个扩展推文部分。我已将tweet_mode='extended'
添加到我的api.user_timeline
部分并将.text
更改为.full_text.
但是,我仍然得到截断的推文。我知道转推有一个full_text属性,但我正在抓住时间轴,而不是将推文与转推分开。
有没有办法普遍查询推文并检索full_text版本。我在下面提供了我的代码。
screen_name_list = ['@x']
for name in screen_name_list:
user = api.get_user(name)
#initialize a list to hold all the tweepy Tweets
alltweets = []
#make initial request for most recent tweets (200 is the maximum allowed count)
new_tweets = api.user_timeline(screen_name = name, count = 200,tweet_mode='extended', include_rts=True)
#save most recent tweets
alltweets.extend(new_tweets)
#save the id of the oldest tweet less one
oldest = alltweets[-1].id - 1
#keep grabbing tweets until there are no tweets left to grab
while len(new_tweets) > 0:
print 'getting tweets before %s' % (oldest)
#all subsiquent requests use the max_id param to prevent duplicates
new_tweets = api.user_timeline(screen_name = name, count=200, max_id=oldest, tweet_mode='extended')
#save most recent tweets
alltweets.extend(new_tweets)
#update the id of the oldest tweet less one
oldest = alltweets[-1].id - 1
print "...%s tweets downloaded so far" % (len(alltweets))
#transform the tweepy tweets into a 2D array that will populate the csv
outtweets = [[tweet.id_str, tweet.created_at, tweet.full_text.encode('utf-8')] for tweet in alltweets]
tweet_time = [index[1] for index in outtweets]
tweet_list = [index[2] for index in outtweets]
如果你更换
tweet.full_text
同
tweet.retweeted_status.full_text if tweet.full_text.startswith("RT @") else tweet.full_text
您将获得转发的全文,虽然前面没有“RT”,因此您可能还想在CSV中添加另一列来表示转发,例如:
[1 if tweet.full_text.startswith("RT @") else 0] for tweet in alltweets