下载鸣叫,但主题标签丢失

问题描述 投票:0回答:1

我试图删除一位用户的所有推文,但是当我下载的数据丢失了其标签时例如,该推文应该具有5个标签。但是我下载的数据显示如下:

b'RT @gcosma1: Fantastic opportunity! PhD Studentship: Energy Prediction in Buildings using Artificial Intelligence\nthe_url #\xe2\x80\xa6'

enter image description here

有人知道为什么会这样吗?它困扰了我很长时间,我找不到解决方案。这是我的代码:

import tweepy
import csv
import json

consumer_key = 'XXX'
consumer_secret = 'XXX'
access_token = 'XXX'
access_token_secret = 'XXX'

def get_all_tweets(screen_name):
    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_token_secret)
    api = tweepy.API(auth)

    all_the_tweets = []
    new_tweets = api.user_timeline(screen_name=screen_name, count=200)
    all_the_tweets.extend(new_tweets)
    oldest_tweet = all_the_tweets[-1].id - 1

    t_no = 201
    while len(all_the_tweets) != t_no:
        new_tweets = api.user_timeline(screen_name=screen_name,count=200, max_id=oldest_tweet, tweet_mode="extended")
        t_no = len(all_the_tweets)
        all_the_tweets.extend(new_tweets)
        oldest_tweet = all_the_tweets[-1].id - 1
        print ('...%s tweets have been downloaded so far' % len(all_the_tweets))

    # transforming the tweets into a 2D array that will be used to populate the csv
    outtweets = [[tweet.id_str, tweet.created_at,
    tweet.text.encode('utf8')] for tweet in all_the_tweets]
    # writing to the csv file

    with open(screen_name + '_tweets.csv', 'w', encoding='utf8') as f:
        writer = csv.writer(f)
        writer.writerow(['id', 'created_at', 'text'])
        writer.writerows(outtweets)

if __name__ == '__main__':
    get_all_tweets(input("Enter the twitter handle of the person whose tweets you want to download:- "))

python twitter tweepy
1个回答
0
投票

似乎只在转推中发生。原始推文的文本似乎包含所有标签。如果查看其原始推文,就会发现它是

Fantastic opportunity! PhD Studentship: Energy Prediction in Buildings using Artificial Intelligence\nthe_url #DeepLearning #MachineLearning #AI #DataScience #PhD the_url2'

所以,你可以做这样的事情

new_tweets = api.user_timeline(screen_name='gcosma1', count=200, tweet_mode="extended")

tweet_text = []
for tweet in new_tweets:

    #Check if it is a retweet. If yes, add the original tweet
    if hasattr(tweet, 'retweeted_status'):
        tweet_text.append(tweet.retweeted_status.full_text)
    else:
        tweet_text.append(tweet.full_text)

print(tweet_text)
© www.soinside.com 2019 - 2024. All rights reserved.