在熊猫中清洁tweets.csv:“ TypeError:预期的字符串或类似字节的对象”

问题描述 投票:0回答:1

我有一个看起来像这样的大熊猫数据框(这是为什么它弄乱了)


<bound method NDFrame.head of          0  \
0       -1   
1       -1   
2       -1   
3       -1   
4       -1   
...     ..   
1599994  1   
1599995  1   
1599996  1   
1599997  1   
1599998  1   

       @switchfoot http://twitpic.com/2y1zl - Awww, that's a bummer.  You shoulda got David Carr of Third Day to do it. ;D  
0        is upset that he can't update his Facebook by ...                                                                   
1        @Kenichan I dived many times for the ball. Man...                                                                   
2          my whole body feels itchy and like its on fire                                                                    
3        @nationwideclass no, it's not behaving at all....                                                                   
4                            @Kwesidei not the whole crew                                                                    
...                                                    ...                                                                   
1599994  Just woke up. Having no school is the best fee...                                                                   
1599995  TheWDB.com - Very cool to hear old Walt interv...                                                                   
1599996  Are you ready for your MoJo Makeover? Ask me f...                                                                   
1599997  Happy 38th Birthday to my boo of alll time!!! ...                                                                   
1599998  happy #charitytuesday @theNSPCC @SparksCharity...                                                                   

第一列具有感性,第二列具有鸣叫

为了清除推文,我复制了以下功能:

def cleanTxt(text):
 text = re.sub('@[A-Za-z0–9]+', '', text) #Removing @mentions
 text = re.sub('#', '', text) # Removing '#' hash tag
 text = re.sub('RT[\s]+', '', text) # Removing RT
 text = re.sub('https?:\/\/\S+', '', text) # Removing hyperlink

 return text

当我尝试使用它时:

str(df.iloc[: 1])
df = df.iloc[: 1].apply(cleanTxt)
print(df.head)

我收到以下错误:


TypeError: expected string or bytes-like object

我该如何解决?

python python-3.x pandas dataframe twitter
1个回答
0
投票

当re.sub遇到非字符串值时,可能会发生这种情况。您可能应该仔细检查字符串中是否有任何特殊字符或转义字符。我只是打印一些推文并检查是否有错误。

© www.soinside.com 2019 - 2024. All rights reserved.