我有一个看起来像这样的大熊猫数据框(这是为什么它弄乱了)
<bound method NDFrame.head of 0 \
0 -1
1 -1
2 -1
3 -1
4 -1
... ..
1599994 1
1599995 1
1599996 1
1599997 1
1599998 1
@switchfoot http://twitpic.com/2y1zl - Awww, that's a bummer. You shoulda got David Carr of Third Day to do it. ;D
0 is upset that he can't update his Facebook by ...
1 @Kenichan I dived many times for the ball. Man...
2 my whole body feels itchy and like its on fire
3 @nationwideclass no, it's not behaving at all....
4 @Kwesidei not the whole crew
... ...
1599994 Just woke up. Having no school is the best fee...
1599995 TheWDB.com - Very cool to hear old Walt interv...
1599996 Are you ready for your MoJo Makeover? Ask me f...
1599997 Happy 38th Birthday to my boo of alll time!!! ...
1599998 happy #charitytuesday @theNSPCC @SparksCharity...
第一列具有感性,第二列具有鸣叫
为了清除推文,我复制了以下功能:
def cleanTxt(text):
text = re.sub('@[A-Za-z0–9]+', '', text) #Removing @mentions
text = re.sub('#', '', text) # Removing '#' hash tag
text = re.sub('RT[\s]+', '', text) # Removing RT
text = re.sub('https?:\/\/\S+', '', text) # Removing hyperlink
return text
当我尝试使用它时:
str(df.iloc[: 1])
df = df.iloc[: 1].apply(cleanTxt)
print(df.head)
我收到以下错误:
TypeError: expected string or bytes-like object
我该如何解决?
当re.sub遇到非字符串值时,可能会发生这种情况。您可能应该仔细检查字符串中是否有任何特殊字符或转义字符。我只是打印一些推文并检查是否有错误。