这是我的python代码,用于从twitter检索数据。但是当我尝试将数据存储到gannie.txt时,遇到以下错误。
File "D:\software\Anaconda\lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode characters in position 5-6: character maps to <undefined>
关于此的任何帮助,我都是这种文本挖掘的新手,我正在尝试使用自然语言处理来构建情感分析项目
这是我的代码:
outF = open("gannie.txt", "a")
for tweet in tweets:
#print(tweet.text)
Tweet = tweet.text
#Convert www.* or https?://* to URL
Tweet = re.sub('((www\.[\s]+)|(https?://[^\s]+))','URL',Tweet)
Tweet = re.sub('@[^\s]+','TWITTER_USER',Tweet)
#Remove additional white spaces
Tweet = re.sub('[\s]+', ' ', Tweet)
#Replace #word with word Handling hashtags
Tweet = re.sub(r'#([^\s]+)', r'\1', Tweet)
#trim
Tweet = Tweet.strip('\'"')
#Deleting happy and sad face emoticon from the tweet
a = ':)'
b = ':('
Tweet = Tweet.replace(a,'')
Tweet = Tweet.replace(b,'')
#Deleting the Twitter @username tag and reTweets
tag = 'TWITTER_USER'
rt = 'RT'
url = 'URL'
Tweet = Tweet.replace(tag,'')
tweetCount+=1
if rt in Tweet:
continue
Tweet = Tweet.replace(url,'')
print(Tweet)
outF.write(Tweet)
outF.write("\n")
outF.close()
我仅通过添加encoding =“ utf-8”打开文件行就得到了答案
之前:outF = open("gannie.txt", "a")
之后:outF = open("gannie.txt", "a",encoding="utf-8")