因此,我正在使用twitterAPI进行一个项目,以使用特定的经度和纬度在推特上收集不同的关键字。我为推文抓取了数据,该数据是包含以下字段的每个关键字的字典列表:
dict_keys(['created_at', 'id', 'id_str', 'text', 'truncated', 'entities', 'extended_entities', 'metadata', 'source', 'in_reply_to_status_id', 'in_reply_to_status_id_str', 'in_reply_to_user_id', 'in_reply_to_user_id_str', 'in_reply_to_screen_name', 'user', 'geo', 'coordinates', 'place', 'contributors', 'is_quote_status', 'retweet_count', 'favorite_count', 'favorited', 'retweeted', 'possibly_sensitive', 'lang'])
我现在想将每个关键字x1_tweets,x2_tweets和x3_tweets的文本提取到json文件中
为此我定义了一个功能:
def save_to_json(obj, filename):
with open(filename, 'w') as fp:
json.dump(obj, fp, indent=4, sort_keys=True)
其中obj是词典列表,而filename是我要用来保存文档的文件名。当我尝试将函数用于示例save_to_json(x1_tweets, doors)
时,它将返回一个包含所有内容的文件。我应该如何使用它返回仅包含推文的文件的功能?任何帮助都将被申请!提前致谢!这是json文件的样子:
[
{
"contributors": null,
"coordinates": null,
"created_at": "Mon May 18 02:08:53 +0000 2020",
"entities": {
"hashtags": [],
"media": [
{
"display_url": "pic.twitter.com/ig7H0jIHOq",
"expanded_url": "https://twitter.com/CMag051/status/1262303473682022400/photo/1",
"id": 1262203448080007168,
"id_str": "1262203448080007168",
"indices": [
98,
121
],
"media_url": "http://pbs.twimg.com/media/EYQ_WT0VAAA6hTK.jpg",
"media_url_https": "https://pbs.twimg.com/media/EYQ_WT0VAAA6hTK.jpg",
"sizes": {
"large": {
"h": 2048,
"resize": "fit",
"w": 1536
},
"medium": {
"h": 1200,
"resize": "fit",
"w": 900
},
"small": {
"h": 680,
"resize": "fit",
"w": 510
},
"thumb": {
"h": 150,
"resize": "crop",
"w": 150
}
},
"type": "photo",
"url": "https://twitter.com/ig7H0jIHOq"
}
],
"symbols": [],
"urls": [],
"user_mentions": []
},
"extended_entities": {
"media": [
{
"display_url": "pic.twitter.com/ig7H0jvHOq",
"expanded_url": "https://twitter.com/CMag051/status/1262253473682022400/photo/1",
"id": 1262203448080007168,
"id_str": "1262203448080007168",
"indices": [
98,
121
],
"media_url": "http://pbs.twimg.com/media/EYQ_WT0VAAA6hTK.jpg",
"media_url_https": "https://pbs.twimg.com/media/EYQ_WT0VAAA6hTK.jpg",
"sizes": {
"large": {
"h": 2048,
"resize": "fit",
"w": 1536
},
"medium": {
"h": 1200,
"resize": "fit",
"w": 900
},
"small": {
"h": 680,
"resize": "fit",
"w": 510
},
"thumb": {
"h": 150,
"resize": "crop",
"w": 150
}
},
"type": "photo",
"url": "https://twitter.com/ig7H0iIHOq"
}
]
},
"favorite_count": 1,
"favorited": false,
"geo": null,
"id": 1262203473682022400,
"id_str": "1262203473682022400",
"in_reply_to_screen_name": null,
"in_reply_to_status_id": null,
"in_reply_to_status_id_str": null,
"in_reply_to_user_id": null,
"in_reply_to_user_id_str": null,
"is_quote_status": false,
"lang": "en",
"metadata": {
"iso_language_code": "en",
"result_type": "recent"
},
"place": null,
"possibly_sensitive": false,
"retweet_count": 0,
"retweeted": false,
"source": "<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>",
"text": "Beautiful evening. \n\nSitting on patio, eating some apple , and listening to the birds chirp. https://twitter.com/ig7H0jIHOq",
"truncated": false,
"user": {
"contributors_enabled": false,
"created_at": "Wed Apr 01 03:32:05 +0000 2009",
"default_profile": false,
"default_profile_image": false,
"description": "Photographer | Music & Sports Enthusiast.",
"entities": {
"description": {
"urls": []
}
},
"favourites_count": 19189,
"follow_request_sent": false,
"followers_count": 547,
"following": false,
"friends_count": 2432,
"geo_enabled": false,
"has_extended_profile": true,
"id": 28041855,
"id_str": "28041855",
"is_translation_enabled": false,
"is_translator": false,
"lang": null,
"listed_count": 0,
"location": "Phoenix, AZ",
"name": "Chris",
"notifications": false,
"profile_background_color": "000000",
"profile_background_image_url": "http://abs.twimg.com/images/themes/theme1/bg.png",
"profile_background_image_url_https": "https://abs.twimg.com/images/themes/theme1/bg.png",
"profile_background_tile": false,
"profile_banner_url": "https://pbs.twimg.com/profile_banners/28041855/1586840506",
"profile_image_url": "http://pbs.twimg.com/profile_images/1262196071817605121/WBvC3h5P_normal.jpg",
"profile_image_url_https": "https://pbs.twimg.com/profile_images/1262196071817605121/WBvC3h5P_normal.jpg",
"profile_link_color": "ABB8C2",
"profile_sidebar_border_color": "000000",
"profile_sidebar_fill_color": "000000",
"profile_text_color": "000000",
"profile_use_background_image": false,
"protected": false,
"screen_name": "CMag051",
"statuses_count": 11285,
"time_zone": null,
"translator_type": "none",
"url": null,
"utc_offset": null,
"verified": false
}
}
您要做的第一件事是将以下代码更改为:
def save_to_json(obj, filename):
with open(filename, 'a') as fp:
json.dump(obj, fp, indent=4, sort_keys=True)
由于以下原因,您需要更改打开文件的模式。
w:
以只写模式打开。指针放置在文件的开头,这将覆盖任何具有相同名称的现有文件。如果不存在同名文件,它将创建一个新文件。
a:
打开一个文件,用于向其添加新信息。指针放置在文件的末尾。如果不存在同名文件,则会创建一个新文件。
而且,sort_keys
没有意义,因为您只传递了string
而不是dict
。同样,indent=4
对于strings
也没有意义。
如果您需要对推文进行索引,则可以使用以下代码:
tweets = {}
for i, tweet in enumerate(x1_tweets):
tweets[i] = tweet['text']
save_to_json(tweets,'bat.json')
上面的代码将创建带有推文索引的dict
,并在处理完所有推文后将其写入文件。
并且如果您只需要推文的文本而没有索引,则可以使用string aggregation
或使用list
和append
来自该推文中的所有文本,并将其写入输出文件。