beautifulsoup中的python find_all_next找不到字符串

问题描述 投票:1回答:2

我正试图从Instagram页面获取用户名。我应该使用“data = soup.find_all('script')[3]之后得到的数据的一部分”它看起来像这样:

Blockquote(script type =“text / javascript”> window._sharedData = {“config”:{“csrf_token”:“hIuZDxW17bTXz5EDLY25ftqivOOrLEeZ”,“viewer”:null,“viewerId”:null},“supports_es6”:false,“country_code “:” RU”, “LANGUAGE_CODE”: “EN”, “区域”: “EN_US”, “entry_data”:{ “PostPage”:[{ “graphql”:{ “shortcode_media”:{ “__类型名”: “GraphImage” , “ID”: “1968747493659350883”, “短码”: “BtSZWokAZdj”, “尺寸”:{ “高度”:640, “宽度”:640}, “gating_info”:NULL, “media_preview”:“ACoq5miitSxxIGTHPXPGcd8ZFAGXRXSSWypFsAAZ / lzjpn / Csm5sjAu7Ib8MUAUaKU0lABVq0lMUqsPUA / Q8VVpynBB9CKAOtuOFB9CD + uP5Gq19HuiOPTP5Ul1exhdgy7kdF7fU / wCGatJiRPqv5ZFIZybnP4UynOpUlT1HFNpiClDFeRSUUATLcSJ904 + lPF5MvR2H41WooAc7lzuY5J702iigD // Z”, “DISPLAY_URL”: “https://instagram.fhel3-1.fna.fbcdn.net/vp/68311f4b09669fd75609e9fcabbf1ae0/5D0517DE/t51.2885-15/e35/49907137_294327238101721_6745007497573009307_n.jpg?_nc_ht=instagram.fhel3-1.fna.fbcdn.net”, “display_resources”:[{ “SRC”: “https://instagram.fhel3-1.fna.fbcdn.net/vp/68311f4b09669fd75609e9fcabbf1ae0/5D0517DE/t51.2885-15/e35/49907137_294327238101721_6745007497573009307_n.jpg?_nc_ht=instagram.fhel3-1.fna.fbcdn.net”, “config_width”:640, “config_height”:640},{ “SRC”: “https://instagram.fhel3-1.fna.fbcdn.net/vp/68311f4b09669fd75609e9fcabbf1ae0/5D0517DE/t51.2885-15/e35/49907137_294327238101721_6745007497573009307_n.jpg?_nc_ht=instagram.fhel3-1.fna.fbcdn.net”, “config_width”:750, “config_height”:750},{ “SRC”: “https://instagram.fhel3-1.fna.fbcdn.net/vp/68311f4b09669fd75609e9fcabbf1ae0/5D0517DE/t51.2885-15/e35/49907137_294327238101721_6745007497573009307_n.jpg?_nc_ht=instagram.fhel3-1.fna.fbcdn.net”, “config_width”:1080, “config_height”:1080}], “accessibility_caption”:“图像可能包含:一个或多个人和特写“,”is_video“:false,”should_lo g_client_event “:假” tracking_token “:” eyJ2ZXJzaW9uIjo1LCJwYXlsb2FkIjp7ImlzX2FuYWx5dGljc190cmFja2VkIjp0cnVlLCJ1dWlkIjoiN2Q1Yjg2NmY5OGIwNDVhNWIxMmRhNjEwZTA3NDY1MmYxOTY4NzQ3NDkzNjU5MzUwODgzIn0sInNpZ25hdHVyZSI6IiJ9" , “edge_media_to_tagged_user”:{ “边缘”:[]}, “edge_media_to_caption”:{ “边缘”:[{ “节点”:{ “文本”:“\ U2022 \ nScars展示你的故事。 \ n你的痛苦。 \ n你讨厌。\ n你的悲伤和绝望。 \ n他们会让你成为一个拥有者,并拥有各种不同标志的人。 \ n有些停留,有些走了。\ n有些更亮,有些更轻。\ n有些更大,有些更小。\ n更深,有些表面。 \ n但是他们真的都是一样的,你看到了吗?\ n他们都是伤疤,只是讲述了我们生活的不同点,我们的故事。 \ n我们整个生命中的纪念品,向我们展示了我们已经成长的多少。 \ n我们已经克服了多少。我们变得多么强大。\ n我们勇敢而勇敢地成为我们生命中最艰难和最黑暗的时代。 \ u \ u2022 \ n \ u2022 \ n \ u2022 \ n#poem #cuts #selfharm #tatoo #dark #pain #sad #lonely #anxiety #depressed“}}]},”caption_is_edited“:true,”has_ranked_comments “:假” edge_media_to_comment “:{” 计数 “:1,” page_info “:{” has_next_page “:假” end_cursor “:空},” 边缘 “:[]},” comments_disabled “:假” taken_at_timestamp” :1548913011, “edge_media_preview_like”:{ “计数”:17, “边缘”:[]}, “edge_media_to_sponsor_user”:{ “边缘”:[]}, “位置”:NULL, “viewer_has_liked”:假 “viewer_has_saved” :假的, “viewer_has_saved_to_collection”:假的, “viewer_in_photo_of_you”:假的, “viewer_can_reshare”:真正的 “主人”:{ “ID”: “10173498181”, “is_verified”:假的, “profile_pic_url”: “https://instagram.fhel3-1.fna.fbcdn.net/vp/9a17134e8d0a36efec53f1da5cac1f38/5D14BC0F/t51.2885-19/s150x150/47690762_475199173011446_4764198224049209344_n.jpg?_nc_ht=instagram.fhel3-1.fna.fbcdn.net”,“用户名“:” devils..tea “” blocked_by_viewer。 “:假的,” followed_by_viewer “:假的,” FULL_NAME “:” 郁闷\ ud83e \ udd40" , “has_blocked_viewer”:假的, “is_private”:假的, “is_unpublished”:假 “requested_by_viewer”:假} ......

有“用户名”部分(在blockquote的末尾)。我认为这是一个字符串,但我无法理解它。所以它不是一个字符串,但它是什么?这是一堂课?我应该使用哪种方法来检索用户名“username”:“devils..tea。”。如果您能提供帮助,请提前感谢您。

....
req = requests.get(url)
soup = BeautifulSoup(req.text, "lxml")
data = soup.find_all('script') [3]
username = data.find_all_next(string="username")
print (username)
html python-3.x web-scraping beautifulsoup instagram
2个回答
1
投票

或者,对于我们这些不喜欢正则表达式(nudge,nudge @QHarr:D)的人,你可以试试这个:

data = [your quote above]
data_list = data.split(",")
for i in data_list:
   if 'username' in i:
       print(i)

输出:

"username":"devils..tea."

2
投票

你可以使用正则表达式

import re
data = '''
(script type="text/javascript">window._sharedData = {"config":{"csrf_token":"hIuZDxW17bTXz5EDLY25ftqivOOrLEeZ","viewer":null,"viewerId":null},"supports_es6":false,"country_code":"RU","language_code":"en","locale":"en_US","entry_data":{"PostPage":[{"graphql":{"shortcode_media":{"__typename":"GraphImage","id":"1968747493659350883","shortcode":"BtSZWokAZdj","dimensions":{"height":640,"width":640},"gating_info":null,"media_preview":"ACoq5miitSxxIGTHPXPGcd8ZFAGXRXSSWypFsAAZ/lzjpn/Csm5sjAu7Ib8MUAUaKU0lABVq0lMUqsPUA/Q8VVpynBB9CKAOtuOFB9CD+uP5Gq19HuiOPTP5Ul1exhdgy7kdF7fU/wCGatJiRPqv5ZFIZybnP4UynOpUlT1HFNpiClDFeRSUUATLcSJ904+lPF5MvR2H41WooAc7lzuY5J702iigD//Z","display_url":"https://instagram.fhel3-1.fna.fbcdn.net/vp/68311f4b09669fd75609e9fcabbf1ae0/5D0517DE/t51.2885-15/e35/49907137_294327238101721_6745007497573009307_n.jpg?_nc_ht=instagram.fhel3-1.fna.fbcdn.net","display_resources":[{"src":"https://instagram.fhel3-1.fna.fbcdn.net/vp/68311f4b09669fd75609e9fcabbf1ae0/5D0517DE/t51.2885-15/e35/49907137_294327238101721_6745007497573009307_n.jpg?_nc_ht=instagram.fhel3-1.fna.fbcdn.net","config_width":640,"config_height":640},{"src":"https://instagram.fhel3-1.fna.fbcdn.net/vp/68311f4b09669fd75609e9fcabbf1ae0/5D0517DE/t51.2885-15/e35/49907137_294327238101721_6745007497573009307_n.jpg?_nc_ht=instagram.fhel3-1.fna.fbcdn.net","config_width":750,"config_height":750},{"src":"https://instagram.fhel3-1.fna.fbcdn.net/vp/68311f4b09669fd75609e9fcabbf1ae0/5D0517DE/t51.2885-15/e35/49907137_294327238101721_6745007497573009307_n.jpg?_nc_ht=instagram.fhel3-1.fna.fbcdn.net","config_width":1080,"config_height":1080}],"accessibility_caption":"Image may contain: one or more people and closeup","is_video":false,"should_log_client_event":false,"tracking_token":"eyJ2ZXJzaW9uIjo1LCJwYXlsb2FkIjp7ImlzX2FuYWx5dGljc190cmFja2VkIjp0cnVlLCJ1dWlkIjoiN2Q1Yjg2NmY5OGIwNDVhNWIxMmRhNjEwZTA3NDY1MmYxOTY4NzQ3NDkzNjU5MzUwODgzIn0sInNpZ25hdHVyZSI6IiJ9","edge_media_to_tagged_user":{"edges":[]},"edge_media_to_caption":{"edges":[{"node":{"text":"\u2022\nScars show your story. \nYour pain. \nYour hate.\nYour sadness and despair. \nThey make you who you are, and one of a kind with every different mark. \nSome stay, some go.\nSome brighter, some lighter.\nSome bigger, some smaller.\nSome deeper, some one the surface. \nBut they are really all the same, you see?\nThey are all scars, just telling different points of our life, our story. \nOur souvenir throughout our whole life, that shows us how much we've grown. \nHow much we have overcome. How strong we've become.\nHow brave and courageous we've become from the hardest and darkest times of our life. \u2022\n\u2022\n\u2022\n\u2022\n#poem #cuts #selfharm #tatoo #dark #pain #sad #lonely #anxiety #depressed"}}]},"caption_is_edited":true,"has_ranked_comments":false,"edge_media_to_comment":{"count":1,"page_info":{"has_next_page":false,"end_cursor":null},"edges":[]},"comments_disabled":false,"taken_at_timestamp":1548913011,"edge_media_preview_like":{"count":17,"edges":[]},"edge_media_to_sponsor_user":{"edges":[]},"location":null,"viewer_has_liked":false,"viewer_has_saved":false,"viewer_has_saved_to_collection":false,"viewer_in_photo_of_you":false,"viewer_can_reshare":true,"owner":{"id":"10173498181","is_verified":false,"profile_pic_url":"https://instagram.fhel3-1.fna.fbcdn.net/vp/9a17134e8d0a36efec53f1da5cac1f38/5D14BC0F/t51.2885-19/s150x150/47690762_475199173011446_4764198224049209344_n.jpg?_nc_ht=instagram.fhel3-1.fna.fbcdn.net","username":"devils..tea.","blocked_by_viewer":false,"followed_by_viewer":false,"full_name":"depressed\ud83e\udd40","has_blocked_viewer":false,"is_private":false,"is_unpublished":false,"requested_by_viewer":false}......
'''

r = re.compile(r'username":"(.*)(?=","blocked)')
print(r.findall(data))
© www.soinside.com 2019 - 2024. All rights reserved.