我正在使用Python:3.7.1版本并使用此功能,我想对亚马逊网站上的I-Phone用户评论(或客户评论)进行网络报废(链接如下)。
当我尝试下面的代码然后它给我以下错误:
码:
# -*- coding: utf-8 -*-
#import the library used to query a website
import urllib.request
from bs4 import BeautifulSoup
#specify the url
scrap_link = "https://www.amazon.in/Apple-iPhone-Silver-64GB-Storage/dp/B0711T2L8K/ref=sr_1_1?s=electronics&ie=UTF8&qid=1548335262&sr=1-1&keywords=iphone+X"
wiki = "https://en.wikipedia.org/wiki/List_of_state_and_union_territory_capitals_in_India"
#Query the website and return the html to the variable 'page'
page = urllib.request.urlopen(scrap_link)
#page = urllib.request.urlopen(wiki)
print(page)
#Parse the html in the 'page' variable, and store it in Beautiful Soup format
soup = BeautifulSoup(page)
print(soup.prettify())
错误:
File "C:\Users\bsrivastava\AppData\Local\Continuum\anaconda3\lib\urllib\request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
HTTPError: Service Unavailable
注意:当我尝试废弃wiki链接(在代码中显示)时,它工作正常。
那么为什么我使用Amazon链接收到此错误,我该如何克服它?
此外,当我收到此客户评论数据时,我需要将其存储为结构化格式,如下所示。我该怎么做? (我对NLP是全新的,所以需要一些指导)
Structure:
a. Reviewer’s Name
b. Date of review
c. Color
d. Size
e. Verified Purchase (True or False)
f. Rating
g. Review Title
h. Review Description
NLP?你确定吗?
import requests
from bs4 import BeautifulSoup
scrap_link = "https://www.amazon.in/Apple-iPhone-Silver-64GB-Storage/dp/B0711T2L8K/ref=sr_1_1?s=electronics&ie=UTF8&qid=1548335262&sr=1-1&keywords=iphone+X"
req = requests.get(scrap_link)
soup = BeautifulSoup(req.content, 'html.parser')
container = soup.findAll('div', attrs={'class':'a-section review aok-relative'})
data = []
for x in container:
ReviewersName = x.find('span', attrs={'class':'a-profile-name'}).text
data.append({'ReviewersName':ReviewersName})
print(data)
#later save the dictionary to csv