lst.append(float(n.text[-3:])) ValueError:无法将字符串转换为浮点数:''

问题描述 投票:0回答:1

我正在尝试编写 Prime 视频抓取代码,但收到此错误,并且我无法解决此错误(无法将字符串转换为浮点数):

[14500:11368:0328/150755.021:ERROR:device_event_log_impl.cc(214)] 
[15:07:55.019] USB: usb_device_handle_win.cc:1056 Failed to read descriptor 
from node connection: A device attached to the system is not functioning. 
(0x1F)
c:/Users/SAM/Amazon Prime Video Selenium Scraper/main.py:53: 
GuessedAtParserWarning: No parser was explicitly specified

所以我正在为这个系统使用最好的可用 HTML 解析器(“html.parser”)。这通常不是问题,但如果您在另一个系统或不同的虚拟环境中运行此代码,它可能会使用不同的解析器并表现不同

提前致谢

from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd
import matplotlib.pyplot as plt
from wordcloud import WordCloud

# CSS Variables
titleClass = "h1"
titleName = "_2IIDsE _3I-nQy"
ratingClass = "span"
ratingName = "Gpyvwj _1pG1w4 _1g4OLh _1tadIP _3YQFvK"
synopsisClass = "div"
synopsisName = "_1W5VSv"


storeFrontURL = "https://www.amazon.com/gp/video/storefront"
vidDownloadURL = "/gp/video/detail/"

videoLinks = []
titles = []
ratings = []
synopsis = []


def scrapeText(lst, classType, className):
    findClass = soup.find_all(classType, class_=className)
    if len(findClass) == 0:
        lst.append(None)
    else:
        for n in findClass:
            if className == ratingName:
                lst.append(float(n.text[-3:]))
            else:
                lst.append(n.text)

# Initialize Browser to be Control by Python


driver = webdriver.Chrome(
    executable_path="C:/Users/SAM/Downloads/chromedriver_win32/chromedriver.exe")
driver.get(storeFrontURL)

elems = driver.find_elements_by_xpath("//a[@href]")
for elem in elems:
    if vidDownloadURL in elem.get_attribute("href"):
        videoLinks.append(elem.get_attribute("href"))

videoLinks = list(dict.fromkeys(videoLinks))

for i in range(0, len(videoLinks)):
    driver.get(videoLinks[i])
    content = driver.page_source
    soup = BeautifulSoup(content)

    scrapeText(titles, titleClass, titleName)
    scrapeText(ratings, ratingClass, ratingName)
    scrapeText(synopsis, synopsisClass, synopsisName)

data = {'Titles': titles, 'Rating': ratings, 'Synopsis': synopsis}
df = pd.DataFrame(data)
df.to_csv('PrimeVid.csv', index=False, encoding='utf-8')


def wordcloud(dataframe, filename):

    if len(df) > 1:
        text = ' '.join(dataframe.Synopsis)
        wordcloud = WordCloud().generate(text)

        plt.imshow(wordcloud, interpolation='bilinear')
        plt.axis("off")

        plt.savefig(filename + ".png")


dfBelow6 = df.loc[(df['Rating'] < 6)]
dfBelow6 = df.loc[(df['Rating'] >= 6) & (df['Rating'] < 8)]
dfBelow6 = df.loc[(df['Rating'] >= 8)]

wordcloud(dfBelow6, "below6")
wordcloud(df6to7, "6to7")
wordcloud(dfAbove8, "above8")
python web-scraping
1个回答
0
投票

TheCodex.me?向阿维纳什·贾恩学习?我也是。

© www.soinside.com 2019 - 2024. All rights reserved.