Selenium 和 Beautiful Soup 未能取消论坛中的评论数量

问题描述 投票:0回答:1

我是网络抓取和Python 的新手。对于我的项目,我正在尝试废弃评论的#。网站上每条评论旁边已经有一个数字。

我尝试通过类名查找元素,但我的列表如下所示:

[<span class="_3SqN3KZ8m8vCsD9FNcxcki _208tAU6LsyjP5LKTdcPXD0">#1</span>, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, <span class="_3SqN3KZ8m8vCsD9FNcxcki _208tAU6LsyjP5LKTdcPXD0">#25</span>]

这是我正在处理的代码:

URL="https://lihkg.com/thread/3717611/page/1"
num_list=[]
driver.get(URL)
sleep(2)

html = BeautifulSoup(driver.page_source, 'html.parser')
result_list = html.find_all('div', {'class': '_36ZEkSvpdj_igmog0nluzh'})
for result in result_list:
    num_list.append(result.find("span",{'class': "_3SqN3KZ8m8vCsD9FNcxcki _208tAU6LsyjP5LKTdcPXD0"}))

print(num_list)
python selenium-webdriver web-scraping beautifulsoup
1个回答
0
投票

只需对代码进行一点修改即可获得它(保存结果并从中获得

getText
)。条件是避免
NoneType
:

html = BeautifulSoup(driver.page_source, 'html.parser')
result_list = html.find_all('div', {'class': '_36ZEkSvpdj_igmog0nluzh'})
for result in result_list:
    element=result.find("span",{'class': "_3SqN3KZ8m8vCsD9FNcxcki _208tAU6LsyjP5LKTdcPXD0"})
    if element:
        num_list.append(element.getText(strip=True))
© www.soinside.com 2019 - 2024. All rights reserved.