我是网络抓取和Python 的新手。对于我的项目,我正在尝试废弃评论的#。网站上每条评论旁边已经有一个数字。
我尝试通过类名查找元素,但我的列表如下所示:
[<span class="_3SqN3KZ8m8vCsD9FNcxcki _208tAU6LsyjP5LKTdcPXD0">#1</span>, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, <span class="_3SqN3KZ8m8vCsD9FNcxcki _208tAU6LsyjP5LKTdcPXD0">#25</span>]
这是我正在处理的代码:
URL="https://lihkg.com/thread/3717611/page/1"
num_list=[]
driver.get(URL)
sleep(2)
html = BeautifulSoup(driver.page_source, 'html.parser')
result_list = html.find_all('div', {'class': '_36ZEkSvpdj_igmog0nluzh'})
for result in result_list:
num_list.append(result.find("span",{'class': "_3SqN3KZ8m8vCsD9FNcxcki _208tAU6LsyjP5LKTdcPXD0"}))
print(num_list)
只需对代码进行一点修改即可获得它(保存结果并从中获得
getText
)。条件是避免NoneType
:
html = BeautifulSoup(driver.page_source, 'html.parser')
result_list = html.find_all('div', {'class': '_36ZEkSvpdj_igmog0nluzh'})
for result in result_list:
element=result.find("span",{'class': "_3SqN3KZ8m8vCsD9FNcxcki _208tAU6LsyjP5LKTdcPXD0"})
if element:
num_list.append(element.getText(strip=True))