从'div'中的'p'中提取文本

Question

我想要做的很简单，去https://www.reddit.com/new/，只提取前3个帖子的标题。我已经尝试在继续下一个2之前提取第一个的标题，但我一直遇到问题。非常感谢我能得到的任何帮助。

import urllib
from bs4 import BeautifulSoup
import requests


quote_page = 'https://www.reddit.com/r/new/'
page = urllib.urlopen(quote_page)
soup = BeautifulSoup(requests.get(quote_page).text, 'html.parser')
title_box = soup.find('div', {'class':'top-matter'})

title = title_box.text.strip()
print(title)

错误输出：

Traceback (most recent call last):
  File "/home/ad044/Desktop/sidebar stuff/123.py", line 13, in <module>
    title = title_box.text.strip()
AttributeError: 'NoneType' object has no attribute 'text'
[Finished in 1.8s with exit code 1]
[shell_cmd: python -u "/home/ad044/Desktop/sidebar stuff/123.py"]
[dir: /home/ad044/Desktop/sidebar stuff]
[path: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin]

Answer 1

页面使用javascript，所以你需要一个像selenium这样的方法，它允许渲染你感兴趣的元素。然后，您可以索引到列表中

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
url = 'https://www.reddit.com/new/'
driver = webdriver.Chrome()
driver.get(url)
data = WebDriverWait(driver,10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".kCqBrs")))[:3]
for item in data:
    print(item.text)

从'div'中的'p'中提取文本

问题描述投票：1回答：1

1个回答

最新问题

从'div'中的'p'中提取文本

问题描述 投票：1回答：1

1个回答

最新问题

问题描述投票：1回答：1