所以基本上我使用 Selenium 来抓取 Youtube 视频的评论。所以我需要获取作者姓名和他们的评论。但无论如何。我可以获取并打印出包含所有评论但不包含单个评论的元素。这就是我的:
wait = WebDriverWait(driver, 5)
driver.get("https://www.youtube.com/watch?v=vMtr0dE0jRo")
# Scroll to the bottom of the page to load comments
driver.execute_script("window.scrollTo(0, document.documentElement.scrollHeight);")
# You might need to adjust the range and sleep time depending on the number of comments
for _ in range(5): # Adjust the range according to the number of comments
driver.execute_script("window.scrollTo(0, document.documentElement.scrollHeight);")
time.sleep(1) # Adjust sleep time if necessary
for item in range(3):
wait.until(EC.visibility_of_all_elements_located((By.TAG_NAME, "body")))
time.sleep(2)
print("=====START CRAWLING DATA=====")
data = {};
comments = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "#sections #contents")))
print(len(comments))
for comment in comments:
# Author Name
author_element = comment.find_element(By.CSS_SELECTOR, "#header-author h3")
author_name = author_element.text.strip() # Remove leading/trailing whitespace
# Comment Text
comment_text_element = comment.find_element(By.ID, "content-text")
comment_text = comment_text_element.text.strip()
# Print author name and comment text
print("Author:", author_name)
print("Comment:", comment_text)
print()
print("Done")
但它得到了
NoSuchElementException Traceback (most recent call last)
Cell In[19], line 49
28 # children of element
29 # Function to find all children of an element recursively
30 # def find_all_children(element):
(...)
45 # find_all_children(child_element)
46 # find_all_children(comment)
47 for comment in comments:
48 # Author Name
---> 49 author_element = comment.find_element(By.CSS_SELECTOR, "#header-author h3")
50 author_name = author_element.text.strip() # Remove leading/trailing whitespace
52 # Comment Text
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\selenium\webdriver\remote\webelement.py:417, in WebElement.find_element(self, by, value)
414 by = By.CSS_SELECTOR
415 value = f'[name="{value}"]'
--> 417 return self._execute(Command.FIND_CHILD_ELEMENT, {"using": by, "value": value})["value"]
File ~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\selenium\webdriver\remote\webelement.py:395, in WebElement._execute(self, command, params)
393 params = {}
394 params["id"] = self._id
--> 395 return self._parent.execute(command, params)
...
(No symbol) [0x00007FF6388510C2]
(No symbol) [0x00007FF638841914]
BaseThreadInitThunk [0x00007FFDE4801FD7+23]
RtlUserThreadStart [0x00007FFDE541D7D0+32]
所以我希望这个程序可以打印出作者姓名和评论,如下所示:
作者:@Ian21344 评论:不错!
作者:@Daved 评论:看起来不错
这一行:
comments = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "#sections #contents")))
改变:
comments = wait.until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "#contents #comment")))
节点选择器选择不准确