为了研究本福德定律,我正在尝试获取推荐 Instagram Reels 上的点赞列表。所以计划只是打开卷轴,获取点赞计数,滑动到下一个卷轴并重复,直到我有足够的数据。
我正在尝试使用 Selenium Webdriver 在 Python 中执行此操作:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
import pyautogui
import time
driver = webdriver.Chrome()
driver.get("https://www.instagram.com/") # open instagram
time.sleep(2)
driver.find_element(By.XPATH, '//button[text()="Allow all cookies"]').click() # allow cookies
time.sleep(3)
user_field = driver.find_element(By.NAME, "username") # enter username
user_field.send_keys("my_username")
user_field.send_keys(Keys.ENTER)
password_field = driver.find_element(By.NAME, "password") # enter password
password_field.send_keys("my_password")
password_field.send_keys(Keys.ENTER)
time.sleep(5)
driver.get('https://www.instagram.com/reels') # go to reels
for i in range(1, 5): # swipe 5 reels then just wait
time.sleep(5)
# get how many likes the reel has (this doesn't seem to update)
div_element = driver.find_element(By.CSS_SELECTOR, '.html-div.xe8uvvx.xdj266r.x11i5rnm.x1mh8g0r.xexx8yu.x4uap5.x18d9i69.xkhd6sd.x6s0dn4.x1ypdohk.x78zum5.xdt5ytf.xieb3on')
# get likes by xpath (doesn't work crashes program)
# div_element = driver.find_element(By.XPATH, '//*[@id="mount_0_0_vZ"]/div/div/div[2]/div/div/div[1]/div[1]/div[2]/section/main/div[2]/div[9]/div/div[2]/div[1]/div/div/div/span/span')
# get all the buttons (works but there is alot of other unnecessay data)
# div_element = driver.find_element(By.XPATH, '//div[@role="button"]')
like_text = div_element.text
print(like_text) # print out the likes
time.sleep(3)
pyautogui.press('down') # swipe to next reel
time.sleep(5000)
在代码中,我尝试访问图片中突出显示的跨度元素:
这个跨度块似乎与所有其他卷轴相同,但相似计数除外。
但是如果我运行代码,它无法更新类似计数,因此它输出:
165K
165K
165K
165K
我尝试通过各种方法(XPATH、CSS_SELECTOR、NAME、ID...)访问此元素,其中一些方法会崩溃,另一些方法则不返回任何内容。有什么想法要做什么吗?
您面临的问题可能是由于 Instagram 动态加载内容的方式造成的。您看到的初始 HTML 可能不包含后续卷轴更新的点赞计数。
使用显式等待 不要依赖固定的 time.sleep 延迟,而是使用 Selenium 的 WebDriverWait 和预期条件来等待更新类似计数元素。 您可以根据需要使用 ExpectedConditions.presence_of_element_ located 或 ExpectedConditions.text_to_be_present_in_element
演示伪代码:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
# ... Your Existing Code ...
wait = WebDriverWait(driver, 10) # Set a wait time of 10 seconds
like_element = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, '.html-div.xe8uvvx...)))
like_text = like_element.text
print(like_text)