获取Instagram帖子的喜爱者列表 - Python和Selenium

Question

我正在接受网络抓取培训。为此，我挑战自己获取所有喜欢Instagram帖子的人员名单。我的问题是，我只能得到喜欢的前11个用户名。在找到喜欢的内容时，我无法找到自动化滚动过程的正确方法。

这是我在Jupyter Notebook中的过程（它不能用作脚本）：

from selenium import webdriver
import pandas as pd

driver = webdriver.Chrome()

driver.get('https://www.instagram.com/p/BuE82VfHRa6/')

userid_element = driver.find_elements_by_xpath('//*[@id="react-root"]/section/main/div/div/article/div[2]/section[2]/div/div/a')[0].click()

elems = driver.find_elements_by_xpath("//*[@id]/div/a")

users = []

for elem in elems:
    users.append(elem.get_attribute('title'))

print(users)

你们有什么想法吗？

非常感谢

Answer 1

我猜Instagram网站使用最喜欢的用户元素17。所以，这是一个循环

从网上获取元素列表
保存到我的清单
向下滚动以获取新元素
检查，这是最后一个滚动元素？

driver.get('https://www.instagram.com/p/BuE82VfHRa6/')

userid_element = driver.find_elements_by_xpath('//*[@id="react-root"]/section/main/div/div/article/div[2]/section[2]/div/div/a')[0].click()
time.sleep(2)

# here, you can see user list you want.
# you have to scroll down to download more data from instagram server.
# loop until last element with users table view height value.

users = []

height = driver.find_element_by_xpath("/html/body/div[3]/div/div[2]/div/div").value_of_css_property("padding-top")
match = False
while match==False:
    lastHeight = height

    # step 1
    elements = driver.find_elements_by_xpath("//*[@id]/div/a")

    # step 2
    for element in elements:
        if element.get_attribute('title') not in users:
            users.append(element.get_attribute('title'))

    # step 3
    driver.execute_script("return arguments[0].scrollIntoView();", elements[-1])
    time.sleep(1)

    # step 4
    height = driver.find_element_by_xpath("/html/body/div[3]/div/div[2]/div/div").value_of_css_property("padding-top")
    if lastHeight==height:
        match = True

print(users)
print(len(users))
driver.quit()

我在接近100个喜欢的帖子中测试，并且它有效。

Answer 2

请尝试以下代码，让我知道这是否有效。

from selenium import webdriver
driver = webdriver.Chrome()

driver.get('https://www.instagram.com/p/BuE82VfHRa6/')

elems = driver.find_elements_by_xpath("//a[@class='FPmhX notranslate TlrDj']")

users = []

for elem in elems:
    users.append(elem.get_attribute('title'))
    print('Title : ' +elem.get_attribute('title'))

print(users)

输出： -

Title : kyliejenner
Title : saturdayshade28
Title : worldmeetzboy
Title : mrokon
Title : addieisaac
Title : addieisaac
Title : amber_doerksen
Title : amber_doerksen
Title : addieisaac
Title : zayn6117
Title : amber_doerksen
Title : amber_doerksen
Title : worldmeetzboy
Title : worldmeetzboy
Title : razvanpopic1301
Title : johanna.trmn
Title : johanna.trmn
Title : johanna.trmn
Title : americ.av
Title : gabriellcostta1.0
Title : gabriellcostta1.0
Title : gabriellcostta1.0
Title : worldmeetzboy
Title : enactusepi
Title : enactusepi
[u'kyliejenner', u'saturdayshade28', u'worldmeetzboy', u'mrokon', u'addieisaac', u'addieisaac', u'amber_doerksen', u'amber_doerksen', u'addieisaac', u'zayn6117', u'amber_doerksen', u'amber_doerksen', u'worldmeetzboy', u'worldmeetzboy', u'razvanpopic1301', u'johanna.trmn', u'johanna.trmn', u'johanna.trmn', u'americ.av', u'gabriellcostta1.0', u'gabriellcostta1.0', u'gabriellcostta1.0', u'worldmeetzboy', u'enactusepi', u'enactusepi']

Answer 3

我无法让代码在预测的答案中发布。因此，我在下面进行了调整，现在每个帖子我得到500个喜欢的人。

def get_post_likers(shortcode):
    chrome = ch.initialize()
    chrome.get('https://www.instagram.com/p/' + shortcode + '/')
    chrome.execute_script("window.scrollTo(0, 1080)") 
    url = "/p/" + shortcode + "/liked_by/"
    time.sleep(2)
    like_link = chrome.find_element_by_xpath('//a[@href="'+url+'"]')
    like_link.click()
    time.sleep(2)
    users = []
    pb = chrome.find_element_by_xpath("//div[@role = 'dialog']/div[2]/div[1]/div[1]").value_of_css_property("padding-bottom")
    match = False
    while match==False:
        lastHeight = pb

        # step 1
        elements = chrome.find_elements_by_xpath("//*[@id]/div/a")
        # step 2
        for element in elements:
            if element.get_attribute('title') not in users:
                users.append(element.get_attribute('title'))
        # step 3
        chrome.execute_script("return arguments[0].scrollIntoView();", elements[-1])
        time.sleep(1)
        # step 4
        pb = chrome.find_element_by_xpath("//div[@role = 'dialog']/div[2]/div[1]/div[1]").value_of_css_property("padding-bottom")
        if lastHeight==pb or len(users) >= 1500:
            match = True
    return users

获取Instagram帖子的喜爱者列表 - Python和Selenium

问题描述投票：1回答：3

3个回答

最新问题

获取Instagram帖子的喜 爱者列表 - Python和Selenium

问题描述 投票：1回答：3

3个回答

最新问题

获取Instagram帖子的喜爱者列表 - Python和Selenium

问题描述投票：1回答：3