为什么从 Instagram 抓取关注者计数失败?

问题描述 投票:0回答:4

我正在尝试抓取用户名数组的关注者数量。我正在使用 BeautifulSoup。

我使用的代码如下

import requests
from bs4 import BeautifulSoup

def instagram_followers(username):
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
    }
    response = requests.get(f'https://www.instagram.com/{username}/')
    soup = BeautifulSoup(response.text, 'html.parser')
    info = soup.find('meta', property='og:description')
    if info:
        followers = info['content'].split(" ")[0]
        return followers
    else:
        return -1

该函数始终返回-1

python web-scraping beautifulsoup python-requests instagram
4个回答
0
投票

Instagram 的网络内容很大程度上由 JavaScript 驱动。因此,无法使用 requests 和 BeautifulSoup 等标准方法从此页面提取文本。因此,最好使用 Selenium 来实现此目的。

下面提供了相应的Selenium代码:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

# Replace username with your Instagram account username
username = "instagram"
url = f"https://www.instagram.com/{username}/"
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))

try:
    driver.get(url)
    driver.implicitly_wait(10)
    meta_description = driver.find_element(By.XPATH, "//meta[@name='description']")
    content = meta_description.get_attribute("content")
    follower_data = content.split(",")[0]
    followers = follower_data.split(" ")[0]

    print(f"{username} has {followers} followers.")
finally:
    driver.quit()

0
投票

代码在您的问题焦点上运行良好,因此问题不可重现,无需任何其他信息。

检查以下内容:

  • response.status_code
    作为第一个指标,您可以积极地刮擦,服务器将处理此问题。
  • 还实现你的
    headers
    ,它们没有在你的代码中使用
import requests
from bs4 import BeautifulSoup

def instagram_followers(username):
    headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
    }
    response = requests.get(f'https://www.instagram.com/{username}/', headers=headers)
    if response.status_code == 200:
        soup = BeautifulSoup(response.text, 'html.parser')
        info = soup.find('meta', property='og:description')
        if info:
            followers = info['content'].split(" ")[0]
            return followers
        else:
            return -1
    else:
        print('Something went wrong with your request: ' + response.status_code)
    
instagram_followers('thestackoverflow')

输出:

52K

0
投票

您可以使用 Selenium 库和 Python.Tool 从 Instagram 抓取关注者数量。该工具可以动态加载内容,从而允许您访问由 Js 渲染的关注者数量。

设置步骤:

这种信息交换对于网络抓取的工作至关重要!第 1 步:安装必要的库您需要为您的浏览器(本例中为 Chrome )安装 Selenium 和 WebDriver。为此,只需运行 以下命令:

pip install selenium
pip install webdriver-manager

以下Python代码可用于抓取关注者数量:

给出了一个使用 Selenium 从 Instagram 用户的个人资料页面中抓取关注者 计数的示例。

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

def instagram_followers(username):
# Setting up Chrome options for headless browsing (no UI)
options = webdriver.ChromeOptions()
options.add_argument("--headless")  # Run in headless mode (no browser window)

# Create WebDriver instance (automatically installs and sets up ChromeDriver)
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)

try:
    # Navigate to Instagram user's page
    driver.get(f'https://www.instagram.com/{username}/')

    # Wait for the page to load (implicitly wait for 5 seconds)
    driver.implicitly_wait(5)

    # Extract follower count from meta tag (it will be in the og:description content)
    meta_description = driver.find_element(By.XPATH, "//meta[@property='og:description']").get_attribute("content")

    # The meta description contains the follower count in the format "X followers, Y following"
    followers = meta_description.split(" ")[0]
    return followers

except Exception as e:
    print(f"Error: {e}")
    return -1  # If the data can't be fetched, return -1

finally:
    # Clean up by quitting the driver (close the browser session)
    driver.quit()

# Example usage:
username = "instagram"  # Replace with the username you want
followers = instagram_followers(username)
print(f"Followers of {username}: {followers}")

代码说明:

Selenium WebDriver:

webdriver. Chrome(): Open a Chrome browser instance.

driver. get(f'https://www.instagram.com/{username}/'): Will open the Instagram user page for the username passed.

driver. implicitly_wait(5): Waits for the page to load dynamically for a 5-second.

提取关注者数量:

关注者的数量在元标记内通过属性 og:description 设置。

从该标签中,我们提取内容属性并 拆分它,以便我们可以获得关注者计数。

以无头模式运行:

此代码以“无头”模式运行浏览器(这意味着您不会看到浏览器窗口打开)。这对于自动化很有用。

重要提示:

Instagram 限制:如果检测到抓取 ,Instagram 将限制或停止请求。处理错误 和速率限制

Webdriver InstallationWebdriver_manager downloads the appropriate chromedriver version for your OS.

此方法利用 Selenium 来获取动态内容,并且应该足以从 Instagram 个人资料中抓取关注者数量 。


0
投票

改用API:

import requests

def get_follower_count(username):
    headers = {
        'x-ig-app-id': '936619743392459'
    }

    url = f'https://www.instagram.com/api/v1/users/web_profile_info/?username={username}'
    response = requests.get(url, headers=headers)
    
    return response.json()['data']['user']['edge_followed_by']['count']


print(get_follower_count('thestackoverflow'))
© www.soinside.com 2019 - 2024. All rights reserved.