我正在尝试抓取用户名数组的关注者数量。我正在使用 BeautifulSoup。
我使用的代码如下
import requests
from bs4 import BeautifulSoup
def instagram_followers(username):
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
}
response = requests.get(f'https://www.instagram.com/{username}/')
soup = BeautifulSoup(response.text, 'html.parser')
info = soup.find('meta', property='og:description')
if info:
followers = info['content'].split(" ")[0]
return followers
else:
return -1
该函数始终返回-1
Instagram 的网络内容很大程度上由 JavaScript 驱动。因此,无法使用 requests 和 BeautifulSoup 等标准方法从此页面提取文本。因此,最好使用 Selenium 来实现此目的。
下面提供了相应的Selenium代码:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
# Replace username with your Instagram account username
username = "instagram"
url = f"https://www.instagram.com/{username}/"
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
try:
driver.get(url)
driver.implicitly_wait(10)
meta_description = driver.find_element(By.XPATH, "//meta[@name='description']")
content = meta_description.get_attribute("content")
follower_data = content.split(",")[0]
followers = follower_data.split(" ")[0]
print(f"{username} has {followers} followers.")
finally:
driver.quit()
代码在您的问题焦点上运行良好,因此问题不可重现,无需任何其他信息。
检查以下内容:
response.status_code
作为第一个指标,您可以积极地刮擦,服务器将处理此问题。headers
,它们没有在你的代码中使用import requests
from bs4 import BeautifulSoup
def instagram_followers(username):
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
}
response = requests.get(f'https://www.instagram.com/{username}/', headers=headers)
if response.status_code == 200:
soup = BeautifulSoup(response.text, 'html.parser')
info = soup.find('meta', property='og:description')
if info:
followers = info['content'].split(" ")[0]
return followers
else:
return -1
else:
print('Something went wrong with your request: ' + response.status_code)
instagram_followers('thestackoverflow')
输出:
52K
您可以使用 Selenium 库和 Python.Tool 从 Instagram 抓取关注者数量。该工具可以动态加载内容,从而允许您访问由 Js 渲染的关注者数量。
设置步骤:
这种信息交换对于网络抓取的工作至关重要!第 1 步:安装必要的库您需要为您的浏览器(本例中为 Chrome )安装 Selenium 和 WebDriver。为此,只需运行 以下命令:
pip install selenium
pip install webdriver-manager
以下Python代码可用于抓取关注者数量:
给出了一个使用 Selenium 从 Instagram 用户的个人资料页面中抓取关注者 计数的示例。
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
def instagram_followers(username):
# Setting up Chrome options for headless browsing (no UI)
options = webdriver.ChromeOptions()
options.add_argument("--headless") # Run in headless mode (no browser window)
# Create WebDriver instance (automatically installs and sets up ChromeDriver)
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)
try:
# Navigate to Instagram user's page
driver.get(f'https://www.instagram.com/{username}/')
# Wait for the page to load (implicitly wait for 5 seconds)
driver.implicitly_wait(5)
# Extract follower count from meta tag (it will be in the og:description content)
meta_description = driver.find_element(By.XPATH, "//meta[@property='og:description']").get_attribute("content")
# The meta description contains the follower count in the format "X followers, Y following"
followers = meta_description.split(" ")[0]
return followers
except Exception as e:
print(f"Error: {e}")
return -1 # If the data can't be fetched, return -1
finally:
# Clean up by quitting the driver (close the browser session)
driver.quit()
# Example usage:
username = "instagram" # Replace with the username you want
followers = instagram_followers(username)
print(f"Followers of {username}: {followers}")
代码说明:
Selenium WebDriver:
webdriver. Chrome(): Open a Chrome browser instance.
driver. get(f'https://www.instagram.com/{username}/'): Will open the Instagram user page for the username passed.
driver. implicitly_wait(5): Waits for the page to load dynamically for a 5-second.
提取关注者数量:
关注者的数量在元标记内通过属性 og:description 设置。
从该标签中,我们提取内容属性并 拆分它,以便我们可以获得关注者计数。
以无头模式运行:
此代码以“无头”模式运行浏览器(这意味着您不会看到浏览器窗口打开)。这对于自动化很有用。
重要提示:
Instagram 限制:如果检测到抓取 ,Instagram 将限制或停止请求。处理错误 和速率限制
Webdriver InstallationWebdriver_manager downloads the appropriate chromedriver version for your OS.
此方法利用 Selenium 来获取动态内容,并且应该足以从 Instagram 个人资料中抓取关注者数量 。
改用API:
import requests
def get_follower_count(username):
headers = {
'x-ig-app-id': '936619743392459'
}
url = f'https://www.instagram.com/api/v1/users/web_profile_info/?username={username}'
response = requests.get(url, headers=headers)
return response.json()['data']['user']['edge_followed_by']['count']
print(get_follower_count('thestackoverflow'))