如何在Python中获取标头请求URL值？

Question

我尝试使用selenium、requests、beautifulsoup和curl从apkmirror下载APK文件，当我到达下载页面时我陷入困境，然后URL重定向，并且download.php请求出现在检查中网络中的 devtools 具有该文件的直接链接。之前之后

我尝试使用带 -L 标志的curl 将文件下载到以下网址：

curl -L "https://www.apkmirror.com/apk/google-inc/youtube/youtube-19-16-39-release/youtube-19-16-39-android-apk-download/download/?key=15b5cb3061082b309a0c30f1d2e410704e909596&forcebaseapk=true"

并且curl 获取一个HTML 页面，并且不遵循上图中显示的重定向URL。

Answer 1

这可以使用 python requests 库来完成。我们需要按照 4 个步骤从 apkmirror 下载并保存应用程序。

获取存在下载按钮的 apk 页面并获取下载链接（检查下载按钮以查看链接）。
现在，获取下载链接（就像单击下载按钮一样）以获取一些可用于下载实际 apk 文件的 id 和键值。
使用我们从之前的获取中提取的参数发出请求。
将响应保存到文件中。

import requests
from bs4 import BeautifulSoup
from lxml import etree

headers = {
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
    'accept-language': 'en-GB,en;q=0.9',
    'cache-control': 'no-cache',
    'pragma': 'no-cache',
    'priority': 'u=0, i',
    'sec-ch-ua': '"Google Chrome";v="125", "Chromium";v="125", "Not.A/Brand";v="24"',
    'sec-ch-ua-mobile': '?0',
    'sec-ch-ua-platform': '"Linux"',
    'sec-fetch-dest': 'document',
    'sec-fetch-mode': 'navigate',
    'sec-fetch-site': 'none',
    'sec-fetch-user': '?1',
    'upgrade-insecure-requests': '1',
    'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36',
}

apk_url = "https://www.apkmirror.com/apk/google-inc/youtube/youtube-19-16-39-release/youtube-19-16-39-android-apk-download/"

# Fetching Main APK page to extracting download link
print(f"Fetching APK URL: {apk_url}")
main_page_response = requests.get(
    apk_url,
    headers=headers,
)

main_page_soup = BeautifulSoup(main_page_response.content, "html.parser")
main_page_dom = etree.HTML(str(main_page_soup))
link = main_page_dom.xpath("//a[contains(@class,'downloadButton')]/@href")[0]
download_link = f"https://www.apkmirror.com{link}"

# Fetching download link to get the additional parameters required to download the actual app
print(f"Fetching Download Link: {download_link}")
download_page_response = requests.get(
    download_link,
    headers=headers,
)

download_page_soup = BeautifulSoup(download_page_response.content, "html.parser")
download_page_dom = etree.HTML(str(download_page_soup))
id_value = download_page_dom.xpath("//input[contains(@name,'id')]/@value")[0]
key_value = download_page_dom.xpath("//input[contains(@name,'key')]/@value")[0]

params = {
    'id': id_value,
    'key': key_value,
    'forcebaseapk': 'true',
}
headers["referer"] = download_link

# Making another request with the extracted parameters and headers to download the file
final_url = f"https://www.apkmirror.com/wp-content/themes/APKMirror/download.php?id={id_value}&key={key_value}&forcebaseapk=true"
print(f"Making final request to download the file: {final_url}")
print("Please wait for sometime. It depends on the apk file size and internet speed")
apk_response = requests.get(
    'https://www.apkmirror.com/wp-content/themes/APKMirror/download.php',
    params=params,
    headers=headers,
)

# Writing response to apk file (mention file path in place of youtube.apk)
print("Saving response to a apk file")
with open('youtube_final.apk', 'wb+') as f:
        f.write(apk_response.content)

print("File saved successfully")

如何在Python中获取标头请求URL值？

问题描述投票：0回答：1

1个回答

最新问题

如何在Python中获取标头请求URL值？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1