我想安装最新的 xgboost nightly build。该文档表明可以在此处找到最新版本:https://s3-us-west-2.amazonaws.com/xgboost-nightly-builds/list.html?prefix=master/
获取最新版本的名称,然后可以按以下方式使用 pip(示例):
!pip install https://s3-us-west-2.amazonaws.com/xgboost-nightly-builds/master/xgboost-2.0.0.dev0%2B15ca12a77ebbaf76515291064c24d8c2268400fd-py3-none-manylinux2014_x86_64.whl
有没有办法以某种方式指定“最新的夜间构建”,而不必复制提交密钥?
鉴于页面源代码,很明显夜间构建列表是使用 JavaScript 动态填充的。要抓取由 JavaScript 填充的内容,您可以使用 Selenium 或任何其他执行 JavaScript 代码来生成完整 DOM 的无头浏览器方法。
首先,安装 Selenium 和兼容的 WebDriver,例如 Chrome WebDriver。
pip install selenium
从此处下载正确版本的 ChromeDriver 并使其可在您系统的
PATH
中访问。
然后,代码将尝试获取专门为
.whl
构建的最新 manylinux2014_x86_64
文件并安装它。
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
import subprocess
import time
# Initialize WebDriver
options = webdriver.ChromeOptions()
options.add_argument("--headless")
driver = webdriver.Chrome(options=options)
# Open the URL
driver.get("https://s3-us-west-2.amazonaws.com/xgboost-nightly-builds/list.html?prefix=master/")
# Wait for the list to load
try:
element_present = EC.presence_of_element_located((By.ID, 'listing'))
WebDriverWait(driver, 10).until(element_present)
except TimeoutException:
print("Timed out waiting for page to load")
time.sleep(5) # Further delay to ensure JavaScript has time to load
# Extract the URLs
links = driver.find_elements(By.TAG_NAME, 'a')
# Initialize a variable to store the latest build URL
latest_build_url = ""
# Iterate through links in reverse to find the most recent one that matches our criteria
for link in reversed([l.get_attribute('href') for l in links]):
if "py3-none-manylinux2014_x86_64.whl" in link:
latest_build_url = link
break
# Close the browser
driver.close()
# Install the latest build using pip
if latest_build_url:
subprocess.run(["pip", "install", latest_build_url])
else:
print("Latest build not found.")
注意:WebDriver 将等待最多 10 秒来加载列表。根据需要调整该值。
该方法应该允许您在 JavaScript 填充 URL 后获取 URL 列表。请注意,此方法还依赖于页面的当前结构,该结构可能会发生变化。