我试图捕获所有元素中的所有“src”元素,但它永远不会返回像“/cdn/script.js”这样的网址,而只返回像“site.com/cdn/script.js”这样的完整网址,我如何启用这个?
def GetScriptArray():
ScriptElements = Driver.find_elements(By.TAG_NAME, 'script')
for x, Script in enumerate(ScriptElements, start=1):
ScriptSource = Script.get_attribute("src")
ScriptSourceAlt = Script.get_attribute("data-original-src")
if ScriptSource:
if ScriptSource.startswith("http"):
ScriptArray.append(ScriptSource)
elif ScriptSource.startswith("//"):
print("SPECIAL 1 : " + ScriptSource)
elif ScriptSource.startswith("/"):
print("SPECIAL 2 : " + ScriptSource)
else:
print("SCRIPT NUM " + str(x) + " HAS NO SRC")
上面的脚本输出以下内容(我正在测试
hugedomains.com/domain_profile.cfm?d=myecommercewebsite.com
):
DevTools listening on ws://127.0.0.1:60068/devtools/browser/a7437c3c-2acf-484f-9ec8-92c7fb9acca4
SCRIPT NUM 4 HAS NO SRC
SCRIPT NUM 5 HAS NO SRC
SCRIPT NUM 6 HAS NO SRC
SCRIPT NUM 7 HAS NO SRC
SCRIPT NUM 8 HAS NO SRC
SCRIPT NUM 9 HAS NO SRC
SCRIPT NUM 16 HAS NO SRC
SCRIPT NUM 17 HAS NO SRC
SCRIPT NUM 18 HAS NO SRC
[没有剪切网址的数组(无法共享,因为你无法发布 https://]
没有提供像“/cdn/script,js”这样的网址,只有完整的网址...........
我的假设是您没有让页面完全加载。我改变了方法,将
<script>
定位器更改为 script[src]
,以仅拉取具有 src 属性的标签,添加了等待,它对我来说工作得很好。
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
url = 'https://www.hugedomains.com/domain_profile.cfm?d=myecommercewebsite.com'
driver = webdriver.Chrome()
driver.maximize_window()
driver.get(url)
wait = WebDriverWait(driver, 10)
wait.until(EC.visibility_of_element_located((By.ID, "header")))
tags = driver.find_elements(By.CSS_SELECTOR, "script[src]")
for tag in tags:
print(tag.get_attribute("src"))
打印出来了
https://www.gstatic.com/recaptcha/releases/pPK749sccDmVW_9DSeTMVvh2/recaptcha__en.js
https://cdn-cookieyes.com/client_data/e71bc53f1cb88666d160c1e2/script.js
https://cdn-cookieyes.com/client_data/e71bc53f1cb88666d160c1e2/banner.js
https://www.google.com/recaptcha/enterprise.js?render=6LdRB9UiAAAAABaf3jRLyU_gwaGIp-3OvR51myRx
https://static.hugedomains.com/js/hdv3-js/jquery.min.js
https://static.hugedomains.com/js/hdv3-js/script.js?aa=2022-10-32
https://static.hugedomains.com/js/hdv3-js/common.js
https://static.hugedomains.com/js/hdv3-js/hd-js.js?a=20220124b
https://www.hugedomains.com/rjs/hdv3-rjs/hd-js.cfm?aa=2022-10-32