Selenium 仅捕获脚本 src 的完整 url

问题描述 投票:0回答:1

我试图捕获所有元素中的所有“src”元素,但它永远不会返回像“/cdn/script.js”这样的网址,而只返回像“site.com/cdn/script.js”这样的完整网址,我如何启用这个?

def GetScriptArray():
  ScriptElements = Driver.find_elements(By.TAG_NAME, 'script')
  for x, Script in enumerate(ScriptElements, start=1):
    ScriptSource = Script.get_attribute("src")
    ScriptSourceAlt = Script.get_attribute("data-original-src")
    if ScriptSource:
      if ScriptSource.startswith("http"):
        ScriptArray.append(ScriptSource)
      elif ScriptSource.startswith("//"):
        print("SPECIAL 1 : " + ScriptSource)
      elif ScriptSource.startswith("/"):
        print("SPECIAL 2 : " + ScriptSource)
    else:
      print("SCRIPT NUM " + str(x) + " HAS NO SRC")

上面的脚本输出以下内容(我正在测试

hugedomains.com/domain_profile.cfm?d=myecommercewebsite.com
):

DevTools listening on ws://127.0.0.1:60068/devtools/browser/a7437c3c-2acf-484f-9ec8-92c7fb9acca4
SCRIPT NUM 4 HAS NO SRC
SCRIPT NUM 5 HAS NO SRC
SCRIPT NUM 6 HAS NO SRC
SCRIPT NUM 7 HAS NO SRC
SCRIPT NUM 8 HAS NO SRC
SCRIPT NUM 9 HAS NO SRC
SCRIPT NUM 16 HAS NO SRC
SCRIPT NUM 17 HAS NO SRC
SCRIPT NUM 18 HAS NO SRC

[没有剪切网址的数组(无法共享,因为你无法发布 https://]

没有提供像“/cdn/script,js”这样的网址,只有完整的网址...........

python selenium-webdriver selenium-chromedriver
1个回答
0
投票

我的假设是您没有让页面完全加载。我改变了方法,将

<script>
定位器更改为
script[src]
,以仅拉取具有 src 属性的标签,添加了等待,它对我来说工作得很好。

from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

url = 'https://www.hugedomains.com/domain_profile.cfm?d=myecommercewebsite.com'
driver = webdriver.Chrome()
driver.maximize_window()
driver.get(url)

wait = WebDriverWait(driver, 10)
wait.until(EC.visibility_of_element_located((By.ID, "header")))
tags = driver.find_elements(By.CSS_SELECTOR, "script[src]")
for tag in tags:
    print(tag.get_attribute("src"))

打印出来了

https://www.gstatic.com/recaptcha/releases/pPK749sccDmVW_9DSeTMVvh2/recaptcha__en.js
https://cdn-cookieyes.com/client_data/e71bc53f1cb88666d160c1e2/script.js
https://cdn-cookieyes.com/client_data/e71bc53f1cb88666d160c1e2/banner.js
https://www.google.com/recaptcha/enterprise.js?render=6LdRB9UiAAAAABaf3jRLyU_gwaGIp-3OvR51myRx
https://static.hugedomains.com/js/hdv3-js/jquery.min.js
https://static.hugedomains.com/js/hdv3-js/script.js?aa=2022-10-32
https://static.hugedomains.com/js/hdv3-js/common.js
https://static.hugedomains.com/js/hdv3-js/hd-js.js?a=20220124b  
https://www.hugedomains.com/rjs/hdv3-rjs/hd-js.cfm?aa=2022-10-32
© www.soinside.com 2019 - 2024. All rights reserved.