错误 python selenium:消息:会话未创建:无法处理来自未知错误的扩展#1:无法读取清单

问题描述 投票:0回答:1

我目前正在执行的网络抓取任务遇到问题。我尝试抓取的网页采用了 reCAPTCHA v3,这对我的抓取工作提出了重大挑战。 ReCAPTCHA v3 是一种广泛使用的防止自动访问网站的工具,在后台运行,无需任何用户交互,这使得绕过或克服特别具有挑战性。

这是项目结构。

enter image description here

def start_driver():
    try:
        
        url = 'https://antcpt.com/anticaptcha-plugin.zip'
        filehandle, _ = urllib.request.urlretrieve(url)
        with zipfile.ZipFile(filehandle, "r") as f:
            f.extractall("plugin")
        api_key = "MY API KEY"
        file = Path('./plugin/js/config_ac_api_key.js')
        file.write_text(file.read_text().replace("antiCapthaPredefinedApiKey = ''", "antiCapthaPredefinedApiKey = '{}'".format(api_key)))

        zip_file = zipfile.ZipFile('./plugin.zip', 'w', zipfile.ZIP_DEFLATED)
        for root, dirs, files in os.walk("./plugin"):
                for file in files:
                    path = os.path.join(root, file)
                    zip_file.write(path, arcname=path.replace("./plugin/", ""))
        zip_file.close()

        chrome_options = webdriver.ChromeOptions()
        chrome_options.add_extension('./plugin.zip')
        chrome_options.add_argument("--start-maximized")
        chrome_options.add_argument("--no-sandbox")
        chrome_options.add_argument("--disable-dev-shm-usage")
        driver = webdriver.Chrome(options=chrome_options)
        return driver
    except Exception as e:
        print("Error al iniciar el controlador:", e)
        return None

if __name__ == '__main__':
    driver = start_driver()

但我收到错误

Message: session not created: cannot process extension #1 from unknown error: cannot read manifest
python python-3.x selenium-webdriver
1个回答
0
投票

plugin.zip 文件创建插件的压缩文件夹。 您所需要的只是像这样的压缩结构

之前:plugin.zip -> 插件 -> [所有文件]

之后:plugin.zip -> [所有文件]

© www.soinside.com 2019 - 2024. All rights reserved.