在 Docker 中设置 Selenium Wire/Grid 使用代理时出错

问题描述 投票:0回答:1

我正在尝试将我的小型爬虫设置为 Docker 项目。我正在使用 Selenium Wire,因此我可以一次运行多个请求。但是,现在我想设置代理,但遇到了几个问题。

这是我的代码:

需求.txt

selenium==4.0.0
selenium-wire==5.1.0
blinker==1.7.0
setuptools==74.0.0
requests
fake_useragent==1.5.1

我的 Docker-Compose 文件: Docker-Compose

version: '3'
services:
  chrome:
    image: selenium/node-chrome:4.10.0
    depends_on:
      - selenium-hub
    environment:
      - SE_EVENT_BUS_HOST=selenium-hub
      - SE_EVENT_BUS_PUBLISH_PORT=4442
      - SE_EVENT_BUS_SUBSCRIBE_PORT=4443
      - SE_NODE_MAX_SESSIONS=10
    networks:
      - selenium-network

  selenium-hub:
    image: selenium/hub:4.10.0
    container_name: selenium-hub
    ports:
      - "4444:4444"
    networks:
      - selenium-network
    
  python-app:
    build:
      context: .
      dockerfile: Dockerfile.future
    depends_on:
      - selenium-hub
    networks:
      - selenium-network

networks:
  selenium-network:
    driver: bridge

Dockerfile.future

FROM python

WORKDIR /

COPY requirements.txt .
COPY test_with_futures.py .

RUN pip install -r requirements.txt

CMD ["python", "test_with_futures.py"]

和我的Python代码“Test_with_futures.py”

print("Docker gestartet") 
    import time 
    from seleniumwire import webdriver 
    from selenium.webdriver.chrome.options import Options 
    from selenium.webdriver.common.desired_capabilities import DesiredCapabilities 
    from selenium.webdriver.common.by import By 
    import concurrent 
    from fake_useragent import UserAgent 
    from random import random 
    from selenium.webdriver.support.ui import WebDriverWait 
    from selenium.webdriver.support import expected_conditions as EC
    
    def call_function(): 
try:
# PROXY SETTINGS 
PROXY = f"http://USER:[email protected]:20000"    
    
    desired_capabilities = DesiredCapabilities.CHROME.copy() # Necessary cause it wants t use Firefox.
        chrome_options = Options()
        chrome_options.add_argument('--no-sandbox')
        chrome_options.add_argument('--disable-dev-shm-usage')
        chrome_options.add_argument(f'--proxy-server={PROXY}') # If i dont use this, there Proxy will not be used, everthing will happen with my own IP. 
    
        # Setzen des Proxys in den Selenium Wire Optionen
        seleniumwire_options = {
            'auto_config': False,
            'proxy': {
                'http': PROXY,
                'https': PROXY
            }
        }
    
        driver = webdriver.Remote(
            command_executor="http://selenium-hub:4444/wd/hub",
            options=chrome_options, 
            seleniumwire_options=seleniumwire_options,
            desired_capabilities=desired_capabilities
        )
    
        print("----------------- AKTUELLER PROXY -------------------------------------------")
        print(driver.proxy)
        print("------------------------------------------------------------")
    
        driver.get("https://ip.smartproxy.com/json")
        wait = WebDriverWait(driver, 50)  # Warte bis zu 50 Sekunden
        pre_element = wait.until(EC.presence_of_element_located((By.XPATH, "/html/body/pre")))
        res = pre_element.text
        print("----------------- DRIVER -----------------------------------")
        print(res)
        print("------------------------------------------------------------")
        return "", "" # Not used at the moment
    
    except Exception as e: 
        print(e)
    
    finally: 
        driver.quit()
    
    with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
        futures = []
        for i in range(0, 5):
            print(f"Webseitenindex {i}")
            future = executor.submit(call_function)
            futures.append(future)time.sleep(1)
    
    for future in futures:
        ipv4_value, ipv6_value = future.result()  # Entpacken des Tupleprint(f"IPv4: {ipv4_value}, IPv6: {ipv6_value}")


Can someone help me please?

这是我收到的错误消息:

2024-10-07 11:41:07 Message: 
2024-10-07 11:41:07 Stacktrace:
2024-10-07 11:41:07 #0 0x55c1df1544e3 <unknown>
2024-10-07 11:41:07 #1 0x55c1dee83c76 <unknown>
2024-10-07 11:41:07 #2 0x55c1deebfc96 <unknown>
2024-10-07 11:41:07 #3 0x55c1deebfdc1 <unknown>
2024-10-07 11:41:07 #4 0x55c1deef97f4 <unknown>
2024-10-07 11:41:07 #5 0x55c1deedf03d <unknown>
2024-10-07 11:41:07 #6 0x55c1deef730e <unknown>
2024-10-07 11:41:07 #7 0x55c1deedede3 <unknown>
2024-10-07 11:41:07 #8 0x55c1deeb42dd <unknown>
2024-10-07 11:41:07 #9 0x55c1deeb534e <unknown>
2024-10-07 11:41:07 #10 0x55c1df1143e4 <unknown>
2024-10-07 11:41:07 #11 0x55c1df1183d7 <unknown>
2024-10-07 11:41:07 #12 0x55c1df122b20 <unknown>
2024-10-07 11:41:07 #13 0x55c1df119023 <unknown>
2024-10-07 11:41:07 #14 0x55c1df0e71aa <unknown>
2024-10-07 11:41:07 #15 0x55c1df13d6b8 <unknown>
2024-10-07 11:41:07 #16 0x55c1df13d847 <unknown>
2024-10-07 11:41:07 #17 0x55c1df14d243 <unknown>
2024-10-07 11:41:07 #18 0x7f358377d609 start_thread
python docker selenium-webdriver proxy web-crawler
1个回答
0
投票

我解决了这个问题。 Selenium 线不支持用户密码身份验证。我将其更改为白名单 ip 作为身份验证,并且成功了。

© www.soinside.com 2019 - 2024. All rights reserved.