Playwright 在本地工作，但部署到 Google Run 后会失败

Question

我在本地开发并成功测试了一个 python 脚本，该脚本使用 Playwright 进行网页抓取。

TLDR：Python 脚本登录网站，转到不同的页面，单击几个链接，然后下载 CSV 文件，最终将被推送到 Google Big Query 表中，但目前，它只是返回到浏览器进行测试目的...

我创建了一个 dockerfile 并将容器推送到 Google Artifact Registry。最后我将该服务部署到 Google Run。

这里是main.py

from flask import Flask, request, abort, Response
from playwright.async_api import async_playwright
import asyncio
import pandas as pd
import io

app = Flask(__name__)

# Declare the username and password directly in the code
USERNAME = ""
PASSWORD = ""
DASHBOARD = ""

@app.route('/')
def home():
    return 'Flask is running! Visit /csvdata?url=<your-url> to capture and view the CSV data.'

@app.route('/csvdata')
def capture_and_display_csv():
    # Get the URL from the query string
    url = request.args.get('url')
    
    if not url:
        return abort(400, description="URL parameter is required")
    
    # Run the function to get the CSV data
    csv_data = asyncio.run(get_csv_data(url))
    
    # Convert DataFrame to HTML for display (or just return as plain text)
    csv_html = csv_data.to_html()  # Or use csv_data.to_string() for plain text
    
    return Response(csv_html, mimetype='text/html')

async def get_csv_data(url):
    async with async_playwright() as p:
        browser = await p.chromium.launch()
        page = await browser.new_page()
        page.set_default_timeout(600000)
        
        await page.goto(url, timeout=600000)
        
        # Enter the username and password
        await page.fill('#username', USERNAME)
        await page.fill('#password input', PASSWORD)
        
        # Click the login button
        await page.click('button[type="button"]')        
        await page.wait_for_selector('div.randomDOM"]', timeout=600000)   
        await page.goto(DASHBOARD, timeout=600000)

        await page.click('a.step1')
        await page.click('a.step2')
        await page.click('a.step3')
        await page.click('a.download_link')
        
        # Wait for the download to complete
        download = await page.wait_for_event('download', timeout=600000)
        
        # Save the file content to a variable
        csv_content = await download.path()
        
        # Read the CSV content into a pandas DataFrame
        df = pd.read_csv(csv_content)
        await browser.close()
        return df

if __name__ == "__main__":
    app.run(host='0.0.0.0', port=8080)

这是requirements.txt

Flask==2.3.2
playwright==1.39.0
pandas==2.0.3
numpy==1.25.2

根据问题，问题是一旦部署在 Google Run 上，脚本就会超时（以一种非常奇怪的方式，它最初似乎超时，然后进行第二次尝试也超时）。日志没有显示任何实际失败的内容，事实上，如下所示，日志表明运行已成功完成，并有 200 个文档响应：

我已将 Google Run 上的资源调高至 8GIG 内存、8CPUS 和 3600 请求超时（最大值），并且根据上面的代码，明确将页面超时和所有等待功能设置为 10 分钟 - 这些都没有产生效果一个结果。希望有人知道如何进行这项工作或有任何想法。

Answer 1

此处的解决方案是利用 Google Cloud Run Job 服务并将应用程序重新部署为独立功能，而不是获取请求应用程序。这对性能有 2 个积极影响：

无需烧瓶
脚本执行时间更长、更持久，因此超时问题更少

Playwright 在本地工作，但部署到 Google Run 后会失败

问题描述投票：0回答：1

1个回答

最新问题

Playwright 在本地工作，但部署到 Google Run 后会失败

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1