我想使用自动浏览器并使用 jupyter 笔记本单元而不是使用 .py 脚本执行我的步骤。这与名为 selenium 的浏览器自动化库配合得很好。
它不能与名为 Playwright 的库一起正常工作。事实上它根本不起作用。我尝试了他们在手册中提供的每一行代码。 jupyter 笔记本上什么都不起作用。只要将相同的代码复制粘贴到某个 .py 文件中并执行它,一切都可以在我的机器上正常运行。 我正在谈论的各种示例可以在这里找到:https://playwright.dev/python/docs/intro
我真的不明白为什么我无法让它在 jupyter 笔记本中工作,特别是如果它在几乎每个 .py 文件中都工作正常的话。
编辑:显然它可以在 Mac 上运行,但我使用 Windows
下面的代码适用于 MacOS 和 Linux。
如https://github.com/microsoft/playwright-python/issues/480
中所述Jupyter Notebook 使用 asyncio 事件循环,因此您应该使用 async api。
from playwright.async_api import async_playwright
playwright = await async_playwright().start()
browser = await playwright.chromium.launch(headless = False)
page = await browser.new_page()
await page.goto("http://whatsmyuseragent.org/")
# await page.screenshot(path="example.png")
# await browser.close()
# await playwright.stop()
如果你使用
sync
API,它会抛出这样的错误:
from playwright.sync_api import sync_playwright
playwright = sync_playwright().start()
'''
Error: It looks like you are using Playwright Sync API inside the asyncio loop.
Please use the Async API instead.
'''
由于 Colab 笔记本是托管 Jupyter 笔记本,我建议使用以下解决方案在托管 Jupyter 实例中运行 playwright。
我只在我的 Google Colab 笔记本中进行了测试,尚未在本地托管的 Jupyter 实例中进行测试。
如果您在 Windows 上的 Jupyter 笔记本中运行 Playwright 时遇到问题,请尝试禁用内核的事件循环策略。
asyncio.set_event_loop_policy(WindowsSelectorEventLoopPolicy())
if sys.platform.startswith("win") and sys.version_info >= (3, 8):
import asyncio
try:
from asyncio import WindowsProactorEventLoopPolicy, WindowsSelectorEventLoopPolicy
except ImportError:
pass
# not affected
else:
if type(asyncio.get_event_loop_policy()) is WindowsProactorEventLoopPolicy:
# WindowsProactorEventLoopPolicy is not compatible with tornado 6
# fallback to the pre-3.8 default of Selector
# asyncio.set_event_loop_policy(WindowsSelectorEventLoopPolicy())
pass
如果你无法在jupyter笔记本中使用async API,你可以尝试为playwright创建一个虚拟环境:
在终端:
# create a virtual environment for playwright
python3 -m venv playwright_new
source ~/playwright_new/bin/activate
pip install playwright ipykernel requests
playwright install
然后,为jupyter笔记本创建内核链接:
source ~/playwright_new/bin/activate
# create kernel link for jupyter notebook
python -m ipykernel install --user --name playwright_new --display-name "playwright_new"
# in mac
ls /Users/xxx/Library/Jupyter/kernels/
tree /Users/xxx/Library/Jupyter/kernels/playwright_new
/Users/xxx/Library/Jupyter/kernels/playwright_new
├── kernel.json
├── logo-32x32.png
└── logo-64x64.png
# or in linux
tree /root/.local/share/jupyter/kernels
然后,再次运行Python代码。
from playwright.async_api import async_playwright
playwright = await async_playwright().start()
browser = await playwright.chromium.launch(headless = False)
page = await browser.new_page()
await page.goto("http://whatsmyuseragent.org/")
错误调试
if exec python code throws an Error:
Error: Executable doesn't exist at /Users/xxxx/Library/Caches/ms-playwright/chromium-1000/chrome-mac/Chromium.app/Contents/MacOS/Chromium
╔═════════════════════════════════════════════════════════════════════════╗
║ Looks like Playwright Test or Playwright was just installed or updated. ║
║ Please run the following command to download new browsers: ║
║ ║
║ playwright install ║
║ ║
║ <3 Playwright Team ║
╚═════════════════════════════════════════════════════════════════════════╝
在终端:
# you already install playwright
playwright install
cd /Users/xxxx/Library/Caches/ms-playwright
ls
chromium-978106/ ffmpeg-1007/ firefox-1319/ webkit-1616/
# but the folder ms-playwright/chromium-1000 NOT EXISTS
# COPY the exists chromium folder with a new name `chromium-1000`
cp -r chromium-978106 chromium-1000