无法在AWS机器上的python中从selenium调用firefox

Question

我正在尝试使用 python 中的 selenium 来使用 javascript 抓取一些动态页面。但是，按照pypi页面（http://pypi.python.org/pypi/selenium）上selenium的说明进行操作后，我无法调用firefox。我在 AWS ubuntu 12.04 上安装了 Firefox。我收到的错误消息是：

In [1]: from selenium import webdriver

In [2]: br = webdriver.Firefox()
---------------------------------------------------------------------------
WebDriverException                        Traceback (most recent call last)
/home/ubuntu/<ipython-input-2-d6a5d754ea44> in <module>()
----> 1 br = webdriver.Firefox()

/usr/local/lib/python2.7/dist-packages/selenium/webdriver/firefox/webdriver.pyc in __init__(self, firefox_profile, firefox_binary, timeout)
     49         RemoteWebDriver.__init__(self,
     50             command_executor=ExtensionConnection("127.0.0.1", self.profile,
---> 51             self.binary, timeout),
     52             desired_capabilities=DesiredCapabilities.FIREFOX)
     53

/usr/local/lib/python2.7/dist-packages/selenium/webdriver/firefox/extension_connection.pyc in __init__(self, host, firefox_profile, firefox_binary, timeout)
     45         self.profile.add_extension()
     46
---> 47         self.binary.launch_browser(self.profile)
     48         _URL = "http://%s:%d/hub" % (HOST, PORT)
     49         RemoteConnection.__init__(

/usr/local/lib/python2.7/dist-packages/selenium/webdriver/firefox/firefox_binary.pyc in launch_browser(self, profile)
     42
     43         self._start_from_profile_path(self.profile.path)
---> 44         self._wait_until_connectable()
     45
     46     def kill(self):

/usr/local/lib/python2.7/dist-packages/selenium/webdriver/firefox/firefox_binary.pyc in _wait_until_connectable(self)
     79                 raise WebDriverException("The browser appears to have exited "
     80                       "before we could connect. The output was: %s" %
---> 81                       self._get_firefox_output())
     82             if count == 30:
     83                 self.kill()

WebDriverException: Message: 'The browser appears to have exited before we could connect. The output was: Error: no display specified\n'

我在网上搜索了一下，发现这个问题也发生在其他人身上（https://groups.google.com/forum/?fromgroups=#!topic/selenium-users/21sJrOJULZY）。但我不明白解决方案（如果是的话）。

有人可以帮助我吗？谢谢！

Answer 1

问题是 Firefox 需要显示器。我在示例中使用了 pyvirtualdisplay 来模拟显示。解决办法是：

from pyvirtualdisplay import Display
from selenium import webdriver

display = Display(visible=False, size=(1024, 768))
display.start()

driver= webdriver.Firefox()
driver.get("http://www.somewebsite.com/")

<---some code--->

#driver.close() # Close the current window.
driver.quit() # Quit the driver and close every associated window.
display.stop()

请注意，pyvirtualdisplay 需要以下后端之一：Xvfb、Xephyr、Xvnc。

这应该可以解决您的问题。

Answer 2

我也遇到过同样的问题。我使用的是 Firefox 47 和 Selenium 2.53。所以我所做的就是将 Firefox 降级到 45。这很有效。

1) 首先删除 Firefox 47 :

sudo apt-get purge firefox

2）检查可用版本：

apt-cache show firefox | grep Version

它将显示可用的 Firefox 版本，例如：

Version: 47.0+build3-0ubuntu0.16.04.1

Version: 45.0.2+build1-0ubuntu1

3）告诉要下载哪个版本

sudo apt-get install firefox=45.0.2+build1-0ubuntu1

4）接下来你就不能再升级到新版本了。

sudo apt-mark hold firefox

5) 如果您想稍后升级

sudo apt-mark unhold firefox

sudo apt-get upgrade

希望这有帮助。

Answer 3

这已经在OP问题的评论中，但将其作为答案列出。您可以让 Selenium 在后台运行，而无需打开实际的浏览器窗口。

例如，如果您使用 Chrome，请设置以下选项：

from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.set_headless(headless=True)

然后，当您调用网络驱动程序时，您的设置将成为参数：

browser = webdriver.Chrome(chrome_options=chrome_options)

Answer 4

对于 Debian 10 和 Ubuntu 18.04，这是一个完整的运行示例：

在 ~/Downloads 中下载 Chrome 驱动程序：

$ wget https://chromedriver.storage.googleapis.com/80.0.3987.16/chromedriver_linux64.zip

用
```
unzip chromedriver_linux64.zip
```
将文件移动到可执行文件夹（已有路径）：
```
$ sudo mv chromedriver /usr/local/bin
```

然后使用 Jupyter 在笔记本中或在脚本中运行此代码：

from selenium.webdriver import Chrome
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.set_headless(headless=True)


browser = Chrome(chrome_options=chrome_options)
browser.get('http://www.linkedin.com/')
print(browser.page_source)

这将打印页面中的整个源 HTML。

无法在AWS机器上的python中从selenium调用firefox

问题描述投票：0回答：4

4个回答

最新问题

无法在AWS机器上的python中从selenium调用firefox

问题描述 投票：0回答：4

4个回答

最新问题

问题描述投票：0回答：4