我无法解析页面并获得链接Nokogiri

Question

我无法通过Nokogiri解析获得链接列表，https://chromedriver.storage.googleapis.com/index.html?path=79.0.3945.36/

我在做什么错？

links = Nokoiri::HTML('https://chromedriver.storage.googleapis.com/index.html?path=79.0.3945.36/')

或

links = Nokoiri::XML('https://chromedriver.storage.googleapis.com/index.html?path=79.0.3945.36/')

--->

#(Document:0x3fcdda1b988c {
  name = "document",
  children = [
    #(DTD:0x3fcdda1b5b24 { name = "html" }),
    #(Element:0x3fcdda1b46fc {
      name = "html",
      children = [
        #(Element:0x3fcdda1b0804 {
          name = "body",
          children = [
            #(Element:0x3fcdda1ac920 {
              name = "p",
              children = [ #(Text "https://chromedriver.storage.googleapis.com/index.html?path=79.0.3945.36/")]
              })]
          })]
      })]
  })

puts links.to_html

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><p>https://chromedriver.storage.googleapis.com/index.html?path=79.0.3945.36/</p></body></html>
=> nil

Answer 1

这无法正常工作，因为整个页面都是使用JavaScript创建的。该文档的主体仅包含一个脚本标签。打开页面源代码或查看原始响应，而不仅仅是在Web检查器/开发人员工具中查看呈现的DOM。

view-source:https://chromedriver.storage.googleapis.com/index.html?path=79.0.3945.36/

Nokogiri只是HTML解析器，而不是浏览器，因此无法运行JavaScript。尽管您可以使用像phantom.js这样的无头浏览器，但您可能只想查找提供所需数据的API。网页抓取工具通常是对任何问题的错误答案。

Answer 2

例如，我找到了一个更有趣的解决方案））：link_driver = Nokogiri::HTML(page.source).at('a:contains("mac")').values.join('')chromedriver_storage_page = 'https://chromedriver.storage.googleapis.com/'File.new('filename.zip', 'w') << URI.parse(chromedriver_storage+link).read

contains（“ mac”）可以更改contains（“ linux”）或contains（“ win”），没关系，选择任何版本的操作系统

和2解决方案-解析页面chromedriver.chromium.org并获取有关所有版本的信息。如果网站上的版本比我的要新，那么我用新行替换版本号进行下载chromedriver_storage = 'https://chromedriver.storage.googleapis.com/'chromedriver = '79.0.3945.36/'-使用Capybara并仅剪切版本zip = 'chromedriver_mac64.zip'link = chromedriver_storage+chromedriver+zip File.new('filename.zip', 'w') << URI.parse(link).read

事实证明，可以将无头模式的解析器插入crontab任务中，以更新当前浏览器的版本

我无法解析页面并获得链接Nokogiri

问题描述投票：0回答：2

2个回答

最新问题

我无法解析页面并获得链接Nokogiri

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2