如何抓取谷歌？

Question

所以我想抓取谷歌，我已经使用这种方法成功抓取了 craigslist，但由于某种原因我无法抓取谷歌（是的，当然我改变了类和东西..）这就是我想要抓取的：

我想抓取网站描述：

from selenium import webdriver

path = r"C:\Users\Skid\Desktop\chromedriver.exe"

driver = webdriver.Chrome(path)

driver.get("https://www.google.com/#q=python+webscape+google")

posts = driver.find_elements_by_class_name("r")
for post in posts:
    print(post.text)

Answer 1

已解决，在抓取之前添加一个计时器（导入时间，time.sleep(2)）。

Answer 2

您可以使用

BeautifulSoup

网络抓取库抓取 Google 搜索描述网站。

详细了解什么是 CSS 选择器，以及使用 CSS 选择器的缺点。

在在线IDE中检查代码。

from bs4 import BeautifulSoup
import requests, lxml, json

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36",
}

# https://docs.python-requests.org/en/master/user/quickstart/#passing-parameters-in-urls

# this URL params is taken from the actual Google search URL
# and transformed to a more readable format
params = {
  "q": "python web scrape google",            # query
  "gl": "us",                                 # country to search from
  "hl": "en",                                 # language
}

html = requests.get("https://www.google.com/search", headers=headers, params=params, timeout=30)
soup = BeautifulSoup(html.text, "lxml")

website_description_data = []

for result in soup.select(".tF2Cxc"):
  website_name = result.select_one(".yuRUbf a")["href"]
  description = result.select_one(".lEBKkf").text  

  website_description_data.append({
    "website_name" : website_name,
    "description" : description
  })

  print(json.dumps(website_description_data, indent=2))

输出示例

[
  {
    "website_name": "https://practicaldatascience.co.uk/data-science/how-to-scrape-google-search-results-using-python",
    "description": "Mar 13, 2021 \u2014 First, we're using urllib.parse.quote_plus() to URL encode our search query. This will add + characters where spaces sit and ensure that the\u00a0..."
  }
]
[
  {
    "website_name": "https://practicaldatascience.co.uk/data-science/how-to-scrape-google-search-results-using-python",
    "description": "Mar 13, 2021 \u2014 First, we're using urllib.parse.quote_plus() to URL encode our search query. This will add + characters where spaces sit and ensure that the\u00a0..."
  },
  {
    "website_name": "https://stackoverflow.com/questions/38619478/google-search-web-scraping-with-python",
    "description": "You can always directly scrape Google results. To do this, you can use the URL https://google.com/search?q=<Query> this will return the top\u00a0..."
  }
  # ...
]

如何抓取谷歌？

问题描述投票：0回答：2

2个回答

最新问题

如何抓取谷歌？

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2