如何抓取谷歌?

问题描述 投票:0回答:2

所以我想抓取谷歌,我已经使用这种方法成功抓取了 craigslist,但由于某种原因我无法抓取谷歌(是的,当然我改变了类和东西..)这就是我想要抓取的:

我想抓取网站描述:

image

from selenium import webdriver

path = r"C:\Users\Skid\Desktop\chromedriver.exe"

driver = webdriver.Chrome(path)

driver.get("https://www.google.com/#q=python+webscape+google")

posts = driver.find_elements_by_class_name("r")
for post in posts:
    print(post.text)
python python-3.x google-chrome web-scraping
2个回答
0
投票

已解决,在抓取之前添加一个计时器(导入时间,time.sleep(2))。


0
投票

您可以使用

BeautifulSoup
网络抓取库抓取 Google 搜索描述网站。

详细了解 什么是 CSS 选择器,以及 使用 CSS 选择器的缺点

在在线IDE中检查代码

from bs4 import BeautifulSoup
import requests, lxml, json

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36",
}

# https://docs.python-requests.org/en/master/user/quickstart/#passing-parameters-in-urls

# this URL params is taken from the actual Google search URL
# and transformed to a more readable format
params = {
  "q": "python web scrape google",            # query
  "gl": "us",                                 # country to search from
  "hl": "en",                                 # language
}

html = requests.get("https://www.google.com/search", headers=headers, params=params, timeout=30)
soup = BeautifulSoup(html.text, "lxml")

website_description_data = []

for result in soup.select(".tF2Cxc"):
  website_name = result.select_one(".yuRUbf a")["href"]
  description = result.select_one(".lEBKkf").text  

  website_description_data.append({
    "website_name" : website_name,
    "description" : description
  })

  print(json.dumps(website_description_data, indent=2))

输出示例

[
  {
    "website_name": "https://practicaldatascience.co.uk/data-science/how-to-scrape-google-search-results-using-python",
    "description": "Mar 13, 2021 \u2014 First, we're using urllib.parse.quote_plus() to URL encode our search query. This will add + characters where spaces sit and ensure that the\u00a0..."
  }
]
[
  {
    "website_name": "https://practicaldatascience.co.uk/data-science/how-to-scrape-google-search-results-using-python",
    "description": "Mar 13, 2021 \u2014 First, we're using urllib.parse.quote_plus() to URL encode our search query. This will add + characters where spaces sit and ensure that the\u00a0..."
  },
  {
    "website_name": "https://stackoverflow.com/questions/38619478/google-search-web-scraping-with-python",
    "description": "You can always directly scrape Google results. To do this, you can use the URL https://google.com/search?q=<Query> this will return the top\u00a0..."
  }
  # ...
]
© www.soinside.com 2019 - 2024. All rights reserved.