使用 Python 从网站上抓取表格

Question

我尝试了几种适用于其他网站但不适用于此网址的方法。

https://www.wunderground.com/hourly/es/barcelona/IBARCE215/date/2022-07-25 日期（例如 2022-07-25）应该是将来的时间

我试过了

import requests
import lxml.html as lh
import pandas as pd
url = 'https://www.wunderground.com/hourly/es/barcelona/IBARCE215/date/2022-07-25'
page = requests.get(url)
doc = lh.fromstring(page.content)
tr_elements = doc.xpath('//tr')

但是 tr_elements 是空的它适用于 url = 'https://www.wunderground.com/dashboard/pws/ISANSA11/table/2021-11-30/2021-11-30/daily' url = 'http://pokemondb.net/pokedex/all' 但不是 url = 'https://www.wunderground.com/hourly/es/barcelona/IBARCE215/date/2022-07-25'

我也尝试过：

import requests
from bs4 import BeautifulSoup
import pandas as pd
url = 'https://www.wunderground.com/hourly/es/barcelona/IBARCE215/date/2022-07-20'
page = requests.get(url)
soup = BeautifulSoup(page.text, 'lxml')
table1 = soup.find('table', id='hourly-forecast-table')

但是没有找到表。它适用于： url = 'https://www.worldometers.info/coronavirus/' table1 = soup.find('table', id='main_table_countries_today')

在 Chrome 中，我使用“Ctrl + U”和“Ctrl + Shift + I”来查看 HTML 在 url = 'https://www.wunderground.com/hourly/es/barcelona/IBARCE215/date/2022-07-25' 中，我可以使用“Ctrl + Shift + I”看到 id='hourly-forecast-table'但不是“Ctrl + U”。我在 soup 变量的代码中看不到两者。在 url = 'https://www.worldometers.info/coronavirus/' 中，我看到 id='main_table_countries_today' 也使用“Ctrl + U” 我想这个网站有一些不同的东西。

非常感谢，

Answer 1

您是否尝试过将其与 Selenium 以及 Beautiful Soup 一起使用？获取 Selenium 和 Chromedriver，您可以使用它来使用 Selenium 的

send_key

功能来复制您使用的击键，例如“Ctrl+U”。

使用 Python 从网站上抓取表格

问题描述投票：0回答：1

1个回答

最新问题

使用 Python 从网站上抓取表格

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1