如何在 Python 中迭代表行？

Question

如何在 Python 中循环遍历 HTML 表格行？只是为了让大家知道，我正在网站上工作：https://schools.texastribune.org/districts/。我想做的是单击表格主体（？）中的每个链接并提取学生总数：

到目前为止我所拥有的：

response = requests.get("https://schools.texastribune.org/districts/")

soup = BeautifulSoup(response.text)

data = []

for a in soup.find_all('a', {'class': 'table table-striped'}):

    response = requests.get(a.get('href'))
    asoup = BeautifulSoup(response.text)
    data.append({
        'url': a.get('href'),
        'title': a.h2.get_text(strip=True),
        'content': asoup.article.get_text(strip=True)
    })

pd.DataFrame(data)

这是我第一次网络抓取东西。

Answer 1

查找

class_="td"

元素时不应该有

<td>

，它们没有任何类。

表中没有

<ul>

元素，因此

view = match.find('ul',class_="tr")

找不到任何内容。您需要找到

<a>

元素，获取其

href

，然后加载它以获得学生总数。

d = {}
for match in soup.find_all('td'):
    link = match.find("a")
    if link:
        school_page = requests.get("https://schools.texastribune.org" + link.href)
        school_soup = BeautifulSoup(school_page, 'lxml')
        total_div = school_soup.find("div", class_="metric", text="Total students"
        if total_div:
            amount = total_div.find("p", class_="metric-value")
            d[link.text] = amount.text

print(d)

如何在 Python 中迭代表行？

问题描述投票：0回答：1

1个回答

最新问题

如何在 Python 中迭代表行？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1