我正在使用 BeautifulSoup 并且已经能够解析文档的其他部分,但无法让它识别此文本。我究竟做错了什么?这让我发疯 - 救命!
我正在尝试从以下 HTML 中提取显示为“2 of 22 open”的文本“2 of 22 open”:
22 名开放我的代码:
from urllib.request import urlopen, Request
from bs4 import BeautifulSoup
import ssl
import sqlite3
# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
ctype = list()
url = 'https://www.pcc.edu/schedule/spring/wld/wld101/?crn=20273&type=in-person'
html = urlopen(url, context=ctx).read()
soup = BeautifulSoup(html, 'html.parser')
# Return course title
h3 = soup.find_all('h3')
coursetitle = str(h3[0])
lenctitle = len(coursetitle)
ctitle = coursetitle[4:(lenctitle-5)]
print('coursetitle:',ctitle)
tbodys = soup.find_all('tbody')
for tbody in tbodys:
rows = tbody.find_all('tr')
for row in rows:
CRNs = row.find('th')
for CRN in CRNs:
# Return coursetype.
tds = row.select('td')
coursetype = str(tds[0])
lencoursetype = len(coursetype)
ctype = coursetype[4:(lencoursetype-5)]
print('coursetype:',ctype)
# Reture location.
location = tds[1]
locationtxt = location.get_text()
print('location:',locationtxt)
# Return course days/times.
coursedays = tds[2]
coursedaystxt = coursedays.get_text()
splitcdays = str(coursedaystxt)
print('daytime:',splitcdays)
classsize = tds[4]
print('classsize:',classsize)
csize = str(classsize)
print('csize:',csize)
classtxt = classsize.get_text()
print('classtxt:',classtxt)
print('\n\n\n')
运行它时我得到什么:
该数据通过 POST 请求合并到页面中,您可以在开发工具 - 网络中看到。
这是获取它的一种方法:
import requests
import json
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}
payload = {
'term': 202402,
'crn': 20273
}
r = requests.post('https://www.pcc.edu/schedule/capacity/', data=payload, headers=headers)
desired_result = r.json()['20273']['seat']
print(desired_result)
终端结果:
[2, 22]
可以在此处找到请求文档。