from bs4 import BeautifulSoup as soup
import pandas as pd
import requests
url = 'http://ips.alliance-pipeline.com/Ips/MainPage.aspx?siteCd=ALLUSA-IPS&contentSysCd=USA-OP-AVAIL-BY-DAY&tvPath=55/112/56'
response = requets.get(url)
html = soup(response.context, 'html.parser')
loc = html.find_all('td', class_= 'ig162a1706')
tsq = html.find_all('td', class_ = 'ig162a170e')
for r in html.find_all('td', class_ = 'ig162a1706'):
loc = r.text
print(loc)
for a in html.find_all('td', class_ = 'ig162a170e'):
tsq = a.text
print(tsq)
输出:
ALLIANCE/ANR
ALLIANCE/ROSHOLT
AUX SABLE
BANTRY
BORDER USA
GUARDIAN
HANKINSON
HORIZON
LYLE
MIDWESTERN GAS TRANSMISSION
MILNOR
NATURAL GAS PIPELINE COMPANY OF AMERICA
NICOR/MORRIS
PEOPLES/ELWOOD
TIOGA
VECTOR PIPELINE
729,192
2,600
245,000
141,021
1,402,129
2,158
9,030
0
8,000
0
350
114,618
236,385
34,426
111,235
152,612
错误:
print(loc)
输出:
'VECTOR PIPELINE'
大家好,基本上,每当我在for循环之外执行print(loc)时,它仅打印'vector pipeline'
,而我对为什么会发生这种情况感到困惑。即使我在外部执行命令而不仅仅是“矢量管道”,它也不打印整个循环吗?不知道我在做什么错。输出是一个字符串。
您可以同时使用内置方法zip()
和有关名称和TSQ的“ tie”信息。
例如:
import requests
from bs4 import BeautifulSoup
url = 'http://ips.alliance-pipeline.com/Ips/MainPage.aspx?siteCd=ALLUSA-IPS&contentSysCd=USA-OP-AVAIL-BY-DAY&tvPath=55/112/56'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')
col1 = soup.select('td.ig162a1706')
col2 = soup.select('td.ig162a170e')
all_data = {}
for name, tsq in zip(col1, col2):
all_data[name.text] = tsq.text
# pretty print to screen:
from pprint import pprint
pprint(all_data)
打印:
{'ALLIANCE/ANR': '719,055',
'ALLIANCE/ROSHOLT': '2,600',
'AUX SABLE': '245,000',
'BANTRY': '141,021',
'BORDER USA': '1,402,129',
'GUARDIAN': '0',
'HANKINSON': '9,030',
'HORIZON': '0',
'LYLE': '8,000',
'MIDWESTERN GAS TRANSMISSION': '0',
'MILNOR': '350',
'NATURAL GAS PIPELINE COMPANY OF AMERICA': '114,618',
'NICOR/MORRIS': '236,949',
'PEOPLES/ELWOOD': '34,426',
'TIOGA': '111,235',
'VECTOR PIPELINE': '173,570'}
然后,您可以分别打印有关位置的信息,例如:
# print information only for 'TIOGA'
print(all_data['TIOGA'])
打印:
111,235