具体的一个href爬行由美丽的汤在python中。

Question

我正在努力学习beautifulsoup。

在网站中，同样的一个href，却有不同的结果。

例如，我的代码的结果是。

0001545654

6798

HI

0001459640

TX

0001269765

CA

0001456527

CA

0001001379

GA

我只想带数字

数字的URL=a href="cgi-binbrowse-edgar?action=getcompany&CIK=0001545654&owner=exclude&count=40&hidefilings=0">0001545654。

地区的网址=a href="cgi-binbrowse-edgar?action=getcompany&State=HI&owner=exclude&count=40&hidefilings=0">HI。

我想只带CIK!

有什么办法可以只带CIK（0001545654）吗？

from selenium import webdriver
from bs4 import BeautifulSoup
from urllib.request import urlopen

url = 'https://www.sec.gov/cgi-bin/browse-edgar?company=a&owner=exclude&action=getcompany'
page = BeautifulSoup(urlopen(url), 'html.parser')

CIK = page.find('table', 'tableFile2').find_all('a')

#print(CIK)
for i in CIK:
    print(i.get_text())

Answer 1

最简单的解决方案可能是过滤你的结果，这样只有有效的整数在里面。

CIK = [i for i in CIK if str(i.get_text()).isnumeric()]

或者，你可以改进你的BeautifulSoup解析法只得到每行的第一项。

CIK = [e.find_all('a')[0] for e in page.find('table', 'tableFile2').find_all('tr')]

具体的一个href爬行由美丽的汤在python中。

问题描述投票：0回答：1

1个回答

最新问题

具体的一个href爬行由美丽的汤在python中。

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1