beautifulsoup 相关问题

Beautiful Soup是一个用于解析HTML / XML的Python包。此软件包的最新版本是版本4，导入为bs4。

为什么我的 Selenium 脚本只抓取 F O R E B E T 上 7 场比赛的数据？

我正在开发一个网络抓取项目，使用 Selenium 从体育预测网站抓取足球比赛数据（让我们使用 Examples 表示 F O R E B E T）。但是，我的脚本仅检索......的数据

python selenium-webdriver web-scraping beautifulsoup

回答 1 投票 0

美丽的汤和大html

我试图抓取一些像这样的大型维基百科页面。不幸的是，BeautifulSoup 无法处理如此大的内容，并且它会截断页面。

python html web-scraping beautifulsoup large-files

回答 2 投票 0

如何使用BeautifulSoup抓取html中的链接

我需要下载 html 中的几个链接。但我不需要全部，我只需要此网页上某些部分的少数。例如，在http://www.nytimes.com/roomfordebate/2014/09/24/prote...

python web-scraping beautifulsoup

回答 2 投票 0

在 BeautifulSoup 中使用嵌套元素的文本作为选择器

我正在寻找以下 HTML 结构： ID：547 类：foobar 继续... 我正在寻找以下 HTML 结构： ID:547 Class:foobar Procedures:lorem ipsum. dolor sit amet. ... Description:curabitur at orci posuere. massa nec fringilla. ... 我对使用 BeautifulSoup 不太有信心，也不太确定如何处理给定部分的标识符（id、类、过程和描述）嵌套在包含该部分内容的第一段内的事实。我正在尝试遵循以下原则： { 'id': 547, 'class': 'foobar', 'procedures': 'lorem ipsum. dolor sit amet.' 'description': 'curabitur at orci posuere. massa nec fringilla.' } 您可以使用 element.next_sibling 参考来获取 标签后面的文本。对于没有 p 标签的 strong 标签，您必须附加到最后处理的键。使用 Element.find_all() 方法选择所有 标签，循环并更新字典： mapping = {} key = None for item in soup.find_all('p'): if item.strong: key = item.strong.get_text(strip=True).rstrip(':') value = item.strong.next_sibling.strip() else: value = mapping[key] + ' ' + item.get_text(strip=True) mapping[key] = value 演示： >>> from bs4 import BeautifulSoup >>> soup = BeautifulSoup('''\ ... ID:547 ... Class:foobar ... Procedures:lorem ipsum. ... dolor sit amet. ... ... ... Description:curabitur at orci posuere. ... massa nec fringilla. ... ''') >>> mapping = {} >>> key = None >>> for item in soup.find_all('p'): ... if item.strong: ... key = item.strong.get_text(strip=True).rstrip(':') ... value = item.strong.next_sibling.strip() ... else: ... value = mapping[key] + ' ' + item.get_text(strip=True) ... mapping[key] = value ... >>> from pprint import pprint >>> pprint(mapping) {u'Class': u'foobar', u'Description': u'curabitur at orci posuere. massa nec fringilla.', u'ID': u'547', u'Procedures': u'lorem ipsum. dolor sit amet.'} 这不会将 ID 转换为整数；如果您强烈想要转换表示整数的字符串，则可以使用 try: value = int(value)、except ValueError: pass 组合。

python web-scraping beautifulsoup

回答 1 投票 0

BeautifulSoup webscrape，使用随机 html 类隔离特定标签

这里是网络抓取的新手。我已经成功地抓取了一个网站，但是我遇到了一个问题。在文章类中通常只有一个“p”标签，但有时是随机的......

python eclipse web-scraping beautifulsoup

回答 2 投票 0

尝试运行 beautifulsoup4，需要帮助或建议另一种解析网站数据的方法

我正确安装了python requests（只需将文件夹拖到Lib文件夹中），但是我去安装BeautifulSoup并不断收到以下错误。我尝试进行 pip 安装，我认为这是

parsing web-scraping beautifulsoup request

回答 1 投票 0

有没有办法选择将数据框上传到 CSV 的哪个单元格/列，就像将数据抓取到 Excel 文件中一样？

我有过将网页数据抓取到Excel文件中的经验，并且我知道将数据复制到Excel电子表格中时，您可以选择将其复制到哪一列。（startcol = 10 如图所示...

python excel pandas csv beautifulsoup

回答 1 投票 0

Python - 在本地保存请求或 BeautifulSoup 对象

我有一些代码很长，所以需要很长时间才能运行。我想简单地在本地保存请求对象（在本例中为“名称”）或 BeautifulSoup 对象（在本例中为“汤”）...

python file web-scraping beautifulsoup

回答 2 投票 0

美汤提取[已关闭]

我有一些简单的废话问题（1-3 一起做，4-6 一起做）。假设我的 HTML 结构如下：我有一些简单的废话问题（1-3 一起去，4-6 一起去）。假设我的 HTML 结构如下： <meta property="tall"/> <meta property="wide" content="spiral"/> <meta name="red"/> <meta name="tall"/> 如何找到 property 的所有实例？我怎样才能提取"tall"和"wide"？我该如何提取property？如何找到 "tall" 的所有实例？我怎样才能提取name和property 我该如何提取"tall"？我可以轻松做的是提取元的所有实例： soup1.find_all("meta") 但是，在那之后，我必须访问结果列表中的每个元素，然后我可以获得像 property 和 name 这样的东西。但我宁愿跳过这一步，直接获取 property 和 name 的所有实例（如果可能的话）。最后，如果我想使用requests.get从一个网站获取url，并且这是一个必须点击底部按钮才能加载更多内容的网站，而我想要额外的东西，怎么办我让这一切发生？ Beautiful soup 都是关于提取数据的，但这里有一些开始：这里 test.html 是您发布的内容。它有 try, catch block 的原因是，如果查找操作失败，则不会打印错误，而是不会打印任何内容。 from bs4 import BeautifulSoup soup = BeautifulSoup (open(r'd:\test.html','r')) #print soup.prettify() items = soup.findAll("meta") try: print "#How can I find all of the instances of property?" for all_prop in items: if all_prop['property']: print all_prop except: print "" try: print "#How can I then extract tall and wide?" for properties in items: print(properties['property']) except: print "" try: print "#all of the instances of tall" print soup.findAll('meta', attrs = {'property':'tall'}) print soup.findAll('meta', attrs = {'name':'tall'}) print "" except: print "" try: print "#How can I then extract tall?" for just_tall in items: if just_tall.get('property') == 'tall': print just_tall.get('property') if just_tall.get('name') == 'tall': print just_tall.get('name') except: print "" 输出： #How can I find all of the instances of property? <meta property="tall"/> <meta content="spiral" property="wide"/> #How can I then extract tall and wide? tall wide #all of the instances of tall [<meta property="tall"/>] [<meta name="tall"/>] #How can I then extract tall? tall tall 休息只是玩耍，但以上内容将帮助您开始。有些问题仍然不明确，所以我在上面举了一些例子来帮助你。教程和更多示例：文档链接我不是使用 BeautifulSoup 的专家，但我尝试了一下，这就是我的想法，希望足以让您入门。请注意，我可能有更优雅的解决方案。样板： from bs4 import BeautifulSoup import re a = """<meta property="tall"/> <meta property="wide" content="spiral"/> <meta name="red"/> <meta name="tall"/>""" soup = BeautifulSoup(a) 问题：我。 p = soup.findAll('meta', attrs = {"property":re.compile('.*')}) >> [<meta property="tall"/>, <meta content="spiral" property="wide"/>] 二. ex = [p[i]['property'] for i in range(len(p))] >> ['tall', 'wide'] 三．我不确定你的意思，也许已经涵盖了？四． alltall = soup.findAll('meta', attrs = {'name':'tall'}) alltall += (soup.findAll('meta', attrs = {'property':'tall'})) >> [<meta name="tall"/>, <meta property="tall"/>] V./VI。我花了一些时间进行搜索，但没有找到一种优雅的方法来做到这一点。也许我忽略了一些事情。

python web-scraping beautifulsoup

回答 2 投票 0

使用 beautifulsoup python 更改内部标签的文本

我想更改使用Beautifulsoup获得的HTML中标签的内部文本。例子：福变成：我想更改使用 Beautifulsoup 获得的 HTML 中标签的内部文本。示例： <a href="index.html" id="websiteName">Foo</a> 变成： <a href="index.html" id="websiteName">Bar</a> 我已成功通过其 id 获取标签： HTMLDocument.find(id='websiteName') 但是我无法更改标签的内部文本： print HTMLDocument.find(id='websiteName') a = HTMLDocument.find(id='websiteName') a = a.replaceWith('<a href="index.html" id="websiteName">Bar</a>') // I have tried using this as well a = a.replaceWith('Bar') print a 输出： <a href="index.html" id="websiteName">Foo</a> <a href="index.html" id="websiteName">Foo</a> 尝试更改字符串元素： HTMLDocument.find(id='websiteName').string.replace_with('Bar') from bs4 import BeautifulSoup as soup html = """ <a href="index.html" id="websiteName">Foo</a> """ soup = soup(html, 'lxml') result = soup.find(id='websiteName') print(result) # >>> <a href="index.html" id="websiteName">Foo</a> result.string.replace_with('Bar') print(result) # >>> <a href="index.html" id="websiteName">Bar</a>

python beautifulsoup

回答 1 投票 0

如何使用Python填写JavaScript表单？

我想用Python来填写这个表格。我尝试使用 Mechanize，但这是一个 Microsoft 表单，它使用 JavaScript，没有表单标签，也没有 GET/POST URL。也许 BeautifulSoup/Selenium 可以做到这一点，...

selenium web-scraping beautifulsoup scrapy mechanize

回答 1 投票 0

爬取前250部电影IDMb中的数据

拜托，我需要有人帮助我。我不明白为什么我只抓取 25 部电影而不是 250 部电影。我的代码：将 pandas 导入为 pd 导入请求从 bs4 导入 BeautifulSoup headers = {'用户代理': 'M...

python web-scraping beautifulsoup web-crawler

回答 1 投票 0

用python抓取谷歌结果统计[关闭]

我想从谷歌获取关键字的估计结果数。我使用 Python3.3 并尝试使用 BeautifulSoup 和 urllib.request 完成此任务。到目前为止这是我的简单代码 ...

python web-scraping beautifulsoup urllib2

回答 1 投票 0

如何修复我的代码，它返回一个空列表？

我正在抓取一个电子商务网站，它返回一个空列表这是我写的代码。导入请求从 bs4 导入 BeautifulSoup baseurl = 'https://www.thewhiskyexchange.com/' 标题 = {'

python web-scraping beautifulsoup python-requests

回答 1 投票 0

刮掉亚马逊 Wholefoods 导致 200 家没有

抓取代码时会导致以下错误： https://www.amazon.com:443“GET /s?k=Chicken&i=wholefoods&disableAutoscoping=true HTTP/1.1”200 无网址 = { '整个佛...

python flask web-scraping beautifulsoup

回答 1 投票 0

为什么当我不更改任何内容时，每次运行代码时我的列表都会不一致地打印为不同的长度？

我目前正在尝试使用 Pycharm 中的 BeautifulSoup 来抓取这个网站，以将所有文章从最多赞成票到最少赞成票进行排序：https://news.ycombinator.com/news 我已经解析成功了...

python beautifulsoup

回答 1 投票 0

网络抓取 SEC 文件

我正在从 SEC edgar 进行网络抓取 10Q 文档。这是网址链接：https://www.sec.gov/Archives/edgar/data/1652044/000165204419000032/goog10-qq32019.htm 如果我们检查它你会发现我

python html web-scraping beautifulsoup

回答 2 投票 0

Python：尝试通过电子邮件发送 href

下面的代码从 ESPN/college-football 中提取头条新闻。我可以抓住文章的标题和链接。我可以打印这两份文件，但我也想通过电子邮件发送它们。我可以得到...

python web-scraping beautifulsoup

回答 1 投票 0

Python：尝试通过电子邮件发送文章内容

下面的代码从 ESPN/college-football 中提取头条新闻。我进入文章本身并提取 p 内容，它们很好地打印到控制台，但我也想...

python web-scraping beautifulsoup

回答 1 投票 0

我如何在通过滚动而不是索引显示信息的网页中进行抓取？

我正在学习网页抓取，我正在尝试从显示滚动信息的页面获取数据，在这种情况下我能做什么？，是否有一个函数可以使整个页面加载？我正在使用se...

python selenium-webdriver web-scraping beautifulsoup selenium-chromedriver

回答 1 投票 0

beautifulsoup 相关问题

最新问题