Beautiful Soup是一个用于解析HTML / XML的Python包。此软件包的最新版本是版本4,导入为bs4。
用 Beautiful Soup 刮取多个页面 - 仅显示最后一页
我和我的朋友每年都会进行国际足联选秀,我一直在寻找一种快速更新球员信息的方法。 我想要的是从不同页面的玩家那里抓取一些信息。这些页面是: ...
Beautiful Soup 使用 Python 3 抓取多个 URL
下面的代码工作正常,但我需要抓取多个网址,而且我不知道如何...... 如果可以的话,从 CSV 文件中抓取 url 也很好... 基本上我正在尝试重定向...
使用 selenium 和 beautifulsoup 进行网页抓取的问题
我正在为我的大学项目创建一个价格比较网站。我试图从这个网站打印商品和价格 https://www.lotuss.com.my/en/category/fresh-product?sort=relevance:DESC 但是...
如何使用 python 进行网页抓取和访问 <script> bs4
我正在尝试从“https://www.deadstock.ca/products/adidas-futurepacer-grey-one”中抓取数据 我希望能够读取如下所示的变体数据: 窗户。</desc> <question vote="0"> <p>我正在尝试从“<a href="https://www.deadstock.ca/products/adidas-futurepacer-grey-one" rel="nofollow noreferrer">https://www.deadstock.ca/products/adidas-futurepacer-grey-one</a>”中抓取数据</p> <p>我希望能够读取如下所示的变体数据:</p> <pre><code><script> window.ShopifyAnalytics = window.ShopifyAnalytics || {}; window.ShopifyAnalytics.meta = window.ShopifyAnalytics.meta || {}; window.ShopifyAnalytics.meta.currency = 'CAD'; var meta = {"product":{"id":223724142613,"vendor":"Adidas","type":"Footwear - QS","variants":[{"id":3063231774741,"price":26000,"name":"adidas Futurepacer \/ Grey One - 8","public_title":"8","sku":"AQ0907-Grey One-8"}, {"id":3063231807509,"price":26000,"name":"adidas Futurepacer \/ Grey One - 8.5","public_title":"8.5","sku":"AQ0907-Grey One-8.5"}, {"id":3063231840277,"price":26000,"name":"adidas Futurepacer \/ Grey One - 9","public_title":"9","sku":"AQ0907-Grey One-9"}, {"id":3063231873045,"price":26000,"name":"adidas Futurepacer \/ Grey One - 9.5","public_title":"9.5","sku":"AQ0907-Grey One-9.5"}, {"id":3063231905813,"price":26000,"name":"adidas Futurepacer \/ Grey One - 10","public_title":"10","sku":"AQ0907-Grey One-10"}, {"id":3063231938581,"price":26000,"name":"adidas Futurepacer \/ Grey One - 10.5","public_title":"10.5","sku":"AQ0907-Grey One-10.5"}, {"id":3063231971349,"price":26000,"name":"adidas Futurepacer \/ Grey One - 11","public_title":"11","sku":"AQ0907-Grey One-11"}, {"id":3063232004117,"price":26000,"name":"adidas Futurepacer \/ Grey One - 12","public_title":"12","sku":"AQ0907-Grey One-12"}, {"id":3063232036885,"price":26000,"name":"adidas Futurepacer \/ Grey One - 13","public_title":"13","sku":"AQ0907-Grey One-13"}]},"page": {"pageType":"product","resourceType":"product","resourceId":223724142613}}; for (var attr in meta) { window.ShopifyAnalytics.meta[attr] = meta[attr]; } </script> </code></pre> <p>我认为我的目标不正确。我希望代码能够打印所有“id”:数字。到目前为止,这是我的代码,我对 bs4 还很陌生,但任何帮助将不胜感激。谢谢</p> <pre><code>import bs4 as bs import urllib.request import lxml link = urllib.request.urlopen ('https://www.deadstock.ca/products/adidas-futurepacer-grey-one').read() soup = bs.BeautifulSoup(link,'lxml') for variants in soup.find_all('script'): print (variants) </code></pre> <p>我在@Andrej Kesely 的回答上尝试了类似的方法。</p> <pre><code>for id in data['variants']: size = id['variants']['option1'] variantid = id['variants']['id'] print (size) print (variantid) </code></pre> <p>它回来时有一个键错误,我只是想让它显示所有 ID </p> </question> <answer tick="true" vote="0"> <p>对于此网站,您需要使用 <pre><code>id="ProductJson-product-template"</code></pre> 进行定位。它包含具有所有变体的 JSON:</p> <pre><code>import bs4 as bs import urllib.request import json link = urllib.request.urlopen('https://www.deadstock.ca/products/adidas-futurepacer-grey-one').read() soup = bs.BeautifulSoup(link,'lxml') variant = soup.find('script', id='ProductJson-product-template') data = json.loads(variant.text) print(json.dumps(data, indent=4, sort_keys=True)) </code></pre> <p>打印:</p> <pre><code>{ "available": true, "compare_at_price": null, "compare_at_price_max": 0, "compare_at_price_min": 0, "compare_at_price_varies": false, "content": "<p>adidas digs into its vault to redesign the 1984 Micropacer, a computerized shoe that was designed to track running statistics. Today, the adidas Futurepacer takes cues from the Micropacer, and the more recent NMD collection, to deliver a futuristic lifestyle runner. Featuring a premium leather upper, complete with a modernized lace cover, the adidas Futurepacer sits on a Boost midsole for lightweight cushioning and responsiveness. NMD inspired bumpers are found at the heel and forefoot for added style and stability.</p>\n<ul>\n<li>Premium leather upper</li>\n<li>Premium leather lace cover</li>\n<li>Subtle adidas branding</li>\n<li>Boost midsole</li>\n<li>adidas midsole plugs</li>\n<li>Grey One</li>\n</ul>\n<p>Product Code: AQ0907</p>", "created_at": "2018-03-05T14:23:57-08:00", "description": "<p>adidas digs into its vault to redesign the 1984 Micropacer, a computerized shoe that was designed to track running statistics. Today, the adidas Futurepacer takes cues from the Micropacer, and the more recent NMD collection, to deliver a futuristic lifestyle runner. Featuring a premium leather upper, complete with a modernized lace cover, the adidas Futurepacer sits on a Boost midsole for lightweight cushioning and responsiveness. NMD inspired bumpers are found at the heel and forefoot for added style and stability.</p>\n<ul>\n<li>Premium leather upper</li>\n<li>Premium leather lace cover</li>\n<li>Subtle adidas branding</li>\n<li>Boost midsole</li>\n<li>adidas midsole plugs</li>\n<li>Grey One</li>\n</ul>\n<p>Product Code: AQ0907</p>", "featured_image": "//cdn.shopify.com/s/files/1/0616/3517/products/aq0907_adidas_futurepacer_grey_one.jpg?v=1530313621", "handle": "adidas-futurepacer-grey-one", "id": 223724142613, "images": [ "//cdn.shopify.com/s/files/1/0616/3517/products/aq0907_adidas_futurepacer_grey_one.jpg?v=1530313621", "//cdn.shopify.com/s/files/1/0616/3517/products/aq0907_adidas_futurepacer_grey_one_1.jpg?v=1531247436", "//cdn.shopify.com/s/files/1/0616/3517/products/aq0907_adidas_futurepacer_grey_one_2.jpg?v=1531247441", "//cdn.shopify.com/s/files/1/0616/3517/products/aq0907_adidas_futurepacer_grey_one_3.jpg?v=1531247444", "//cdn.shopify.com/s/files/1/0616/3517/products/aq0907_adidas_futurepacer_grey_one_4.jpg?v=1531247447", "//cdn.shopify.com/s/files/1/0616/3517/products/aq0907_adidas_futurepacer_grey_one_5.jpg?v=1531247449" ], "options": [ "US Size" ], "price": 26000, "price_max": 26000, "price_min": 26000, "price_varies": false, "published_at": "2018-07-14T12:00:00-07:00", "tags": [ "07142018", "cf-type-footwear-qs", "cf-us-size-10", "cf-us-size-10-5", "cf-us-size-11", "cf-us-size-12", "cf-us-size-13", "cf-us-size-8", "cf-us-size-8-5", "cf-us-size-9", "cf-us-size-9-5", "cf-vendor-adidas", "free_shipping", "limit-quantity", "plsmerch" ], "title": "adidas Futurepacer / Grey One", "type": "Footwear - QS", "variants": [ { "available": true, "barcode": "193050061142", "compare_at_price": null, "featured_image": null, "id": 3063231774741, "inventory_management": "shopify", "inventory_policy": "deny", "inventory_quantity": 3, "name": "adidas Futurepacer / Grey One - 8", "option1": "8", "option2": null, "option3": null, "options": [ "8" ], "price": 26000, "public_title": "8", "requires_shipping": true, "sku": "AQ0907-Grey One-8", "taxable": true, "title": "8", "weight": 0 }, { "available": true, "barcode": "193050061159", "compare_at_price": null, "featured_image": null, "id": 3063231807509, "inventory_management": "shopify", "inventory_policy": "deny", "inventory_quantity": 2, "name": "adidas Futurepacer / Grey One - 8.5", "option1": "8.5", "option2": null, "option3": null, "options": [ "8.5" ], "price": 26000, "public_title": "8.5", "requires_shipping": true, "sku": "AQ0907-Grey One-8.5", "taxable": true, "title": "8.5", "weight": 0 }, { "available": true, "barcode": "193050061166", "compare_at_price": null, "featured_image": null, "id": 3063231840277, "inventory_management": "shopify", "inventory_policy": "deny", "inventory_quantity": 6, "name": "adidas Futurepacer / Grey One - 9", "option1": "9", "option2": null, "option3": null, "options": [ "9" ], "price": 26000, "public_title": "9", "requires_shipping": true, "sku": "AQ0907-Grey One-9", "taxable": true, "title": "9", "weight": 0 }, { "available": true, "barcode": "193050061173", "compare_at_price": null, "featured_image": null, "id": 3063231873045, "inventory_management": "shopify", "inventory_policy": "deny", "inventory_quantity": 5, "name": "adidas Futurepacer / Grey One - 9.5", "option1": "9.5", "option2": null, "option3": null, "options": [ "9.5" ], "price": 26000, "public_title": "9.5", "requires_shipping": true, "sku": "AQ0907-Grey One-9.5", "taxable": true, "title": "9.5", "weight": 0 }, { "available": true, "barcode": "193050061180", "compare_at_price": null, "featured_image": null, "id": 3063231905813, "inventory_management": "shopify", "inventory_policy": "deny", "inventory_quantity": 6, "name": "adidas Futurepacer / Grey One - 10", "option1": "10", "option2": null, "option3": null, "options": [ "10" ], "price": 26000, "public_title": "10", "requires_shipping": true, "sku": "AQ0907-Grey One-10", "taxable": true, "title": "10", "weight": 0 }, { "available": true, "barcode": "193050061197", "compare_at_price": null, "featured_image": null, "id": 3063231938581, "inventory_management": "shopify", "inventory_policy": "deny", "inventory_quantity": 6, "name": "adidas Futurepacer / Grey One - 10.5", "option1": "10.5", "option2": null, "option3": null, "options": [ "10.5" ], "price": 26000, "public_title": "10.5", "requires_shipping": true, "sku": "AQ0907-Grey One-10.5", "taxable": true, "title": "10.5", "weight": 0 }, { "available": true, "barcode": "193050061203", "compare_at_price": null, "featured_image": null, "id": 3063231971349, "inventory_management": "shopify", "inventory_policy": "deny", "inventory_quantity": 1, "name": "adidas Futurepacer / Grey One - 11", "option1": "11", "option2": null, "option3": null, "options": [ "11" ], "price": 26000, "public_title": "11", "requires_shipping": true, "sku": "AQ0907-Grey One-11", "taxable": true, "title": "11", "weight": 0 }, { "available": true, "barcode": "193050061210", "compare_at_price": null, "featured_image": null, "id": 3063232004117, "inventory_management": "shopify", "inventory_policy": "deny", "inventory_quantity": 4, "name": "adidas Futurepacer / Grey One - 12", "option1": "12", "option2": null, "option3": null, "options": [ "12" ], "price": 26000, "public_title": "12", "requires_shipping": true, "sku": "AQ0907-Grey One-12", "taxable": true, "title": "12", "weight": 0 }, { "available": true, "barcode": "193050061227", "compare_at_price": null, "featured_image": null, "id": 3063232036885, "inventory_management": "shopify", "inventory_policy": "deny", "inventory_quantity": 1, "name": "adidas Futurepacer / Grey One - 13", "option1": "13", "option2": null, "option3": null, "options": [ "13" ], "price": 26000, "public_title": "13", "requires_shipping": true, "sku": "AQ0907-Grey One-13", "taxable": true, "title": "13", "weight": 0 } ], "vendor": "Adidas" } </code></pre> </answer> </body></html>
我正在尝试抓取该网站 https://www.realtor.com/realestateandhomes-search/28083 这是我到目前为止写的代码 类客户端(QWebPage): def __init__(自身, url): 自我.a...
我是美丽汤的新手,不确定如何从该网站为每个州(新南威尔士州、维多利亚州、昆士兰州、南澳大利亚州)添加“解决”栏: https://www.asxenergy.com.au/
我有一个这样的项目列表:(项目列表的数量可能会有所不同) 我的标题 http://myurl.com 文字 我有一个这样的项目列表:(项目列表的数量可能会有所不同) <h3>My title</h3> <a href="http://myurl.com">http://myurl.com</a> <span class="t">text</span> <h3>My title</h3> <a href="http://myurl.com">http://myurl.com</a> <span class="t">text</span> ... 如何通过美丽的汤获得所有这些数据,以便我可以将所有这些数据放在一个列表中以获得如下结果: [{'title': h3, 'url': url, 'title': title}, {'title': h3, 'url': url, 'title': title}, ...] ? 谢谢你 您可以像这样迭代 HTML 内容(假设您的数据保存在 html_data 中): import bs4 soup = BeautifulSoup(html_data) my_list = [] for i in range(len(soup.body.contents), step=3): my_list.append({'title1': soup.body.contents[i], 'url': soup.body.contents[i+1], 'title2': soup.body.contents[i+2]}) 这当然仅在您的数据位于同一级别并且不以任何方式嵌套的前提下才有效。如果不是,那么您应该发布测试数据的有效块及其结构。
我想获取 HTML 页面上所有显示的文本,直到点击某个标签。例如,我想获取页面上所有显示的文本,直到 ID 为“end_con...
以下是脚本: 从 bs4 导入 BeautifulSoup as bs4 导入请求 导入 json 从 lxml 导入 html 从 pprint 导入 pprint 进口再 def get_data(): url = 'https://sports.bovada.lv//
我尝试通过更改网址中的公里数来从网站上抓取数据。但问题是,每辆车都有其最大公里数,在本例中假设为 900 公里。因为我不知道它的最大值,所以我只是看...
我正在尝试抓取网页的下一页。总共有20页。我想使用第一页的网址来抓取下一页。 代码: b=[] url =“https://abcde.com/cate6-%E7%BE%8E%E5%A6%9D%E...
从下面的代码:我只得到了1行数据 url = 'http://investmentmoats.com/DividendScreener/DividendScreener.php' res = requests.get(url) 汤 = BeautifulSoup(res.content,'lxml') t...
我成功地抓取了网站的第一页,但是当我尝试抓取多个页面时,它起作用了,但结果完全错误。 代码: 导入请求 从 bs4 导入 BeautifulSoup 来自
以此页面为例: https://quizlet.com/229413256/chapter-6-configuring-networking-flash-cards/ 假设一个人如何从抽认卡后面刮掉文字答案?隐藏了...
有没有办法使用BeautifulSoup从网页中提取CSS?
我正在做一个项目,需要我查看网页,但要进一步使用HTML,我必须完整地查看它,而不是一堆与图片混合的线条。有没有办法解析 CSS
在Python中使用BeautifulSoup,如何将数据分隔到下一个<h3>标签?
问题有点令人困惑,但我希望下面的例子能够澄清事情。 我尝试从页面中的一个(也是唯一的) 标签中抓取数据。 问题1:该div中的所有数据都放入... 问题有点令人困惑,但我希望下面的例子能够澄清事情。 我尝试从页面中的一个(且唯一)<div> 标签中抓取数据。 问题1:该div中的所有数据都放在一起,仅通过<h3>标签分隔。 问题2:<p>标签之后的<h3>标签数量可变,并且h3标签可以是Title1或Title2。 我如何解析该 div 元素并将所有数据拆分为某种数组/字典结构,该结构仅包含 h3 标签和所有 p 标签,直到下一个 h3 标签 ? 图片说明了一切。 到目前为止我的代码(可以工作并抓取数据): links = soup.find('div', class_='DIV I WANT') for p in links.find_all(['p', 'h3']): print p.text.strip() 编辑: 添加了完整的 html。是的,确实是这样写的。数据明显略有变化: <div class="table"> <p> List of actors. </p> <p> date 25.09.2017 </p> <h3> Actor </h3> <p> Office <br> Address 1 8, 100 City 15 <br> Address 2 250, 200 City 15 </p> <p> 08h00-12h30 13h15-16h45<br>08h00-12h30 13h00-15h00 </p> <p> <a href="http://www.example.com" target="_blank" title="Actors" class="fonticon">www.example.com<span data-icon="l"></span></a> </p> <p> [email protected] <br> </p> <p> 012/123 45 67 <br> </p> <p> telefax 123/123 45 67 </p> <h3> Actress </h3> <p> Personal address <br> Address 7, 20 City 2 <br> Address 5, 30 City 2 </p> <p> 8h15-12h30 13h30-16h30(lu-ma-je)16h45 me<br>8h15-12h30 </p> <p> <a href="http://www.example.com" target="_blank" title="Actress" class="fonticon">www.example.com<span data-icon="l"></span></a> </p> <p> [email protected] <br> </p> <p> 023/999 99 99 <br> 023/999 99 88 phone1 <br> 023/999 99 77 phone2 <br> </p> <p> telefax 001/333 44 55<br>telefax 001/000 00 10 ppts </p> <h3> Actor </h3> 使用 select 选择所有 h3 标签,并使用 .next_siblings 迭代并添加每个元素的文本,直到找到下一个 h3 标签 data = [] titles = soup.select('.table h3') for title in titles: if('Title1' or 'Title2' in title: item = {"title":title.get_text(),"description":""} for sibling in title.next_siblings: #stop when you reach the next tag if(sibling.name == "h3"): break; try: item['description'] += sibling.get_text() except: pass data.append(item) print(data)
Python 3.5 不与 BeautifulSoup 和变量合作
我正在尝试编写一个程序来检查YouTube视频是否属于音乐类别。我已经写了一些代码,但它几乎就像 Python 对我“撒谎”一样。这是代码 ...
我想从此页面抓取信息。 具体来说,我想抓取当您单击“TOP 10 HOLDINGS”下的“查看全部”时出现的表格(您必须在页面上向下滚动一点......
我正在尝试整理我与人们的对话的聊天记录。我希望能够按名称、时间和文本将其分解。 因为我正在进行的对话并不顺利......
我正在创建一个抓取器,用于抓取 URL 页面中的所有评论,并将文本保存在 txt 文件中(1 条评论 = 1 txt)。 现在,当com文本中有一些表情符号时,我遇到了问题......