cheerio 相关问题

专为服务器设计的核心jQuery的快速，灵活和精益实现。 https://github.com/cheeriojs/cheerio

使用cheerio抓取嵌套xml

我正在尝试使用cheerio 废弃一些PubMed 数据。以下脚本工作正常，但当某些 xml 标记不存在时，它会生成错误排序的输出。 var request = require('请求'),

node.js xml web-scraping cheerio pubmed

回答 1 投票 0

Nodejs 抓取选项

我正在尝试在节点上为我的货币兑换图构建一个抓取引擎，目前我正在使用request+cheerio，但是由于一些银行网站在html中不使用id/class'es我的代码一些...

html node.js web-scraping cheerio

回答 1 投票 0

使用链式选择器方法和克隆的 jquery/cheerio 元素

我在javascript中使用cheerio，它使用jquery语法来选择dom元素。我克隆了一个圆顶元素，如下所示： var 克隆 = $('.a').clone() 现在我想访问那个克隆你的数据...

javascript jquery cheerio

回答 1 投票 0

在给定时间自动启动node.js脚本

我正在使用node.js/express.js脚本从网站上抓取数据。我需要的数据是每天生成的，因此我需要我的脚本每天在给定时间自动启动。有没有...

node.js express web-scraping cheerio

回答 2 投票 0

使用cheerio在node js中抓取时获取意外/不存在的元素/标签

我正在抓取并解析网页的内容（https://www.mydealz.de/new）。结构如下。我正在抓取并解析网页内容（https://www.mydealz.de/new）。结构如下。 <div class="threadGrid-title"> <strong><a href="">title</a></strong> <span class=" overflow--fade"> <span class="overflow--wrap-off flex boxAlign-ai--all-bl"> <span class="vAlign--all-tt"> <span class="threadItemCard-price text--b thread-price size--all-l size--fromW3-xl space--mr-0">**price**</span> </span> <span class="mute--text text--lineThrough space--ml-1 size--all-l size--fromW3-xl ">**old price**</span> <span class="text--color-charcoal space--ml-1 size--all-l size--fromW3-xl">**Discount%**</span> </span> </span> </div> 我已经能够获得标题，但对于其他事物和元素，它给了我奇怪和意想不到的东西。代码如下。我正在使用cheerio。 async function checkDeals() { try { const response = await axios.get(baseUrl); const $ = cheerio.load(response.data); const deals = []; // Iterate over each deal $('.thread--type-list').each((index, element) => { const threadTitleElement = $(element).find('.threadGrid-title'); // Print all child elements of .threadGrid-title console.log('Child elements of .threadGrid-title:'); threadTitleElement.each((i, el) => { $(el).find('*').each((j, child) => { console.log($.html(child)); // Print HTML of each nested element }); }); // Attempt to extract deeply nested price and discount const priceElement = $(element).find('.threadItemCard-price'); const discountElement = $(element).find('.text--color-charcoal'); // Extracting the price and discount with more detailed text processing const price = priceElement.map((i, el) => $(el).text().trim()).get().join(' '); const discount = discountElement.map((i, el) => $(el).text().trim()).get().join(' '); console.log('Price:', price); console.log('Discount:', discount); }); } catch (error) { console.error('Error fetching the deals:', error); } } 输出截图：在此输入图片描述输出： Child elements of .threadGrid-title: <strong class="thread-title "><a class="cept-tt thread-link linkPlain thread-title--list js-thread-title" title="(Amazon Prime) Victorinox Universalschäler" href="https://www.mydealz.de/deals/amazon-prime-victorinox-universalschaler-2393896" data-t="threadLink" data-t-click="">(Amazon Prime) Victorinox Universalschäler</a></strong> <a class="cept-tt thread-link linkPlain thread-title--list js-thread-title" title="(Amazon Prime) Victorinox Universalschäler" href="https://www.mydealz.de/deals/amazon-prime-victorinox-universalschaler-2393896" data-t="threadLink" data-t-click="">(Amazon Prime) Victorinox Universalschäler</a><span class="overflow--fade"><div aria-busy="true" class="js-vue2" data-handler="vue2" data-vue2="{"name":"ThreadPriceListing","props":{"threadId":2393896}}"><div class="bRad--a-m space--h-1 bRad--circle skeleton bg--color-greyPanel flex"><img src="/assets/img/skeletons/item-type-F.svg" width="60px" height="28px" class="hide--toW3"><img src="/assets/img/skeletons/item-type-F.svg" width="60px" height="26px" class="hide--fromW3"></div></div><span class="thread-divider"></span><div aria-busy="true" class="js-vue2" data-handler="vue2" data-vue2="{"name":"MerchantLabelThreadListing","props":{"threadId":2393896}}"><div class="bRad--a-m space--h-1 bRad--circle skeleton bg--color-greyPanel flex"><img src="/assets/img/skeletons/item-type-F.svg" width="40px" height="20px"></div></div></span> <div aria-busy="true" class="js-vue2" data-handler="vue2" data-vue2="{"name":"ThreadPriceListing","props":{"threadId":2393896}}"><div class="bRad--a-m space--h-1 bRad--circle skeleton bg--color-greyPanel flex"><img src="/assets/img/skeletons/item-type-F.svg" width="60px" height="28px" class="hide--toW3"><img src="/assets/img/skeletons/item-type-F.svg" width="60px" height="26px" class="hide--fromW3"></div></div> <div class="bRad--a-m space--h-1 bRad--circle skeleton bg--color-greyPanel flex"><img src="/assets/img/skeletons/item-type-F.svg" width="60px" height="28px" class="hide--toW3"><img src="/assets/img/skeletons/item-type-F.svg" width="60px" height="26px" class="hide--fromW3"></div> <img src="/assets/img/skeletons/item-type-F.svg" width="60px" height="28px" class="hide--toW3"> <img src="/assets/img/skeletons/item-type-F.svg" width="60px" height="26px" class="hide--fromW3"> <span class="thread-divider"></span> <div aria-busy="true" class="js-vue2" data-handler="vue2" data-vue2="{"name":"MerchantLabelThreadListing","props":{"threadId":2393896}}"><div class="bRad--a-m space--h-1 bRad--circle skeleton bg--color-greyPanel flex"><img src="/assets/img/skeletons/item-type-F.svg" width="40px" height="20px"></div></div> <div class="bRad--a-m space--h-1 bRad--circle skeleton bg--color-greyPanel flex"><img src="/assets/img/skeletons/item-type-F.svg" width="40px" height="20px"></div> <img src="/assets/img/skeletons/item-type-F.svg" width="40px" height="20px"> Price: Discount: 我想知道价格、旧价和折扣。最经典的 Axios/Cheerio 错误是假设它运行 JS，或者服务器将始终为您的浏览器提供相同的服务，或者您在开发工具控制台中看到的内容反映了 Axios 获取的内容以及 Cheerio 将解析的内容。事实上，如果您幸运并且服务器没有阻止您或发送不同的 HTML 文档，您所获得的只是 Axios 页面的 view-source: 版本。不执行任何 JS，并且使用 JS 加载数据的单页应用程序不会被水合。但有时，您想要的数据在初始加载中可用，只是不一定在您期望的位置。在这种情况下，数据方便地存储在 [data-vue2] 标签中的 JSON blob 中，JS 可能在页面加载后将其水化为 HTML 元素： const axios = require("axios"); // ^1.6.8 const cheerio = require("cheerio"); // ^1.0.0-rc.12 const url = "<Your URL>"; axios.get(url) .then(({data: html}) => { const $ = cheerio.load(html); const data = [...$("[data-vue2]")] .map(e => $(e).data("vue2")) .filter(e => e.name === "ThreadMainListItemNormalizer") .map(e => e.props.thread); console.log(JSON.stringify(data, null, 2)); }) .catch(err => console.error(err));

javascript node.js web-scraping cheerio

回答 1 投票 0

无法在 Next.js 14 服务器操作中使用 Cheerio 从 Beatport 抓取艺术家数据

我正在尝试在 Next.js 14 服务器操作中使用 Cheerio 从 Beatport 抓取艺术家数据。目标是搜索艺术家，单击结果中的第一个艺术家卡，然后提取艺术家...

javascript web-scraping next.js cheerio

回答 1 投票 0

类型错误：$.extract 不是函数

Cheerio的很多功能好像都无法使用。我已经能够使用filter()和find()，但不能使用extract。即使遵循确切的教程。 npm 安装 Cheerio 导入 * 作为 ch...

cheerio

回答 1 投票 0

在 Cheerio 中展平标题和多个子项的嵌套数组

使用 Cheerio，我需要在多个级别上迭代来访问元素。如何使用嵌套迭代来访问元素并返回对象数组？目前使用我的代码，因为嵌套

javascript cheerio

回答 1 投票 0

如何使用嵌套循环访问元素？

使用 Cheerio，我需要在多个级别上迭代来访问元素。如何使用嵌套迭代来访问元素并返回对象数组？目前使用我的代码，因为嵌套

javascript cheerio

回答 1 投票 0

Cheerio (node.js) 在读取 html 时返回错误： TypeError: Cannot call method 'utf8Slice' of null

我对 JS 完全陌生，完全被 Node Cheerio 困住了。如果有人能帮助我，我将非常感激。我正在处理的代码在这里：https://github.com/zafartahirov/bitstar...

javascript node.js npm cheerio

回答 1 投票 0

未捕获的类型错误：$(...).text 不是函数

我有一个非常简单的 HTML 页面，我尝试使用 Cheerio 执行一个简单的 CSS 选择器。 const $ = Cheerio.load(html); console.log($(`body > div > div.-layout-h > div.task-tests--label`...

javascript html node.js cheerio

回答 3 投票 0

如何使用 Cheerio 访问对象内属性的值

我正在尝试使用cheerio将属性（“数据代码”）的值推送到数组中。但是，我不断收到错误消息“allAs[i].attr 不是函数” 这是我到目前为止所拥有的常量

node.js web-scraping cheerio

回答 2 投票 0

如何使用Cheerio获取节点源代码位置信息？

我正在尝试获取节点的源代码位置。有 parse5 （由 Cheerio 使用）sourceCodeLocationInfo 选项。但是用这个测试代码： const Cheerio = require('cheerio'); const $ = 欢呼...

javascript node.js cheerio

回答 2 投票 0

使用cheerio，无法使用.attr获取属性

为什么我必须使用 link.attribs.href 而不是标准的 .attr('href') 方法？ ... res.on('结束', () => { const $ = Cheerio.load(内容); const link = $('.more-link').get(0); ...

node.js cheerio

回答 2 投票 0

如何使用cheerio的nodeJS从div标签获取值

我正在NodeJS中使用cheerio和request制作一个webscraper，但我无法从div中获取特殊值。我将从这个 div 获得“idproduit”值（223）：我正在 NodeJS 中使用 cheerio 和 request 制作网络爬虫，但我无法从 div 中获取特殊值。我会从中获得“idproduit”值（223）div： <div class="vignette_footer js-vignette_footer" idproduit="223"> 为此我正在做： $('.vignettes_produit li').each(function(i, element) { var jsObject = { id: id++, idProduit: null}; jsObject.idProduit = $(this).find('.vignette_footer).attr('idproduit'); }); 但是它给了我一个undefined的结果。有人知道我该怎么办吗？将 $('.vignettes_produit li') 更改为 $('li[class="vignettes_produit"]') 应该可以解决您的问题。例如： let cheerio = require('cheerio') let $ = cheerio.load('<ul><li class="vignettes_produit"><div class="vignette_footer js-vignette_footer" idproduit="223">1</div></li><li class="vignettes_produit"><div class="vignette_footer js-vignette_footer" idproduit="345">2</div></li><li class="vignettes_produit"><div class="vignette_footer js-vignette_footer" idproduit="456">3</div></li></ul>') $('li[class="vignettes_produit"]').each(function(i, element) { console.log($(this).find('.vignette_footer').attr('idproduit')) })

javascript node.js web-scraping cheerio

回答 1 投票 0

使用 Javascript 库，例如 Cheerio，而不使用 node.js [重复]

所以目前我正在开发一个 HTML 页面，该页面显示来自网络的各种内容，我计划使用网络抓取工具获取这些内容。我见过各种各样的刷屏...

javascript libraries cheerio

回答 2 投票 0

cheerio 在脚本标签中查找文本

我想提取script标签中的js脚本。这是脚本标签： $(文档).ready(函数(){ $("#div1").click(函数(){ $("#divcontent").load("ajax.content.php?p=0...</desc> <question vote="3"> <p>我想提取script标签中的js脚本。</p> <p>这是脚本标签：</p> <pre><code><script> $(document).ready(function(){ $("#div1").click(function(){ $("#divcontent").load("ajax.content.php?p=0&cat=1"); }); $("#div2").click(function(){ $("#divcontent").load("ajax.content.php?p=1&cat=1"); }); }); </script> </code></pre> <p>我有一个 id 数组，例如 <pre><code>['div1', 'div2']</code></pre>，我需要提取其中的 url 链接：所以如果我调用一个函数：</p> <pre><code>getUrlOf('div1'); </code></pre> <p>它会回来的<pre><code>ajax.content.php?p=0&cat=1</code></pre></p> </question> <answer tick="false" vote="6"> <p>如果您使用的是较新版本的cheerio (1.0.0-rc.2)，则需要使用 <pre><code>.html()</code></pre> 而不是 <pre><code>.text()</code></pre></p> <pre><code>const cheerio = require('cheerio'); const $ = cheerio.load('<script>script one</script> <script> script two</script>'); // For the first script tag console.log($('script').html()); // For all script tags console.log($('script').map((idx, el) => $(el).html()).toArray()); </code></pre> <p><a href="https://github.com/cheeriojs/cheerio/issues/1050" rel="noreferrer">https://github.com/cheeriojs/cheerio/issues/1050</a></p> </answer> <answer tick="true" vote="4"> <p>使用Cheerio，很容易获取脚本标签的<em>文本：</em></p> <pre><code>const cheerio = require('cheerio'); const $ = cheerio.load("the HTML the webpage you are scraping"); // If there's only one <script> console.log($('script').text()); // If there's multiple scripts $('script').each((idx, elem) => console.log(elem.text())); </code></pre> <p>从这里开始，您实际上只是在问“如何解析通用的 javascript 块并提取链接列表”。我同意帕特里克上面的评论，你可能不应该。 <em>你能</em>制作一个正则表达式，让你找到脚本中的每个链接并推断出它链接到的页面吗？是的。但很可能，如果此页面发生任何变化，您的脚本将立即中断 - 页面的作者可能会切换到内联 <pre><code><a></code></pre> 标签、重构代码、使用实时事件等。</p> <p>请注意，依赖此脚本标记的确切内容将使您的应用程序非常脆弱 - 甚至比通常的页面抓取更脆弱。</p> <p>这是一个松散但有效的正则表达式的示例：</p> <pre><code>let html = "incoming html"; let regex = /\$$"(#.+?)"$\.click(?:.|\n)+?\.load\("(.+?)"/; let match; while (match = regex.exec(html)) { console.log(match[1] + ': ' + match[2]); } </code></pre> <p>如果您是正则表达式新手：此表达式包含两个捕获组，位于括号中（第一个是 div id，第二个是链接文本），以及中间存在的<em>non-capturing</em> 组只是为了确保正则表达式将继续通过换行符。我说它是“松散的”，因为它正在寻找的匹配看起来像这样：</p> <ul> <li>$("<pre><code>***</code></pre>").click<pre><code>***ignored chars***</code></pre>.load("<pre><code>***</code></pre>"</li> </ul> <p>因此，根据 javascript 的数量及其相似程度，您可能必须加强它以避免误报。</p> </answer> </body></html>

javascript node.js cheerio

回答 0 投票 0

cheerio中如何获取div的childNodes？

我想使用cheerio获取div的第一个childNode。我使用 javascript dom 操作来获取它。但无法在cheerio 上获取它。我已经在开发工具中尝试过了，并且得到了预期的

javascript node.js web-scraping cheerio

回答 4 投票 0

使用cheerio获取div的所有子节点？ [重复]

文本1 文字2 文字3 文本4块文字5 最后文本5 所以我...

javascript jquery node.js cheerio

回答 1 投票 0

如何在Cheerio中的div前后添加换行符

我试图通过使用cheerio 用新行换行来修改HTML 中的每个元素。输入：嗨... 我正在尝试通过使用 Cheerio 用新行换行来修改 HTML 中的每个 <div> 元素。输入： <html> <body> <div> <div><div>Hi</div></div> <div>hello</div> <div>ur awesome</div> </div> </body> </html> 预期输出： <html> <body> <div> <div> <div>Hi</div> </div> <div>hello</div> <div>ur awesome</div> </div> </body> </html> 我得到的输出： <html><head></head><body> <div> <div><div>Hi</div></div> <div>hello</div> <div>ur awesome</div> </div> </body></html> 问题：它没有将新行添加到嵌套的<div>元素中。尝试使用 $(this) 上下文，但没有成功。代码： const c = require('cheerio') const $ = c.load(html); $('div').each((i,e) => { let mod = `\n ${ c.html($(e)) } \n` $(e).replaceWith(mod) }) console.log($.html()) 你可以这样做： $('div').before("\n").after("\n")

javascript jquery node.js cheerio

回答 1 投票 0

cheerio 相关问题

最新问题