考虑以下 HTML:
<div aria-roledescription="carousel" data-disliderguid="slider772" class="di-slider slider772-slider gmus-1800x760-slider">
<div class="swiper-container">
<div class="swiper-wrapper">
<div
class="di-slide swiper-slide"
data-guid="slide2221"
data-screen="desktop"
data-title="995_2024_All_Hummer_Evergreen_2024_DWC"
data-id="2221"
data-filtervalue=""
data-swiper-autoplay="3000">
<div class="di-slider-disclaimer">
<button class="di-slider-disclaimer-toggle" aria-expanded="false">
<span class="inactive-label">Important Information</span>
<span class="active-label">Hide Information</span>
</button>
<div class="di-slider-disclaimer-container">
<div class="di-slider-disclaimer-contents">
Preproduction and simulated models shown throughout. Actual production model may vary. HUMMER EV is available from a GMC EV dealer.
</div>
</div>
</div>
<a class="di-slider-link"
aria-hidden="true"
href="/new-vehicles/?_dFR%5Byear%5D%5B0%5D=2024&_dFR%5Bmake%5D%5B0%5D=GMC&_dFR%5Bmodel%5D%5B0%5D=HUMMER+EV&_dFR%5Bmodel%5D%5B1%5D=HUMMER+EV+SUV&_dFR%5Bmodel%5D%5B2%5D=HUMMER+EV+Pickup"
title=""
tabindex="-1"
>
<picture class="slide-image">
<source media="(max-width: 767px)" srcset="https://gtmassets.dealerinspire.com/9061-995_2024_All_Hummer_Evergreen_2024_DWC_600x400.jpg">
<source media="(min-width: 768px)"
srcset="https://gtmassets.dealerinspire.com/9061-995_2024_All_Hummer_Evergreen_2024_DWC_1800x760.jpg">
<img src="https://gtmassets.dealerinspire.com/9061-995_2024_All_Hummer_Evergreen_2024_DWC_1800x760.jpg" alt="GMC HUMMER EV PICKUP AND SUV"
style=""
width="1800" height="760">
</picture>
</a>
</div>
<div
class="di-slide swiper-slide"
data-guid="slide950"
data-screen="desktop"
data-title="Generic"
data-id="950"
data-filtervalue=""
>
<picture class="slide-image">
<source media="(max-width: 767px)" srcset="https://di-uploads-development.dealerinspire.com/robertsonsgmc-winback0123/uploads/2023/03/Group-of-2023-GMC-Terrain-SUVs-parked-on-beach_mobile.jpg">
<source media="(min-width: 768px)"
srcset="https://di-uploads-development.dealerinspire.com/robertsonsgmc-winback0123/uploads/2023/03/Group-of-2023-GMC-Terrain-SUVs-parked-on-beach-1800x760.jpg">
<img src="https://di-uploads-development.dealerinspire.com/robertsonsgmc-winback0123/uploads/2023/03/Group-of-2023-GMC-Terrain-SUVs-parked-on-beach-1800x760.jpg" alt="Group of 2023 GMC Terrain SUVs parked on beach"
style="visibility:hidden"
width="1800" height="760">
</picture>
我尝试通过 ScrapeNinja 使用 Cherrio 返回 Div 类 di-slider 子级的所有图像的 SRC,如 HTML 片段的第一行所示。所有图像都是 HTML 图片对象,并且都具有类似的 div 类。但是,我想要返回的唯一链接是值。
当我尝试在他们的沙箱上运行以下代码时:https://scrapeninja.net/cheerio-sandbox/basic,我收到错误“错误:预期名称,找到://gtmassets.dealerinspire.com/9061 -995_2024_All_Hummer_Evergreen_2024_DWC_1800x760.jpg 第 19 行
这是我收到的错误:
// define function which accepts body and cheerio as args
function extract(input, cheerio) {
// return object with extracted values
let $ = cheerio.load(input);
var listItems = $(".di-slider");
listItems.each(function(idx, picture) {
let image= $(picture).find('img').attr('src');
return {
source: $(image)
};
});
}
我承认,我对 JS 并不是最擅长的,我已经很多年没有使用 jQuery 了,这是我第一次尝试使用 Cheerio 或 scrapeninja。
我已经查看了文档https://pixeljets.com/blog/cheerio-sandbox-cheatsheet/#iterate-over-children-and-return-them-as-an-array-of-objects,并且我构建了我的功能是如何通过cheerio获取图像url?
几个问题:
.forEach
/.each
不返回值。您从其中返回的任何内容都将被忽略。另一方面,.map
使用回调返回的所有值分配一个数组。这是最适合这项工作的功能。您还可以将每个项目推送到数组变量上,但这就是 map
的抽象设计目的。extract()
退回任何东西。$("https://gtmassets.dealerinspire.com/9061-995_2024_All_Hummer_Evergreen_2024_DWC_1800x760.jpg")
。删除这里的$()
。工作代码:
function extract(input, cheerio) {
const $ = cheerio.load(input);
return [...$(".di-slider")].map(e => ({
source: $(e).find("img").attr("src")
}));
}