使用 Puppeteer 抓取项目列表

问题描述 投票:0回答:1

我正在尝试从 Visma Inschool 获取我的学校时间表。由于它在学校本地托管,我有权这样做。

我正在尝试制作一个信息屏幕,但在获取时间表时我完全迷失了。

使用此代码:

async function getInfo() {
  // identify element
  const f = await page.$("[class='active Timetable-TimetableDays_day']")
  // obtain text
  const text = await (await f.getProperty('textContent')).jsonValue()
  console.log("The day is: " + text)
}

我在一篇文本中获得了整个活动日,但我希望将其分成不同的文本,这样我就可以获取日程表列表上的下一个项目以及老师的姓名和班级的教室。

除了得到的输出之外,我不期望任何其他内容,因为我不知道要搜索什么 DOM。

这是 HTML 格式的网站:

<div data-v-36c1dfe8="" role="gridcell" class="active Timetable-TimetableDays_day"><h3 data-v-36c1dfe8="" class="sr-only"> 7 Avtaler, torsdag 14 desember </h3><div data-v-36c1dfe8="" class="Timetable-Items"><div data-v-45669d56="" data-v-36c1dfe8="" class="popup-container" style="top: 40px; left: 0%;"><div data-v-36c1dfe8="" data-v-45669d56=""><h4 data-v-36c1dfe8="" data-v-45669d56="" class="sr-only"> Teknologiforståelse på rom E109. Undervisningstime. Starter 14. desember 2023 klokken 08:20 og slutter 09:05 </h4><div data-v-33d0fd5d="" data-v-36c1dfe8="" tabindex="0" aria-haspopup="dialog" test-id="vs-Timetable-item-4-0-135299254" class="Timetable-TimetableItem Timetable-TimetableItem-m" role="button" aria-label="Se detaljer om Teknologiforståelse på rom E109. Undervisningstime. Starter 14. desember 2023 klokken 08:20 og slutter 09:05" subjectcode="IKM1002" tttype="LESSON" starttimeanddateunix="1702538400" readonlyitems="[object Object],[object Object],[object Object],[object Object],[object Object]" actionitems="" teachername="Teacher" entityid="135299254" block="" teachinggroupid="12459877" data-v-45669d56="" style="border-left: 6px solid rgb(83, 237, 155); height: 90px; width: 100%;"><div data-v-33d0fd5d="" aria-hidden="true" class="Timetable-TimetableItem-wrapper"><div data-v-33d0fd5d="" class="Timetable-TimetableItem-header"><small data-v-33d0fd5d="" class="Timetable-TimetableItem-hours"><i data-v-33d0fd5d="" class="far fa-clock"></i> 08:20 - 09:05 </small><small data-v-33d0fd5d="" class="Timetable-TimetableItem-location"><i data-v-33d0fd5d="" class="far fa-map-marker"></i> E109 </small><!----></div><p data-v-33d0fd5d="" class="Timetable-TimetableItem-subject-name"> Teknologiforståelse </p><!----><small data-v-33d0fd5d="" class="Timetable-TimetableItem-type"> Undervisningstime </small><!----></div></div></div><!----></div><div data-v-45669d56="" data-v-36c1dfe8="" class="popup-container" style="top: 140px; left: 0%;"><div data-v-36c1dfe8="" data-v-45669d56=""><h4 data-v-36c1dfe8="" data-v-45669d56="" class="sr-only"> Teknologiforståelse på rom E109. Undervisningstime. Starter 14. desember 2023 klokken 09:10 og slutter 09:55 </h4><div data-v-33d0fd5d="" data-v-36c1dfe8="" tabindex="0" aria-haspopup="dialog" test-id="vs-Timetable-item-4-1-135299255" class="Timetable-TimetableItem Timetable-TimetableItem-m" role="button" aria-label="Se detaljer om Teknologiforståelse på rom E109. Undervisningstime. Starter 14. desember 2023 klokken 09:10 og slutter 09:55" subjectcode="IKM1002" tttype="LESSON" starttimeanddateunix="1702541400" readonlyitems="[object Object],[object Object],[object Object],[object Object],[object Object]" actionitems="" teachername="Teacher" entityid="135299255" block="" teachinggroupid="12459877" data-v-45669d56="" style="border-left: 6px solid rgb(83, 237, 155); height: 90px; width: 100%;"><div data-v-33d0fd5d="" aria-hidden="true" class="Timetable-TimetableItem-wrapper"><div data-v-33d0fd5d="" class="Timetable-TimetableItem-header"><small data-v-33d0fd5d="" class="Timetable-TimetableItem-hours"><i data-v-33d0fd5d="" class="far fa-clock"></i> 09:10 - 09:55 </small><small data-v-33d0fd5d="" class="Timetable-TimetableItem-location"><i data-v-33d0fd5d="" class="far fa-map-marker"></i> E109 </small><!----></div><p data-v-33d0fd5d="" class="Timetable-TimetableItem-subject-name"> Teknologiforståelse </p><!----><small data-v-33d0fd5d="" class="Timetable-TimetableItem-type"> Undervisningstime </small><!----></div></div></div><!----></div><div data-v-45669d56="" data-v-36c1dfe8="" class="popup-container" style="top: 260px; left: 0%;"><div data-v-36c1dfe8="" data-v-45669d56=""><h4 data-v-36c1dfe8="" data-v-45669d56="" class="sr-only"> Matematikk 1P-Y IM på rom E208. Undervisningstime. Starter 14. desember 2023 klokken 10:10 og slutter 10:55 </h4><div data-v-33d0fd5d="" data-v-36c1dfe8="" tabindex="0" aria-haspopup="dialog" test-id="vs-Timetable-item-4-2-135353441" class="Timetable-TimetableItem Timetable-TimetableItem-m" role="button" aria-label="Se detaljer om Matematikk 1P-Y IM på rom E208. Undervisningstime. Starter 14. desember 2023 klokken 10:10 og slutter 10:55" subjectcode="MAT1121" tttype="LESSON" starttimeanddateunix="1702545000" readonlyitems="[object Object],[object Object],[object Object],[object Object],[object Object]" actionitems="" teachername="Teacher" entityid="135353441" block="" teachinggroupid="12459866" data-v-45669d56="" style="border-left: 6px solid rgb(57, 57, 96); height: 90px; width: 100%;"><div data-v-33d0fd5d="" aria-hidden="true" class="Timetable-TimetableItem-wrapper"><div data-v-33d0fd5d="" class="Timetable-TimetableItem-header"><small data-v-33d0fd5d="" class="Timetable-TimetableItem-hours"><i data-v-33d0fd5d="" class="far fa-clock"></i> 10:10 - 10:55 </small><small data-v-33d0fd5d="" class="Timetable-TimetableItem-location"><i data-v-33d0fd5d="" class="far fa-map-marker"></i> E208 </small><!----></div><p data-v-33d0fd5d="" class="Timetable-TimetableItem-subject-name"> Matematikk 1P-Y IM </p><!----><small data-v-33d0fd5d="" class="Timetable-TimetableItem-type"> Undervisningstime </small><!----></div></div></div><!----></div><div data-v-45669d56="" data-v-36c1dfe8="" class="popup-container" style="top: 360px; left: 0%;"><div data-v-36c1dfe8="" data-v-45669d56=""><h4 data-v-36c1dfe8="" data-v-45669d56="" class="sr-only"> Yrkesfaglig fordypning vg1 på rom E109. Vikar. Starter 14. desember 2023 klokken 11:00 og slutter 11:45 </h4><div data-v-33d0fd5d="" data-v-36c1dfe8="" tabindex="0" aria-haspopup="dialog" test-id="vs-Timetable-item-4-3-135317780" class="Timetable-TimetableItem Timetable-TimetableItem-m" role="button" aria-label="Se detaljer om Yrkesfaglig fordypning vg1 på rom E109. Vikar. Starter 14. desember 2023 klokken 11:00 og slutter 11:45" tttype="SUBSTITUTION" originaltype="LESSON" starttimeanddateunix="1702548000" readonlyitems="[object Object],[object Object],[object Object],[object Object],[object Object]" actionitems="" teachername="Teacher" entityid="135317780" block="" teachinggroupid="12459773" data-v-45669d56="" style="border-left: 6px solid rgb(83, 237, 155); height: 90px; width: 100%;"><div data-v-33d0fd5d="" aria-hidden="true" class="Timetable-TimetableItem-wrapper"><div data-v-33d0fd5d="" class="Timetable-TimetableItem-header"><small data-v-33d0fd5d="" class="Timetable-TimetableItem-hours"><i data-v-33d0fd5d="" class="far fa-clock"></i> 11:00 - 11:45 </small><small data-v-33d0fd5d="" class="Timetable-TimetableItem-location"><i data-v-33d0fd5d="" class="far fa-map-marker"></i> E109 </small><!----></div><p data-v-33d0fd5d="" class="Timetable-TimetableItem-subject-name"> Yrkesfaglig fordypning vg1 </p><!----><small data-v-33d0fd5d="" class="Timetable-TimetableItem-type"> Vikar </small><!----></div></div></div><!----></div><div data-v-45669d56="" data-v-36c1dfe8="" class="popup-container" style="top: 510px; left: 0%;"><div data-v-36c1dfe8="" data-v-45669d56=""><h4 data-v-36c1dfe8="" data-v-45669d56="" class="sr-only"> Yrkesfaglig fordypning vg1 på rom E109. Vikar. Starter 14. desember 2023 klokken 12:15 og slutter 13:00 </h4><div data-v-33d0fd5d="" data-v-36c1dfe8="" tabindex="0" aria-haspopup="dialog" test-id="vs-Timetable-item-4-4-135317781" class="Timetable-TimetableItem Timetable-TimetableItem-m" role="button" aria-label="Se detaljer om Yrkesfaglig fordypning vg1 på rom E109. Vikar. Starter 14. desember 2023 klokken 12:15 og slutter 13:00" tttype="SUBSTITUTION" originaltype="LESSON" starttimeanddateunix="1702552500" readonlyitems="[object Object],[object Object],[object Object],[object Object],[object Object]" actionitems="" teachername="Teacher" entityid="135317781" block="" teachinggroupid="12459773" data-v-45669d56="" style="border-left: 6px solid rgb(83, 237, 155); height: 90px; width: 100%;"><div data-v-33d0fd5d="" aria-hidden="true" class="Timetable-TimetableItem-wrapper"><div data-v-33d0fd5d="" class="Timetable-TimetableItem-header"><small data-v-33d0fd5d="" class="Timetable-TimetableItem-hours"><i data-v-33d0fd5d="" class="far fa-clock"></i> 12:15 - 13:00 </small><small data-v-33d0fd5d="" class="Timetable-TimetableItem-location"><i data-v-33d0fd5d="" class="far fa-map-marker"></i> E109 </small><!----></div><p data-v-33d0fd5d="" class="Timetable-TimetableItem-subject-name"> Yrkesfaglig fordypning vg1 </p><!----><small data-v-33d0fd5d="" class="Timetable-TimetableItem-type"> Vikar </small><!----></div></div></div><!----></div><div data-v-45669d56="" data-v-36c1dfe8="" class="popup-container" style="top: 610px; left: 0%;"><div data-v-36c1dfe8="" data-v-45669d56=""><h4 data-v-36c1dfe8="" data-v-45669d56="" class="sr-only"> Yrkesfaglig fordypning vg1 på rom E109. Undervisningstime. Starter 14. desember 2023 klokken 13:05 og slutter 13:50 </h4><div data-v-33d0fd5d="" data-v-36c1dfe8="" tabindex="0" aria-haspopup="dialog" test-id="vs-Timetable-item-4-5-135321756" class="Timetable-TimetableItem Timetable-TimetableItem-m" role="button" aria-label="Se detaljer om Yrkesfaglig fordypning vg1 på rom E109. Undervisningstime. Starter 14. desember 2023 klokken 13:05 og slutter 13:50" subjectcode="YFF4106" tttype="LESSON" starttimeanddateunix="1702555500" readonlyitems="[object Object],[object Object],[object Object],[object Object],[object Object]" actionitems="[object Object]" teachername="Teacher" entityid="135321756" block="" teachinggroupid="12459773" data-v-45669d56="" style="border-left: 6px solid rgb(83, 237, 155); height: 90px; width: 100%;"><div data-v-33d0fd5d="" aria-hidden="true" class="Timetable-TimetableItem-wrapper"><div data-v-33d0fd5d="" class="Timetable-TimetableItem-header"><small data-v-33d0fd5d="" class="Timetable-TimetableItem-hours"><i data-v-33d0fd5d="" class="far fa-clock"></i> 13:05 - 13:50 </small><small data-v-33d0fd5d="" class="Timetable-TimetableItem-location"><i data-v-33d0fd5d="" class="far fa-map-marker"></i> E109 </small><!----></div><p data-v-33d0fd5d="" class="Timetable-TimetableItem-subject-name"> Yrkesfaglig fordypning vg1 </p><!----><small data-v-33d0fd5d="" class="Timetable-TimetableItem-type"> Undervisningstime </small><!----></div></div></div><!----></div><div data-v-45669d56="" data-v-36c1dfe8="" class="popup-container" style="top: 710px; left: 0%;"><div data-v-36c1dfe8="" data-v-45669d56=""><h4 data-v-36c1dfe8="" data-v-45669d56="" class="sr-only"> Yrkesfaglig fordypning vg1 på rom E109. Undervisningstime. Starter 14. desember 2023 klokken 13:55 og slutter 14:40 </h4><div data-v-33d0fd5d="" data-v-36c1dfe8="" tabindex="0" aria-haspopup="dialog" test-id="vs-Timetable-item-4-6-135321757" class="Timetable-TimetableItem Timetable-TimetableItem-m" role="button" aria-label="Se detaljer om Yrkesfaglig fordypning vg1 på rom E109. Undervisningstime. Starter 14. desember 2023 klokken 13:55 og slutter 14:40" subjectcode="YFF4106" tttype="LESSON" starttimeanddateunix="1702558500" readonlyitems="[object Object],[object Object],[object Object],[object Object],[object Object]" actionitems="[object Object]" teachername="Teacher" entityid="135321757" block="" teachinggroupid="12459773" data-v-45669d56="" style="border-left: 6px solid rgb(83, 237, 155); height: 90px; width: 100%;"><div data-v-33d0fd5d="" aria-hidden="true" class="Timetable-TimetableItem-wrapper"><div data-v-33d0fd5d="" class="Timetable-TimetableItem-header"><small data-v-33d0fd5d="" class="Timetable-TimetableItem-hours"><i data-v-33d0fd5d="" class="far fa-clock"></i> 13:55 - 14:40 </small><small data-v-33d0fd5d="" class="Timetable-TimetableItem-location"><i data-v-33d0fd5d="" class="far fa-map-marker"></i> E109 </small><!----></div><p data-v-33d0fd5d="" class="Timetable-TimetableItem-subject-name"> Yrkesfaglig fordypning vg1 </p><!----><small data-v-33d0fd5d="" class="Timetable-TimetableItem-type"> Undervisningstime </small><!----></div></div></div><!----></div></div><div data-v-3b0485ec="" data-v-36c1dfe8="" class="Timetable-TimetableNowLine" style="top: 546.633px;"></div></div>
javascript html class dom puppeteer
1个回答
0
投票

不要选择包裹所有元素的顶级元素,而是选择内部项目元素并映射它们以从每个元素中提取数据,根据需要深入叶节点和属性:

const fs = require("node:fs/promises");
const puppeteer = require("puppeteer"); // ^21.6.0

let browser;
(async () => {
  browser = await puppeteer.launch({headless: "new"});
  const [page] = await browser.pages();
  const html = await fs.readFile("test.html", {
    encoding: "utf-8",
  });
  await page.setContent(html); // replace with goto
  const data = await page.$$eval(
    ".Timetable-TimetableItem",
    els =>
      els.map(el => {
        const text = s =>
          el
            .querySelector(`.TimeTable-TimetableItem-${s}`)
            .textContent.trim();
        return {
          header: el.previousElementSibling.textContent.trim(),
          label: el.getAttribute("aria-label"),
          subjectCode: el.getAttribute("subjectcode"),
          tttype: el.getAttribute("tttype"),
          entityId: el.getAttribute("entityid"),
          teachingGroupId: el.getAttribute("teachinggroupid"),
          startTimeAndDateUnix: el.getAttribute(
            "starttimeanddateunix"
          ),
          teacher: el.getAttribute("teachername"),
          hours: text("hours"),
          location: text("location"),
          name: text("subject-name"),
          type: text("type"),
        };
      })
  );
  console.log(data);
})()
  .catch(err => console.error(err))
  .finally(() => browser?.close());

输出:

[
  {
    header: 'Teknologiforståelse på rom E109. Undervisningstime. Starter 14. desember 2023 klokken 08:20 og slutter 09:05 ',
    label: 'Se detaljer om Teknologiforståelse på rom E109. Undervisningstime. Starter 14. desember 2023 klokken 08:20 og slutter 09:05',
    subjectCode: 'IKM1002',
    tttype: 'LESSON',
    entityId: '135299254',
    teachingGroupId: '12459877',
    startTimeAndDateUnix: '1702538400',
    teacher: 'Teacher',
    hours: '08:20 - 09:05',
    location: 'E109',
    name: 'Teknologiforståelse',
    type: 'Undervisningstime'
  },
// ...
]

在选择之前,使用类似的东西

await page.waitForSelector(".TimeTable-TimetableItem-hours");

确保元素在那里。

© www.soinside.com 2019 - 2024. All rights reserved.