如何使用 puppeteer 节点滚动和截屏每条推文

问题描述 投票:0回答:1

我使用以下代码向下滚动 Twitter 时间轴页面并对每条推文进行屏幕截图。不过截图都是随机地方被截断的,没有一致性。

我想要的是一个 Twitter 页面,然后向下滚动并抓取(屏幕截图)尽可能多的推文。我该如何修复此代码?

import puppeteer, { BoundingBox, ElementHandle, Page } from 'puppeteer';


export async function getTweets(url: string = "https://x.com/dell") {

const processed: string[] = [];
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto(url);
await page.setViewport({
    width: 1080,
    height: 720,
});




   await new Promise(r => setTimeout(r, 1000));
    await page.waitForSelector('article');

    await scrollAndExtract(page, processed);
    let index = 0;
    while (index < 5) {
        await autoScroll(page);
        await scrollAndExtract(page, processed);
        index++;
    }
    await browser.close();

}

async function scrollAndExtract(page: Page, processed: string[]) {
    const articleArray = await page.$$('article');
    console.log(articleArray.length);
    for (let index = 0; index < articleArray.length; index++) {
        const element = articleArray[index];
        console.log(element);
        element.scrollIntoView();
        const id = await page.evaluate(el => el.getAttribute('aria-labelledby'), element);
        const guid = id!.split(' ')[0];
        if (processed.includes(guid)) {
            continue;
        }
        processed.push(guid);
        const imagePath = `./report/${index}-${guid}.jpg`;
        console.log(`screen shot ${guid}`);
        await screenshot(page, element, imagePath);
    }
}

async function autoScroll(page: Page) {
    console.log("Scroll down!");
    await page.keyboard.press("PageDown");
    await page.keyboard.press("PageDown");
}

async function screenshot(page: Page, example: ElementHandle<HTMLElement>, path: string) {
    const bounding_box = await example?.boundingBox() as BoundingBox;
    const port = page.viewport();
    const width = Math.min(bounding_box?.width, port!.width);
    const height = Math.min(bounding_box?.height, port!.height);
    await page?.screenshot({
        path: path,
        clip: {
            x: bounding_box?.x + 10,
            y: bounding_box?.y,
            width: Math.min(bounding_box?.width, width),
            height: Math.min(bounding_box?.height, height),
        },
    });
}
node.js puppeteer
1个回答
0
投票

我不确定你是否知道这个存在或之前尝试过,但这存在于 puppeteer 中。

我必须在委托项目中使用它。

console.log("Taking screenshots...");
  const panelsToSS = await Promise.all([
    page.$("#element1"),
    page.$("#element2"),
    page.$("#element3"),
    page.$("#element4"),
    page.$("#element5"),
    page.$("#element6"),
    page.$("#element7"),
    page.$("#element8"),
  ]);

  const files = [];
  for (let i = 0; i < panelsToSS.length; i++) {
    if (panelsToSS[i]) {
      let ssBuffer = await panelsToSS[i].screenshot();
      let pngBuffer = await sharp(Buffer(ssBuffer)).png().toBuffer();
      let attachment = new AttachmentBuilder(pngBuffer, {
        name: `${asin}${i}.png`,
      });

      files.push(attachment);
      // files.push(ssBuffer);
      console.log(`Screenshot ${i} taken.`);
    } else {
      console.log(`Screenshot ${i} not found.`);
    }
  }```
© www.soinside.com 2019 - 2024. All rights reserved.