我使用以下代码向下滚动 Twitter 时间轴页面并对每条推文进行屏幕截图。不过截图都是随机地方被截断的,没有一致性。
我想要的是一个 Twitter 页面,然后向下滚动并抓取(屏幕截图)尽可能多的推文。我该如何修复此代码?
import puppeteer, { BoundingBox, ElementHandle, Page } from 'puppeteer';
export async function getTweets(url: string = "https://x.com/dell") {
const processed: string[] = [];
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto(url);
await page.setViewport({
width: 1080,
height: 720,
});
await new Promise(r => setTimeout(r, 1000));
await page.waitForSelector('article');
await scrollAndExtract(page, processed);
let index = 0;
while (index < 5) {
await autoScroll(page);
await scrollAndExtract(page, processed);
index++;
}
await browser.close();
}
async function scrollAndExtract(page: Page, processed: string[]) {
const articleArray = await page.$$('article');
console.log(articleArray.length);
for (let index = 0; index < articleArray.length; index++) {
const element = articleArray[index];
console.log(element);
element.scrollIntoView();
const id = await page.evaluate(el => el.getAttribute('aria-labelledby'), element);
const guid = id!.split(' ')[0];
if (processed.includes(guid)) {
continue;
}
processed.push(guid);
const imagePath = `./report/${index}-${guid}.jpg`;
console.log(`screen shot ${guid}`);
await screenshot(page, element, imagePath);
}
}
async function autoScroll(page: Page) {
console.log("Scroll down!");
await page.keyboard.press("PageDown");
await page.keyboard.press("PageDown");
}
async function screenshot(page: Page, example: ElementHandle<HTMLElement>, path: string) {
const bounding_box = await example?.boundingBox() as BoundingBox;
const port = page.viewport();
const width = Math.min(bounding_box?.width, port!.width);
const height = Math.min(bounding_box?.height, port!.height);
await page?.screenshot({
path: path,
clip: {
x: bounding_box?.x + 10,
y: bounding_box?.y,
width: Math.min(bounding_box?.width, width),
height: Math.min(bounding_box?.height, height),
},
});
}
我不确定你是否知道这个存在或之前尝试过,但这存在于 puppeteer 中。
我必须在委托项目中使用它。
console.log("Taking screenshots...");
const panelsToSS = await Promise.all([
page.$("#element1"),
page.$("#element2"),
page.$("#element3"),
page.$("#element4"),
page.$("#element5"),
page.$("#element6"),
page.$("#element7"),
page.$("#element8"),
]);
const files = [];
for (let i = 0; i < panelsToSS.length; i++) {
if (panelsToSS[i]) {
let ssBuffer = await panelsToSS[i].screenshot();
let pngBuffer = await sharp(Buffer(ssBuffer)).png().toBuffer();
let attachment = new AttachmentBuilder(pngBuffer, {
name: `${asin}${i}.png`,
});
files.push(attachment);
// files.push(ssBuffer);
console.log(`Screenshot ${i} taken.`);
} else {
console.log(`Screenshot ${i} not found.`);
}
}```