使用 puppeteer 和 nodejs 在网站重新加载期间拦截特定请求,我注意到只有在启用请求拦截时,我的脚本才会错过某些请求,即使它们列在我的 DevTools 网络选项卡中。
这让我想知道这是否是一种常见行为,由于拦截会增加请求的延迟,因此这是可以预料的。
在禁用请求拦截的情况下加载和重新加载给定网站时,使用附加脚本会产生 22 个请求(通过 chrome 的 DevTool 记录网络看到的值相同)。 启用请求拦截会导致加载期间产生 24 个请求,并且 <16 requests when reloading. Am I missing something?
这是我用于测试的脚本:
/* package.json
{
"dependencies": {
"puppeteer": "latest"
}
}
*/
const puppeteer = require(`puppeteer`);
let requestCount = [0,0,0,0];
let i = 0;
const url = "https://picsum.photos/";
const interceptEnabled = true;
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
if(interceptEnabled){
await page.setRequestInterception(true);
}
await page.on("request", request => {
if(interceptEnabled){
request.continue();
}
requestCount[i]++;
// console.log(`url[${requestCount[i]}]: ${request.url()}`);
});
await page.goto(url,{waitUntil: "networkidle0"});
console.log(`Total requests load: ${requestCount[i]}`);
i++;
await page.reload({waitUntil: "networkidle0"});
console.log(`Total requests after reload: ${requestCount[i]}`);
i++;
await page.reload({waitUntil: "networkidle0"});
console.log(`Total requests after reload: ${requestCount[i]}`);
i++;
await page.reload({waitUntil: "networkidle0"});
console.log(`Total requests after reload: ${requestCount[i]}`);
console.log(requestCount);
await browser.close();
})();
我认为改进演示以显示实际 URL 将使行为更加清晰。
const puppeteer = require("puppeteer"); // ^22.6.0
const url = "https://picsum.photos/";
let browser;
(async () => {
browser = await puppeteer.launch();
const [page] = await browser.pages();
const requests = [];
await page.setRequestInterception(true);
page.on("request", request => {
requests.at(-1).push(request.url().slice(0, 100))
request.continue();
});
for (let i = 0; i < 4; i++) {
requests.push([]);
await page.goto(url, {waitUntil: "networkidle0"});
}
const uniqueFirstRequests = [];
for (const url of requests[0]) {
for (const batch of requests.slice(1)) {
if (!batch.includes(url)) {
uniqueFirstRequests.push(url);
break;
}
}
}
console.log(requests);
console.log(requests.map(e => e.length));
console.log(uniqueFirstRequests);
})()
.catch(err => console.error(err))
.finally(() => browser?.close());
如果运行此命令,打印的最终数组将包含仅在第一次加载中请求的资源:
[
'https://fonts.googleapis.com/css?family=Roboto:600|Work+Sans:600|Open+Sans:300,400',
'https://picsum.photos/assets/css/style.css',
'https://www.googletagmanager.com/gtag/js?id=G-T978ZC858K',
'https://picsum.photos/assets/images/fastly.svg',
'https://fonts.gstatic.com/s/worksans/v19/QGY_z_wNahGAdqQ43RhVcIgYT2Xz5u32K5fQBi8Jpg.woff2',
'https://fonts.gstatic.com/s/opensans/v40/memvYaGs126MiZpBA-UvWbX2vVnXBbObj2OVTS-muw.woff2',
'https://fonts.gstatic.com/s/worksans/v19/QGY_z_wNahGAdqQ43RhVcIgYT2Xz5u32K5fQBi8Jpg.woff2',
'https://fonts.gstatic.com/s/opensans/v40/memvYaGs126MiZpBA-UvWbX2vVnXBbObj2OVTS-muw.woff2',
'https://www.google-analytics.com/g/collect?v=2&tid=G-T978ZC858K>m=45je4430v897008144za200&_p=17126',
'https://fastly.picsum.photos/id/1035/536/354.jpg?hmac=N7LdfGCyj7EjI-_m2RvtgMrZ-SKgYmtwPBf_dd7ZDf8',
'https://picsum.photos/assets/images/favicon/favicon-32x32.png'
]
这些主要是浏览器已缓存的字体、样式表、分析、SVG 徽标和图标,并且不会在后续加载时将其放入请求处理程序中。其中一个请求是一张随机选择的 picsum 照片,并且在后续加载中不太可能再次显示。