Puppeteer请求拦截影响总请求量

问题描述 投票:0回答:1

使用 puppeteer 和 nodejs 在网站重新加载期间拦截特定请求,我注意到只有在启用请求拦截时,我的脚本才会错过某些请求,即使它们列在我的 DevTools 网络选项卡中。

这让我想知道这是否是一种常见行为,由于拦截会增加请求的延迟,因此这是可以预料的。

在禁用请求拦截的情况下加载和重新加载给定网站时,使用附加脚本会产生 22 个请求(通过 chrome 的 DevTool 记录网络看到的值相同)。 启用请求拦截会导致加载期间产生 24 个请求,并且 <16 requests when reloading. Am I missing something?

这是我用于测试的脚本:


/* package.json
{
  "dependencies": {
    "puppeteer": "latest"
  }
}
*/
const puppeteer = require(`puppeteer`);

let requestCount = [0,0,0,0];  
let i = 0;
const url = "https://picsum.photos/";
const interceptEnabled = true;

(async () => {

  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  if(interceptEnabled){
    await page.setRequestInterception(true);
  }

  await page.on("request", request => {
    if(interceptEnabled){
      request.continue();
    }    
    requestCount[i]++;
    // console.log(`url[${requestCount[i]}]: ${request.url()}`);    
  });

  await page.goto(url,{waitUntil: "networkidle0"});
  console.log(`Total requests load: ${requestCount[i]}`);
  i++;
  
  await page.reload({waitUntil: "networkidle0"});
  console.log(`Total requests after reload: ${requestCount[i]}`);
  i++;

  await page.reload({waitUntil: "networkidle0"});
  console.log(`Total requests after reload: ${requestCount[i]}`);
  i++;

  await page.reload({waitUntil: "networkidle0"});
  console.log(`Total requests after reload: ${requestCount[i]}`); 

  console.log(requestCount);

  await browser.close();
})();

node.js puppeteer httprequest
1个回答
0
投票

我认为改进演示以显示实际 URL 将使行为更加清晰。

const puppeteer = require("puppeteer"); // ^22.6.0

const url = "https://picsum.photos/";

let browser;
(async () => {
  browser = await puppeteer.launch();
  const [page] = await browser.pages();
  const requests = [];
  await page.setRequestInterception(true);
  page.on("request", request => {
    requests.at(-1).push(request.url().slice(0, 100))
    request.continue();
  });

  for (let i = 0; i < 4; i++) {
    requests.push([]);
    await page.goto(url, {waitUntil: "networkidle0"});
  }

  const uniqueFirstRequests = [];

  for (const url of requests[0]) {
    for (const batch of requests.slice(1)) {
      if (!batch.includes(url)) {
        uniqueFirstRequests.push(url);
        break;
      }
    }
  }

  console.log(requests);
  console.log(requests.map(e => e.length));
  console.log(uniqueFirstRequests);
})()
  .catch(err => console.error(err))
  .finally(() => browser?.close());

如果运行此命令,打印的最终数组将包含仅在第一次加载中请求的资源:

[
  'https://fonts.googleapis.com/css?family=Roboto:600|Work+Sans:600|Open+Sans:300,400',
  'https://picsum.photos/assets/css/style.css',
  'https://www.googletagmanager.com/gtag/js?id=G-T978ZC858K',
  'https://picsum.photos/assets/images/fastly.svg',
  'https://fonts.gstatic.com/s/worksans/v19/QGY_z_wNahGAdqQ43RhVcIgYT2Xz5u32K5fQBi8Jpg.woff2',
  'https://fonts.gstatic.com/s/opensans/v40/memvYaGs126MiZpBA-UvWbX2vVnXBbObj2OVTS-muw.woff2',
  'https://fonts.gstatic.com/s/worksans/v19/QGY_z_wNahGAdqQ43RhVcIgYT2Xz5u32K5fQBi8Jpg.woff2',
  'https://fonts.gstatic.com/s/opensans/v40/memvYaGs126MiZpBA-UvWbX2vVnXBbObj2OVTS-muw.woff2',
  'https://www.google-analytics.com/g/collect?v=2&tid=G-T978ZC858K&gtm=45je4430v897008144za200&_p=17126',
  'https://fastly.picsum.photos/id/1035/536/354.jpg?hmac=N7LdfGCyj7EjI-_m2RvtgMrZ-SKgYmtwPBf_dd7ZDf8',
  'https://picsum.photos/assets/images/favicon/favicon-32x32.png'
]

这些主要是浏览器已缓存的字体、样式表、分析、SVG 徽标和图标,并且不会在后续加载时将其放入请求处理程序中。其中一个请求是一张随机选择的 picsum 照片,并且在后续加载中不太可能再次显示。

© www.soinside.com 2019 - 2024. All rights reserved.