如何使用 chrome.history.search 获取我的全部历史记录？

Question

我正在构建一个扩展程序，可以读取 Chrome 历史记录并分析关键字链接。

我正在使用

chrome.history.search

方法来检索浏览器历史记录，如下所示：

chrome.history.search({
        'text': '',
        'maxResults': 500,
    }, function(historyItems){
    });

此时，我将检索到的 URL 存储在数组中并开始读取它们。

但我并没有得到一切。检索到的 URL 数量随不同的运行而变化。我尝试尝试搜索方法中的参数，但我无法影响返回的链接数量。

谁能帮我理解这个？

编辑：当我说我没有得到所有内容时，我的意思是与我可以看到的浏览器历史记录相比，通过扩展程序拉出的历史记录要有限得多。

Answer 1

有趣的是，Justaman 在他的 answer 中提到的 Chromium bug 揭示了传递

maxResults: 0

实际上会返回 all 历史项。所以如果你真的想要完整的历史，你可以这样做：

chrome.history.search({ text: "", startTime: 0, maxResults: 0 }, 
    items => console.log(items));

我还没有尝试过，因为我预计将我的数十（？）数千个历史记录项加载到内存中会导致 Chrome 崩溃。但几天前我确实用

startTime

尝试过，结果返回了 645 个项目。

如果您碰巧使用方便的 chrome-promise 库，这里有一个 Antony 的 answer 版本，它使用 Promise 而不是回调来循环 API 调用，直到找到所需数量的历史记录项：

import ChromePromise from 'chrome-promise';

const chromep = new ChromePromise();

function loop(fn)
{
    return fn().then(val => (val === true && loop(fn)) || val);
}

function getHistory(requestedCount)
{
    var history = [],
        ids = {};

    return loop(() => {
        var endTime = history.length &&
                history[history.length - 1].lastVisitTime || Date.now();

        return chromep.history.search({
            text: "",
            startTime: 0,
            endTime: endTime,
            maxResults: 1000
        })
            .then(historyItems => {
                var initialHistoryLength = history.length;

                historyItems.forEach(item => {
                    var id = item.id;

                        // history will often return duplicate items
                    if (!ids[id] && history.length < requestedCount) {
                        addURLs(item);
                        history.push(item);
                        ids[id] = true;
                    }
                });

                    // only loop if we found some new items in the last call
                    // and we haven't reached the limit yet
                if (history.length > initialHistoryLength && 
                        history.length < requestedCount) {
                    return true;
                } else {
                    return history;
                }
            });
    });
}

您可以像这样使用此功能：

getHistory(2000).then(items => console.log(items));

Answer 2

这是我编写的一些代码，用于尝试使用搜索检索所有历史记录项目。尝试一下，看看是否有帮助：

var nextEndTimeToUse = 0;

var allItems = [];
var itemIdToIndex = {};

function getMoreHistory(callback) {
  var params = {text:"", maxResults:500};
  params.startTime = 0;
  if (nextEndTimeToUse > 0)
    params.endTime = nextEndTimeToUse;

  chrome.history.search(params, function(items) {
    var newCount = 0;
    for (var i = 0; i < items.length; i++) {
      var item = items[i];
      if (item.id in itemIdToIndex)
        continue;
      newCount += 1;
      allItems.push(item);
      itemIdToIndex[item.id] = allItems.length - 1;
    }
    if (items && items.length > 0) {
      nextEndTimeToUse = items[items.length-1].lastVisitTime;
    }
    callback(newCount);
  });
}

function go() {
  getMoreHistory(function(cnt) { 
    console.log("got " + cnt);
    if (cnt > 0)
      go();
  });
}

Answer 3

https://bugs.chromium.org/p/chromium/issues/detail?id=73812

您需要添加开始时间

  var microsecondsBack = 1000 * 60 * 60 * 24 * days;

  var startTime = (new Date).getTime() - microsecondsBack;

Answer 4

我将在这个答案的前言中说，这可能是我用过的最糟糕的 API。

所有这些 SO 答案都会为您提供 HistoryItem，而不是 VisitItem。 HistoryItem 为您提供 URL、上次访问时间和访问计数。如果您想要完整的历史记录，您需要使用另一种方法来获取其他访问。

第 1 步 - 通过 HistoryItem 获取 URL

一次性完成

const items = await chrome.history.search({text: '', startTime: 0, maxResults: 0})

或以页为单位（例如 1000 页）

const set2 = new Set<string>()
let endTime = Number.MAX_SAFE_INTEGER
const items: HistoryItem[] = []
while (true) {
    const res2 = await chrome.history.search({text: '', maxResults: 1000, startTime: 0, endTime})

    if (res2.length === 0) {
      break
    }
    for (const historyItem of res2) {
      if (!historyItem.lastVisitTime) {
        // I have never seen this undefined, but ruling out for Typescript...
        console.warn('no last visit time', historyItem)
        continue
      }

      // I do not trust that the order will always be consistent,
      // so check every item for earliest time
      if (historyItem.lastVisitTime < endTime) {
        endTime = historyItem.lastVisitTime
      }

      // WORKAROUND API returns duplicate items
      if (!set.has(historyItem.id)) {
        set.add(historyItem.id)
        items.push(historyItem)
      }
    }
}

URL 在

history.search

结果中是唯一的。

第 2 步 - 通过 VisitItem 获取每个 URL 的 VisitItems

const allVisits:VisitItem[] = []
for (const item of items) {
    if (item.url) {
      const visits = await chrome.history.getVisits({url: item.url})
      allVisits.push(...visits)
    } else {
      console.warn(item)
    }
}

对 VisitItem[] 进行排序和分组以满足您的需求。您可以交叉引用 VisitItem 和 HistoryItem 以获取页面标题和其他字段。

历史项目访问计数与返回的访问项目计数不匹配。祝你好运，解决这个问题。

您无法通过

id

查找 VisitItem 或 HistoryItem。 🤣

LOL 🤯 最后，VisitItem 的

id

字段不是唯一的；它与所有其他 VisitItem 共享相同的 id，就像具有相同 URL 的 HistoryItem 的

id

字段一样。 VisitItems 有自己的

visitId

。

如何使用 chrome.history.search 获取我的全部历史记录？

问题描述投票：0回答：4

4个回答

第 1 步 - 通过 HistoryItem 获取 URL

一次性完成

或以页为单位（例如 1000 页）

第 2 步 - 通过 VisitItem 获取每个 URL 的 VisitItems

最新问题

如何使用 chrome.history.search 获取我的全部历史记录？

问题描述 投票：0回答：4

4个回答

第 1 步 - 通过 HistoryItem 获取 URL

一次性完成

或以页为单位（例如 1000 页）

第 2 步 - 通过 VisitItem 获取每个 URL 的 VisitItems

最新问题

问题描述投票：0回答：4