无法对所有bing API结果进行分页

Question

我目前正在使用Bing Web Search API v7查询Bing搜索结果。根据API文档，参数count和offset用于对结果进行分页，其总数由结果本身通过totalEstimatedMatches的值定义。

如下文档所示：

totalEstimatedMatches：与查询相关的估计网页数。使用此数字以及计数和偏移查询参数来分页结果。

这似乎达到了一定程度，之后API继续反复返回完全相同的结果，无论count和offset的值如何。

在我的具体案例中，totalEstimatedMatches设置在330,000。使用count的50（即每个请求50个结果），结果开始在offset 700周围重复，即3,500导致估计的330,000。

在使用bing前端时，一旦页数变得足够高，我就注意到了类似的行为，例如

https://www.bing.com/search?q=feed%3amp3&first=1&FORM=PERE - 初步搜索，估计51,000结果
https://www.bing.com/search?q=feed%3amp3&first=1000&FORM=PERE - 前1000，应该得到1000到1010的结果，但返回与下面的url相同的结果
https://www.bing.com/search?q=feed%3amp3&first=2000&FORM=PERE - first = 2000，应该得到2000到2010的结果，但返回与上面的url相同的结果

我是否错误地使用了API，或者这只是某种限制或错误，其中totalEstimatedMatches只是关闭？

Answer 1

totalEstimatedMatches提供网络上该查询的匹配总数 - 包括重复结果和近似类似内容。

为了优化索引编制，所有搜索引擎都会将结果限制在前N个网页中。这就是你所看到的。这种行为在所有搜索引擎中都是一致的，因为通常所有用户都会在2-3个搜索页面内更改查询/选择网页/放弃。

简而言之，这不是错误/不正确的实现，而是索引的优化限制了您获得更多结果。如果您确实需要获得更多结果，则可以使用相关搜索并附加唯一网页。

Answer 2

从技术上讲，这不是问题的直接答案。希望通过Bing的API提供一种有效分页的方法是有帮助的，而不必使用"totalEstimatedMatches"返回值，正如另一个答案所解释的那样，返回值可以表现得非常不可预测：这里有一些python：

class ApiWorker(object):
    def __init__(self, q):
        self.q = q
        self.offset = 0
        self.result_hashes = set()
        self.finished = False

    def calc_next_offset(self, resp_urls):
       before_adding = len(self.result_hashes)
       self.result_hashes.update((hash(i) for i in resp_urls)) #<==abuse of set operations.
       after_adding = len(self.result_hashes)
       if after_adding == before_adding: #<==then we either got a bunch of duplicates or we're getting very few results back.
           self.finished = True
       else:
           self.offset += len(new_results)

    def page_through_results(self, *args, **kwargs):
        while not self.finished:
            new_resp_urls = ...<call_logic>...
            self.calc_next_offset(new_resp_urls) 
            ...<save logic>...
        print(f'All unique results for q={self.q} have been obtained.')

一旦获得完整的重复响应，此^将停止分页。

无法对所有bing API结果进行分页

问题描述投票：1回答：2

2个回答

最新问题

无法对所有bing API结果进行分页

问题描述 投票：1回答：2

2个回答

最新问题

问题描述投票：1回答：2