无法从URL获取多个Json响应

Question

我将 id 值传递给 API url 以获取 JSON 响应，但仅获取一个响应，其余所有响应都会抛出 500 个错误。我收集列表中的 id，并将 id 作为 while 循环中的参数传递给 API URL 以提取数据。

###Get id in a variable##     
                                                                                                                                                               
df_filter=spark.sql("""select distinct ID from filter_view""")

rdd = df_filter.rdd
listOfRows = rdd.collect()
counter = 0
##total_results = []
while counter < len(listOfRows):
    url += '?ids=' + listOfRows[counter].ID 
    response = requests.get(url,headers=headers)
       
    if response.status_code == 200:
        json_response = response.json()
        ##total_results.append(json_response)
        df2 = pd.json_normalize(json_response, record_path=['entities'])
        display(df2)
        
    else:
        print("Error: HTTP status code " + str(response.status_code))
    counter +=1

我只得到一个 ID 的输出，其余的都以 500 个错误结束。

所需输出：

ID---ItemID--Details
1    100      text
1    101      text
2    200      text
2    300      text
3    400      sometext
3    500      sometext

我得到的输出：

ID---ItemID--Details
1    100     text    
1    101     text
Error: HTTP status code 500
Error: HTTP status code 500
Error: HTTP status code 500
Error: HTTP status code 500
Error: HTTP status code 500
Error: HTTP status code 500

Answer 1

第一次迭代生成一个有效的 URL：

baseURL/?ids=1

，但由于它是使用串联和赋值构建的，因此当您想要

baseURL/?ids=1?ids=2

时，第二次迭代会生成

baseURL/?ids=2

。

while counter < len(listOfRows):
    response = requests.get(f'{url}?ids={listOfRows[counter].ID}', headers=headers)

API是否支持在单个请求中获取多个资源？通常，对于像

ids

这样的复数查询参数，它将采用逗号分隔的资源 ID 列表 (

?ids=1,2,3

) 或数组（

?ids[]=1&ids[]=2&ids[]=3

或

?ids=1&ids=2&ids=3

）。如果是这样，提出这样的请求会更高效，对 API 提供商也更有礼貌。

response = requests.get(
    url + '?ids=' + ','.join([row.ID for row in listOfRows]),
    headers=headers
)

您可能需要更改代码来解析新响应。

如果不支持多个 GET，至少将其转换为 for 循环。无需跟踪

counter

和测试

counter < len(listOfRows)

，这将提高可读性。

df_filter=spark.sql("""select distinct ID from filter_view""")

rdd = df_filter.rdd
listOfRows = rdd.collect()
for row in listOfRows:
    response = requests.get(f'{url}?ids={row.ID}', headers=headers)
       
    if response.status_code == 200:
        json_response = response.json()
        df2 = pd.json_normalize(json_response, record_path=['entities'])
        display(df2)
        
    else:
        print("Error: HTTP status code " + str(response.status_code))

更新：基于评论

我有超过5000个id需要一一传递。这如何以每个 20 个块的形式传递？

构建

...?ids=1&ids=2&ids=3...

形式的 URL，每个 URL 的 ID 不超过 20 个。

from itertools import islice
def chunker(it: seq, chunksize):
    iterator = iter(it)
    while chunk := list(islice(iterator, chunksize)):
        yield chunk

for id_chunk in chunker([row.ID for row in listOfRows], 20):
    response = requests.get(
        f'{url}?ids=' + '&ids='.join(id_chunk),
        headers=headers
    )

chunker()

会将一个迭代拆分为长度为

list

的 <=

chunksize

。首先过滤

listOfRows

仅查找 ID。然后将 ID 分成长度为 20 的

list

。构建 URL 并发出请求。谢谢卡夫兰的

chunker()

。

无法从URL获取多个Json响应

问题描述投票：0回答：1

1个回答

最新问题

无法从URL获取多个Json响应

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1