我创建了一个生成器来对 api 执行分页:
def page_helper(req, timeout=5, page=1, **kwargs):
print(f"Page {page}", end="\r")
try:
response = req(params={**kwargs, "page": page})
response = response.json()
except Exception as e:
status = response.status_code
if status == "429":
print(f"Rate limited. Waiting {timeout} seconds.")
time.sleep(timeout)
yield from page_helper(req, page=page, **kwargs)
else:
raise e
else:
if len(response) == kwargs["limit"]:
yield from page_helper(req, page=page + 1, **kwargs)
yield response
后来我在这样的地方使用这个生成器
batches = page_helper(<some_request>)
# get insert and updates per batch
for i, batch in enumerate(batches):
print(f"Batch {i + 1}", end="\r")
insert_batch = []
update_batch = []
# ... process batch
我希望它批量获取每个页面并在获取下一批之前对其进行处理。获取批次工作完美,但它继续获取页面而不在中间进行处理。
我尝试通过调用 next 来检查生成器,我希望它只返回一批。然而它立即开始完整的迭代:
next(batches) # --> Performs full iteration
next(batches)
next(batches)
next(batches)
我的生成器功能有问题吗?
迭代器不起作用,因为它在递归中从自身产生,但实际上没有产生任何东西。
这是一个简单的修复方法,方法是在从递归中产生之前先产生一个项目:
def page_helper(req, timeout=5, page=1, **kwargs):
print(f"Page {page}", end="\r")
try:
response = req(params={**kwargs, "page": page})
response = response.json()
except Exception as e:
status = response.status_code
if status == "429":
print(f"Rate limited. Waiting {timeout} seconds.")
time.sleep(timeout)
yield from page_helper(req, page=page, **kwargs)
else:
raise e
else:
yield response # --> moved before yielding from itself
if len(response) == kwargs["limit"]:
yield from page_helper(req, page=page + 1, **kwargs)