之前,我使用 SQL 连接来获取数据,10,000 行只需要几分之一秒的时间。
select name,
profession.expertise,
resonsible.name,
building.locker
from technicians
join profession on technicin.profession = profession.id
join responsible on techician.responsible = responsible.id
join building on technician.locker = buiding.id
迁移到云端后,我无法再访问该数据库,需要休息。 不幸的是,将其转换为 python REST 请求,即使只是 50 行,也会导致 20 秒的停顿,10000 行将只是几个小时!? 因为我必须遍历所有结果并请求相关关系链接数据,对吧?
伪Python
result_list = requests.get("https://sample.com/api/v1/technicians")
for technician in result_list:
professn = requests.get("https://sample.com/api/v1/technicians/technician_ID/profession")
resopnsible = requests.get("https://sample.com/api/v1/technicians/technician_ID/responsible")
building = requests.get("https://sample.com/api/v1/technicians/technician_ID/building")
问题:我提出的请求是否做错了什么?
迁移到基于 REST 的解决方案后,您将面临性能问题,现在为技术人员获取相关数据所需的时间比使用 SQL 连接时要长得多。您可以通过以下几种方法来优化 REST API 使用:
首先检查API是否支持在单个请求中查询相关实体。许多 REST API 提供
include
或 expand
参数,这可以大大减少 API 调用次数。
例如:
result_list = requests.get("https://sample.com/api/v1/technicians?include=profession,responsible,building")
这会在单个请求中返回技术人员及其
profession
、responsible
和 building
数据,从而避免对每个关系进行额外调用。
如果无法将相关数据批处理到一个请求中,您可以通过发出并发请求而不是按顺序执行请求来提高性能。 Python 的
concurrent.futures
或 asyncio
可以帮助解决这个问题。
这是一个使用
concurrent.futures
的示例:
import requests
from concurrent.futures import ThreadPoolExecutor
def get_related_data(technician):
technician_id = technician['id']
profession = requests.get(f"https://sample.com/api/v1/technicians/{technician_id}/profession").json()
responsible = requests.get(f"https://sample.com/api/v1/technicians/{technician_id}/responsible").json()
building = requests.get(f"https://sample.com/api/v1/technicians/{technician_id}/building").json()
return {'technician': technician, 'profession': profession, 'responsible': responsible, 'building': building}
# Fetch the list of technicians
result_list = requests.get("https://sample.com/api/v1/technicians").json()
# Fetch related data concurrently
with ThreadPoolExecutor(max_workers=10) as executor:
results = list(executor.map(get_related_data, result_list))
# `results` will now contain all technicians and their related data
这种方法通过并行获取数据而不是一次一个请求来减少等待时间。
如果某些实体(如
profession
或 responsible
)在多个技术人员之间共享或不经常更新,则在本地缓存此数据可以帮助减少冗余 API 调用。
这是一个简单的缓存实现:
cache = {}
def fetch_with_cache(url):
if url not in cache:
cache[url] = requests.get(url).json()
return cache[url]
result_list = requests.get("https://sample.com/api/v1/technicians").json()
for technician in result_list:
profession = fetch_with_cache(f"https://sample.com/api/v1/technicians/{technician['id']}/profession")
responsible = fetch_with_cache(f"https://sample.com/api/v1/technicians/{technician['id']}/responsible")
building = fetch_with_cache(f"https://sample.com/api/v1/technicians/{technician['id']}/building")
此方法减少了多次请求同一资源时的请求次数。
如果您获取的数据过多,请检查 API 是否允许部分响应,您可以仅请求所需的特定字段。这减少了有效负载大小并缩短了响应时间。