我正在使用 FastAPI、Supabase 和 SQLAlchemy 构建 API,并使用 ORM 在一行表上执行简单的 select 语句需要 2-3 秒,这是不可接受的。
我尝试过这两个函数来查询Users表:
import time
from sqlalchemy import select, text
from app.models import User
from app.core.db_client import SessionLocal
from app.core.core_logging import logger
# Read users
@profile_decorator
def read_users_sa():
# TAKES 2-3 SECONDS TO EXECUTE !!!!!!!!!!
session = SessionLocal()
users = session.query(User).all()
session.close()
return users
@profile_decorator
def read_users_raw_sql():
# TAKES 0.36 SECONDS
try:
# Create session with minimal configuration
session = SessionLocal()
try:
query = text("SELECT * FROM users")
result = session.execute(query)
users = result.fetchall()
return users
finally:
session.close()
except Exception as e:
logger.error(f"SQLAlchemy query error: {e}")
return None
这是我的用户 SQLAlchemy 模型:
from .config.base import Base
from .config.shared import * # shared imports that all the models use
class User(Base):
"""
User model for storing user related details
"""
__tablename__ = "users"
# Required fields
user_id = Column(UUID(as_uuid=True), primary_key=True, nullable=False) # Foreign key added on dashboard
team_id = Column(UUID(as_uuid=True), nullable=True) # Future: references teams(team_id)
first_name = Column(String(64), nullable=False)
last_name = Column(String(64), nullable=False)
marketing_opt_in = Column(Boolean, default=False)
tracking_opt_out = Column(Boolean, default=False)
# Optional fields
birth_date = Column(Date, nullable=True)
timezone = Column(String(4), nullable=True)
address = Column(JSON, nullable=True)
billing_address = Column(JSON, nullable=True)
preferences = Column(JSON, nullable=True)
payment_method = Column(JSON, nullable=True)
# Timestamps
created_at = Column(TIMESTAMP(timezone=True), nullable=False, server_default=text('now()'))
updated_at = Column(TIMESTAMP(timezone=True), nullable=False, server_default=text('now()'))
# Constraints
UniqueConstraint(user_id, name="unique_user_id")
这是我的数据库客户端文件:
from app.core.config import settings
from supabase import create_client, Client
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
url = settings.SUPABASE_URL
key = settings.SUPABASE_API_KEY
# Supabase client
supabase: Client = create_client(url, key)
# SQLAlchemy engine
engine = create_engine(settings.POSTGRES_POOLER_URI)
# SQLAlchemy session
SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
read_users_raw_sql()
执行需要约 0.3 秒
read_users_sa()
需要2-3秒执行
postgresql://postgres.<instance_name>:<pwd>@aws-0-us-east-1.pooler.supabase.com:6543/postgres
如果有人能解释这一点,那就太好了。
select
构造函数总而言之,这些都没有帮助我解决问题。
编辑
我改变了两个函数的执行顺序。我首先运行原始 SQL,然后运行 ORM 查询。现在表现逆转了。
启动后执行的第一个查询似乎需要 2-3 秒,而后续查询则快了 10 倍。
我编写这个脚本是为了在本地计算机上尝试这一点,但它只是本地应用程序服务器 --> 本地数据库服务器,因此数字很小,可能无法扩展到您正在处理的内容。 另外,我不是网络专家,但根据您启动的方式,解析 DNS 可能会出现延迟,特别是如果您通过同一域名动态分配不同的服务器。
这测试起来有点奇怪,因为 sqlalchemy 池仅按需分配连接,通常每个线程 1 个连接或每个任务 1 个连接。 因此,很容易认为您正在连接,但您要么没有连接,要么只是重新使用池中的相同连接,但我认为这个测试脚本避免了这种情况。 显然,不要在同一个主线程中打开大量连接,就像我在这里尝试为测试以外的任何事情所做的那样。
import os
from datetime import datetime, timedelta
from pprint import pprint
from sqlalchemy import create_engine
from sqlalchemy.sql import text
from time import sleep
def get_engine(env):
PG_URI = f"postgresql+psycopg://{env['DB_USER']}:{env['DB_PASSWORD']}@{env['DB_HOST']}:{env['DB_PORT']}/{env['DB_NAME']}"
return create_engine(PG_URI,
echo=False,
echo_pool='debug',
pool_size=5,
pool_timeout=5,
pool_pre_ping=True,
connect_args={
"connect_timeout": 5,
"options": f"-c statement-timeout={5 * 1000}",
},
)
def test_engine():
engine = get_engine(os.environ)
connections = []
connection_times = []
try:
for _ in range(5):
b4 = datetime.now()
conn = engine.connect()
print (conn.execute(text('SELECT 1')).fetchone())
connections.append(conn)
connection_times.append(datetime.now() - b4)
pprint (connection_times)
pprint (connections)
print (f'max={max(connection_times)}')
print (f'min={min(connection_times)}')
print (f'avg={sum(connection_times, timedelta(seconds=0)) / len(connection_times)}')
print (f'unique={len(set(connections))}')
finally:
for conn in connections:
conn.close()
engine.dispose()
def main():
test_engine()
sleep(1)
test_engine()
if __name__ == '__main__':
main()
注意,
pool_echo="debug"
标志有大量输出,但脚本会输出一些数字,似乎第一次连接总是需要更长的时间,但这个数据集和毫秒是如此之小,我不知道这是否可靠。
-- the first call
[datetime.timedelta(microseconds=15121),
datetime.timedelta(microseconds=7246),
datetime.timedelta(microseconds=9117),
datetime.timedelta(microseconds=6799),
datetime.timedelta(microseconds=7614)]
[<sqlalchemy.engine.base.Connection object at 0x7f58b18aced0>,
<sqlalchemy.engine.base.Connection object at 0x7f58af1b3b90>,
<sqlalchemy.engine.base.Connection object at 0x7f58af1b3b50>,
<sqlalchemy.engine.base.Connection object at 0x7f58af1b3c50>,
<sqlalchemy.engine.base.Connection object at 0x7f58af1b3bd0>]
max=0:00:00.015121
min=0:00:00.006799
avg=0:00:00.009179
unique=5
--- the second call
[datetime.timedelta(microseconds=11134),
datetime.timedelta(microseconds=6415),
datetime.timedelta(microseconds=8002),
datetime.timedelta(microseconds=6963),
datetime.timedelta(microseconds=8338)]
[<sqlalchemy.engine.base.Connection object at 0x7f58b18aced0>,
<sqlalchemy.engine.base.Connection object at 0x7f58af1c3250>,
<sqlalchemy.engine.base.Connection object at 0x7f58af1c3590>,
<sqlalchemy.engine.base.Connection object at 0x7f58af1c3b50>,
<sqlalchemy.engine.base.Connection object at 0x7f58af1c3ed0>]
max=0:00:00.011134
min=0:00:00.006415
avg=0:00:00.008170
unique=5