如果是独立的持久连接,我没有问题,连接可持续几个小时。
如果我使用 psycopg(3) 连接池,我会发出请求,第一个请求没问题,我的池大小保持在 5,在某个时候池大小会减小,当客户端发出新请求时,我会收到池超时。
然后我尝试:启动池,不请求任何内容,只需等待。一段时间后(大约 1 小时)我查看 postgresql (pg_stat_activity),我有 5 个空闲(=池大小)连接。然后我从客户端发出请求,所有连接同时消失(我可以从 pg_stat_activity 中看到)和池超时,情况陷入困境。
我也尝试将 max_timeout 减少到 900 但仍然是同样的问题。
def init_pool(self, min_cnx=5):
cnx_str = f"host={DB_HOST} port={DB_PORT} dbname={DB_NAME} user={DB_USERNAME} password={DB_USERPWD}"
self.pool = ConnectionPool(conninfo=cnx_str, min_size=min_cnx, open=True, check=ConnectionPool.check_connection)
def query(self, q, dbv=None, debug=False) -> list:
print("pool size: ", len(self.pool._pool))
print("pool stats before: ", self.pool.get_stats())
with self.pool.connection() as cnx:
if cnx.closed:
self.pool.check()
raise ConnectionError("ERROR: PostgreSQL cnx from pool is closed.")
cnx.autocommit = True
cnx.row_factory = self.row_factory
with psycopg.ClientCursor(cnx) as rdc:
rdc.execute(q, dbv) if dbv else rdc.execute(q)
if debug and rdc._query:
print(rdc._query.query)
if rdc.description:
data = rdc.fetchall()
else:
data = []
print("pool stats after query: ", self.pool.get_stats())
print("pool stats after: ", self.pool.get_stats())
return data
和日志:
[pid: 236344|app: 0|req: 26/26] () {56 vars in 1083 bytes} [Mon Feb 5 11:41:56 2024] POST /v1/user => generated 933 bytes in
109 msecs (HTTP/1.1 200) 8 headers in 749 bytes (1 switches on core 0)
pool size: 3
pool stats before: {'connections_num': 5, 'requests_num': 3, 'requests_queued': 1, 'connections_ms': 268, 'requests_wait_ms': 34, 'usage_ms
': 34, 'pool_min': 5, 'pool_max': 5, 'pool_size': 5, 'pool_available': 3, 'requests_waiting': 0}
pool stats after query: {'connections_num': 5, 'requests_num': 4, 'requests_queued': 1, 'connections_ms': 268, 'requests_wait_ms': 34, 'usa
ge_ms': 34, 'pool_min': 5, 'pool_max': 5, 'pool_size': 5, 'pool_available': 2, 'requests_waiting': 0}
pool stats after: {'connections_num': 5, 'requests_num': 4, 'requests_queued': 1, 'connections_ms': 268, 'requests_wait_ms': 34, 'usage_ms'
: 49, 'pool_min': 5, 'pool_max': 5, 'pool_size': 5, 'pool_available': 2, 'requests_waiting': 0}
[pid: 236344|app: 0|req: 28/28] () {56 vars in 1087 bytes} [Mon Feb 5 11:41:58 2024] POST /v1/iobjs => generated 4788 bytes i
n 29 msecs (HTTP/1.1 200) 6 headers in 302 bytes (1 switches on core 0)
[pid: 236344|app: 0|req: 29/29] () {54 vars in 816 bytes} [Mon Feb 5 11:42:05 2024] OPTIONS /v1/user/quit => generated 0 byte
s in 0 msecs (HTTP/1.1 200) 6 headers in 307 bytes (0 switches on core 0)
pool size: 0
pool stats before: {'connections_num': 5, 'requests_num': 6, 'requests_queued': 1, 'connections_ms': 268, 'requests_wait_ms': 34, 'usage_ms
': 62, 'pool_min': 5, 'pool_max': 5, 'pool_size': 5, 'pool_available': 0, 'requests_waiting': 0}
Traceback (most recent call last):
File "/var/srvr/log.py", line 68, in process
self.db.query(
File "/var/srvr/pg3p.py", line 71, in query
with self.pool.connection() as cnx:
File "/usr/local/lib/python3.12/contextlib.py", line 137, in __enter__
return next(self.gen)
^^^^^^^^^^^^^^
File "/var/srvr/lib/python3.12/site-packages/psycopg_pool/pool.py", line 170, in connection
conn = self.getconn(timeout=timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/var/srvr/lib/python3.12/site-packages/psycopg_pool/pool.py", line 204, in getconn
raise PoolTimeout(
psycopg_pool.PoolTimeout: couldn't get a connection after 30.00 sec
pool size: 0
pool stats before: {'connections_num': 5, 'requests_num': 7, 'requests_queued': 2, 'connections_ms': 268, 'requests_wait_ms': 30035, 'usage
_ms': 62, 'requests_errors': 1, 'pool_min': 5, 'pool_max': 5, 'pool_size': 5, 'pool_available': 0, 'requests_waiting': 1}
编辑:
我切换回 1 个持久连接,非常稳定(几天)。然后按照评论中的建议,我回到了 min_size=10 和 max_size=20 的池。
行为没有变化:池正在丢失连接,而没有尝试启动新连接来替换丢失的连接(也尝试了 20 和 50 分钟/最大,没有区别)
pool stats after: {'connections_num': 11, 'requests_num': 34, 'requests_queued': 1, 'connections_ms': 641, 'requests_wait_ms': 37, 'usage_m s': 67, 'pool_min': 10, 'pool_max': 20, 'pool_size': 11, 'pool_available': 5, 'requests_waiting': 0}
[pid: 282017|app: 0|req: 39/39] () {56 vars in 1087 bytes}[Thu Feb 8 11:02:17 2024] POST /v1/iobjs => generated 30081 bytes in 10 msecs (HTTP/1.1 200) 6 headers in 303 bytes (1 switches on core 0)
pool size: 5
pool stats before: {'connections_num': 11, 'requests_num': 34, 'requests_queued': 1, 'connections_ms': 641, 'requests_wait_ms': 37, 'usage_ ms': 67, 'pool_min': 10, 'pool_max': 20, 'pool_size': 11, 'pool_available': 5, 'requests_waiting': 0}
pool stats after query: {'connections_num': 11, 'requests_num': 35, 'requests_queued': 1, 'connections_ms': 641, 'requests_wait_ms': 37, 'u sage_ms': 67, 'connections_lost': 1, 'pool_min': 10, 'pool_max': 20, 'pool_size': 11, 'pool_available': 3, 'requests_waiting': 0}
pool stats after: {'connections_num': 11, 'requests_num': 35, 'requests_queued': 1, 'connections_ms': 641, 'requests_wait_ms': 37, 'usage_m s': 70, 'connections_lost': 1, 'pool_min': 10, 'pool_max': 20, 'pool_size': 11, 'pool_available': 3, 'requests_waiting': 0}
[pid: 282017|app: 0|req: 40/40] () {56 vars in 1087 bytes} [Thu Feb 8 11:02:17 2024] POST /v1/iobjs => generated 4788 bytes i n 5 msecs (HTTP/1.1 200) 6 headers in 302 bytes (1 switches on core 0)
[pid: 282017|app: 0|req: 41/41] () {54 vars in 808 bytes} [Thu Feb 8 11:02:26 2024] OPTIONS /v1/iobjs => generated 0 bytes in 0 msecs (HTTP/1.1 200) 6 headers in 307 bytes (0 switches on core 0)
[pid: 282017|app: 0|req: 42/42] () {54 vars in 814 bytes} [Thu Feb 8 11:02:26 2024] OPTIONS /v1/settings => generated 0 bytes in 0 msecs (HTTP/1.1 200) 6 headers in 307 bytes (0 switches on core 0)
pool size: 3
pool stats before: {'connections_num': 11, 'requests_num': 35, 'requests_queued': 1, 'connections_ms': 641, 'requests_wait_ms': 37, 'usage_ ms': 70, 'connections_lost': 1, 'pool_min': 10, 'pool_max': 20, 'pool_size': 11, 'pool_available': 3, 'requests_waiting': 0}
pool stats after query: {'connections_num': 11, 'requests_num': 36, 'requests_queued': 1, 'connections_ms': 641, 'requests_wait_ms': 37, 'u sage_ms': 70, 'connections_lost': 1, 'pool_min': 10, 'pool_max': 20, 'pool_size': 11, 'pool_available': 2, 'requests_waiting': 0}
pool stats after: {'connections_num': 11, 'requests_num': 36, 'requests_queued': 1, 'connections_ms': 641, 'requests_wait_ms': 37, 'usage_m s': 73, 'connections_lost': 1, 'pool_min': 10, 'pool_max': 20, 'pool_size': 11, 'pool_available': 2, 'requests_waiting': 0}
[pid: 282017|app: 0|req: 43/43] () {56 vars in 1087 bytes} [Thu Feb 8 11:02:26 2024] POST /v1/iobjs => generated 4788 bytes i n 6 msecs (HTTP/1.1 200) 6 headers in 302 bytes (1 switches on core 0)
Traceback (most recent call last): File "main.py", line 326, in v1_settings
row = db.query_row(
^^^^^^^^^^^^^
并且 postgresql 日志(debug3)没有显示任何特别的内容(AFAIU):
2024-02-08 11:02:17.075 CET [282007] pguser@maindb DEBUG: proc_exit(0): 2 callbacks to make
2024-02-08 11:02:17.075 CET [282007] pguser@maindb DEBUG: exit(0)
2024-02-08 11:02:17.075 CET [282007] pguser@maindb DEBUG: shmem_exit(-1): 0 before_shmem_exit callbacks to make
2024-02-08 11:02:17.075 CET [282007] pguser@maindb DEBUG: shmem_exit(-1): 0 on_shmem_exit callbacks to make
2024-02-08 11:02:17.075 CET [282007] pguser@maindb DEBUG: proc_exit(-1): 0 callbacks to make
2024-02-08 11:02:17.079 CET [281970] DEBUG: server process (PID 282007) exited with exit code 0
2024-02-08 11:02:26.183 CET [282006] pguser@maindb DEBUG: shmem_exit(0): 4 before_shmem_exit callbacks to make
2024-02-08 11:02:26.183 CET [282006] pguser@maindb DEBUG: shmem_exit(0): 6 on_shmem_exit callbacks to make
2024-02-08 11:02:26.183 CET [282006] pguser@maindb DEBUG: proc_exit(0): 2 callbacks to make
2024-02-08 11:02:26.183 CET [282006] pguser@maindb DEBUG: exit(0)
2024-02-08 11:02:26.183 CET [282006] pguser@maindb DEBUG: shmem_exit(-1): 0 before_shmem_exit callbacks to make
2024-02-08 11:02:26.183 CET [282006] pguser@maindb DEBUG: shmem_exit(-1): 0 on_shmem_exit callbacks to make
2024-02-08 11:02:26.183 CET [282006] pguser@maindb DEBUG: proc_exit(-1): 0 callbacks to make
2024-02-08 11:02:26.188 CET [282009] pguser@maindb DEBUG: shmem_exit(0): 4 before_shmem_exit callbacks to make
2024-02-08 11:02:26.189 CET [282009] pguser@maindb DEBUG: shmem_exit(0): 6 on_shmem_exit callbacks to make
2024-02-08 11:02:26.189 CET [282009] pguser@maindb DEBUG: proc_exit(0): 2 callbacks to make
2024-02-08 11:02:26.189 CET [282009] pguser@maindb DEBUG: exit(0)
2024-02-08 11:02:26.189 CET [282009] pguser@maindb DEBUG: shmem_exit(-1): 0 before_shmem_exit callbacks to make
2024-02-08 11:02:26.189 CET [282009] pguser@maindb DEBUG: shmem_exit(-1): 0 on_shmem_exit callbacks to make
2024-02-08 11:02:26.189 CET [282009] pguser@maindb DEBUG: proc_exit(-1): 0 callbacks to make
2024-02-08 11:02:26.189 CET [281970] DEBUG: server process (PID 282006) exited with exit code 0
2024-02-08 11:02:26.191 CET [282011] pguser@maindb DEBUG: shmem_exit(0): 4 before_shmem_exit callbacks to make
2024-02-08 11:02:26.191 CET [282011] pguser@maindb DEBUG: shmem_exit(0): 6 on_shmem_exit callbacks to make
2024-02-08 11:02:26.191 CET [282011] pguser@maindb DEBUG: proc_exit(0): 2 callbacks to make
2024-02-08 11:02:26.192 CET [282011] pguser@maindb DEBUG: exit(0)
2024-02-08 11:02:26.192 CET [282011] pguser@maindb DEBUG: shmem_exit(-1): 0 before_shmem_exit callbacks to make
2024-02-08 11:02:26.192 CET [282011] pguser@maindb DEBUG: shmem_exit(-1): 0 on_shmem_exit callbacks to make
2024-02-08 11:02:26.192 CET [282011] pguser@maindb DEBUG: proc_exit(-1): 0 callbacks to make
2024-02-08 11:02:26.193 CET [281970] DEBUG: server process (PID 282009) exited with exit code 0
2024-02-08 11:02:26.194 CET [281970] DEBUG: server process (PID 282011) exited with exit code 0
2024-02-08 11:02:33.979 CET [281970] DEBUG: postmaster received pmsignal signal
2024-02-08 11:02:33.983 CET [282844] DEBUG: InitPostgres
2024-02-08 11:02:33.985 CET [282844] DEBUG: autovacuum: processing database "template1"
query_row() 与 query() 相同:
def query_row(self, q, dbv=None, debug=False):
with self.pool.connection() as cnx:
cnx.autocommit = True
cnx.row_factory = self.row_factory
with psycopg.ClientCursor(cnx) as c:
c.execute(q, dbv) if dbv else c.execute(q)
if debug and c._query:
print(c._query.query)
if c.rowcount == 1:
return c.fetchone()
else:
return None
来自 postgresql 的连接日志给出了提示:在所有连接消失之前出现 SSL 错误:
2024-02-09 10:38:05.627 CET [297036] LOG: checkpoint starting: time
2024-02-09 10:38:05.739 CET [297036] LOG: checkpoint complete: wrote 2 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0. 103 s, sync=0.002 s, total=0.112 s; sync files=2, longest=0.001 s, average=0.001 s; distance=1 kB, estimate=1 kB; lsn=0/1C4BB70, redo lsn=0/1C4BB38
2024-02-09 10:46:46.488 CET [297518] user@db LOG: SSL error: decryption failed or bad record mac
2024-02-09 10:46:46.489 CET [297518] user@db LOG: could not receive data from client: Connection reset by peer
2024-02-09 10:46:46.489 CET [297518] user@db LOG: disconnection: session time: 0:10:00.165 user=user database=db host=127.0.0.1 port=56266
2024-02-09 12:18:21.905 CET [297528] user@db LOG: disconnection: session time: 1:41:35.416 user=user database=db host=127.0.0.1 port=56356
2024-02-09 12:18:21.909 CET [297519] user@db LOG: disconnection: session time: 1:41:35.587 user=user database=db host=127.0.0.1 port=56276
2024-02-09 12:18:24.998 CET [297520] user@db LOG: disconnect103 s, sync=0.002 s, total=0.112 s; sync files=2, longest=0.001 s, average=0.001 s; distance=1 kB, estimate=1 kB; lsn=0/1C4BB70, redo lsn=0/1C4BB38
2024-02-09 10:46:46.488 CET [297518] user@db LOG: SSL error: decryption failed or bad record macion: session time: 1:41:38.674 user=user database=db host=127.0.0.1 port=56290
2024-02-09 12:18:25.033 CET [297521] user@db LOG: disconnection: session time: 1:41:38.672 user=user database=db host=127.0.0.1 port=56296
2024-02-09 12:18:33.726 CET [297522] user@db LOG: disconnection: session time: 1:41:47.360 user=user database=db host=127.0.0.1 port=56302
2024-02-09 12:18:33.739 CET [297523] user@db LOG: disconnection: session time: 1:41:47.372 user=user database=db host=127.0.0.1 port=56308
2024-02-09 12:18:33.745 CET [297524] user@db LOG: disconnection: session time: 1:41:47.316 user=user database=db host=127.0.0.1 port=56310
2024-02-09 12:18:33.760 CET [297525] user@db LOG: disconnection: session time: 1:41:47.325 user=user database=db host=127.0.0.1 port=56320
2024-02-09 12:18:35.793 CET [297526] user@db LOG: disconnection: session time: 1:41:49.353 user=user database=db host=127.0.0.1 port=56336
2024-02-09 12:18:35.847 CET [297527] user@db LOG: disconnection: session time: 1:41:49.365 user=user database=db host=127.0.0.1 port=56340
此 SSL 错误的原因来自 uwsgi 默认行为:如果您有 2 个工作线程,则在 init 时创建一个上下文(初始化变量,其中包括池和连接),然后工作线程共享内存。这导致了两个工作人员之间共享连接的经典情况。
在某个时间点,它会导致 SSL 不一致,从而导致大规模关闭。
解决方案是使用
lazy-apps=true
指示 uwsgi 创建具有自己上下文的每个工作线程。