psycopg3 池:所有连接在长时间空闲后立即丢失

问题描述 投票:0回答:1

如果是独立的持久连接,我没有问题,连接可持续几个小时。
如果我使用 psycopg(3) 连接池,我会发出请求,第一个请求没问题,我的池大小保持在 5,在某个时候池大小会减小,当客户端发出新请求时,我会收到池超时。
然后我尝试:启动池,不请求任何内容,只需等待。一段时间后(大约 1 小时)我查看 postgresql (pg_stat_activity),我有 5 个空闲(=池大小)连接。然后我从客户端发出请求,所有连接同时消失(我可以从 pg_stat_activity 中看到)和池超时,情况陷入困境。

我也尝试将 max_timeout 减少到 900 但仍然是同样的问题。

def init_pool(self, min_cnx=5):
    cnx_str = f"host={DB_HOST} port={DB_PORT} dbname={DB_NAME} user={DB_USERNAME} password={DB_USERPWD}"
    self.pool = ConnectionPool(conninfo=cnx_str, min_size=min_cnx, open=True, check=ConnectionPool.check_connection)

def query(self, q, dbv=None, debug=False) -> list:

    print("pool size: ", len(self.pool._pool))
    print("pool stats before: ", self.pool.get_stats())

    with self.pool.connection() as cnx:

        if cnx.closed:
            self.pool.check()
            raise ConnectionError("ERROR: PostgreSQL cnx from pool is closed.")

        cnx.autocommit = True
        cnx.row_factory = self.row_factory

        with psycopg.ClientCursor(cnx) as rdc:
            rdc.execute(q, dbv) if dbv else rdc.execute(q)

            if debug and rdc._query:
                print(rdc._query.query)

            if rdc.description:
                data = rdc.fetchall()
            else:
                data = []

        print("pool stats after query: ", self.pool.get_stats())
    print("pool stats after: ", self.pool.get_stats())
    return data

和日志:

[pid: 236344|app: 0|req: 26/26] () {56 vars in 1083 bytes} [Mon Feb  5 11:41:56 2024] POST /v1/user => generated 933 bytes in 
109 msecs (HTTP/1.1 200) 8 headers in 749 bytes (1 switches on core 0)                                                                                                                                         
pool size:  3                                                                                                                               
pool stats before:  {'connections_num': 5, 'requests_num': 3, 'requests_queued': 1, 'connections_ms': 268, 'requests_wait_ms': 34, 'usage_ms
': 34, 'pool_min': 5, 'pool_max': 5, 'pool_size': 5, 'pool_available': 3, 'requests_waiting': 0}                                            
pool stats after query:  {'connections_num': 5, 'requests_num': 4, 'requests_queued': 1, 'connections_ms': 268, 'requests_wait_ms': 34, 'usa
ge_ms': 34, 'pool_min': 5, 'pool_max': 5, 'pool_size': 5, 'pool_available': 2, 'requests_waiting': 0}                                       
pool stats after:  {'connections_num': 5, 'requests_num': 4, 'requests_queued': 1, 'connections_ms': 268, 'requests_wait_ms': 34, 'usage_ms'
: 49, 'pool_min': 5, 'pool_max': 5, 'pool_size': 5, 'pool_available': 2, 'requests_waiting': 0}                                             
[pid: 236344|app: 0|req: 28/28] () {56 vars in 1087 bytes} [Mon Feb  5 11:41:58 2024] POST /v1/iobjs => generated 4788 bytes i
n 29 msecs (HTTP/1.1 200) 6 headers in 302 bytes (1 switches on core 0)                                                                     
[pid: 236344|app: 0|req: 29/29] () {54 vars in 816 bytes} [Mon Feb  5 11:42:05 2024] OPTIONS /v1/user/quit => generated 0 byte
s in 0 msecs (HTTP/1.1 200) 6 headers in 307 bytes (0 switches on core 0)                                                                                                                                                              
pool size:  0   
pool stats before:  {'connections_num': 5, 'requests_num': 6, 'requests_queued': 1, 'connections_ms': 268, 'requests_wait_ms': 34, 'usage_ms
': 62, 'pool_min': 5, 'pool_max': 5, 'pool_size': 5, 'pool_available': 0, 'requests_waiting': 0}                                            
Traceback (most recent call last):                                                          
  File "/var/srvr/log.py", line 68, in process
    self.db.query(
  File "/var/srvr/pg3p.py", line 71, in query
    with self.pool.connection() as cnx:
  File "/usr/local/lib/python3.12/contextlib.py", line 137, in __enter__
    return next(self.gen)
           ^^^^^^^^^^^^^^
  File "/var/srvr/lib/python3.12/site-packages/psycopg_pool/pool.py", line 170, in connection
    conn = self.getconn(timeout=timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/srvr/lib/python3.12/site-packages/psycopg_pool/pool.py", line 204, in getconn
    raise PoolTimeout(
psycopg_pool.PoolTimeout: couldn't get a connection after 30.00 sec  
pool size:  0  
pool stats before:  {'connections_num': 5, 'requests_num': 7, 'requests_queued': 2, 'connections_ms': 268, 'requests_wait_ms': 30035, 'usage
_ms': 62, 'requests_errors': 1, 'pool_min': 5, 'pool_max': 5, 'pool_size': 5, 'pool_available': 0, 'requests_waiting': 1}

编辑:
我切换回 1 个持久连接,非常稳定(几天)。然后按照评论中的建议,我回到了 min_size=10 和 max_size=20 的池。
行为没有变化:池正在丢失连接,而没有尝试启动新连接来替换丢失的连接(也尝试了 20 和 50 分钟/最大,没有区别)

pool stats after:  {'connections_num': 11, 'requests_num': 34, 'requests_queued': 1, 'connections_ms': 641, 'requests_wait_ms': 37, 'usage_m s': 67, 'pool_min': 10, 'pool_max': 20, 'pool_size': 11, 'pool_available': 5, 'requests_waiting': 0}
[pid: 282017|app: 0|req: 39/39]  () {56 vars in 1087 bytes}[Thu Feb  8 11:02:17 2024] POST /v1/iobjs => generated 30081 bytes  in 10 msecs (HTTP/1.1 200) 6 headers in 303 bytes (1 switches on core 0)
pool size:  5                                                         
pool stats before:  {'connections_num': 11, 'requests_num': 34, 'requests_queued': 1, 'connections_ms': 641, 'requests_wait_ms': 37, 'usage_ ms': 67, 'pool_min': 10, 'pool_max': 20, 'pool_size': 11, 'pool_available': 5, 'requests_waiting': 0}                          
pool stats after query:  {'connections_num': 11, 'requests_num': 35, 'requests_queued': 1, 'connections_ms': 641, 'requests_wait_ms': 37, 'u sage_ms': 67, 'connections_lost': 1, 'pool_min': 10, 'pool_max': 20, 'pool_size': 11, 'pool_available': 3, 'requests_waiting': 0}       
pool stats after:  {'connections_num': 11, 'requests_num': 35, 'requests_queued': 1, 'connections_ms': 641, 'requests_wait_ms': 37, 'usage_m s': 70, 'connections_lost': 1, 'pool_min': 10, 'pool_max': 20, 'pool_size': 11, 'pool_available': 3, 'requests_waiting': 0}       
[pid: 282017|app: 0|req: 40/40]  () {56 vars in 1087 bytes} [Thu Feb  8 11:02:17 2024] POST /v1/iobjs => generated 4788 bytes i n 5 msecs (HTTP/1.1 200) 6 headers in 302 bytes (1 switches on core 0)   
[pid: 282017|app: 0|req: 41/41]  () {54 vars in 808 bytes} [Thu Feb  8 11:02:26 2024] OPTIONS /v1/iobjs => generated 0 bytes in  0 msecs (HTTP/1.1 200) 6 headers in 307 bytes (0 switches on core 0)   
[pid: 282017|app: 0|req: 42/42]  () {54 vars in 814 bytes} [Thu Feb  8 11:02:26 2024] OPTIONS /v1/settings => generated 0 bytes  in 0 msecs (HTTP/1.1 200) 6 headers in 307 bytes (0 switches on core 0)                                                                  
pool size:  3                                                          
pool stats before:  {'connections_num': 11, 'requests_num': 35, 'requests_queued': 1, 'connections_ms': 641, 'requests_wait_ms': 37, 'usage_ ms': 70, 'connections_lost': 1, 'pool_min': 10, 'pool_max': 20, 'pool_size': 11, 'pool_available': 3, 'requests_waiting': 0}       
pool stats after query:  {'connections_num': 11, 'requests_num': 36, 'requests_queued': 1, 'connections_ms': 641, 'requests_wait_ms': 37, 'u sage_ms': 70, 'connections_lost': 1, 'pool_min': 10, 'pool_max': 20, 'pool_size': 11, 'pool_available': 2, 'requests_waiting': 0}       
pool stats after:  {'connections_num': 11, 'requests_num': 36, 'requests_queued': 1, 'connections_ms': 641, 'requests_wait_ms': 37, 'usage_m s': 73, 'connections_lost': 1, 'pool_min': 10, 'pool_max': 20, 'pool_size': 11, 'pool_available': 2, 'requests_waiting': 0}       
[pid: 282017|app: 0|req: 43/43]  () {56 vars in 1087 bytes} [Thu Feb  8 11:02:26 2024] POST /v1/iobjs => generated 4788 bytes i n 6 msecs (HTTP/1.1 200) 6 headers in 302 bytes (1 switches on core 0)   
Traceback (most recent call last):                                     File "main.py", line 326, in v1_settings                              

    row = db.query_row(                                                                                                                     
          ^^^^^^^^^^^^^

并且 postgresql 日志(debug3)没有显示任何特别的内容(AFAIU):

2024-02-08 11:02:17.075 CET [282007] pguser@maindb DEBUG:  proc_exit(0): 2 callbacks to make                                                
2024-02-08 11:02:17.075 CET [282007] pguser@maindb DEBUG:  exit(0)                                                                          
2024-02-08 11:02:17.075 CET [282007] pguser@maindb DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make                            
2024-02-08 11:02:17.075 CET [282007] pguser@maindb DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make                                
2024-02-08 11:02:17.075 CET [282007] pguser@maindb DEBUG:  proc_exit(-1): 0 callbacks to make                                               
2024-02-08 11:02:17.079 CET [281970] DEBUG:  server process (PID 282007) exited with exit code 0                                            
2024-02-08 11:02:26.183 CET [282006] pguser@maindb DEBUG:  shmem_exit(0): 4 before_shmem_exit callbacks to make                             
2024-02-08 11:02:26.183 CET [282006] pguser@maindb DEBUG:  shmem_exit(0): 6 on_shmem_exit callbacks to make                                 
2024-02-08 11:02:26.183 CET [282006] pguser@maindb DEBUG:  proc_exit(0): 2 callbacks to make                                                
2024-02-08 11:02:26.183 CET [282006] pguser@maindb DEBUG:  exit(0)                                                                          
2024-02-08 11:02:26.183 CET [282006] pguser@maindb DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make                            
2024-02-08 11:02:26.183 CET [282006] pguser@maindb DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make                                
2024-02-08 11:02:26.183 CET [282006] pguser@maindb DEBUG:  proc_exit(-1): 0 callbacks to make                                               
2024-02-08 11:02:26.188 CET [282009] pguser@maindb DEBUG:  shmem_exit(0): 4 before_shmem_exit callbacks to make                             
2024-02-08 11:02:26.189 CET [282009] pguser@maindb DEBUG:  shmem_exit(0): 6 on_shmem_exit callbacks to make
2024-02-08 11:02:26.189 CET [282009] pguser@maindb DEBUG:  proc_exit(0): 2 callbacks to make
2024-02-08 11:02:26.189 CET [282009] pguser@maindb DEBUG:  exit(0)
2024-02-08 11:02:26.189 CET [282009] pguser@maindb DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make
2024-02-08 11:02:26.189 CET [282009] pguser@maindb DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make
2024-02-08 11:02:26.189 CET [282009] pguser@maindb DEBUG:  proc_exit(-1): 0 callbacks to make
2024-02-08 11:02:26.189 CET [281970] DEBUG:  server process (PID 282006) exited with exit code 0
2024-02-08 11:02:26.191 CET [282011] pguser@maindb DEBUG:  shmem_exit(0): 4 before_shmem_exit callbacks to make
2024-02-08 11:02:26.191 CET [282011] pguser@maindb DEBUG:  shmem_exit(0): 6 on_shmem_exit callbacks to make
2024-02-08 11:02:26.191 CET [282011] pguser@maindb DEBUG:  proc_exit(0): 2 callbacks to make
2024-02-08 11:02:26.192 CET [282011] pguser@maindb DEBUG:  exit(0)
2024-02-08 11:02:26.192 CET [282011] pguser@maindb DEBUG:  shmem_exit(-1): 0 before_shmem_exit callbacks to make
2024-02-08 11:02:26.192 CET [282011] pguser@maindb DEBUG:  shmem_exit(-1): 0 on_shmem_exit callbacks to make
2024-02-08 11:02:26.192 CET [282011] pguser@maindb DEBUG:  proc_exit(-1): 0 callbacks to make
2024-02-08 11:02:26.193 CET [281970] DEBUG:  server process (PID 282009) exited with exit code 0
2024-02-08 11:02:26.194 CET [281970] DEBUG:  server process (PID 282011) exited with exit code 0
2024-02-08 11:02:33.979 CET [281970] DEBUG:  postmaster received pmsignal signal
2024-02-08 11:02:33.983 CET [282844] DEBUG:  InitPostgres
2024-02-08 11:02:33.985 CET [282844] DEBUG:  autovacuum: processing database "template1"

query_row() 与 query() 相同:

def query_row(self, q, dbv=None, debug=False):

    with self.pool.connection() as cnx:
        cnx.autocommit = True
        cnx.row_factory = self.row_factory

        with psycopg.ClientCursor(cnx) as c:
            c.execute(q, dbv) if dbv else c.execute(q)

            if debug and c._query:
                print(c._query.query)

            if c.rowcount == 1:
                return c.fetchone()
            else:
                return None
python postgresql psycopg2 psycopg3
1个回答
0
投票

来自 postgresql 的连接日志给出了提示:在所有连接消失之前出现 SSL 错误:

2024-02-09 10:38:05.627 CET [297036] LOG:  checkpoint starting: time                                                                        
2024-02-09 10:38:05.739 CET [297036] LOG:  checkpoint complete: wrote 2 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0. 103 s, sync=0.002 s, total=0.112 s; sync files=2, longest=0.001 s, average=0.001 s; distance=1 kB, estimate=1 kB; lsn=0/1C4BB70, redo lsn=0/1C4BB38
2024-02-09 10:46:46.488 CET [297518] user@db LOG:  SSL error: decryption failed or bad record mac
2024-02-09 10:46:46.489 CET [297518] user@db LOG:  could not receive data from client: Connection reset by peer
2024-02-09 10:46:46.489 CET [297518] user@db LOG:  disconnection: session time: 0:10:00.165 user=user database=db host=127.0.0.1 port=56266
2024-02-09 12:18:21.905 CET [297528] user@db LOG:  disconnection: session time: 1:41:35.416 user=user database=db host=127.0.0.1 port=56356
2024-02-09 12:18:21.909 CET [297519] user@db LOG:  disconnection: session time: 1:41:35.587 user=user database=db host=127.0.0.1 port=56276
2024-02-09 12:18:24.998 CET [297520] user@db LOG:  disconnect103 s, sync=0.002 s, total=0.112 s; sync files=2, longest=0.001 s, average=0.001 s; distance=1 kB, estimate=1 kB; lsn=0/1C4BB70, redo lsn=0/1C4BB38
2024-02-09 10:46:46.488 CET [297518] user@db LOG:  SSL error: decryption failed or bad record macion: session time: 1:41:38.674 user=user database=db host=127.0.0.1 port=56290
2024-02-09 12:18:25.033 CET [297521] user@db LOG:  disconnection: session time: 1:41:38.672 user=user database=db host=127.0.0.1 port=56296
2024-02-09 12:18:33.726 CET [297522] user@db LOG:  disconnection: session time: 1:41:47.360 user=user database=db host=127.0.0.1 port=56302
2024-02-09 12:18:33.739 CET [297523] user@db LOG:  disconnection: session time: 1:41:47.372 user=user database=db host=127.0.0.1 port=56308
2024-02-09 12:18:33.745 CET [297524] user@db LOG:  disconnection: session time: 1:41:47.316 user=user database=db host=127.0.0.1 port=56310
2024-02-09 12:18:33.760 CET [297525] user@db LOG:  disconnection: session time: 1:41:47.325 user=user database=db host=127.0.0.1 port=56320
2024-02-09 12:18:35.793 CET [297526] user@db LOG:  disconnection: session time: 1:41:49.353 user=user database=db host=127.0.0.1 port=56336
2024-02-09 12:18:35.847 CET [297527] user@db LOG:  disconnection: session time: 1:41:49.365 user=user database=db host=127.0.0.1 port=56340

此 SSL 错误的原因来自 uwsgi 默认行为:如果您有 2 个工作线程,则在 init 时创建一个上下文(初始化变量,其中包括池和连接),然后工作线程共享内存。这导致了两个工作人员之间共享连接的经典情况。
在某个时间点,它会导致 SSL 不一致,从而导致大规模关闭。
解决方案是使用

lazy-apps=true
指示 uwsgi 创建具有自己上下文的每个工作线程。

© www.soinside.com 2019 - 2024. All rights reserved.