调用一个 3GB 的 postgres 表怎么会比表本身占用更多的空间呢

Question

我正在使用 python 创建数据管道，并使用 psycopg2 从我们的 RDS 实例中提取一个 3GB 表。我目前在本地测试这个，每当我尝试这个时，我的 Ram 都会说我的 python 实例占用了 27gb，这是不可能的，因为我只有 8gb 的 ram。

我不确定这是否与管道获取表并使用 asyncio 异步向目标发出 post 请求相关。

def query_database(
    query: str,
    query_args: str = None,
    dbname: str = os.environ.get('DBNAME'),
    user: str = os.environ.get('DBUSER'),
    password: str = os.environ.get('DBPASSWORD'),
    host: str = os.environ.get('LOCAL_BIND_ADDRESS'),
    port: int = int(os.environ.get('LOCAL_BIND_PORT'))
):
    with psycopg2.connect(
            dbname=dbname,
            user=user,
            password=password,
            host=host,
            port=port
        ) as conn:
        with conn.cursor() as cur:
            # Execute a query
            args = [query, query_args] if query_args else [query]
            cur.execute(*args)
            # Check if query returned rows and get them
            if cur.description:
                records = cur.fetchall()
                columns = [desc[0] for desc in cur.description]
                return pd.DataFrame(data=records, columns=columns)


query = sql.SQL("""SELECT * from table""")
query_database

Answer 1

27GB 可能是正确的。您看到的是虚拟内存（交换空间）。这样做是为了防止操作系统崩溃，因为操作系统会在物理内存中交换虚拟内存，并将其存储在磁盘上（如果它运行已满）。

内存使用率高可能是由于数据库以压缩格式存储数据，并且现在将其保存在Python对象中，而Python对象并没有试图节省空间。此外，众所周知，psycopg2 在大型查询（如您的查询）上会泄漏内存。也可能是您的程序中存在其他一些内存泄漏，例如不必要的复制数据。

如果可能的话，我建议通过使用优化的 SQL 查询来读取更少的数据。相反，你可以尝试分成小块阅读。您可以使用

fetchmany(x)

而不是

fetchall()

来做到这一点。

调用一个 3GB 的 postgres 表怎么会比表本身占用更多的空间呢

问题描述投票：0回答：1

1个回答

最新问题

调用一个 3GB 的 postgres 表怎么会比表本身占用更多的空间呢

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1