有没有更好的方法将两个 SQL 查询中的两个 DataFrame 合并到一个 DataFrame 中?

问题描述 投票:0回答:1

目前,我正在 SQL 数据库中查询两个单独的表 - 其中一个是我从

my_table
获取“数据”,另一个是我从
config
表获取“单位”。 unitsdata 中的值 1:1 对应,即每列数据值都有对应的单位。

因为

query_units
返回单个 column 信息,所以我最终将结果
DataFrame
T
转置。然后我
concat
DataFrame
与我的主要数据进行调换,最终得到我想要的结果,即一行“单位”,后面跟着其余的数据行(请参见底部的示例)。

最终我试图确定是否有更有效的方法来处理此设置。

运行两个单独的查询并连接结果(尽管有一些转置)是可行的,但是我想知道是否有一种方法可以用单个 SQL 查询来处理这个问题,我可以用它来获取单个

DataFrame
.

import pandas as pd
import pyodbc

# connect to database
conn = pyodbc.connect(('<connection info here>'))

# query to get all data from 'my_table' between these two timestamps
query_data = "SELECT * FROM my_table WHERE timestamp BETWEEN '2024/10/24 11:00:00' AND '2024/10/24 12:00:00'"
# query to get units from the 'config' table for the items in 'my_table'
query_units = "SELECT units FROM config WHERE t_name=my_table"

# get a DataFrame containing the requested data
df = pd.read_sql_query(query_data, conn, 'timestamp')
# and a DataFrame of the units for each data item, transposed because this returns a
# single column and I want it as a row (using the data column names as its index)
eu = pd.read_sql_query(query_units, conn, index=df.columns).T

# combine (concat) the units row and the rest of the data, joined along 'df.columns'
# (the name of each data value)
df = concat(objs=(eu, df))

这是上述结果的示例(注意:这是正确的)

       data1    data2    data3    ...    dataN
units  volts    amps     temp     ...    volts  (this is the row that gets inserted)
11:00  10       5        69       ...    9
11:30  11       5        70       ...    10
12:00  12       6        72       ...    9
python sql pandas
1个回答
0
投票
import pandas as pd
import pyodbc

class DataRetriever:
    def __init__(self, connection_info):
        self.conn = pyodbc.connect(connection_info)

    def get_data_with_units(self, start_time, end_time):
        # We use one SQL query with JOIN to combine data and units of measurement. 
        query = f'''
            SELECT mt.*, cfg.units
            FROM my_table mt
            JOIN config cfg ON cfg.t_name = 'my_table'
            WHERE mt.timestamp BETWEEN '{start_time}' AND '{end_time}'
        '''

        # Execute the query and create a DataFrame.
        df = pd.read_sql_query(query, self.conn)

        # Move the line with units of measurement to the top, use the pd.concat method to combine.
        units = df.pop('units').to_frame().T
        units.columns = df.columns  # Rename indexes for correct merging.
        result_df = pd.concat([units, df], ignore_index=True)

        return result_df

if __name__ == "__main__":
    connection_info = "<connection info here>"
    retriever = DataRetriever(connection_info)
    start_time = '2024/10/24 11:00:00'
    end_time = '2024/10/24 12:00:00'
    data_with_units = retriever.get_data_with_units(start_time, end_time)
    print(data_with_units)

© www.soinside.com 2019 - 2024. All rights reserved.