目前,我正在 SQL 数据库中查询两个单独的表 - 其中一个是我从
my_table
获取“数据”,另一个是我从 config
表获取“单位”。 units 与 data 中的值 1:1 对应,即每列数据值都有对应的单位。
因为
query_units
返回单个 column 信息,所以我最终将结果 DataFrame
与 T
转置。然后我 concat
将 DataFrame
与我的主要数据进行调换,最终得到我想要的结果,即一行“单位”,后面跟着其余的数据行(请参见底部的示例)。
最终我试图确定是否有更有效的方法来处理此设置。
运行两个单独的查询并连接结果(尽管有一些转置)是可行的,但是我想知道是否有一种方法可以用单个 SQL 查询来处理这个问题,我可以用它来获取单个
DataFrame
.
import pandas as pd
import pyodbc
# connect to database
conn = pyodbc.connect(('<connection info here>'))
# query to get all data from 'my_table' between these two timestamps
query_data = "SELECT * FROM my_table WHERE timestamp BETWEEN '2024/10/24 11:00:00' AND '2024/10/24 12:00:00'"
# query to get units from the 'config' table for the items in 'my_table'
query_units = "SELECT units FROM config WHERE t_name=my_table"
# get a DataFrame containing the requested data
df = pd.read_sql_query(query_data, conn, 'timestamp')
# and a DataFrame of the units for each data item, transposed because this returns a
# single column and I want it as a row (using the data column names as its index)
eu = pd.read_sql_query(query_units, conn, index=df.columns).T
# combine (concat) the units row and the rest of the data, joined along 'df.columns'
# (the name of each data value)
df = concat(objs=(eu, df))
这是上述结果的示例(注意:这是正确的)
data1 data2 data3 ... dataN
units volts amps temp ... volts (this is the row that gets inserted)
11:00 10 5 69 ... 9
11:30 11 5 70 ... 10
12:00 12 6 72 ... 9
import pandas as pd
import pyodbc
class DataRetriever:
def __init__(self, connection_info):
self.conn = pyodbc.connect(connection_info)
def get_data_with_units(self, start_time, end_time):
# We use one SQL query with JOIN to combine data and units of measurement.
query = f'''
SELECT mt.*, cfg.units
FROM my_table mt
JOIN config cfg ON cfg.t_name = 'my_table'
WHERE mt.timestamp BETWEEN '{start_time}' AND '{end_time}'
'''
# Execute the query and create a DataFrame.
df = pd.read_sql_query(query, self.conn)
# Move the line with units of measurement to the top, use the pd.concat method to combine.
units = df.pop('units').to_frame().T
units.columns = df.columns # Rename indexes for correct merging.
result_df = pd.concat([units, df], ignore_index=True)
return result_df
if __name__ == "__main__":
connection_info = "<connection info here>"
retriever = DataRetriever(connection_info)
start_time = '2024/10/24 11:00:00'
end_time = '2024/10/24 12:00:00'
data_with_units = retriever.get_data_with_units(start_time, end_time)
print(data_with_units)