寻找Python帮助将数据从宽数据转换为长数据(?)
我的数据看起来像这样:
channelId,utc,scet,val1,val2
A-0001,2024-061T22:00:05.02064,0.03,3,
A-0002,2024-061T22:00:06.02064,0.07,2,
A-0001,2024-061T22:00:11.02064,0.02,2,
A-0002,2024-061T22:00:12.02064,0.05,7,
A-0001,2024-061T22:01:12.365611,0.01,1.5,
A-0002,2024-061T22:01:14.365611,.07,16
我想生成一个包含以下列的表格:
时间,A-0001_val1,A-0001_val2,A-0002_val1,A-0002_val2....
因为并非所有值都共享相同的时间戳,所以我想将间隔缩小到 1 分钟。
到目前为止我有这个:
import pandas as pd
# Read the input table into a DataFrame
df = pd.read_csv('~/Desktop/test_file_1.csv')
# Convert timestamp columns to datetime format with explicit format specification
df['utc'] = pd.to_datetime(df['utc'], format='%Y-%jT%H:%M:%S.%f')
# Round timestamps to the nearest minute
df['utc'] = df['utc'].dt.round('min')
# Pivot the DataFrame
df_pivot = df.pivot_table(index=['utc'], columns='channelId', values=['val1', 'val2'])
df_reset = df_pivot.reset_index()
df_reset['utc'] = pd.to_datetime(df_reset['utc'])
df_reset.set_index('utc', inplace=True)
# Resample the DataFrame to get values for every minute
df_resampled = df_reset.resample('T').last().ffill()
# Flatten multi-level column index
df_resampled.columns = [f'{col[1]}_{col[0]}' for col in df_resampled.columns.values]
# Reset index
df_resampled.reset_index(inplace=True)
# Rename columns
df_resampled.rename(columns={'ert': 'Time'}, inplace=True)
df_final = df_resampled[['Time', *sorted(df_resampled.columns[1:])]]
# Write the output table to a CSV file
df_final.to_csv('output_table_3.csv', index=False)
我的输出如下所示:
Time, A-0001_val1, A-0001_val2, A-0002_val1, A-0002_val2
2024-03-01 22:00:00,0.02,2,0.05,7
2024-03-01 22:00:01,0.01,1.5,0.06,16
我认为没关系,但我很好奇是否有人有更好的方法。
你已经非常接近解决这个问题了:
原始数据是
channelId;utc;scet;val1;val2
A-0001;2024-061T22:00:05.02064;0.03;3;5
A-0002;2024-061T22:00:06.02064;0.07;2;4
A-0001;2024-061T22:00:11.02064;0.02;2;3
A-0002;2024-061T22:00:12.02064;0.05;7;3
A-0001;2024-061T22:01:12.365611;0.01;1.5;2
A-0002;2024-061T22:01:14.365611;.07;16;3
代码应该是
import pandas as pd
df = pd.read_csv(r"C:\Users\s-degossondevarennes\OneDrive - Pricer AB\testfile.csv", sep = ";")
df['utc'] = pd.to_datetime(df['utc'], format='%Y-%jT%H:%M:%S.%f').dt.round('min')
df_pivot = df.pivot_table(index='utc', columns='channelId', values=['val1', 'val2'], aggfunc='mean')
df_pivot.columns = ['{}_{}'.format(col[1], col[0]) for col in df_pivot.columns]
df_pivot.reset_index(inplace=True)
df_pivot.rename(columns={'utc': 'Time'}, inplace=True)
print(df_pivot)
这给出了
Time A-0001_val1 A-0002_val1 A-0001_val2 A-0002_val2
0 2024-03-01 22:00:00 2.5 4.5 4.0 3.5
1 2024-03-01 22:01:00 1.5 16.0 2.0 3.0