我有两个大型数据框,
cl
和cb
,它们描述了一段时间内的交易限价订单簿。 cl
包含级别(想想价格),cb
包含尺寸(想想订单)。
我想以某种方式将其中的每一个组合起来,从而产生一个数据帧,其中
cl
中的每个价格(值)条目作为列值,以及来自cb
的相应关联大小作为给定的行/列值一天中的某个时间。
cl
2023-08-14 06:30:01 4470.75 4471.0 4471.25 4471.5 4471.75 4472.0 4472.25 4472.5 4472.75 4473.0 4473.25 4473.5 4473.75 4474.0 4474.25 4474.5 4474.75 4475.0 4475.25 4475.5
2023-08-14 06:30:02 4472.0 4472.25 4472.5 4472.75 4473.0 4473.25 4473.5 4473.75 4474.0 4474.25 4474.5 4474.75 4475.0 4475.25 4475.5 4475.75 4476.0 4476.25 4476.5 4476.75
2023-08-14 06:30:03 4472.5 4472.75 4473.0 4473.25 4473.5 4473.75 4474.0 4474.25 4474.5 4474.75 4475.0 4475.25 4475.5 4475.75 4476.0 4476.25 4476.5 4476.75 4477.0 4477.25
2023-08-14 06:30:04 4471.25 4471.5 4471.75 4472.0 4472.25 4472.5 4472.75 4473.0 4473.25 4473.5 4473.75 4474.0 4474.25 4474.5 4474.75 4475.0 4475.25 4475.5 4475.75 4476.0
2023-08-14 06:30:05 4471.5 4471.75 4472.0 4472.25 4472.5 4472.75 4473.0 4473.25 4473.5 4473.75 4474.0 4474.25 4474.5 4474.75 4475.0 4475.25 4475.5 4475.75 4476.0 4476.25
2023-08-14 06:30:06 4471.0 4471.25 4471.5 4471.75 4472.0 4472.25 4472.5 4472.75 4473.0 4473.25 4473.5 4473.75 4474.0 4474.25 4474.5 4474.75 4475.0 4475.25 4475.5 4475.75
2023-08-14 06:30:07 4471.0 4471.25 4471.5 4471.75 4472.0 4472.25 4472.5 4472.75 4473.0 4473.25 4473.5 4473.75 4474.0 4474.25 4474.5 4474.75 4475.0 4475.25 4475.5 4475.75
2023-08-14 06:30:08 4470.0 4470.25 4470.5 4470.75 4471.0 4471.25 4471.5 4471.75 4472.0 4472.25 4472.5 4472.75 4473.0 4473.25 4473.5 4473.75 4474.0 4474.25 4474.5 4474.75
2023-08-14 06:30:09 4469.5 4469.75 4470.0 4470.25 4470.5 4470.75 4471.0 4471.25 4471.5 4471.75 4472.0 4472.25 4472.5 4472.75 4473.0 4473.25 4473.5 4473.75 4474.0 4474.25
2023-08-14 06:30:10 4470.0 4470.25 4470.5 4470.75 4471.0 4471.25 4471.5 4471.75 4472.0 4472.25 4472.5 4472.75 4473.0 4473.25 4473.5 4473.75 4474.0 4474.25 4474.5 4474.75
cb
2023-08-14 06:30:01 38.0 45.0 105.0 53.0 49.0 42.0 68.0 49.0 32.0 26.0 -21.0 -33.0 -33.0 -60.0 -49.0 -47.0 -48.0 -72.0 -76.0 -70.0
2023-08-14 06:30:02 69.0 64.0 55.0 59.0 53.0 59.0 41.0 46.0 51.0 26.0 -41.0 -48.0 -66.0 -61.0 -67.0 -44.0 -78.0 -72.0 -54.0 -61.0
2023-08-14 06:30:03 54.0 54.0 54.0 56.0 54.0 50.0 43.0 41.0 52.0 40.0 -1.0 -41.0 -56.0 -41.0 -73.0 -44.0 -47.0 -47.0 -58.0 -76.0
2023-08-14 06:30:04 100.0 43.0 53.0 67.0 59.0 41.0 41.0 40.0 42.0 23.0 -25.0 -34.0 -54.0 -57.0 -61.0 -67.0 -49.0 -55.0 -40.0 -93.0
2023-08-14 06:30:05 43.0 53.0 69.0 50.0 42.0 43.0 43.0 41.0 31.0 6.0 -36.0 -45.0 -58.0 -62.0 -60.0 -48.0 -56.0 -41.0 -94.0 -45.0
2023-08-14 06:30:06 70.0 101.0 44.0 53.0 72.0 51.0 43.0 43.0 41.0 42.0 -13.0 -41.0 -41.0 -56.0 -59.0 -61.0 -59.0 -45.0 -56.0 -41.0
2023-08-14 06:30:07 66.0 101.0 42.0 54.0 48.0 51.0 45.0 42.0 30.0 30.0 -15.0 -39.0 -53.0 -61.0 -57.0 -60.0 -57.0 -41.0 -53.0 -42.0
2023-08-14 06:30:08 67.0 46.0 48.0 36.0 67.0 99.0 39.0 50.0 36.0 46.0 -8.0 -39.0 -43.0 -50.0 -47.0 -51.0 -49.0 -58.0 -53.0 -79.0
2023-08-14 06:30:09 94.0 54.0 59.0 45.0 46.0 30.0 45.0 95.0 27.0 30.0 -26.0 -44.0 -42.0 -53.0 -56.0 -50.0 -44.0 -47.0 -46.0 -55.0
并且想要这样的东西(不是我的示例的实际预期输出,而是说明所需的输出):
df
4469.5 4469.75 4470 4470.25 4470.5 4470.75 4471 4471.25 4471.5 4471.75 4472 4472.25 4472.5 4472.75 4473 4473.25 4473.5 4473.75 4474 4474.25 4474.5 4474.75 4475 4475.25 4475.5 4475.75 4476 4476.25 4476.5 4476.75 4477 4477.25
0 0 0 0 0 0 0 38 45 105 53 49 42 68 49 32 26 -21 -33 -33 -60 -49 -47 -48 -72 -76 -70 0 0 0 0 0
0 0 0 0 0 0 69 64 55 59 53 59 41 46 51 26 -41 -48 -66 -61 -67 -44 -78 -72 -54 -61 0 0 0 0 0 0
0 0 0 0 54 54 54 56 54 50 43 41 52 40 -1 -41 -56 -41 -73 -44 -47 -47 -58 -76 0 0 0 0 0 0 0 0
0 0 0 0 0 0 100 43 53 67 59 41 41 40 42 23 -25 -34 -54 -57 -61 -67 -49 -55 -40 -93 0 0 0 0 0 0
0 0 0 0 0 0 0 0 43 53 69 50 42 43 43 41 31 6 -36 -45 -58 -62 -60 -48 -56 -41 -94 -45 0 0 0 0
0 0 0 0 0 0 0 0 0 0 70 101 44 53 72 51 43 43 41 42 -13 -41 -41 -56 -59 -61 -59 -45 -56 -41 0 0
0 0 0 0 0 0 0 0 0 0 0 0 66 101 42 54 48 51 45 42 30 30 -15 -39 -53 -61 -57 -60 -57 -41 -53 -42
0 0 0 0 0 0 0 0 0 0 67 46 48 36 67 99 39 50 36 46 -8 -39 -43 -50 -47 -51 -49 -58 -53 -79 0 0
0 0 0 0 0 0 0 0 94 54 59 45 46 30 45 95 27 30 -26 -44 -42 -53 -56 -50 -44 -47 -46 -55 0 0 0 0
0 0 0 0 0 0 0 61 46 50 36 51 95 35 42 31 26 -17 -37 -56 -46 -46 -44 -45 -52 -56 -60 0 0 0 0 0
0 0 0 0 0 0 30 45 99 43 48 39 48 30 25 35 -30 -50 -47 -47 -54 -54 -60 -61 -41 -60 0 0 0 0 0 0
0 0 0 0 0 43 42 29 48 99 32 39 39 44 19 -10 -36 -44 -56 -48 -49 -56 -55 -60 -62 0 0 0 0 0 0 0
0 0 0 0 61 46 50 32 37 90 33 43 42 1 -35 -54 -54 -61 -57 -49 -51 -56 -57 -68 0 0 0 0 0 0 0 0
0 0 0 47 35 41 110 43 45 49 33 31 29 -19 -57 -59 -52 -51 -58 -57 -62 -76 -54 0 0 0 0 0 0 0 0 0
0 0 48 35 41 110 34 44 49 32 32 12 -9 -44 -60 -51 -52 -58 -57 -62 -75 -54 0 0 0 0 0 0 0 0 0 0
0 49 34 40 109 34 47 49 34 39 23 -12 -42 -56 -52 -52 -60 -58 -63 -74 -54 0 0 0 0 0 0 0 0 0 0 0
46 33 37 111 32 42 50 34 46 28 -15 -24 -54 -50 -49 -58 -58 -62 -75 -54 0 0 0 0 0 0 0 0 0 0 0 0
0 48 35 40 111 39 49 56 41 55 28 -21 -39 -59 -51 -54 -61 -58 -63 -76 -54 0 0 0 0 0 0 0 0 0 0 0
48 46 42 116 37 46 53 38 59 31 -20 -44 -61 -61 -54 -61 -61 -63 -76 -55 0 0 0 0 0 0 0 0 0 0 0 0
0 0 46 51 116 35 47 53 42 65 38 30 -28 -59 -63 -56 -62 -58 -63 -77 -56 -68 0 0 0 0 0 0 0 0 0 0
解决方案:逆透视+合并+透视
代码
import pandas as pd
with open('cl.txt', 'r', encoding="utf-8") as file:
file_txt = file.read()
file_txt = file_txt.replace(" "," ")
file_txt = file_txt.replace(" "," ")
file_txt = file_txt.split("\n")
cl_list = [ li.split(" ") for li in file_txt]
with open('cb.txt', 'r', encoding="utf-8") as file:
file_txt = file.read()
file_txt = file_txt.replace(" "," ")
file_txt = file_txt.replace(" "," ")
file_txt = file_txt.split("\n")
cb_list = [ li.split(" ") for li in file_txt]
df_cl = pd.DataFrame(cl_list,columns= ['date','time','1','2','3','4','5','6','7','8','9','10','11','12','13','14','15','16','17','18','19','20'])
df_cb = pd.DataFrame(cb_list,columns=['date','time','1','2','3','4','5','6','7','8','9','10','11','12','13','14','15','16','17','18','19','20'])
df_cb_unpivot = pd.melt(df_cb, id_vars=['date','time'], value_vars=['1','2','3','4','5','6','7','8','9','10','11','12','13','14','15','16','17','18','19','20'])
df_cl_unpivot = pd.melt(df_cl, id_vars=['date','time'], value_vars=['1','2','3','4','5','6','7','8','9','10','11','12','13','14','15','16','17','18','19','20'])
df_cl_cb_join = pd.merge(df_cl_unpivot, df_cb_unpivot, on=['date','time','variable'])
df_final = df_cl_cb_join
df_final = df_final.drop('variable', axis=1)
df_final.rename(columns={'value_x': 'column', 'value_y': 'value'}, inplace=True)
df_final.pivot(index=['date','time'], columns='column', values='value')