数据帧有多个班次

Question

我想根据数组nShiftsPerCol中指定的移位数重复移动数据帧的选择列。如何生成包含指定非零移位的列的输出数据帧DFO，并且每个列都移位了多次。注意，第一班次是零或没有班次。将班次编号附加到列名称。

import pandas as pd 
import numpy as np 

df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': [2, 3, 4, 5, 6], 'C': [3, 4, 5, 6, 7]})
print(df)
nCols = df.shape[0]
nShiftsPerCol = np.zeros(nCols)
nShiftsPerCol[0]=3 # shift column A 3 times
nShiftsPerCol[2]=2 # shift column C 2 times

原始数据帧

期望的输出

   A_0  A_1  A_2  C_0   C_1
0  1    2    3    3     4
1  2    3    4    4     5
2  3    4    5    5     6
3  4    5    NA   6     7
4  5    NA   NA   7     NA

Answer 1

首先创建Series并过滤掉0值：

#for columns need shape[1]
nCols = df.shape[1]
nShiftsPerCol = np.zeros(nCols)
nShiftsPerCol[0]=3 # shift column A 3 times
nShiftsPerCol[2]=2 # shift column C 2 times

print (nShiftsPerCol)

s = pd.Series(nShiftsPerCol, df.columns).astype(int)
s = s[s!=0]
print (s)
A    3
C    2
dtype: int32

然后循环并创建新列：

for i, x in s.items():
    for y in range(x):
        df['{}_{}'.format(i, y)] = df[i].shift(-y)

print (df)
   A  B  C  A_0  A_1  A_2  C_0  C_1
0  1  2  3    1  2.0  3.0    3  4.0
1  2  3  4    2  3.0  4.0    4  5.0
2  3  4  5    3  4.0  5.0    5  6.0
3  4  5  6    4  5.0  NaN    6  7.0
4  5  6  7    5  NaN  NaN    7  NaN

存储列名称和移位号的另一种解决方案是元组列表：

L = list(zip(df.columns, nShiftsPerCol.astype(int)))
L = [x for x in L if x[1] != 0]
print (L)
[('A', 3), ('C', 2)]

for i, x in L:
    for y in range(x):
        df['{}_{}'.format(i, y)] = df[i].shift(-y)

print (df)
   A  B  C  A_0  A_1  A_2  C_0  C_1
0  1  2  3    1  2.0  3.0    3  4.0
1  2  3  4    2  3.0  4.0    4  5.0
2  3  4  5    3  4.0  5.0    5  6.0
3  4  5  6    4  5.0  NaN    6  7.0
4  5  6  7    5  NaN  NaN    7  NaN

Answer 2

你也可以试试这个

from itertools import chain
nShiftsPerCol = [3, 0, 2]
# define a function to help generate shifted columns
col_maker = lambda df, x, num: df[x].shift(-num)
# generate new_cols from nShiftPerCol
new_cols = chain(*[[(df.columns[idx], i) for i in range(v)] 
                   for idx, v in enumerate(nShiftsPerCol) if v != 0])
# output of new_cols
# [('A', 0), ('A', 1), ('A', 2), ('C', 0), ('C', 1)] 
df_desired = pd.DataFrame({col + "_" + str(num): col_maker(df, col, num) 
                           for col, num in new_cols})

数据帧有多个班次

问题描述投票：3回答：2

2个回答

最新问题

数据帧有多个班次

问题描述 投票：3回答：2

2个回答

最新问题

问题描述投票：3回答：2