我也想对某个列进行分组,然后随机排列n个连续的行。
df = pd.DataFrame({'grouper_col':[1,1,1,1,1,1, 2,2,2,2,2,2], 'b':[1,2,3,4,5,6,21,22,23,24,25,26]})
grouper_col b
0 1 1
1 1 2
2 1 3
3 1 4
4 1 5
5 1 6
6 2 21
7 2 22
8 2 23
9 2 24
10 2 25
11 2 26
然后在每个组中随机播放例如两个连续的行,例如:
grouper_col b
0 1 5
1 1 6
2 1 3
3 1 4
4 1 1
5 1 2
6 2 21
7 2 22
8 2 25
9 2 26
10 2 23
11 2 24
其中每个组中的两个连续行与同一组中的其他两个连续行随机洗牌。
这是解决此问题的一种方法:
# find the size of each group
sizes = df.groupby('grouper_col').b.size()
# iterate over the elements of the above series
for g, v in sizes.items():
v -= 1
# only randomly shuffle if there are more than 4
if v > 4:
random_s = np.array([0,0])
while abs(random_s[0] - random_s[1]) <= 1:
# if the indices are next to each other not valid
random_s = np.random.randint(0, v, 2)
# add 1 to the above indices (i.e [0,2] to [[0,1][2,3]])
replace_ix = random_s[:,None] + np.array([0,1])
# keep indices to replace and replace
to_replace = df.loc[df.grouper_col.eq(g), 'b'].values
repl_1 = to_replace[replace_ix[0]]
repl_2 = to_replace[replace_ix[1]]
to_replace[replace_ix[0]] = repl_2
to_replace[replace_ix[1]] = repl_1
df.loc[df.grouper_col.eq(g), 'b'] = to_replace
print(df)
grouper_col b
0 1 5
1 1 6
2 1 3
3 1 4
4 1 1
5 1 2
6 2 21
7 2 25
8 2 26
9 2 24
10 2 22
11 2 23