我需要通过按其他三列进行分组来选择R数据帧中一列的随机样本。这与下面讨论的内容类似:
而且我不知道如何在R中的Python代码中进行复制。
[糟糕,我至今尚未发布我尝试过的内容。我使用了data.table包。
library(data.table)
sample_df <- df[, .SD[sample(x = .N, size = 50)], by = id]
但是,我不确定如何通过将其他3列用作分组依据来对一列进行采样
添加了样本屏蔽数据
df:
col1 col2 col3 col4
A1 ABC 1234 H
A1 ABC 1234 O2
A1 ABC 1234 N
B1 DEF 7787J C
B1 DEF 7787J CA
C1 HIJ 8989 CL
目标df:
col1 col2 col3 col4
A1 ABC 1234 H or O2 or N
A1 ABC 1234 H or O2 or N
B1 DEF 7787J C
B1 DEF 7787J CA
C1 HIJ 8989 CL
Base R解决方案:
sample_df <- do.call("rbind", lapply(split(df, df$Position), function(x){if(nrow(x) > 1){sample(x)}else{x}}))
数据:
df <- structure(list(Name = structure(c(4L, 1L, 2L, 6L, 3L, 5L, 4L, 1L, 2L, 3L, 5L, 4L, 1L, 2L, 6L, 3L, 5L, 2L, 6L, 3L, 5L),
.Label = c("Bob", "Dave", "Fred", "Jim", "Ray", "Steve"),
class = "factor"), Date = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,
3L, 3L, 3L, 4L, 4L, 4L, 4L), .Label = c("2019-10-19", "2019-10-20", "2019-10-21", "2019-10-22"),
class = "factor"), Load = c(900L, 900L, 900L, 850L, 850L, 850L, 789L, 789L, 789L, 960L,
960L, 909L, 909L, 909L, 991L, 991L, 991L, 720L, 717L, 717L, 717L),
Position = structure(c(2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 1L),
.Label = c("Defense", "Forward"), class = "factor")), row.names = c(NA, -21L), class = "data.frame")