基于R数据帧的3列的分组依据选择[保留的一列随机样本

Question

我需要通过按其他三列进行分组来选择R数据帧中一列的随机样本。这与下面讨论的内容类似：

而且我不知道如何在R中的Python代码中进行复制。

[糟糕，我至今尚未发布我尝试过的内容。我使用了data.table包。

library(data.table)
sample_df <- df[, .SD[sample(x = .N, size = 50)], by = id]

但是，我不确定如何通过将其他3列用作分组依据来对一列进行采样

添加了样本屏蔽数据

df：

col1    col2    col3    col4
A1       ABC    1234     H
A1       ABC    1234    O2
A1       ABC    1234    N
B1       DEF    7787J   C
B1       DEF    7787J   CA
C1       HIJ    8989    CL

目标df：

 col1   col2    col3    col4
 A1     ABC     1234    H or O2 or N
 A1     ABC     1234    H or O2 or N
 B1     DEF     7787J   C
 B1     DEF     7787J   CA
 C1     HIJ     8989    CL

Answer 1

Base R解决方案：

sample_df <- do.call("rbind", lapply(split(df, df$Position), function(x){if(nrow(x) > 1){sample(x)}else{x}}))

数据：

df <- structure(list(Name = structure(c(4L, 1L, 2L, 6L, 3L, 5L, 4L, 1L, 2L, 3L, 5L, 4L, 1L, 2L, 6L, 3L, 5L, 2L, 6L, 3L, 5L), 
                                              .Label = c("Bob",  "Dave", "Fred", "Jim", "Ray", "Steve"),
                                              class = "factor"), Date = structure(c(1L,  1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,
                                                                                    3L, 3L, 3L,  4L, 4L, 4L, 4L), .Label = c("2019-10-19", "2019-10-20", "2019-10-21",  "2019-10-22"), 
                                                                                  class = "factor"), Load = c(900L, 900L, 900L,  850L, 850L, 850L, 789L, 789L, 789L, 960L, 
                                                                                                              960L, 909L, 909L, 909L,  991L, 991L, 991L, 720L, 717L, 717L, 717L), 
                             Position = structure(c(2L,  2L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,  1L, 2L, 1L, 1L), 
                                                  .Label = c("Defense", "Forward"), class = "factor")), row.names = c(NA,  -21L), class = "data.frame")

基于R数据帧的3列的分组依据选择[保留的一列随机样本

问题描述投票：-2回答：1

1个回答

最新问题

基于R数据帧的3列的分组依据选择[保留的一列随机样本

问题描述 投票：-2回答：1

1个回答

最新问题

问题描述投票：-2回答：1