在现有行之间重复空行,并根据条件突变空行。

问题描述 投票:1回答:1

考虑下面的数据框架。

ID <- rep(c(1, 2,3,4,5))
Sex <- rep(c("Female", "Female","Male", "Female","Male"))
Age <- rep(c(NA,55,70,18,19))
size <-  rep(c(3, 4,6,2,7))
level <- rep(c("student","student","classmates", "parents","classmates"))
data <- data.frame(ID,size,Sex,Age,level)
data
#>   ID size    Sex Age      level
#> 1  1    3 Female  NA    student
#> 2  2    4 Female  55    student
#> 3  3    6   Male  70 classmates
#> 4  4    2 Female  18    parents
#> 5  5    7   Male  19 classmates

首先,我想根据列的大小来重复ID的空行数,例如,ID==1,就有3个空行,因为size=3,以此类推。所以我有这样的东西。

ID <- rep(c(1,1,1,1, 2,2,2,2,2,3,3,3,3,3,3,3,4,4,4,5,5,5,5,5,5,5,5))
Sex <- rep(c("Female",NA,NA,NA,"Female",NA,NA,NA,NA,"Male",NA,NA,NA,NA,NA,NA, "Female",NA,NA,"Male",NA,NA,NA,NA,NA,NA,NA))
Age <- rep(c(NA,NA,NA,NA,55,NA,NA,NA,NA,70,NA,NA,NA,NA,NA,NA, 18,NA,NA,19,NA,NA,NA,NA,NA,NA,NA))
size <- rep(c(3,NA,NA,NA,4,NA,NA,NA,NA,6,NA,NA,NA,NA,NA,NA, 2,NA,NA,7,NA,NA,NA,NA,NA,NA,NA))
level <- rep(c("student",NA,NA,NA,"student",NA,NA,NA,NA,"classmates",NA,NA,NA,NA,NA,NA,"parents",NA,NA,"classmates",NA,NA,NA,NA,NA,NA,NA))
data2 <- data.frame(ID,size,Sex,Age,level)
data2
#>    ID size    Sex Age      level
#> 1   1    3 Female  NA    student
#> 2   1   NA   <NA>  NA       <NA>
#> 3   1   NA   <NA>  NA       <NA>
#> 4   1   NA   <NA>  NA       <NA>
#> 5   2    4 Female  55    student
#> 6   2   NA   <NA>  NA       <NA>
#> 7   2   NA   <NA>  NA       <NA>
#> 8   2   NA   <NA>  NA       <NA>
#> 9   2   NA   <NA>  NA       <NA>
#> 10  3    6   Male  70 classmates
#> 11  3   NA   <NA>  NA       <NA>
#> 12  3   NA   <NA>  NA       <NA>
#> 13  3   NA   <NA>  NA       <NA>
#> 14  3   NA   <NA>  NA       <NA>
#> 15  3   NA   <NA>  NA       <NA>
#> 16  3   NA   <NA>  NA       <NA>
#> 17  4    2 Female  18    parents
#> 18  4   NA   <NA>  NA       <NA>
#> 19  4   NA   <NA>  NA       <NA>
#> 20  5    7   Male  19 classmates
#> 21  5   NA   <NA>  NA       <NA>
#> 22  5   NA   <NA>  NA       <NA>
#> 23  5   NA   <NA>  NA       <NA>
#> 24  5   NA   <NA>  NA       <NA>
#> 25  5   NA   <NA>  NA       <NA>
#> 26  5   NA   <NA>  NA       <NA>
#> 27  5   NA   <NA>  NA       <NA>

其次,在data2中,如果对于每一个ID的级别==同学,按照一个分布(例如均匀分布,a=2,b=3,或者正态分布,平均数说2,方差3)随机填充data2的 "年龄 "列。"年龄 "列中我想根据级别填充不同年龄范围的随机值。对于水平=学生,年龄范围可能是从0到18,对于父母,它将是例如30-60等。如果我没有指定的问题不够清楚,随时问,我会尽量解释得更清楚! 先谢谢大家!

r dataframe dplyr data.table mutate
1个回答
2
投票

这样的东西可以吗?

library(dplyr)

data %>%
  mutate(new_size = size + 1) %>%
  tidyr::uncount(new_size) %>%
  mutate(new_age = ifelse(level == 'student', 
                   sample(18, sum(level == 'student'), replace = TRUE),
                   ifelse(level == 'classmates',
                   sample(12:24, sum(level == 'classmates'), replace = TRUE), 
                   ifelse(level == 'parents',
                   sample(30:60, sum(level == 'parents'), replace = TRUE), NA))))

这将返回 。

#   ID size    Sex Age      level new_age
#1   1    3 Female  NA    student      17
#2   1    3 Female  NA    student       8
#3   1    3 Female  NA    student      12
#4   1    3 Female  NA    student       2
#5   2    4 Female  55    student      18
#6   2    4 Female  55    student       5
#7   2    4 Female  55    student       9
#8   2    4 Female  55    student       1
#9   2    4 Female  55    student       1
#10  3    6   Male  70 classmates      23
#11  3    6   Male  70 classmates      17
#12  3    6   Male  70 classmates      24
#13  3    6   Male  70 classmates      24
#14  3    6   Male  70 classmates      20
#15  3    6   Male  70 classmates      17
#16  3    6   Male  70 classmates      17
#17  4    2 Female  18    parents      57
#18  4    2 Female  18    parents      50
#19  4    2 Female  18    parents      54
#20  5    7   Male  19 classmates      13
#21  5    7   Male  19 classmates      20
#22  5    7   Male  19 classmates      12
#23  5    7   Male  19 classmates      13
#24  5    7   Male  19 classmates      16
#25  5    7   Male  19 classmates      23
#26  5    7   Male  19 classmates      17
#27  5    7   Male  19 classmates      24

1
投票

对于你的第一个请求来说,这应该是可行的。

 data %>% 
  uncount(size) %>% 
  group_by(ID) %>% 
  mutate(idx = row_number(),
         Sex = ifelse(idx != 1, NA, Sex),
         Age = ifelse(idx != 1, NA, Age),
         level = ifelse(idx != 1, NA, level)) %>% 
  select(-idx)

1
投票

我想这个可以实现你在问题中描述的功能。

library(dplyr)
library(tidyr)

data %>%
  uncount(size + 1, .remove = F) %>%
  group_by(level) %>% 
  mutate(new_age = case_when(
    level == "student" ~ sample(0:18, n(), T),
    level == "classmates" ~ sample(12:18, n(), T),
    level == "parents" ~ sample(30:60, n(), T)
  )) %>%
  ungroup()

輸出

# # A tibble: 27 x 6
#       ID  size Sex      Age level      new_age
#    <dbl> <dbl> <fct>  <dbl> <fct>        <int>
#  1     1     3 Female    NA student          7
#  2     1     3 Female    NA student          6
#  3     1     3 Female    NA student         14
#  4     1     3 Female    NA student          0
#  5     2     4 Female    55 student          1
#  6     2     4 Female    55 student         12
#  7     2     4 Female    55 student         16
#  8     2     4 Female    55 student         15
#  9     2     4 Female    55 student         12
# 10     3     6 Male      70 classmates      18
# 11     3     6 Male      70 classmates      18
# 12     3     6 Male      70 classmates      14
# 13     3     6 Male      70 classmates      14
# 14     3     6 Male      70 classmates      14
# 15     3     6 Male      70 classmates      17
# 16     3     6 Male      70 classmates      16
# 17     4     2 Female    18 parents         34
# 18     4     2 Female    18 parents         51
# 19     4     2 Female    18 parents         34
# 20     5     7 Male      19 classmates      15
# 21     5     7 Male      19 classmates      16
# 22     5     7 Male      19 classmates      14
# 23     5     7 Male      19 classmates      17
# 24     5     7 Male      19 classmates      17
# 25     5     7 Male      19 classmates      14
# 26     5     7 Male      19 classmates      12
# 27     5     7 Male      19 classmates      15
最新问题
© www.soinside.com 2019 - 2025. All rights reserved.