考虑下面的数据框架。
ID <- rep(c(1, 2,3,4,5))
Sex <- rep(c("Female", "Female","Male", "Female","Male"))
Age <- rep(c(NA,55,70,18,19))
size <- rep(c(3, 4,6,2,7))
level <- rep(c("student","student","classmates", "parents","classmates"))
data <- data.frame(ID,size,Sex,Age,level)
data
#> ID size Sex Age level
#> 1 1 3 Female NA student
#> 2 2 4 Female 55 student
#> 3 3 6 Male 70 classmates
#> 4 4 2 Female 18 parents
#> 5 5 7 Male 19 classmates
首先,我想根据列的大小来重复ID的空行数,例如,ID==1,就有3个空行,因为size=3,以此类推。所以我有这样的东西。
ID <- rep(c(1,1,1,1, 2,2,2,2,2,3,3,3,3,3,3,3,4,4,4,5,5,5,5,5,5,5,5))
Sex <- rep(c("Female",NA,NA,NA,"Female",NA,NA,NA,NA,"Male",NA,NA,NA,NA,NA,NA, "Female",NA,NA,"Male",NA,NA,NA,NA,NA,NA,NA))
Age <- rep(c(NA,NA,NA,NA,55,NA,NA,NA,NA,70,NA,NA,NA,NA,NA,NA, 18,NA,NA,19,NA,NA,NA,NA,NA,NA,NA))
size <- rep(c(3,NA,NA,NA,4,NA,NA,NA,NA,6,NA,NA,NA,NA,NA,NA, 2,NA,NA,7,NA,NA,NA,NA,NA,NA,NA))
level <- rep(c("student",NA,NA,NA,"student",NA,NA,NA,NA,"classmates",NA,NA,NA,NA,NA,NA,"parents",NA,NA,"classmates",NA,NA,NA,NA,NA,NA,NA))
data2 <- data.frame(ID,size,Sex,Age,level)
data2
#> ID size Sex Age level
#> 1 1 3 Female NA student
#> 2 1 NA <NA> NA <NA>
#> 3 1 NA <NA> NA <NA>
#> 4 1 NA <NA> NA <NA>
#> 5 2 4 Female 55 student
#> 6 2 NA <NA> NA <NA>
#> 7 2 NA <NA> NA <NA>
#> 8 2 NA <NA> NA <NA>
#> 9 2 NA <NA> NA <NA>
#> 10 3 6 Male 70 classmates
#> 11 3 NA <NA> NA <NA>
#> 12 3 NA <NA> NA <NA>
#> 13 3 NA <NA> NA <NA>
#> 14 3 NA <NA> NA <NA>
#> 15 3 NA <NA> NA <NA>
#> 16 3 NA <NA> NA <NA>
#> 17 4 2 Female 18 parents
#> 18 4 NA <NA> NA <NA>
#> 19 4 NA <NA> NA <NA>
#> 20 5 7 Male 19 classmates
#> 21 5 NA <NA> NA <NA>
#> 22 5 NA <NA> NA <NA>
#> 23 5 NA <NA> NA <NA>
#> 24 5 NA <NA> NA <NA>
#> 25 5 NA <NA> NA <NA>
#> 26 5 NA <NA> NA <NA>
#> 27 5 NA <NA> NA <NA>
其次,在data2中,如果对于每一个ID的级别==同学,按照一个分布(例如均匀分布,a=2,b=3,或者正态分布,平均数说2,方差3)随机填充data2的 "年龄 "列。"年龄 "列中我想根据级别填充不同年龄范围的随机值。对于水平=学生,年龄范围可能是从0到18,对于父母,它将是例如30-60等。如果我没有指定的问题不够清楚,随时问,我会尽量解释得更清楚! 先谢谢大家!
这样的东西可以吗?
library(dplyr)
data %>%
mutate(new_size = size + 1) %>%
tidyr::uncount(new_size) %>%
mutate(new_age = ifelse(level == 'student',
sample(18, sum(level == 'student'), replace = TRUE),
ifelse(level == 'classmates',
sample(12:24, sum(level == 'classmates'), replace = TRUE),
ifelse(level == 'parents',
sample(30:60, sum(level == 'parents'), replace = TRUE), NA))))
这将返回 。
# ID size Sex Age level new_age
#1 1 3 Female NA student 17
#2 1 3 Female NA student 8
#3 1 3 Female NA student 12
#4 1 3 Female NA student 2
#5 2 4 Female 55 student 18
#6 2 4 Female 55 student 5
#7 2 4 Female 55 student 9
#8 2 4 Female 55 student 1
#9 2 4 Female 55 student 1
#10 3 6 Male 70 classmates 23
#11 3 6 Male 70 classmates 17
#12 3 6 Male 70 classmates 24
#13 3 6 Male 70 classmates 24
#14 3 6 Male 70 classmates 20
#15 3 6 Male 70 classmates 17
#16 3 6 Male 70 classmates 17
#17 4 2 Female 18 parents 57
#18 4 2 Female 18 parents 50
#19 4 2 Female 18 parents 54
#20 5 7 Male 19 classmates 13
#21 5 7 Male 19 classmates 20
#22 5 7 Male 19 classmates 12
#23 5 7 Male 19 classmates 13
#24 5 7 Male 19 classmates 16
#25 5 7 Male 19 classmates 23
#26 5 7 Male 19 classmates 17
#27 5 7 Male 19 classmates 24
对于你的第一个请求来说,这应该是可行的。
data %>%
uncount(size) %>%
group_by(ID) %>%
mutate(idx = row_number(),
Sex = ifelse(idx != 1, NA, Sex),
Age = ifelse(idx != 1, NA, Age),
level = ifelse(idx != 1, NA, level)) %>%
select(-idx)
我想这个可以实现你在问题中描述的功能。
library(dplyr)
library(tidyr)
data %>%
uncount(size + 1, .remove = F) %>%
group_by(level) %>%
mutate(new_age = case_when(
level == "student" ~ sample(0:18, n(), T),
level == "classmates" ~ sample(12:18, n(), T),
level == "parents" ~ sample(30:60, n(), T)
)) %>%
ungroup()
輸出
# # A tibble: 27 x 6
# ID size Sex Age level new_age
# <dbl> <dbl> <fct> <dbl> <fct> <int>
# 1 1 3 Female NA student 7
# 2 1 3 Female NA student 6
# 3 1 3 Female NA student 14
# 4 1 3 Female NA student 0
# 5 2 4 Female 55 student 1
# 6 2 4 Female 55 student 12
# 7 2 4 Female 55 student 16
# 8 2 4 Female 55 student 15
# 9 2 4 Female 55 student 12
# 10 3 6 Male 70 classmates 18
# 11 3 6 Male 70 classmates 18
# 12 3 6 Male 70 classmates 14
# 13 3 6 Male 70 classmates 14
# 14 3 6 Male 70 classmates 14
# 15 3 6 Male 70 classmates 17
# 16 3 6 Male 70 classmates 16
# 17 4 2 Female 18 parents 34
# 18 4 2 Female 18 parents 51
# 19 4 2 Female 18 parents 34
# 20 5 7 Male 19 classmates 15
# 21 5 7 Male 19 classmates 16
# 22 5 7 Male 19 classmates 14
# 23 5 7 Male 19 classmates 17
# 24 5 7 Male 19 classmates 17
# 25 5 7 Male 19 classmates 14
# 26 5 7 Male 19 classmates 12
# 27 5 7 Male 19 classmates 15