按id和变量类型重塑数据帧

问题描述 投票:3回答:3

我无法重新排列以下数据框:

dat1 <- data.frame(
   id = rep(1, 4),
   var = paste0(rep(c("firstName",  "secondName"), each= 2), c(rep(1:2, 2))),
   value = c(1:4)
 )
dat2 <- data.frame(
   id = rep(2,3),
   var = paste0(rep(c("firstName", "secondName"), each= 2)[1:3], c(rep(1:2, 
2))[1:3]),
  value = c(5:7)
)
dat = rbind(dat1, dat2)
dat$type = gsub('[0-9]', '', dat$var)
# > dat
# id         var value
# 1  1  firstName1     1
# 2  1  firstName2     2
# 3  1 secondName1     3
# 4  1 secondName2     4
# 5  2  firstName1     5
# 6  2  firstName2     6
# 7  2 secondName1     7

我想得到以下结果:

id firstName  secondName
 1  1          3 
 1  2          4
 2  5          7
 2  6          NA

我试过unstack(dat, form = value ~ type)但它不起作用。

问题更新:firstName1应该与secondName1对应,所以如果我将dat2更改为

  dat2 <- data.frame(id = rep(2,3),var =paste0(rep(c("firstName", "secondName"), each= 2)[2:4], c(rep(1:2, 2))[2:4]),value = c(5:7))
# > dat
#    id         var value       type
# 1:  1  firstName1     1  firstName
# 2:  1  firstName2     2  firstName
# 3:  1 secondName1     3 secondName
# 4:  1 secondName2     4 secondName
# 5:  2  firstName2     5  firstName
# 6:  2 secondName1     6 secondName
# 7:  2 secondName2     7 secondName

对于id = 2,他的名字应该是c(NA,6)和c(5,7)。那么如何处理这种情况呢?

r dataframe reshape
3个回答
6
投票

在我看来,更好的选择是使用rowiddata.table函数:

library(data.table)
dcast(setDT(dat), id + rowid(type) ~ type, value.var = 'value')[, type := NULL][]

这使:

   id firstName secondName
1:  1         1          3
2:  1         2          4
3:  2         5          7
4:  2         6         NA

对于更新的问题:

setDT(dat)[, num := gsub('.*([0-9])', '\\1', var)
           ][, dcast(.SD, id + num ~ type, value.var = 'value')
             ][, num := NULL][]

这使:

   id firstName secondName
1:  1         1          3
2:  1         2          4
3:  2        NA          6
4:  2         5          7

3
投票

试试dcast

res <- data.table::dcast(
    dat,
    id  + substring(as.character(var), nchar(as.character(var))) ~ type,
    value.var = 'value')

res[2] <- NULL

# > res
#   id firstName secondName
# 1  1         1          3
# 2  1         2          4
# 3  2         5          7
# 4  2         6         NA

substring(as.character(var), nchar(as.character(var)))用于将第二列的最后一个字符作为组变量。


3
投票

library(tidyr)

rbind(dat1,dat2) %>% separate(var,c("name","index"),"(?=\\d+$)") %>%
spread(key=name,value=value)

Result

  id index firstName secondName
1  1     1         1          3
2  1     2         2          4
3  2     1         5          7
4  2     2         6         NA

Note

如果你想删除col %>% dplyr::select(-index),最后添加index

© www.soinside.com 2019 - 2024. All rights reserved.