使用transform()和colsplit()缺少值

问题描述 投票:0回答:2

我有一个df:

Name    Letter
 1      A;B;C;D;E
 2      A;B;C;
 3      A;
 4      A;B;C;D;E

我使用以下代码制作一个df,其中每个Letter被分成它自己的列,使用:

library(reshape2)

new_df = transform(df, taxa = colsplit(Letter, split = ";", names = c("A", "B", "C", "D", "E"))) 

当我这样做时,我得到一个新的df,看起来像:

Name    .A   .B   .C   .D   .E
  1     A    B    C    D    E
  2     A    B    C    C    C
  3     A    A    A    A    A
  4     A    B    C    D    E

我如何做到这一点,以便丢失的字母不会被前一个字母替换,而是由“未分类”等特定指示符替换

Name    .A   .B   .C   .D   .E        
   2     A    B    C    C    C

变为:

Name    .A   .B   .C       .D       .E
   2     A    B    C    unclass  unclass
r dataframe
2个回答
2
投票

我们可以使用cSplit包中的splitstackshape函数。之后,将NA替换为“unclass”。

library(splitstackshape)

df2 <- cSplit(df, "Letter", sep = ";", type.convert = FALSE)

df2[is.na(df2)] <- "unclass"

df2
#    Name Letter_1 Letter_2 Letter_3 Letter_4 Letter_5
# 1:    1        A        B        C        D        E
# 2:    2        A        B        C  unclass  unclass
# 3:    3        A  unclass  unclass  unclass  unclass
# 4:    4        A        B        C        D        E

数据

df <- read.table(text = "Name    Letter
 1      A;B;C;D;E
 2      A;B;C;
 3      A;
 4      A;B;C;D;E",
                 header = TRUE, stringsAsFactors = FALSE)

1
投票

对于tidyverse风格的方法,我提供:

library(tidyr)
library(dplyr)
library(purrr)
library(tibble)

df <- tribble(
  ~name, ~letter,
  1, "A;B;C;D;E",
  2, "A;B;C;E",
  3, "A;",
  4, "A;B;C;D;E",
  5, "D;A;C"
)

df %>%
  mutate(letter = strsplit(letter, ";")) %>%
  unnest %>%
  spread(letter, -name) %>%
  imap_dfr(~case_when(
    .y == "name" ~ as.character(.x),
    is.na(.x) ~ "unclass",
    TRUE ~ .y
  ))

# # A tibble: 5 x 6
#   name  A     B       C       D       E      
#   <chr> <chr> <chr>   <chr>   <chr>   <chr>  
# 1 1     A     B       C       D       E      
# 2 2     A     B       C       unclass E      
# 3 3     A     unclass unclass unclass unclass
# 4 4     A     B       C       D       E      
# 5 5     A     unclass C       D       unclass

注:这里的关键好处是当序列中存在间隙或者它不正常时,列位置会得到尊重,当name == 2A;B;C;Ename == 5D;A;C时,请参见更改的值。

© www.soinside.com 2019 - 2024. All rights reserved.