我可以从列表值中获取数据框中的因子吗?

问题描述 投票:0回答:1

每一行都有多种描述符。我想让它们成为因素。

data <- data.frame(
  type = factor(c("sticky, warm", "ugly, matte", "warm, glittery", "ugly, glittery") )
)

此设置产生:

> data$type
Levels: sticky, warm ugly, glittery ugly, matte warm, glittery

但我希望它是:

> data$type
Levels: sticky warm ugly glittery matte

我尝试在使用 Factor() 之前使用 strsplit() 来制作值列表,但这没有帮助。

一个解决方案是为每个描述符创建多个布尔列,但我希望有更优雅的东西。

r dataframe factors
1个回答
0
投票

您可以为因素创建列表列,并为所有列表使用全部 5 个级别。
它的用处主要取决于您计划使用它的用途。例如,data.frametibble打印方法不会向您显示这种结构中的级别标签,并且过滤/子集设置变得更加冗长。如果您在工作流程中的某个时刻确实需要因素,您可能需要重塑该框架。

无论如何,

data <- data2 <- data.frame(
  type = factor(c("sticky, warm", "ugly, matte", "warm, glittery", "ugly, glittery") )
)

# from factors to list column of strings
data$str_lst <- 
  as.character(data$type) |> strsplit(", ")

# build a vector of level labels
desc_levels <- 
  unlist(data$str_lst) |> unique() |> sort()

# from list column of strings to list column of factors
data$fct_lst <- 
  lapply(data$str_lst, factor, levels = desc_levels)

data
#>             type        str_lst fct_lst
#> 1   sticky, warm   sticky, warm    3, 5
#> 2    ugly, matte    ugly, matte    4, 2
#> 3 warm, glittery warm, glittery    5, 1
#> 4 ugly, glittery ugly, glittery    4, 1

str(data)
#> 'data.frame':    4 obs. of  3 variables:
#>  $ type   : Factor w/ 4 levels "sticky, warm",..: 1 3 4 2
#>  $ str_lst:List of 4
#>   ..$ : chr  "sticky" "warm"
#>   ..$ : chr  "ugly" "matte"
#>   ..$ : chr  "warm" "glittery"
#>   ..$ : chr  "ugly" "glittery"
#>  $ fct_lst:List of 4
#>   ..$ : Factor w/ 5 levels "glittery","matte",..: 3 5
#>   ..$ : Factor w/ 5 levels "glittery","matte",..: 4 2
#>   ..$ : Factor w/ 5 levels "glittery","matte",..: 5 1
#>   ..$ : Factor w/ 5 levels "glittery","matte",..: 4 1

str(desc_levels)
#>  chr [1:5] "glittery" "matte" "sticky" "ugly" "warm"

# subset by factor level:
subset(data, sapply(fct_lst, \(x) any(x == "warm")))
#>             type        str_lst fct_lst
#> 1   sticky, warm   sticky, warm    3, 5
#> 3 warm, glittery warm, glittery    5, 1

或与

dplyr
purrr
:

library(dplyr, warn.conflicts = FALSE)
library(purrr)

data2 |> 
  as_tibble() |> 
  mutate(str_lst = as.character(type) |> strsplit( ", ")) |> 
  mutate(
    fct_lst = map(
      str_lst, 
      \(x, levels) factor(x, levels = levels), 
      levels = unlist(str_lst) |> unique() |> sort()
    )
  ) |> 
  # rowwise for bit less verbose filtering 
  rowwise() |> 
  filter(any(fct_lst == "warm")) |> 
  ungroup() |> 
  print() |> 
  glimpse() 
#> # A tibble: 2 × 3
#>   type           str_lst   fct_lst  
#>   <fct>          <list>    <list>   
#> 1 sticky, warm   <chr [2]> <fct [2]>
#> 2 warm, glittery <chr [2]> <fct [2]>

#> Rows: 2
#> Columns: 3
#> $ type    <fct> "sticky, warm", "warm, glittery"
#> $ str_lst <list> <"sticky", "warm">, <"warm", "glittery">
#> $ fct_lst <list> <sticky, warm>, <warm, glittery>

最新问题
© www.soinside.com 2019 - 2025. All rights reserved.