每一行都有多种描述符。我想让它们成为因素。
data <- data.frame(
type = factor(c("sticky, warm", "ugly, matte", "warm, glittery", "ugly, glittery") )
)
此设置产生:
> data$type
Levels: sticky, warm ugly, glittery ugly, matte warm, glittery
但我希望它是:
> data$type
Levels: sticky warm ugly glittery matte
我尝试在使用 Factor() 之前使用 strsplit() 来制作值列表,但这没有帮助。
一个解决方案是为每个描述符创建多个布尔列,但我希望有更优雅的东西。
您可以为因素创建列表列,并为所有列表使用全部 5 个级别。
它的用处主要取决于您计划使用它的用途。例如,data.frame和tibble打印方法不会向您显示这种结构中的级别标签,并且过滤/子集设置变得更加冗长。如果您在工作流程中的某个时刻确实需要因素,您可能需要重塑该框架。
无论如何,
data <- data2 <- data.frame(
type = factor(c("sticky, warm", "ugly, matte", "warm, glittery", "ugly, glittery") )
)
# from factors to list column of strings
data$str_lst <-
as.character(data$type) |> strsplit(", ")
# build a vector of level labels
desc_levels <-
unlist(data$str_lst) |> unique() |> sort()
# from list column of strings to list column of factors
data$fct_lst <-
lapply(data$str_lst, factor, levels = desc_levels)
data
#> type str_lst fct_lst
#> 1 sticky, warm sticky, warm 3, 5
#> 2 ugly, matte ugly, matte 4, 2
#> 3 warm, glittery warm, glittery 5, 1
#> 4 ugly, glittery ugly, glittery 4, 1
str(data)
#> 'data.frame': 4 obs. of 3 variables:
#> $ type : Factor w/ 4 levels "sticky, warm",..: 1 3 4 2
#> $ str_lst:List of 4
#> ..$ : chr "sticky" "warm"
#> ..$ : chr "ugly" "matte"
#> ..$ : chr "warm" "glittery"
#> ..$ : chr "ugly" "glittery"
#> $ fct_lst:List of 4
#> ..$ : Factor w/ 5 levels "glittery","matte",..: 3 5
#> ..$ : Factor w/ 5 levels "glittery","matte",..: 4 2
#> ..$ : Factor w/ 5 levels "glittery","matte",..: 5 1
#> ..$ : Factor w/ 5 levels "glittery","matte",..: 4 1
str(desc_levels)
#> chr [1:5] "glittery" "matte" "sticky" "ugly" "warm"
# subset by factor level:
subset(data, sapply(fct_lst, \(x) any(x == "warm")))
#> type str_lst fct_lst
#> 1 sticky, warm sticky, warm 3, 5
#> 3 warm, glittery warm, glittery 5, 1
或与
dplyr
和 purrr
:
library(dplyr, warn.conflicts = FALSE)
library(purrr)
data2 |>
as_tibble() |>
mutate(str_lst = as.character(type) |> strsplit( ", ")) |>
mutate(
fct_lst = map(
str_lst,
\(x, levels) factor(x, levels = levels),
levels = unlist(str_lst) |> unique() |> sort()
)
) |>
# rowwise for bit less verbose filtering
rowwise() |>
filter(any(fct_lst == "warm")) |>
ungroup() |>
print() |>
glimpse()
#> # A tibble: 2 × 3
#> type str_lst fct_lst
#> <fct> <list> <list>
#> 1 sticky, warm <chr [2]> <fct [2]>
#> 2 warm, glittery <chr [2]> <fct [2]>
#> Rows: 2
#> Columns: 3
#> $ type <fct> "sticky, warm", "warm, glittery"
#> $ str_lst <list> <"sticky", "warm">, <"warm", "glittery">
#> $ fct_lst <list> <sticky, warm>, <warm, glittery>