假设我们获取了
mtcars
数据并运行了 PCA。然后,我们想知道哪些品牌的汽车在 PC 空间中最相似,即最近的邻居。所以有人进行了最近邻分析,并记录下来。
然后,我得到的是一个如下所示的数据框,其中焦点汽车作为
car
列,第一个和第二个最近的邻居 n1
和 n2
列在各自的列中。
tibble(car = c("Honda", "Toyota", "Mazda", "Fiat", "Lotus"),
nn1 = c("Toyota", "Honda", "Toyota", "Lotus", "Mazda"),
nn2 = c("Mazda","Mazda", "Honda", "Honda", "Fiat"))
# A tibble: 5 × 3
car nn1 nn2
<chr> <chr> <chr>
1 Honda Toyota Mazda
2 Toyota Honda Mazda
3 Mazda Toyota Honda
4 Fiat Lotus Honda
5 Lotus Mazda Fiat
我想将其转换为 onehot 风格的数据框,其中 5 个焦点汽车品牌是行,列是可能的邻居,每个编码为 0 或 1,具体取决于它是否是最近的邻居之一焦点车。所以作为一个小标题,它看起来像这样:
# A tibble: 5 × 6
cars Honda Toyota Mazda Fiat Lotus
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Honda 0 1 1 0 0
2 Toyota 1 0 1 0 0
3 Mazda 1 1 0 0 0
4 Fiat 1 0 0 0 1
5 Lotus 0 0 0 1 1
或者它可能是这样的数据框:
Honda Toyota Mazda Fiat Lotus
Honda 0 1 1 0 0
Toyota 1 0 1 0 0
Mazda 1 1 0 0 0
Fiat 1 0 0 0 1
Lotus 0 0 0 1 1
与其说是一个单热编码矩阵,不如说是一个邻接矩阵。调用您的数据
df
:
library(tidyr)
library(dplyr)
df |>
pivot_longer(-car) |>
mutate(fill = 1) |>
pivot_wider(id_cols = car, names_from = value, values_from = fill, values_fill = 0)
# # A tibble: 5 × 6
# car Toyota Mazda Honda Lotus Fiat
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 Honda 1 1 0 0 0
# 2 Toyota 0 1 1 0 0
# 3 Mazda 1 0 1 0 0
# 4 Fiat 0 0 1 1 0
# 5 Lotus 0 1 0 0 1
也许你可以尝试
table
,如下所示
> with(df, table(rep(car, each = ncol(df) - 1), t(df[-1])))
Fiat Honda Lotus Mazda Toyota
Fiat 0 1 1 0 0
Honda 0 0 0 1 1
Lotus 1 0 0 1 0
Mazda 0 1 0 0 1
Toyota 0 1 0 1 0
as.data.frame.matrix(table(reshape2::melt(df, id = 1)[-2]))
#> Fiat Honda Lotus Mazda Toyota
#> Fiat 0 1 1 0 0
#> Honda 0 0 0 1 1
#> Lotus 1 0 0 1 0
#> Mazda 0 1 0 0 1
#> Toyota 0 1 0 1 0