如何找出哪个图与另一个图最相似？

Question

我在 R 中有这个矩阵列表：

 my_list =  structure(list(
        matrix(c(2,2,2,2,3, 1,2,2,2,3, 1,2,3,3,3, 1,2,1,3,3, 1,1,1,3,3), nrow=5, byrow=TRUE),
        matrix(c(1,1,1,2,2, 1,1,1,2,2, 1,1,1,3,3, 1,1,3,3,3, 1,1,3,3,3), nrow=5, byrow=TRUE),
        matrix(c(2,2,2,3,3, 2,2,2,3,3, 1,1,3,3,3, 1,1,3,3,3, 1,1,1,1,1), nrow=5, byrow=TRUE),
        matrix(c(3,3,3,3,3, 2,2,3,3,3, 2,2,2,3,3, 2,2,2,1,1, 1,1,1,1,1), nrow=5, byrow=TRUE),
        matrix(c(3,3,3,3,3, 1,1,1,2,2, 1,1,1,2,2, 1,1,1,2,2, 1,1,2,2,2), nrow=5, byrow=TRUE),
        matrix(c(1,1,1,3,3, 1,1,1,3,3, 1,1,1,2,2, 1,1,2,2,2, 1,1,2,2,2), nrow=5, byrow=TRUE),
        matrix(c(1,3,3,2,2, 1,3,3,2,2, 1,1,3,3,2, 1,1,1,3,2, 1,1,1,1,2), nrow=5, byrow=TRUE),
        matrix(c(1,1,1,2,2, 1,1,1,2,2, 1,1,1,2,3, 1,1,1,2,3, 1,1,2,2,2), nrow=5, byrow=TRUE),
        matrix(c(2,2,2,2,2, 3,3,3,3,3, 3,3,3,3,3, 1,1,3,3,3, 1,1,3,3,3), nrow=5, byrow=TRUE),
        matrix(c(1,1,1,1,1, 1,1,1,1,2, 3,1,1,1,2, 3,3,2,2,2, 3,3,2,2,2), nrow=5, byrow=TRUE),
        matrix(c(3,1,1,1,1, 3,3,1,1,1, 3,3,2,2,1, 2,2,2,1,1, 2,2,2,1,1), nrow=5, byrow=TRUE),
        matrix(c(3,3,3,3,3, 3,3,1,3,1, 2,2,1,1,1, 2,2,1,1,1, 2,2,1,1,1), nrow=5, byrow=TRUE),
        matrix(c(3,3,1,1,1, 3,1,1,1,1, 3,2,2,1,1, 3,2,2,2,2, 3,3,2,2,2), nrow=5, byrow=TRUE),
        matrix(c(1,1,1,1,1, 1,1,1,1,2, 1,1,1,1,2, 3,3,3,2,2, 3,3,2,2,2), nrow=5, byrow=TRUE),
        matrix(c(1,1,1,1,2, 1,1,1,3,2, 1,3,1,3,2, 1,3,3,3,2, 1,1,3,2,2), nrow=5, byrow=TRUE),
        matrix(c(3,3,3,3,3, 3,3,2,2,2, 3,3,2,2,2, 1,1,2,2,2, 1,1,2,2,2), nrow=5, byrow=TRUE),
        matrix(c(1,1,1,3,3, 1,1,1,3,3, 1,1,2,2,3, 1,1,1,2,3, 1,1,2,2,2), nrow=5, byrow=TRUE),
        matrix(c(1,1,1,1,3, 1,1,2,3,3, 1,1,2,3,3, 1,1,2,2,2, 1,2,2,2,2), nrow=5, byrow=TRUE),
        matrix(c(1,1,1,1,1, 1,2,2,1,1, 1,1,2,2,1, 3,3,2,2,2, 2,2,2,2,2), nrow=5, byrow=TRUE),
        matrix(c(3,3,3,1,1, 1,1,1,1,2, 1,1,1,1,2, 1,2,1,2,2, 1,2,2,2,2), nrow=5, byrow=TRUE),
        matrix(c(2,3,3,3,3, 2,3,3,3,3, 2,3,3,3,3, 2,2,2,1,1, 1,1,1,1,1), nrow=5, byrow=TRUE),
        matrix(c(2,2,2,2,2, 2,2,2,2,2, 2,2,2,2,2, 2,3,1,1,2, 3,3,1,1,1), nrow=5, byrow=TRUE),
        matrix(c(1,1,1,1,1, 1,1,1,1,1, 3,1,1,1,1, 2,2,1,1,1, 2,2,1,1,1), nrow=5, byrow=TRUE),
        matrix(c(1,1,1,3,3, 2,1,1,3,3, 2,2,1,1,3, 2,2,2,1,1, 2,2,2,2,2), nrow=5, byrow=TRUE),
        matrix(c(1,1,1,1,1, 1,1,1,1,1, 3,3,3,3,1, 2,2,2,2,1, 2,2,2,2,1), nrow=5, byrow=TRUE),
        matrix(c(3,3,3,3,1, 3,3,2,1,1, 3,3,2,1,1, 3,3,2,2,2, 3,3,2,2,2), nrow=5, byrow=TRUE),
        matrix(c(3,3,3,1,1, 3,3,3,1,1, 3,2,2,1,1, 2,2,2,1,1, 2,2,2,1,1), nrow=5, byrow=TRUE),
        matrix(c(2,2,2,1,1, 2,2,2,1,1, 2,2,1,1,1, 3,2,2,1,1, 3,3,1,1,1), nrow=5, byrow=TRUE),
        matrix(c(1,1,1,1,1, 1,2,1,1,1, 1,2,2,1,1, 1,1,2,3,3, 1,1,2,3,3), nrow=5, byrow=TRUE),
        matrix(c(1,1,3,3,3, 1,2,2,2,3, 1,2,2,3,3, 1,2,2,3,3, 1,1,1,3,3), nrow=5, byrow=TRUE),
        matrix(c(3,1,1,1,1, 3,1,1,1,1, 3,3,1,1,2, 3,1,1,2,2, 3,2,2,2,2), nrow=5, byrow=TRUE),
        matrix(c(1,1,1,3,3, 1,1,1,3,3, 2,3,3,3,3, 2,3,3,2,2, 2,2,2,2,2), nrow=5, byrow=TRUE),
        matrix(c(3,2,2,2,2, 3,2,2,2,2, 3,1,2,1,1, 3,1,1,1,1, 3,3,3,3,1), nrow=5, byrow=TRUE),
        matrix(c(3,3,3,3,3, 3,3,3,1,1, 2,1,1,1,1, 2,2,2,2,2, 2,2,2,2,2), nrow=5, byrow=TRUE),
        matrix(c(3,3,3,3,2, 3,3,3,3,2, 3,1,1,1,2, 3,1,1,1,1, 3,1,1,1,1), nrow=5, byrow=TRUE),
        matrix(c(3,3,2,2,2, 3,1,1,2,2, 3,1,1,2,2, 1,1,1,2,2, 1,1,1,2,2), nrow=5, byrow=TRUE)
    ), class = "list")

然后我使用以下代码绘制了所有这些：

library(ggplot2)
library(gridExtra)
library(reshape2)
library(dplyr)

plot_matrix <- function(mat, plot_number) {
    df <- melt(mat)
    names(df) <- c("row", "col", "value")
    
    df$index <- (df$row - 1) * 5 + df$col
    
    colors <- c(
        "1" = "#FFB3B3",
        "2" = "#B3D9FF",
        "3" = "#B3FFB3"
    )
    
    p <- ggplot(df, aes(x = col, y = -row, fill = factor(value))) +
        geom_tile(color = "black", linewidth = 0.5) +
        geom_text(aes(label = index), size = 3) +
        scale_fill_manual(values = colors) +
        labs(title = paste("Object", plot_number)) +
        coord_equal() +
        theme_minimal() +
        theme(
            legend.position = "none",
            plot.title = element_text(hjust = 0.5, margin = margin(b = 10)),
            axis.text = element_blank(),
            axis.title = element_blank(),
            panel.grid = element_blank(),
            plot.margin = margin(5, 5, 5, 5)
        )
    
    return(p)
}

plot_list <- lapply(seq_along(my_list), function(i) {
    plot_matrix(my_list[[i]], i)
})

n_plots <- length(plot_list)
n_cols <- 6
n_rows <- ceiling(n_plots / n_cols)

grid.arrange(
    grobs = plot_list,
    ncol = n_cols,
    nrow = n_rows,
    padding = unit(2, "mm")
)

我有以下问题：如果我们采用对象 1 - 我们可以根据以下条件找出其余对象中哪个与对象 1“最相似”：A) 颜色分布和 B) 颜色形状边界和 C) 颜色边界的放置？

我目前的方法是分别回答每一个问题并对其进行平均。例如：

A) 找出每个对象的颜色分布作为向量，并取对象 1 和所有其他对象之间的欧氏距离。
B) 和 C) 使用对象 1 和所有其他对象之间的 Jaccard 距离或 Hausdorf 距离
取所有差异的平均值来了解一般相似性。平均值最低的对 (object1, object_i) 最相似

我不确定这种方法有多正确，我想知道是否有更简单的方法。

Answer 1

这个问题很难回答，但是A部分和B部分应该相对简单，

hclust()

并且可以计算某种jaccard距离（尽管不是传统的jaccard“有多少共同点”，因为所有对象都有相同的值（1、2 和 3），但它们处于不同的位置）。

我正在努力解决的部分是：

B) 颜色边界的形状和 C) 颜色边界的位置？

这超出了我的专业领域，但也许聚类类型的方法可能会帮助您入门。

例如

library(tidyverse)
my_list =  structure(list(
  matrix(c(2,2,2,2,3, 1,2,2,2,3, 1,2,3,3,3, 1,2,1,3,3, 1,1,1,3,3), nrow=5, byrow=TRUE),
  matrix(c(1,1,1,2,2, 1,1,1,2,2, 1,1,1,3,3, 1,1,3,3,3, 1,1,3,3,3), nrow=5, byrow=TRUE),
  matrix(c(2,2,2,3,3, 2,2,2,3,3, 1,1,3,3,3, 1,1,3,3,3, 1,1,1,1,1), nrow=5, byrow=TRUE),
  matrix(c(3,3,3,3,3, 2,2,3,3,3, 2,2,2,3,3, 2,2,2,1,1, 1,1,1,1,1), nrow=5, byrow=TRUE),
  matrix(c(3,3,3,3,3, 1,1,1,2,2, 1,1,1,2,2, 1,1,1,2,2, 1,1,2,2,2), nrow=5, byrow=TRUE),
  matrix(c(1,1,1,3,3, 1,1,1,3,3, 1,1,1,2,2, 1,1,2,2,2, 1,1,2,2,2), nrow=5, byrow=TRUE),
  matrix(c(1,3,3,2,2, 1,3,3,2,2, 1,1,3,3,2, 1,1,1,3,2, 1,1,1,1,2), nrow=5, byrow=TRUE),
  matrix(c(1,1,1,2,2, 1,1,1,2,2, 1,1,1,2,3, 1,1,1,2,3, 1,1,2,2,2), nrow=5, byrow=TRUE),
  matrix(c(2,2,2,2,2, 3,3,3,3,3, 3,3,3,3,3, 1,1,3,3,3, 1,1,3,3,3), nrow=5, byrow=TRUE),
  matrix(c(1,1,1,1,1, 1,1,1,1,2, 3,1,1,1,2, 3,3,2,2,2, 3,3,2,2,2), nrow=5, byrow=TRUE),
  matrix(c(3,1,1,1,1, 3,3,1,1,1, 3,3,2,2,1, 2,2,2,1,1, 2,2,2,1,1), nrow=5, byrow=TRUE),
  matrix(c(3,3,3,3,3, 3,3,1,3,1, 2,2,1,1,1, 2,2,1,1,1, 2,2,1,1,1), nrow=5, byrow=TRUE),
  matrix(c(3,3,1,1,1, 3,1,1,1,1, 3,2,2,1,1, 3,2,2,2,2, 3,3,2,2,2), nrow=5, byrow=TRUE),
  matrix(c(1,1,1,1,1, 1,1,1,1,2, 1,1,1,1,2, 3,3,3,2,2, 3,3,2,2,2), nrow=5, byrow=TRUE),
  matrix(c(1,1,1,1,2, 1,1,1,3,2, 1,3,1,3,2, 1,3,3,3,2, 1,1,3,2,2), nrow=5, byrow=TRUE),
  matrix(c(3,3,3,3,3, 3,3,2,2,2, 3,3,2,2,2, 1,1,2,2,2, 1,1,2,2,2), nrow=5, byrow=TRUE),
  matrix(c(1,1,1,3,3, 1,1,1,3,3, 1,1,2,2,3, 1,1,1,2,3, 1,1,2,2,2), nrow=5, byrow=TRUE),
  matrix(c(1,1,1,1,3, 1,1,2,3,3, 1,1,2,3,3, 1,1,2,2,2, 1,2,2,2,2), nrow=5, byrow=TRUE),
  matrix(c(1,1,1,1,1, 1,2,2,1,1, 1,1,2,2,1, 3,3,2,2,2, 2,2,2,2,2), nrow=5, byrow=TRUE),
  matrix(c(3,3,3,1,1, 1,1,1,1,2, 1,1,1,1,2, 1,2,1,2,2, 1,2,2,2,2), nrow=5, byrow=TRUE),
  matrix(c(2,3,3,3,3, 2,3,3,3,3, 2,3,3,3,3, 2,2,2,1,1, 1,1,1,1,1), nrow=5, byrow=TRUE),
  matrix(c(2,2,2,2,2, 2,2,2,2,2, 2,2,2,2,2, 2,3,1,1,2, 3,3,1,1,1), nrow=5, byrow=TRUE),
  matrix(c(1,1,1,1,1, 1,1,1,1,1, 3,1,1,1,1, 2,2,1,1,1, 2,2,1,1,1), nrow=5, byrow=TRUE),
  matrix(c(1,1,1,3,3, 2,1,1,3,3, 2,2,1,1,3, 2,2,2,1,1, 2,2,2,2,2), nrow=5, byrow=TRUE),
  matrix(c(1,1,1,1,1, 1,1,1,1,1, 3,3,3,3,1, 2,2,2,2,1, 2,2,2,2,1), nrow=5, byrow=TRUE),
  matrix(c(3,3,3,3,1, 3,3,2,1,1, 3,3,2,1,1, 3,3,2,2,2, 3,3,2,2,2), nrow=5, byrow=TRUE),
  matrix(c(3,3,3,1,1, 3,3,3,1,1, 3,2,2,1,1, 2,2,2,1,1, 2,2,2,1,1), nrow=5, byrow=TRUE),
  matrix(c(2,2,2,1,1, 2,2,2,1,1, 2,2,1,1,1, 3,2,2,1,1, 3,3,1,1,1), nrow=5, byrow=TRUE),
  matrix(c(1,1,1,1,1, 1,2,1,1,1, 1,2,2,1,1, 1,1,2,3,3, 1,1,2,3,3), nrow=5, byrow=TRUE),
  matrix(c(1,1,3,3,3, 1,2,2,2,3, 1,2,2,3,3, 1,2,2,3,3, 1,1,1,3,3), nrow=5, byrow=TRUE),
  matrix(c(3,1,1,1,1, 3,1,1,1,1, 3,3,1,1,2, 3,1,1,2,2, 3,2,2,2,2), nrow=5, byrow=TRUE),
  matrix(c(1,1,1,3,3, 1,1,1,3,3, 2,3,3,3,3, 2,3,3,2,2, 2,2,2,2,2), nrow=5, byrow=TRUE),
  matrix(c(3,2,2,2,2, 3,2,2,2,2, 3,1,2,1,1, 3,1,1,1,1, 3,3,3,3,1), nrow=5, byrow=TRUE),
  matrix(c(3,3,3,3,3, 3,3,3,1,1, 2,1,1,1,1, 2,2,2,2,2, 2,2,2,2,2), nrow=5, byrow=TRUE),
  matrix(c(3,3,3,3,2, 3,3,3,3,2, 3,1,1,1,2, 3,1,1,1,1, 3,1,1,1,1), nrow=5, byrow=TRUE),
  matrix(c(3,3,2,2,2, 3,1,1,2,2, 3,1,1,2,2, 1,1,1,2,2, 1,1,1,2,2), nrow=5, byrow=TRUE)
), class = "list")

library(ggplot2)
library(gridExtra)
#> 
#> Attaching package: 'gridExtra'
#> The following object is masked from 'package:dplyr':
#> 
#>     combine
library(reshape2)
#> 
#> Attaching package: 'reshape2'
#> The following object is masked from 'package:tidyr':
#> 
#>     smiths
library(dplyr)

plot_matrix <- function(mat, plot_number) {
  df <- melt(mat)
  names(df) <- c("row", "col", "value")
  
  df$index <- (df$row - 1) * 5 + df$col
  
  colors <- c(
    "1" = "#FFB3B3",
    "2" = "#B3D9FF",
    "3" = "#B3FFB3"
  )
  
  p <- ggplot(df, aes(x = col, y = -row, fill = factor(value))) +
    geom_tile(color = "black", linewidth = 0.5) +
    geom_text(aes(label = index), size = 3) +
    scale_fill_manual(values = colors) +
    labs(title = paste("Object", plot_number)) +
    coord_equal() +
    theme_minimal() +
    theme(
      legend.position = "none",
      plot.title = element_text(hjust = 0.5, margin = margin(b = 10)),
      axis.text = element_blank(),
      axis.title = element_blank(),
      panel.grid = element_blank(),
      plot.margin = margin(5, 5, 5, 5)
    )
  
  return(p)
}

plot_list <- lapply(seq_along(my_list), function(i) {
  plot_matrix(my_list[[i]], i)
})

n_plots <- length(plot_list)
n_cols <- 6
n_rows <- ceiling(n_plots / n_cols)

grid.arrange(
  grobs = plot_list,
  ncol = n_cols,
  nrow = n_rows,
  padding = unit(2, "mm")
)

starting plot

my_list
#> [[1]]
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    2    2    2    2    3
#> [2,]    1    2    2    2    3
#> [3,]    1    2    3    3    3
#> [4,]    1    2    1    3    3
#> [5,]    1    1    1    3    3
#> 
#> [[2]]
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    1    1    1    2    2
#> [2,]    1    1    1    2    2
#> [3,]    1    1    1    3    3
#> [4,]    1    1    3    3    3
#> [5,]    1    1    3    3    3
#> 
#> ...
#> 
#> [[36]]
#>      [,1] [,2] [,3] [,4] [,5]
#> [1,]    3    3    2    2    2
#> [2,]    3    1    1    2    2
#> [3,]    3    1    1    2    2
#> [4,]    1    1    1    2    2
#> [5,]    1    1    1    2    2
#> 
#> attr(,"class")
#> [1] "list"

# combine all of the melted matrices into a single dataframe
list_of_dfs <- map(seq_along(my_list), ~melt(my_list[[.x]]) %>%
                     mutate(id = .x) %>%
                     pivot_wider(id_cols = id,
                                 names_from = c(Var1, Var2),
                                 values_from = value)) %>%
  bind_rows()

# get all of the combinations, e.g. Object 1 vs 1, 1 vs 2, etc
combinations <- expand.grid(1:nrow(list_of_dfs), 1:nrow(list_of_dfs)) %>%
  filter(Var1 != Var2)
output <- map2(combinations$Var2, combinations$Var1, ~sum(list_of_dfs[.x,] == list_of_dfs[.y,]))
combinations$total_matches <- unlist(output)

# check the top matches i.e. how many values are in the same place
head(combinations[order(combinations$total_matches, decreasing = TRUE),], 15)
#>     Var1 Var2 total_matches
#> 328   14   10            23
#> 465   10   14            23
#> 125   21    4            21
#> 191   17    6            21
#> 566    6   17            21
#> 704    4   21            21
#> 261   17    8            20
#> 376   27   11            20
#> 568    8   17            20
#> 921   11   27            20
#> 29    30    1            19
#> 42     8    2            19
#> 145    6    5            19
#> 159   20    5            19
#> 175   36    5            19

看起来对象 30 与对象 1 最匹配（即它在矩阵中的相同位置有 19 个值）。使用 hclust 你会得到相同的答案：


list_of_dfs <- map(seq_along(my_list), ~melt(my_list[[.x]]) %>%
                     mutate(id = .x) %>%
                     pivot_wider(id_cols = id,
                                 names_from = c(Var1, Var2),
                                 values_from = value)) %>%
  bind_rows() %>%
  dplyr::mutate(across(-id, ~scale(.x)))

hc <- hclust(dist(list_of_dfs[-1]), method = "complete")
plot(hc)

hclust diagram

^{创建于 2024 年 12 月 16 日，使用 reprex v2.1.0}

希望这有帮助；祝你好运！

如何找出哪个图与另一个图最相似？

问题描述投票：0回答：1

1个回答

最新问题

如何找出哪个图与另一个图最相似？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1