如何使用 hclust 和 cutreeDynamic 创建按簇着色的树状图

问题描述 投票:0回答:1

我正在研究聚类问题,我想使用 hclust 函数创建树状图,并使用 cutreeDynamic 从上述树状图创建聚类。事实上,我已经做到了。

# Preprocessing data for only numeric features
  omicData_clustering <- omicData

  omicData_clustering[[classVariable]] <- clinicDataSVM[[classVariable]]

  omicData_clustering <- omicData_clustering[omicData_clustering[[classVariable]] %in% c(changedClass, class), ]

  omicData_clustering <- omicData_clustering[,
                                                                       -which(names(omicData_clustering) %in% c(idColumn))]
  omicData_num <- omicData_clustering[,
                                                                -which(names(omicData_clustering) %in% c(classVariable))]
  # scale the data
  omicData_clustering_scaled <- scale(omicData_num)
  
  # getting dist
  dist <- dist(omicData_clustering_scaled)
  
  # doing hclust
  hc <- hclust(dist, method = "complete")
  
  # number of changed class for the minimum cluster size
  num <- sum(clinicDataSVM[[classVariable]] == changedClass)
  
  # getting dynamic clusters
  dynamic_clusters <- cutreeDynamic(hc, distM = as.matrix(dist), minClusterSize = num)
  
  # getting only changed class labels position
  labels <- omicData_clustering[[classVariable]]
  labels[labels != changedClass] <- ""

其中“dynamic_clusters”具有以下值,例如:

> dynamic_clusters
7 1 4 1 1 3 7 2 4 6 1 1 2 3 3 2 1 2 6 1 1 3 1 2 2 7 1 6 7 1 1 2 1 6 3 7 1 2 7 1 5 2 6 6 7 2 6 6 5 7 3 1 6 5 1 2 2 6 2 1 6 7 4 6 2 1 4 1 6 5 4 4 7 1 
4 1 5 1 1 6 4 2 5 3 1 1 2 6 6 2 1 2 3 1 1 6 1 2 2 4 1 3 4 1 1 2 1 3 6 4 1 2 4 1 7 2 3 3 4 2 3 3 7 4 6 1 3 7 1 2 2 3 2 1 3 4 5 3 2 1 5 1 3 7 5 5 4 1 
4 7 2 2 1 1 5 1 6 3 4 6 7 5 2 7 5 6 5 1 4 4 7 3 5 2 4 2 6 2 7 1 1 1 2 7 2 2 6 7 6 3 6 7 1 5 2 7 4 2 1 3 7 6 1 4 6 2 2 5 7 3 7 2 7 2 6 1 6 6 1 6 1 1 
5 4 2 2 1 1 7 1 3 6 5 3 4 7 2 4 7 3 7 1 5 5 4 6 7 2 5 2 3 2 4 1 1 1 2 4 2 2 3 4 3 6 3 4 1 7 2 4 5 2 1 6 4 3 1 5 3 2 2 7 4 6 4 2 4 2 3 1 3 3 1 3 1 1 
3 7 6 4 7 4 2 2 7 7 7 4 4 5 2 3 4 1 2 4 1 1 3 6 2 6 2 
6 4 3 5 4 5 2 2 4 4 4 5 5 7 2 6 5 1 2 5 1 1 6 3 2 3 2 

在标签中,我有以下内容:

> labels
  [1] ""             ""             ""             ""             ""             ""             "Control2Case" ""             ""            
 [10] ""             ""             ""             "Control2Case" ""             ""             ""             ""             ""            
 [19] ""             ""             ""             ""             ""             ""             ""             ""             ""            
 [28] ""             ""             ""             "Control2Case" ""             ""             ""             ""             ""            
 [37] ""             ""             ""             ""             "Control2Case" ""             ""             "Control2Case" ""            
 [46] ""             ""             ""             "Control2Case" ""             ""             ""             ""             ""            
 [55] ""             ""             ""             ""             ""             ""             ""             "Control2Case" ""            
 [64] ""             ""             ""             ""             ""             ""             ""             ""             ""            
 [73] ""             ""             ""             ""             ""             ""             ""             ""             ""            
 [82] ""             ""             ""             ""             ""             ""             ""             ""             ""            
 [91] ""             ""             ""             ""             ""             ""             ""             ""             ""            
[100] ""             ""             ""             ""             ""             ""             ""             ""             ""            
[109] ""             ""             ""             ""             ""             ""             "Control2Case" ""             ""            
[118] ""             ""             ""             ""             ""             ""             ""             ""             ""            
[127] ""             ""             ""             ""             ""             ""             ""             ""             "Control2Case"
[136] ""             ""             ""             ""             ""             ""             ""             ""             ""            
[145] ""             ""             ""             ""             ""             ""             ""             ""             ""            
[154] ""             ""             ""             ""             ""             ""             ""             ""             ""            
[163] ""             ""             ""             ""             ""             ""             ""             ""             ""            
[172] ""             ""             ""             ""

问题是我想用聚类绘制树状图并确定“Control2Case”属于哪些聚类。这可能吗?

我输入了以下代码(来自https://cran.r-project.org/web/packages/dendextend/vignettes/dendextend.html):

library(dynamicTreeCut)
data(iris)
x  <- iris[,-5] %>% as.matrix
hc <- x %>% dist %>% hclust
dend <- hc %>% as.dendrogram 

# Find special clusters:
clusters <- cutreeDynamic(hc, distM = as.matrix(dist(x)), method = "tree")
# we need to sort them to the order of the dendrogram:
clusters <- clusters[order.dendrogram(dend)]
clusters_numbers <- unique(clusters) - (0 %in% clusters)
n_clusters <- length(clusters_numbers)

library(colorspace)
cols <- rainbow_hcl(n_clusters)
true_species_cols <- rainbow_hcl(3)[as.numeric(iris[,][order.dendrogram(dend),5])]
dend2 <- dend %>% 
         branches_attr_by_clusters(clusters, values = cols) %>% 
         color_labels(col =   true_species_cols)
plot(dend2)
clusters <- factor(clusters)
levels(clusters)[-1]  <- cols[-5][c(1,4,2,3)] 
   # Get the clusters to have proper colors.
   # fix the order of the colors to match the branches.
colored_bars(clusters, dend, sort_by_labels_order = FALSE)

enter image description here

但我不知道如何使其适应我的具体问题,因为之前的代码中有一些特定于 Iris 问题的行,我不明白它们为什么在那里。

r cluster-analysis hierarchical-clustering
1个回答
0
投票

ggalign的开发版本中,我引入了一个新的

cutree
参数,允许用户应用任何自定义函数来进行树木切割。只需将
iris
数据替换为您的数据即可。该对象是一个类似ggplot的对象,您可以通过映射为分支着色。

library(ggalign)
#> Loading required package: ggplot2
ggstack(iris[, -5L], "v") +
    align_dendro(
        aes(color = branch),
        cutree = function(tree, dist, k, h) {
            dynamicTreeCut::cutreeDynamic(tree, distM = dist, method = "tree")
        }
    ) +
    scale_y_continuous(expand = expansion()) +
    scale_color_brewer(palette = "Dark2") +
    theme(axis.text.x = element_text(angle = -90, hjust = 0))

创建于 2024 年 10 月 13 日,使用 reprex v2.1.0

enter image description here

© www.soinside.com 2019 - 2024. All rights reserved.