来自 ggplot 的 R 密度散布

Question

我有这个伪批量单细胞数据集。

se1 <- MakePseudoBulk(scData, group.var = "sample_type", assay = "RNA", sample_id = NA, verbose = F)

我现在在散点图上关联两个不同的类别：

# Use the log2 normalized counts; this way it is dominated by highly expressed genes
counts <- assays(se1)$log2_tmm_cpm

# get column names for counts matrix
colnames(counts)

# choose columns for plotting
data.plot <- counts[, c(1, 2, 3)]

ggplot(data.frame(data.plot), aes(x = data.plot[, 1], y = data.plot[, 2], label = rownames(counts))) +
  geom_point() +
  labs(title = "FLEX_single-cell vs FLEX_tissue", x = "FLEX_single-cell", y = "FLEX_tissue") +
  stat_cor() +
  geom_text(nudge_x = 0.3, nudge_y = 0.3)

现在我尝试使用密度散布来显示基因之间的共性：

密度散点图

# Create a function to generate density scatter plot
density_scatter_plot <- function(data, x_col, y_col, title, x_label, y_label) {
  ggplot(data, aes(x = .data[[x_col]], y = .data[[y_col]])) +
    geom_pointdensity() + scale_color_viridis() + # Use geom_pointdensity
    labs(title = title, x = x_label, y = y_label) +
    stat_cor()
}



# Create and save density scatter plots for each comparison
plot1 <- density_scatter_plot(data.plot, "FLEX_single.cell", "FLEX_tissue", "FLEX_single.cell vs FLEX_tissue", "FLEX_single.cell", "FLEX_tissue")

但是我得到这个结果，看起来很奇怪，如何修复该功能？

,

Answer 1

这只是当大量点聚集在一个位置（在本例中为原点）时所得到的外观。比如说，如果有 20,000 个点都位于原点，那么在距原点固定距离内将出现一个邻居数为 20,000 的圆形区域。然后，该值将在原点 1 带宽半径之外下降到 1,000 左右的值。即使原点圆外的密度变化相当大，也很难看到，因为与原点处的密度大小相比，它仍然很小。

我们可以通过制作一个二元正态云来复制它，并在 c(0, 0) 处添加大量额外点：

set.seed(1)

df <- as.data.frame(rbind(MASS::mvrnorm(n = 20000, mu = c(0, 0), 
                      Sigma = matrix(c(3, 3, 4, 4), ncol = 2)),
                    as.data.frame(matrix(0, ncol = 2, nrow = 20000))))


df <- setNames(df, c("x", "y"))

现在我们有：

library(ggpointdensity)

ggplot(df, aes(x, y)) +
  geom_pointdensity() +
  scale_color_viridis_c() +
  coord_cartesian(ylim = c(0, 8), x = c(0, 8), expand = FALSE)

有多种方法可以改善密度的表观变异性。您可以删除所有在源头导致问题的点：

ggplot(df[df[[1]] > 0.01 & df[[2]] > 0.01,], aes(x, y)) +
  geom_pointdensity() +
  scale_color_viridis_c() +
  coord_cartesian(ylim = c(0, 8), x = c(0, 8), expand = FALSE)

或更改

scale_color_viridis

的限制

ggplot(df, aes(x, y)) +
  geom_pointdensity() +
  scale_color_viridis_c(limits = c(0, 3000), na.value = "#f5e51f") +
  coord_cartesian(ylim = c(0, 8), x = c(0, 8), expand = FALSE)

或者使用对数色标：

ggplot(df, aes(x, y)) +
  geom_pointdensity(aes(color = after_stat(log(n_neighbors)))) +
  scale_color_viridis_c() +
  coord_cartesian(ylim = c(0, 8), x = c(0, 8), expand = FALSE)

来自 ggplot 的 R 密度散布

问题描述投票：0回答：1

1个回答

最新问题

来自 ggplot 的 R 密度散布

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1