我有这个伪批量单细胞数据集。
se1 <- MakePseudoBulk(scData, group.var = "sample_type", assay = "RNA", sample_id = NA, verbose = F)
我现在在散点图上关联两个不同的类别:
# Use the log2 normalized counts; this way it is dominated by highly expressed genes
counts <- assays(se1)$log2_tmm_cpm
# get column names for counts matrix
colnames(counts)
# choose columns for plotting
data.plot <- counts[, c(1, 2, 3)]
ggplot(data.frame(data.plot), aes(x = data.plot[, 1], y = data.plot[, 2], label = rownames(counts))) +
geom_point() +
labs(title = "FLEX_single-cell vs FLEX_tissue", x = "FLEX_single-cell", y = "FLEX_tissue") +
stat_cor() +
geom_text(nudge_x = 0.3, nudge_y = 0.3)
现在我尝试使用密度散布来显示基因之间的共性:
密度散点图
# Create a function to generate density scatter plot
density_scatter_plot <- function(data, x_col, y_col, title, x_label, y_label) {
ggplot(data, aes(x = .data[[x_col]], y = .data[[y_col]])) +
geom_pointdensity() + scale_color_viridis() + # Use geom_pointdensity
labs(title = title, x = x_label, y = y_label) +
stat_cor()
}
# Create and save density scatter plots for each comparison
plot1 <- density_scatter_plot(data.plot, "FLEX_single.cell", "FLEX_tissue", "FLEX_single.cell vs FLEX_tissue", "FLEX_single.cell", "FLEX_tissue")
但是我得到这个结果,看起来很奇怪,如何修复该功能?
这只是当大量点聚集在一个位置(在本例中为原点)时所得到的外观。比如说,如果有 20,000 个点都位于原点,那么在距原点固定距离内将出现一个邻居数为 20,000 的圆形区域。然后,该值将在原点 1 带宽半径之外下降到 1,000 左右的值。即使原点圆外的密度变化相当大,也很难看到,因为与原点处的密度大小相比,它仍然很小。
我们可以通过制作一个二元正态云来复制它,并在 c(0, 0) 处添加大量额外点:
set.seed(1)
df <- as.data.frame(rbind(MASS::mvrnorm(n = 20000, mu = c(0, 0),
Sigma = matrix(c(3, 3, 4, 4), ncol = 2)),
as.data.frame(matrix(0, ncol = 2, nrow = 20000))))
df <- setNames(df, c("x", "y"))
现在我们有:
library(ggpointdensity)
ggplot(df, aes(x, y)) +
geom_pointdensity() +
scale_color_viridis_c() +
coord_cartesian(ylim = c(0, 8), x = c(0, 8), expand = FALSE)
有多种方法可以改善密度的表观变异性。您可以删除所有在源头导致问题的点:
ggplot(df[df[[1]] > 0.01 & df[[2]] > 0.01,], aes(x, y)) +
geom_pointdensity() +
scale_color_viridis_c() +
coord_cartesian(ylim = c(0, 8), x = c(0, 8), expand = FALSE)
或更改
scale_color_viridis
的限制
ggplot(df, aes(x, y)) +
geom_pointdensity() +
scale_color_viridis_c(limits = c(0, 3000), na.value = "#f5e51f") +
coord_cartesian(ylim = c(0, 8), x = c(0, 8), expand = FALSE)
或者使用对数色标:
ggplot(df, aes(x, y)) +
geom_pointdensity(aes(color = after_stat(log(n_neighbors)))) +
scale_color_viridis_c() +
coord_cartesian(ylim = c(0, 8), x = c(0, 8), expand = FALSE)