library(readxl)
Data <- read_excel("20240214.xlsx")
library(tidyverse)
Data_frame <- separate(Data, col = "FUN", into = c("Bacteria", "Protein"), remove = TRUE, sep = ";" )
View(Data_frame)
Bacteria Protein Log2quantity Sample 1 Log2quantity Sample 2 Log2quantity Sample 3 ...
clostridium ABC transporter 15.2 5.2 2.1
clostridium Kinase1 8.2 1.2 8.2
bacillus ABC transporter 5.5 8.8 24.2
bacillus Oxidoreductase 3.2 10.2 12.2
bacillus Kinase1 2.1 1.2 42.2
firmicutes Kinase1 9.9 9.2 22.2
... ... ... ... ...
我的蛋白质数据显示(1)蛋白质名称(2)细菌名称(从蛋白质推导出来)(3)蛋白质含量。
我想总结一下每个样品中蛋白质和细菌的比例。之后,由于细菌和蛋白质有太多不同的名称,我想查看简短的数据表,其中仅包含每个样本中细菌和蛋白质最丰富的前 20 种。
For instance, it would look like -
clostridium= 15%, bacillus = 5%, firmicutes= 2% ... in sample 1
clostridium= 2%, bacillus = 15%, firmicutes= 42% ...in sample 2
clostridium= 12%, bacillus = 11%, firmicutes= 6% ...in sample 3
ABC transporter= 30%, Kinase1= 15%, Oxidoreductase= 3% ...in sample 1
ABC transporter= 10%, Kinase1= 11%, Oxidoreductase= 12% ...in sample 2
ABC transporter= 20%, Kinase1= 55%, Oxidoreductase= 21% ...in sample 3
我应该如何在 R 中执行此操作并将数据导出到 Excel 文件中?因为我预计摘要数据会随着长列表而变得巨大。因此,我想在 Excel 中查看它更具可读性。
我是 R 初学者,只知道一些 ggplot2 的可视化方法。看起来它可以与 dplyr 一起使用,但我不确定应该如何开始。特别是,我想知道是否可以使用 R 将相对较大的数据导出到 Excel 文件中。
您应该按细菌和蛋白质分离数据,计算每个样本的 ,并使用过滤操作选择前 20 个样本。然后您可以将数据写入 Excel 文件。
library(readxl)
library(dplyr)
library(tidyr)
library(writexl)
Data <- read_excel("20240214.xlsx") %>%
separate("FUN", into = c("Bacteria", "Protein"), sep = ";") %>%
pivot_longer(cols = starts_with("Log2quantity"), names_to = "Sample", values_to = "Quantity")
top_20_bacteria <- Data %>%
group_by(Sample, Bacteria) %>%
summarise(Total = sum(Quantity, na.rm = TRUE), .groups = "drop") %>%
mutate(Proportion = Total / sum(Total)) %>%
slice_max(Proportion, n = 20)
top_20_protein <- Data %>%
group_by(Sample, Protein) %>%
summarise(Total = sum(Quantity, na.rm = TRUE), .groups = "drop") %>%
mutate(Proportion = Total / sum(Total)) %>%
slice_max(Proportion, n = 20)
write_xlsx(list("Top 20 Bacteria" = top_20_bacteria, "Top 20 Protein" = top_20_protein),
"Top20_Bacteria_Protein_Summary.xlsx")