如何将重复行的值合并为一行

问题描述 投票:0回答:2

我有一个这样的数据框:

df <- structure(list(A = c(2, 3, 1), B = c(3, 2, 1), C = c(4, 5, 1), 
    D = c(4, 4, 1), Genus = c("Ensifer", "Ensifer", "Ensifer"
    )), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-3L))

      A     B     C     D Genus  
  <dbl> <dbl> <dbl> <dbl> <chr>  
1     2     3     4     4 Ensifer
2     3     2     5     4 Ensifer
3     1     1     1     1 Ensifer

在这个数据框中,我有五列,其中四列有值,而第五列有名称,同一个名称重复多次,但我希望名称

Ensifer
变成一个,所有值加起来并成为一个就这样排

      A     B     C     D Genus  
  <dbl> <dbl> <dbl> <dbl> <chr>  
1     6     6    10     9 Ensifer

我想在 R 中这样做,因为数据太长了

我已经尝试过这段代码,但是它花费的时间太长了

count <- read.csv("count_data.csv", header=T)
shl <- aggregate(count, by=count['Genus'], sum)
r dataframe aggregate summarize
2个回答
0
投票

结合子集使用

aggregate()
功能

count <- read.csv("count_data.csv", header=T)
# Subset the data to only include the columns you need
count_subset <- count[,c("Genus", "Sample1", "Sample2", "Sample3")]
# Use the aggregate() function to group the data by "Genus" and sum the values
count_agg <- aggregate(. ~ Genus, data=count_subset, sum)
colnames(count_agg) <- c("Genus", "Sample1", "Sample2", "Sample3")


count_agg

根据您的图像更改样本值


0
投票

我们也可以这样做:

dplyr:这里我们应用

sum()
函数跨列
A:D
.

library(dplyr)
df %>% 
  summarise(across(A:D, sum), .by=Genus)

 Genus       A     B     C     D
  <chr>   <dbl> <dbl> <dbl> <dbl>
1 Ensifer     6     6    10     9

基地R1:

result <- aggregate(df[, 1:4], by = list(df$Genus), FUN = sum)
names(result)[1] <- "Genus"

Genus A B  C D
1 Ensifer 6 6 10 9

基础R2:

t(sapply(split(df[, 1:4], df$Genus), colSums))

        A B  C D
Ensifer 6 6 10 9
© www.soinside.com 2019 - 2024. All rights reserved.