如何根据偶数索引的列对奇数索引的列进行分组?

问题描述 投票:0回答:1

我是R编程的初学者。我必须按以下条件对 R 中的数据帧进行分组:所有奇数列名称都需要根据偶数列中存在的相应值进行分组。我已附上示例图像。

Example data input

Expected output data

我使用了从 chatGPT 获得的以下 R 脚本,但它没有给出准确的结果。尽管它正确地对值进行分组,但它会返回更多的行数,并且输出结果中的列顺序也会发生变化。

library(dplyr)
library(purrr)
library(tidyr)

group_odd_by_even <- function(df) {
  # Get the column names
  col_names <- colnames(df)
  
  # Identify odd and even indexed columns
  odd_cols <- col_names[seq(1, length(col_names), by = 2)]
  even_cols <- col_names[seq(2, length(col_names), by = 2)]
  
  # Initialize an empty list to store grouped data
  grouped_list <- map2(odd_cols, even_cols, ~ {
    # Group odd columns by their respective even columns
    df %>%
      select(!!sym(.x), !!sym(.y)) %>%
      group_by(!!sym(.y)) %>%
      summarise(!!sym(.x) := paste(!!sym(.x), collapse = ", ")) %>%
      rename(!!sym(.y) := !!sym(.y))
  })
  
  # Reduce all grouped dataframes into a single dataframe by joining on even columns
  grouped_df <- reduce(grouped_list, ~full_join(.x, .y, by = intersect(colnames(.x), colnames(.y))))
  
  return(grouped_df)
}

示例输入数据,

structure(list(col1 = c("A", "B", "C", "D"), col2 = c(1, 2, 3, 
4), col3 = c("A", "B", "C", "D"), col4 = c(2, 3, 2, 3), col5 = c("A", 
"B", "C", "D"), col6 = c(1, 2, 3, 1), col7 = c("A", "B", "C", 
"D"), col8 = c(1, 1, 1, 1), col9 = c("A", "B", "C", "D"), col10 = c(1, 
1, 1, 1)), class = "data.frame", row.names = c(NA, -4L))

请帮我为此编写通用函数。预先感谢您。

r dplyr
1个回答
0
投票

我很想看到任何更简单、更短的方法。这是一种使用重塑的方法。

首先,我添加行号以进行跟踪,并重塑更长的形状。从这里,我们可以添加变量来跟踪我们所在的一对列,以及我们所在的列类型(值或组)。

然后我们可以再次重新调整宽度,以便每行都有一个分组列和一个值列。我们可以使用汇总来连接每个列对中每个组内的值。

最后再次重塑宽度。

library(tidyverse)
df |>
  mutate(row = row_number()) |>
  pivot_longer(-row, values_transform = as.character) |>
  mutate(pair_num = (row_number() + 1) %/% 2, 
         type = if_else(row_number() %% 2 == 1, "val", "grp"), .by = row) |>
  select(-name) |>
  pivot_wider(names_from = type, values_from = value) |>
  summarize(vals = paste0(val, collapse = ", "),
            .by = c(pair_num, grp)) |>
  mutate(row = row_number(), .by = pair_num) |>
  pivot_wider(names_from = pair_num, values_from = c(grp, vals), names_vary = "slowest")

(请注意,输入数据与问题中显示的数据不同,因此结果不匹配。)

    row grp_1 vals_1 grp_2 vals_2 grp_3 vals_3 grp_4 vals_4     grp_5 vals_5    
  <int> <chr> <chr>  <chr> <chr>  <chr> <chr>  <chr> <chr>      <chr> <chr>     
1     1 1     A      2     A, C   1     A, D   1     A, B, C, D 1     A, B, C, D
2     2 2     B      3     B, D   2     B      NA    NA         NA    NA        
3     3 3     C      NA    NA     3     C      NA    NA         NA    NA        
4     4 4     D      NA    NA     NA    NA     NA    NA         NA    NA 
© www.soinside.com 2019 - 2024. All rights reserved.