我是R编程的初学者。我必须按以下条件对 R 中的数据帧进行分组:所有奇数列名称都需要根据偶数列中存在的相应值进行分组。我已附上示例图像。
我使用了从 chatGPT 获得的以下 R 脚本,但它没有给出准确的结果。尽管它正确地对值进行分组,但它会返回更多的行数,并且输出结果中的列顺序也会发生变化。
library(dplyr)
library(purrr)
library(tidyr)
group_odd_by_even <- function(df) {
# Get the column names
col_names <- colnames(df)
# Identify odd and even indexed columns
odd_cols <- col_names[seq(1, length(col_names), by = 2)]
even_cols <- col_names[seq(2, length(col_names), by = 2)]
# Initialize an empty list to store grouped data
grouped_list <- map2(odd_cols, even_cols, ~ {
# Group odd columns by their respective even columns
df %>%
select(!!sym(.x), !!sym(.y)) %>%
group_by(!!sym(.y)) %>%
summarise(!!sym(.x) := paste(!!sym(.x), collapse = ", ")) %>%
rename(!!sym(.y) := !!sym(.y))
})
# Reduce all grouped dataframes into a single dataframe by joining on even columns
grouped_df <- reduce(grouped_list, ~full_join(.x, .y, by = intersect(colnames(.x), colnames(.y))))
return(grouped_df)
}
示例输入数据,
structure(list(col1 = c("A", "B", "C", "D"), col2 = c(1, 2, 3,
4), col3 = c("A", "B", "C", "D"), col4 = c(2, 3, 2, 3), col5 = c("A",
"B", "C", "D"), col6 = c(1, 2, 3, 1), col7 = c("A", "B", "C",
"D"), col8 = c(1, 1, 1, 1), col9 = c("A", "B", "C", "D"), col10 = c(1,
1, 1, 1)), class = "data.frame", row.names = c(NA, -4L))
请帮我为此编写通用函数。预先感谢您。
我很想看到任何更简单、更短的方法。这是一种使用重塑的方法。
首先,我添加行号以进行跟踪,并重塑更长的形状。从这里,我们可以添加变量来跟踪我们所在的一对列,以及我们所在的列类型(值或组)。
然后我们可以再次重新调整宽度,以便每行都有一个分组列和一个值列。我们可以使用汇总来连接每个列对中每个组内的值。
最后再次重塑宽度。
library(tidyverse)
df |>
mutate(row = row_number()) |>
pivot_longer(-row, values_transform = as.character) |>
mutate(pair_num = (row_number() + 1) %/% 2,
type = if_else(row_number() %% 2 == 1, "val", "grp"), .by = row) |>
select(-name) |>
pivot_wider(names_from = type, values_from = value) |>
summarize(vals = paste0(val, collapse = ", "),
.by = c(pair_num, grp)) |>
mutate(row = row_number(), .by = pair_num) |>
pivot_wider(names_from = pair_num, values_from = c(grp, vals), names_vary = "slowest")
(请注意,输入数据与问题中显示的数据不同,因此结果不匹配。)
row grp_1 vals_1 grp_2 vals_2 grp_3 vals_3 grp_4 vals_4 grp_5 vals_5
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 1 A 2 A, C 1 A, D 1 A, B, C, D 1 A, B, C, D
2 2 2 B 3 B, D 2 B NA NA NA NA
3 3 3 C NA NA 3 C NA NA NA NA
4 4 4 D NA NA NA NA NA NA NA NA