我有以下数据集。
Email Relationship Q1 Q2 Q3 Q4
1 [email protected] Self 1 2 2 3
2 [email protected] Peer 3 3 4 5
3 [email protected] Peer 5 2 3 1
4 [email protected] Peer 4 1 2 3
5 [email protected] Peer 2 3 3 4
6 [email protected] Direct Report 3 3 4 4
7 [email protected] Direct Report 5 2 4 4
8 [email protected] Self 3 4 4 2
9 [email protected] Peer 2 2 3 4
10 [email protected] Peer 3 3 3 2
11 [email protected] Peer 2 5 5 3
12 [email protected] Direct Report 4 4 4 3
13 [email protected] Direct Report 5 3 2 1
14 [email protected] Direct Report 2 4 5 3
我想把它从长线转化为宽线 这样我就能计算出每个关系组的平均值和总数
Email Q1-Overall Q1-Self Q1-Peer Q1-Direct Report Q2-Overall Q2-Self Q2-Peer Q2-Direct Report
[email protected] 3.00 3.00 2.33 3.67 3.57 4.00 3.33 3.67
[email protected] 3.29 1.00 3.50 4.00 2.28 2.00 2.25 2.50
我试过融化它
df<-dcast(melt(Data_Long, id.vars=c("Email", "Relationship")), Email~Q1+Relationship)
但我的问题是如何进行下一步计算平均值 或者是否有更有效的方法。因为我的数据有几百个问题,有没有一种方法可以高效地将所有的数据转化?
我也尝试过 dplyr
包中的summaryise和spread命令,但没能找到一种方法来组合变量,在这些变量中创建新的变量。感谢任何建议。
下面是一个 tidyverse
解决方案。它的分组是 Email
和 Relationship
来计算列 Q*
的手段。然后再将其重塑为宽幅,用 pivot_wider
.
library(tidyverse)
Data_long %>%
group_by(Email, Relationship) %>%
summarise_at(vars(matches('^Q')), list(mean)) %>%
pivot_wider(
id_cols = Email,
names_from = Relationship,
values_from = matches('^Q')
)
## A tibble: 2 x 13
## Groups: Email [2]
# Email `Q1_Direct Repo… Q1_Peer Q1_Self `Q2_Direct Repo… Q2_Peer Q2_Self `Q3_Direct Repo…
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 othe… 3.67 2.33 3 3.67 3.33 4 3.67
#2 samp… 4 3.5 1 2.5 2.25 2 4
## … with 5 more variables: Q3_Peer <dbl>, Q3_Self <dbl>, #`Q4_Direct Report` <dbl>,
## Q4_Peer <dbl>, Q4_Self <dbl>
数据。
Data_long <- read.table(text = "
Email Relationship Q1 Q2 Q3 Q4
1 [email protected] Self 1 2 2 3
2 [email protected] Peer 3 3 4 5
3 [email protected] Peer 5 2 3 1
4 [email protected] Peer 4 1 2 3
5 [email protected] Peer 2 3 3 4
6 [email protected] 'Direct Report' 3 3 4 4
7 [email protected] 'Direct Report' 5 2 4 4
8 [email protected] Self 3 4 4 2
9 [email protected] Peer 2 2 3 4
10 [email protected] Peer 3 3 3 2
11 [email protected] Peer 2 5 5 3
12 [email protected] 'Direct Report' 4 4 4 3
13 [email protected] 'Direct Report' 5 3 2 1
14 [email protected] 'Direct Report' 2 4 5 3
", header = TRUE)
A data.table
解决方案。
library(data.table)
setDT(df)
df[, melt(.SD, id.vars = c("Email", "Relationship"))
][, dcast(.SD, Email ~ paste(variable, Relationship, sep = "-"), fun.aggregate = mean)]
Email Q1-Direct Report Q1-Peer Q1-Self Q2-Direct Report Q2-Peer Q2-Self Q3-Direct Report Q3-Peer Q3-Self Q4-Direct Report Q4-Peer Q4-Self
1: [email protected] 3.666667 2.333333 3 3.666667 3.333333 4 3.666667 3.666667 4 2.333333 3.00 2
2: [email protected] 4.000000 3.500000 1 2.500000 2.250000 2 4.000000 3.000000 2 4.000000 3.25 3
数据(下次请自行提供
df <- data.frame(
Email = rep(c("[email protected]", "[email protected]"), each = 7L),
Relationship = c(
"Self", "Peer", "Peer", "Peer", "Peer", "Direct Report",
"Direct Report", "Self", "Peer", "Peer", "Peer", "Direct Report",
"Direct Report", "Direct Report"
),
Q1 = c(1L, 3L, 5L, 4L, 2L, 3L, 5L, 3L, 2L, 3L, 2L, 4L, 5L, 2L),
Q2 = c(2L, 3L, 2L, 1L, 3L, 3L, 2L, 4L, 2L, 3L, 5L, 4L, 3L, 4L),
Q3 = c(2L, 4L, 3L, 2L, 3L, 4L, 4L, 4L, 3L, 3L, 5L, 4L, 2L, 5L),
Q4 = c(3L, 5L, 1L, 3L, 4L, 4L, 4L, 2L, 4L, 2L, 3L, 3L, 1L, 3L)
)