我有一个数据集如下。
library(dplyr)
library(tidyr)
df= tibble::tibble(
variety=rep(c("CV1", "CV2", "CV3"), each=16L),
irrigation=rep(rep(c("yes", "no"), 3), each=8L),
fertilizer=rep(rep(c("Organic", "Urea"), 6), each=4L),
reps=c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 3, 4, 3, 1, 3, 4, 3, 2, 1, 4,
2, 3, 1, 4, 1, 3, 1, 2, 2, 4, 1, 2, 1, 4, 2, 3, 1, 4, 2, 3, 2, 4),
yield=c(8.379842, 8.058658, 9.73285, 9.224371999999999, NA, 6.996108000000001,
9.865782, 7.112071666666666, 5.968758, 8.976471666666667, 7.980724, 9.35065,
5.5111574999999995, 6.998728, 6.164252, 5.118412857142857, 7.748125, 8.58071,
NA, NA, 7.673354999999999, 7.91948, NA, NA, 11.190445, 8.463484999999999,
9.61818, 10.89841, 7.83943, 8.44905, 9.844165, 9.98026, 10.130675, 9.59432,
NA, NA, 9.502525, 9.216965, NA, NA, 7.807259999999999, 9.94434, 7.92808,
11.88664, 10.700185000000001, 10.723835000000001, 11.363140000000001,
11.846934999999998),
nutrients=c(0.42549600000000004, 0.417924, 0.47264, 0.45002, NA, 0.381154, 0.484084,
0.3597316666666666, 0.32555, 0.45681666666666665, 0.38164600000000004,
0.456822, 0.30655, 0.363892, 0.350876, 0.30200857142857146, 0.26754,
0.30954499999999996, NA, NA, 0.328395, 0.30893, NA, NA, 0.37877, 0.33532,
0.40417000000000003, 0.4581, 0.32077500000000003, 0.33331500000000003,
0.39925, 0.40179000000000004, 0.40585499999999997, 0.339465, NA, NA, 0.339545,
0.34077500000000005, NA, NA, 0.3227, 0.37770000000000004, 0.34663, 0.48564,
0.43601500000000004, 0.38200500000000004, 0.47248500000000004, 0.506255),
)
head(df,5)
variety irrigation fertilizer reps yield nutrients
CV1 yes Organic 1 8.379842 0.425496
CV1 yes Organic 2 8.058658 0.417924
CV1 yes Organic 3 9.732850 0.472640
CV1 yes Organic 4 9.224372 0.450020
CV1 yes Urea 1 NA NA
.
.
.
我想将有机肥料的产量与尿素肥料的产量以及两种肥料之间的养分对齐,以创建两种不同产量和养分之间的回归图。最初,我尝试使用
pivot_wider()
。
df2= data.frame(df %>%
group_by(variety, irrigation) %>%
pivot_wider(names_from=fertilizer, values_from=nutrients))
head(df2,8)
variety irrigation reps nutrients Organic Urea
1 CV1 yes 1 0.4254960 8.379842 NA
2 CV1 yes 2 0.4179240 8.058658 NA
3 CV1 yes 3 0.4726400 9.732850 NA
4 CV1 yes 4 0.4500200 9.224372 NA
5 CV1 yes 1 NA NA NA
6 CV1 yes 2 0.3811540 NA 6.996108
7 CV1 yes 3 0.4840840 NA 9.865782
8 CV1 yes 4 0.3597317 NA 7.112072
.
.
.
目前,有机肥和尿素的产量并不一致。我的目标是像下面这样的布局。
variety irrigation reps nutrients Organic Urea
1 CV1 yes 1 0.4254960 8.379842 NA
2 CV1 yes 2 0.4179240 8.058658 6.996108
3 CV1 yes 3 0.4726400 9.732850 9.865782
4 CV1 yes 4 0.4500200 9.224372 7.112072
.
.
.
如何解决这个问题?还有,有什么办法可以同时调动产量和养分吗?
谢谢,
这是我最好的猜测。在某些情况下,您似乎对相同的品种/灌溉/重复/肥料进行了多次观察。为了解决这个问题,我添加了一个变量
obs
来区分这些。
df |>
arrange(variety, irrigation, reps) %>%
mutate(obs = row_number(), .by = c(variety, irrigation, reps, fertilizer)) %>%
pivot_wider(names_from = fertilizer, values_from = c(yield, nutrients))
结果
# A tibble: 28 × 8
variety irrigation reps obs yield_Organic yield_Urea nutrients_Organic nutrients_Urea
<chr> <chr> <dbl> <int> <dbl> <dbl> <dbl> <dbl>
1 CV1 no 1 1 5.97 5.51 0.326 0.307
2 CV1 no 2 1 8.98 7.00 0.457 0.364
3 CV1 no 3 1 7.98 6.16 0.382 0.351
4 CV1 no 4 1 9.35 5.12 0.457 0.302
5 CV1 yes 1 1 8.38 NA 0.425 NA
6 CV1 yes 2 1 8.06 7.00 0.418 0.381
7 CV1 yes 3 1 9.73 9.87 0.473 0.484
8 CV1 yes 4 1 9.22 7.11 0.450 0.360
9 CV2 no 1 1 11.2 7.84 0.379 0.321
10 CV2 no 1 2 NA 9.84 NA 0.399
# ℹ 18 more rows
我不怀疑
pivot_wider
是正确的功能。我就是这样解决的:
wu <- filter(df, fertilizer != 'Urea')
hu <- filter(df, fertilizer == 'Urea')
fr <- full_join(wu, hu, by=c("variety"="variety", "irrigation"="irrigation", "reps"="reps"), suffix=c("a","b"))
fr2 <- select(fr, "variety", "irrigation", "reps", "yielda", "nutrientsa", "yieldb")
final <- rename(fr2, `Organic`=`yielda`,`nutrients`=`nutrientsa`,`Urea`=`yieldb`)