如何正确使用pivot_wider()来对齐两个变量的值?

问题描述 投票:0回答:2

我有一个数据集如下。

library(dplyr)
library(tidyr)

df= tibble::tibble(
    variety=rep(c("CV1", "CV2", "CV3"), each=16L),
    irrigation=rep(rep(c("yes", "no"), 3), each=8L),
    fertilizer=rep(rep(c("Organic", "Urea"), 6), each=4L),
    reps=c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 3, 4, 3, 1, 3, 4, 3, 2, 1, 4,
    2, 3, 1, 4, 1, 3, 1, 2, 2, 4, 1, 2, 1, 4, 2, 3, 1, 4, 2, 3, 2, 4),
    yield=c(8.379842, 8.058658, 9.73285, 9.224371999999999, NA, 6.996108000000001,
    9.865782, 7.112071666666666, 5.968758, 8.976471666666667, 7.980724, 9.35065,
    5.5111574999999995, 6.998728, 6.164252, 5.118412857142857, 7.748125, 8.58071,
    NA, NA, 7.673354999999999, 7.91948, NA, NA, 11.190445, 8.463484999999999,
    9.61818, 10.89841, 7.83943, 8.44905, 9.844165, 9.98026, 10.130675, 9.59432,
    NA, NA, 9.502525, 9.216965, NA, NA, 7.807259999999999, 9.94434, 7.92808,
    11.88664, 10.700185000000001, 10.723835000000001, 11.363140000000001,
    11.846934999999998),
    nutrients=c(0.42549600000000004, 0.417924, 0.47264, 0.45002, NA, 0.381154, 0.484084,
    0.3597316666666666, 0.32555, 0.45681666666666665, 0.38164600000000004,
    0.456822, 0.30655, 0.363892, 0.350876, 0.30200857142857146, 0.26754,
    0.30954499999999996, NA, NA, 0.328395, 0.30893, NA, NA, 0.37877, 0.33532,
    0.40417000000000003, 0.4581, 0.32077500000000003, 0.33331500000000003,
    0.39925, 0.40179000000000004, 0.40585499999999997, 0.339465, NA, NA, 0.339545,
    0.34077500000000005, NA, NA, 0.3227, 0.37770000000000004, 0.34663, 0.48564,
    0.43601500000000004, 0.38200500000000004, 0.47248500000000004, 0.506255),
)
head(df,5)
variety irrigation  fertilizer  reps  yield     nutrients
CV1     yes         Organic     1     8.379842  0.425496
CV1     yes         Organic     2     8.058658  0.417924
CV1     yes         Organic     3     9.732850  0.472640
CV1     yes         Organic     4     9.224372  0.450020
CV1     yes         Urea        1     NA        NA
.
.
.

我想将有机肥料的产量与尿素肥料的产量以及两种肥料之间的养分对齐,以创建两种不同产量和养分之间的回归图。最初,我尝试使用

pivot_wider()

df2= data.frame(df %>%
                group_by(variety, irrigation) %>%
                pivot_wider(names_from=fertilizer, values_from=nutrients))
head(df2,8)
variety irrigation  reps     nutrients  Organic   Urea
1   CV1     yes         1    0.4254960  8.379842  NA
2   CV1     yes         2    0.4179240  8.058658  NA
3   CV1     yes         3    0.4726400  9.732850  NA
4   CV1     yes         4    0.4500200  9.224372  NA
5   CV1     yes         1    NA         NA        NA
6   CV1     yes         2    0.3811540  NA        6.996108
7   CV1     yes         3    0.4840840  NA        9.865782
8   CV1     yes         4    0.3597317  NA        7.112072
    .
    .
    .

目前,有机肥和尿素的产量并不一致。我的目标是像下面这样的布局。

variety irrigation  reps     nutrients  Organic   Urea
1   CV1     yes         1    0.4254960  8.379842  NA
2   CV1     yes         2    0.4179240  8.058658  6.996108
3   CV1     yes         3    0.4726400  9.732850  9.865782
4   CV1     yes         4    0.4500200  9.224372  7.112072
    .
    .
    .

如何解决这个问题?还有,有什么办法可以同时调动产量和养分吗?

谢谢,

r dplyr pivot tidyr transpose
2个回答
0
投票

这是我最好的猜测。在某些情况下,您似乎对相同的品种/灌溉/重复/肥料进行了多次观察。为了解决这个问题,我添加了一个变量

obs
来区分这些。

df |>
  arrange(variety, irrigation, reps) %>% 
  mutate(obs = row_number(), .by = c(variety, irrigation, reps, fertilizer)) %>%
  pivot_wider(names_from = fertilizer, values_from = c(yield, nutrients))

结果

# A tibble: 28 × 8
   variety irrigation  reps   obs yield_Organic yield_Urea nutrients_Organic nutrients_Urea
   <chr>   <chr>      <dbl> <int>         <dbl>      <dbl>             <dbl>          <dbl>
 1 CV1     no             1     1          5.97       5.51             0.326          0.307
 2 CV1     no             2     1          8.98       7.00             0.457          0.364
 3 CV1     no             3     1          7.98       6.16             0.382          0.351
 4 CV1     no             4     1          9.35       5.12             0.457          0.302
 5 CV1     yes            1     1          8.38      NA                0.425         NA    
 6 CV1     yes            2     1          8.06       7.00             0.418          0.381
 7 CV1     yes            3     1          9.73       9.87             0.473          0.484
 8 CV1     yes            4     1          9.22       7.11             0.450          0.360
 9 CV2     no             1     1         11.2        7.84             0.379          0.321
10 CV2     no             1     2         NA          9.84            NA              0.399
# ℹ 18 more rows

0
投票

我不怀疑

pivot_wider
是正确的功能。我就是这样解决的:

wu <- filter(df, fertilizer != 'Urea')
hu <- filter(df, fertilizer == 'Urea')
fr <- full_join(wu, hu, by=c("variety"="variety", "irrigation"="irrigation", "reps"="reps"), suffix=c("a","b"))
fr2 <- select(fr, "variety", "irrigation", "reps", "yielda", "nutrientsa", "yieldb")
final <- rename(fr2, `Organic`=`yielda`,`nutrients`=`nutrientsa`,`Urea`=`yieldb`)
© www.soinside.com 2019 - 2024. All rights reserved.