R-Dataframe:在所有DF上按行平均,在其他列上按条件平均

问题描述 投票:0回答:2

我从我想要的代码开始(所有代码都是用示例编写的]

df <- data.frame(comp = c(10, 12, 14, 17, 17),
                 val = c(0, 5, 10, 15, 20),
                 cond_inf = c(8, 9.6, 11.2, 13.6, 13.6),
                 cond_sup = c(12, 14.4, 16.8, 20.4, 20.4),
                 mean_cond_text1 = c("Average of VAL lines whose COMP is between 8 12", 
                                     "Average of VAL lines whose COMP is between 9.6 14.4",
                                     "Average of VAL lines whose COMP is between 11.2 16.8",
                                     "Average of VAL lines whose COMP is between 13.6 20.4", 
                                     "Average of VAL lines whose COMP is between 13.6 20.4"),
                 mean_cond_text2 = c("(val_row1+val_row2)/2", "(val_row1+val_row2+val_row3)/3", "(val_row2+val_row3)/2", "(val_row3+val_row4+val_row5)/2", "(val_row3+val_row4+val_row5)/2)"),
                 mean_cond_text3 = c("(0+5)/2", "(0+5+10)/3", "(5+10)/2", "(10+15+20)/3", "(10+15+20)/3)"),
                 mean_cond_num = c((0+5)/2, (0+5+10)/3, (5+10)/2, (10+15+20)/3, (10+15+20)/3))

我希望在数据帧的每一行上计算列VAL的平均值,对于所有COMP比较值都包含在我计算平均值的行的COND_INF-COND_SUP区间中的所有行。因此,在数据框的每一行上都有一个平均值可以计算。

在数据帧中,每行始终填充4列

COMP =将在其中应用条件以在计算平均值时考虑或不考虑线的列

VAL =如果考虑线,将用于计算平均值的值

COND_INF =下限值(COMP的-20%),其COMP必须等于或等于更高的值

COND_SUP =上限(COM的+ 20%),其COMP必须等于或小于下限

谢谢您的帮助,我迷路了...

r dataframe conditional-statements mean
2个回答
0
投票

不完全确定所需的方法,但这似乎与您要寻找的方法接近。

看您的数据框,不清楚如何计算第3至5行。例如,第3行的comp为14。这应该在第2-5行的范围内,而不仅仅是2和3?第4行和第5行的范围是(13.6,20.4),应该包括在comp为14的计算中吗?对于第4行和第5行,我也得到了不同的平均值。

让我知道我的理解是否正确。根据到目前为止的印象,这是一种方法。我怀疑使用data.tablesqldf等还有更好的替代方法

df <- data.frame(comp = c(10, 12, 14, 17, 17),
                 val = c(0, 5, 10, 15, 20),
                 cond_inf = c(8, 9.6, 11.2, 13.6, 13.6),
                 cond_sup = c(12, 14.4, 16.8, 20.4, 20.4))

library(dplyr)

# Add index for row number
df$idx <- seq(1, nrow(df))

# Split dataframe into comp and index and look up table with values and range
df1 <- df[c(1,5)]
df2 <- df[2:4]

# Expand grid to get multiple combinations and filter to those where comp in range 
expand_grid(df1, df2) %>%
  filter(between(comp, cond_inf, cond_sup)) %>%
  group_by(idx) %>%
  mutate(mean_cond_num = mean(val)) %>%
  right_join(df)

   comp   idx   val cond_inf cond_sup mean_cond_num
  <dbl> <int> <dbl>    <dbl>    <dbl>         <dbl>
1    10     1     0      8       12             2.5
2    12     2     5      9.6     14.4           5  
3    14     3    10     11.2     16.8          12.5
4    17     4    15     13.6     20.4          17.5
5    17     5    20     13.6     20.4          17.5

0
投票

谢谢您的帮助。有了你的想法,我做到了

df <- data.frame(comp = c(10, 12, 14, 17, 17),
                 val = c(0, 5, 10, 15, 20),
                 cond_inf = c(8, 9.6, 11.2, 13.6, 13.6),
                 cond_sup = c(12, 14.4, 16.8, 20.4, 20.4),
                 mean_cond_num = c((0+5)/2, (0+5+10)/3, (5+10)/2, (10+15+20)/3, (10+15+20)/3))

df$id <- seq(1, nrow(df))
df2 <- sqldf("SELECT a.*, b.val as val2, b.cond_inf as cond_inf2, b.cond_sup as cond_sup2
       FROM df as a, df as b
       where a.cond_inf <= b.comp
          and a.cond_sup >= b.comp")

df3 <- df2 %>%
  group_by(id, mean_cond_num) %>%
  summarise(mmoy = mean(val2))

它有效,我必须尝试对我的真实数据进行计算时间确定。如果还可以,我会再解决。谢谢

© www.soinside.com 2019 - 2024. All rights reserved.