按组将 R 数据框中的值向上移动

问题描述 投票:0回答:1

我在 R 中有一个数据框,我需要使用数值将特定列中的值向上移动。用作向上移动值的输入的数值与分组变量相关联,并且通常在组与组之间不同。这是模拟数据框的示例

mock data frame
df<-data.frame(ID=c("a","a","a","a","a","b","b","b","b","b","c","c","c","c","c"),
               var_description = c("description 1", "description 2","description 3","description 4","description 5",
                                   "description 1", "description 2","description 3","description 4","description 5",
                                   "description 1", "description 2","description 3","description 4","description 5"),
               var_value = c(NA,NA,NA,"desc 1 val","desc 2 val",
                             NA,"desc 1 val","desc 2 val","desc 3 val","desc 4 val",
                             NA,NA,NA,NA,"desc 1 val"),
               shift_val = c(3,3,3,3,3,
                             1,1,1,1,1,
                             4,4,4,4,4))

这是当前的工作解决方案:-


#getting unique vals
unique_id = unique(df$ID)

#initialising dataframe to collect results
df_results<-data.frame()


#function to shift values up
shift <- function(x,n){
  c(x[-(seq(n))],rep(NA,n))
}


#for loop

for (i in unique_id) {
  
  #filtering by each ID
  one_id<-df%>%filter(ID==i)
  
  #getting value to shift values by
  Move_by_val = unique(one_id$shift_val)
  #shifting
  one_id$new_var_value<-shift(one_id$var_value, Move_by_val)
  
  #binding one_id onto df_results
  
  df_results<-rbind(df_results,one_id)
  
}


head(df_results,10)

# ID var_description  var_value shift_val new_var_value
# 1   a   description 1       <NA>         3    desc 1 val
# 2   a   description 2       <NA>         3    desc 2 val
# 3   a   description 3       <NA>         3          <NA>
# 4   a   description 4 desc 1 val         3          <NA>
# 5   a   description 5 desc 2 val         3          <NA>
# 6   b   description 1       <NA>         1    desc 1 val
# 7   b   description 2 desc 1 val         1    desc 2 val
# 8   b   description 3 desc 2 val         1    desc 3 val
# 9   b   description 4 desc 3 val         1    desc 4 val
# 10  b   description 5 desc 4 val         1          <NA>


使用与每个唯一

shift_val
关联的
df$ID
,我可以将
var_value
向上移动,其中值与
var_description
匹配,如
new_var_value
列中所示。

我当前使用的解决方案有效,但当应用于更大的数据帧(例如 ~ >100k 行)时,它会变得缓慢且低效。

有人可以推荐一种替代解决方案,它可以提供与我相同的输出,但有可能更高效吗?也许

data.table
purrr
解决方案可能是最好的,但我对这两者都不熟悉。如有任何帮助,我们将不胜感激!

r for-loop dplyr data.table purrr
1个回答
0
投票

使用您的

shift()
尝试

shift = \(x, k) c(x[-seq(k)], rep(NA, k))
lapply(split(df, df$ID), 
  \(l) transform(l, new_var_value = shift(var_value, k = shift_val[1L]))) |>
  do.call(what = rbind) |> `row.names<-`(NULL)

给予

   ID var_description  var_value shift_val new_var_value
1   a   description 1       <NA>         3    desc 1 val
2   a   description 2       <NA>         3    desc 2 val
3   a   description 3       <NA>         3          <NA>
4   a   description 4 desc 1 val         3          <NA>
5   a   description 5 desc 2 val         3          <NA>
6   b   description 1       <NA>         1    desc 1 val
7   b   description 2 desc 1 val         1    desc 2 val
8   b   description 3 desc 2 val         1    desc 3 val
9   b   description 4 desc 3 val         1    desc 4 val
10  b   description 5 desc 4 val         1          <NA>
11  c   description 1       <NA>         4    desc 1 val
12  c   description 2       <NA>         4          <NA>
13  c   description 3       <NA>         4          <NA>
14  c   description 4       <NA>         4          <NA>
15  c   description 5 desc 1 val         4          <NA>
© www.soinside.com 2019 - 2024. All rights reserved.