我在 R 中有一个数据框,我需要使用数值将特定列中的值向上移动。用作向上移动值的输入的数值与分组变量相关联,并且通常在组与组之间不同。这是模拟数据框的示例
mock data frame
df<-data.frame(ID=c("a","a","a","a","a","b","b","b","b","b","c","c","c","c","c"),
var_description = c("description 1", "description 2","description 3","description 4","description 5",
"description 1", "description 2","description 3","description 4","description 5",
"description 1", "description 2","description 3","description 4","description 5"),
var_value = c(NA,NA,NA,"desc 1 val","desc 2 val",
NA,"desc 1 val","desc 2 val","desc 3 val","desc 4 val",
NA,NA,NA,NA,"desc 1 val"),
shift_val = c(3,3,3,3,3,
1,1,1,1,1,
4,4,4,4,4))
这是当前的工作解决方案:-
#getting unique vals
unique_id = unique(df$ID)
#initialising dataframe to collect results
df_results<-data.frame()
#function to shift values up
shift <- function(x,n){
c(x[-(seq(n))],rep(NA,n))
}
#for loop
for (i in unique_id) {
#filtering by each ID
one_id<-df%>%filter(ID==i)
#getting value to shift values by
Move_by_val = unique(one_id$shift_val)
#shifting
one_id$new_var_value<-shift(one_id$var_value, Move_by_val)
#binding one_id onto df_results
df_results<-rbind(df_results,one_id)
}
head(df_results,10)
# ID var_description var_value shift_val new_var_value
# 1 a description 1 <NA> 3 desc 1 val
# 2 a description 2 <NA> 3 desc 2 val
# 3 a description 3 <NA> 3 <NA>
# 4 a description 4 desc 1 val 3 <NA>
# 5 a description 5 desc 2 val 3 <NA>
# 6 b description 1 <NA> 1 desc 1 val
# 7 b description 2 desc 1 val 1 desc 2 val
# 8 b description 3 desc 2 val 1 desc 3 val
# 9 b description 4 desc 3 val 1 desc 4 val
# 10 b description 5 desc 4 val 1 <NA>
使用与每个唯一
shift_val
关联的 df$ID
,我可以将 var_value
向上移动,其中值与 var_description
匹配,如new_var_value
列中所示。
我当前使用的解决方案有效,但当应用于更大的数据帧(例如 ~ >100k 行)时,它会变得缓慢且低效。
有人可以推荐一种替代解决方案,它可以提供与我相同的输出,但有可能更高效吗?也许
data.table
或 purrr
解决方案可能是最好的,但我对这两者都不熟悉。如有任何帮助,我们将不胜感激!
使用您的
shift()
尝试
shift = \(x, k) c(x[-seq(k)], rep(NA, k))
lapply(split(df, df$ID),
\(l) transform(l, new_var_value = shift(var_value, k = shift_val[1L]))) |>
do.call(what = rbind) |> `row.names<-`(NULL)
给予
ID var_description var_value shift_val new_var_value
1 a description 1 <NA> 3 desc 1 val
2 a description 2 <NA> 3 desc 2 val
3 a description 3 <NA> 3 <NA>
4 a description 4 desc 1 val 3 <NA>
5 a description 5 desc 2 val 3 <NA>
6 b description 1 <NA> 1 desc 1 val
7 b description 2 desc 1 val 1 desc 2 val
8 b description 3 desc 2 val 1 desc 3 val
9 b description 4 desc 3 val 1 desc 4 val
10 b description 5 desc 4 val 1 <NA>
11 c description 1 <NA> 4 desc 1 val
12 c description 2 <NA> 4 <NA>
13 c description 3 <NA> 4 <NA>
14 c description 4 <NA> 4 <NA>
15 c description 5 desc 1 val 4 <NA>