Rdata.table更新通过参考加入,但更新正确的表

问题描述 投票:0回答:1
Context:data.table vignette中令人惊叹的新

juins的最后一部分说明了如何使用Merge sintax(x[i,on=.(id),...])通过参考来更新左表(也在like this so so询问中进行了描述)。这是一个简化的示例:

x = data.table(id = 1:5, newvar1=LETTERS[1:5], newvar2=5:1)
               #In practice x would have more vars: newvar2, ..., newvarN
 i = data.table(id = 1:7, var1 = c('bla','ble','bli','blo','blu','blA','blS') )

#Updating the LEFT table by reference
x[i,on = .(id),
  j = `:=`(id=id,var1=var1,newvar1=newvar1)]

#Result: the column var1 (from i) is added to x by reference
print(x)
> print(x)
id newvar1   var1
<int> <char> <char>
1:     1      A    bla
2:     2      B    ble
3:     3      C    bli
4:     4      D    blo
5:     5      E    blu

我需要相反:通过参考更新

i

x = data.table(id = 1:5, newvar1=LETTERS[1:5]) #In practice x would have more vars: newvar2, ..., newvarN i = data.table(id = 1:7,var1 = c('bla','ble','bli','blo','blu','blA','blS')) #Right join then overwrite the content of i i <- x[i,on = .(id)]

有一种方法可以通过参考来做到这一点?

在我的应用程序中(我假设为几个DT重型用户),
i

是主要数据集,拥有数百万个观察值和数十列。也就是说,我想继续添加新变量的数据集,例如来自

newvar1

newvar2
,...,
newvarN
。这样,我需要保留
x
的列和行(基数),而不是
i
。 Afaik的aforwriting
x
涉及制作
i
的副本,该副本会导致内存尖峰(〜双倍),该副本可能超过可用的RAM和crasterr.
我目前的解决方法是:

i

update1:我在data.table的github
上添加了一个相关问题
    

这项工作应该吗?

i_id <- i[,.(id)] # create an auxiliary data.table with just the 'id' var i_id <- x[i_id,on=id] #the still making the copy, but with just the 'id' columns + columns from d (instead of hundreds of columns from i) i[,newvar1:=i_id$newvar1] i[,newvar2:=i_id$newvar2] ... i[,newvarN:=i_id$newvarN] #This works, but the code is quite large (if N is large), verbose and error prone
使用

cols <- c("newvar1", "newvar2") # or setdiff(names(x), "id") i[, (cols) := x[.SD, on = "id", .SD, .SDcols = cols]] # or to avoid the double .SD. # i[, (cols) := x[i, on = "id", .SD, .SDcols = cols]] # i[, (cols) := x[.SD, on = "id", mget(cols)]] i # id var1 newvar1 newvar2 # <int> <char> <char> <char> # 1: 1 bla A F # 2: 2 ble B G # 3: 3 bli C H # 4: 4 blo D I # 5: 5 blu E J # 6: 6 blA <NA> <NA> # 7: 7 blS <NA> <NA>

r data.table memory-efficient right-join
1个回答
4
投票
最新问题
© www.soinside.com 2019 - 2025. All rights reserved.