juins的最后一部分说明了如何使用Merge sintax(x[i,on=.(id),...]
)通过参考来更新左表(也在like this so so询问中进行了描述)。这是一个简化的示例:
x = data.table(id = 1:5, newvar1=LETTERS[1:5], newvar2=5:1)
#In practice x would have more vars: newvar2, ..., newvarN
i = data.table(id = 1:7, var1 = c('bla','ble','bli','blo','blu','blA','blS') )
#Updating the LEFT table by reference
x[i,on = .(id),
j = `:=`(id=id,var1=var1,newvar1=newvar1)]
#Result: the column var1 (from i) is added to x by reference
print(x)
> print(x)
id newvar1 var1
<int> <char> <char>
1: 1 A bla
2: 2 B ble
3: 3 C bli
4: 4 D blo
5: 5 E blu
我需要相反:通过参考更新i
x = data.table(id = 1:5, newvar1=LETTERS[1:5])
#In practice x would have more vars: newvar2, ..., newvarN
i = data.table(id = 1:7,var1 = c('bla','ble','bli','blo','blu','blA','blS'))
#Right join then overwrite the content of i
i <- x[i,on = .(id)]
在我的应用程序中(我假设为几个DT重型用户),
i
是主要数据集,拥有数百万个观察值和数十列。也就是说,我想继续添加新变量的数据集,例如来自
newvar1
,
newvar2
,...,newvarN
。这样,我需要保留x
的列和行(基数),而不是i
。
Afaik的aforwritingx
涉及制作i
的副本,该副本会导致内存尖峰(〜双倍),该副本可能超过可用的RAM和crasterr.我目前的解决方法是:
i
update1:我在data.table的github
上添加了一个相关问题这项工作应该吗?
i_id <- i[,.(id)] # create an auxiliary data.table with just the 'id' var
i_id <- x[i_id,on=id] #the still making the copy, but with just the 'id' columns + columns from d (instead of hundreds of columns from i)
i[,newvar1:=i_id$newvar1]
i[,newvar2:=i_id$newvar2]
...
i[,newvarN:=i_id$newvarN] #This works, but the code is quite large (if N is large), verbose and error prone
使用