根据其他列中值的唯一组合分配新列

问题描述 投票:0回答:1

我有一个鸟类观察记录的数据集,大约30万行,有7列。我想根据其他3个列的唯一组合创建一个新列,所有这些列都是因子变量-“ gridref”,即记录所在的1km网格正方形; “观察者”,进行观察的人和“日期”,观察的日期。我想为每一个唯一的“访问”到1公里网格正方形创建一个新列“ visit_ID”,也就是gridref,观察者和日期的每个唯一组合。

我尝试使用以下代码:

birds_raw$vid <- as.integer(interaction(birds_raw$gridref, birds_raw$observer, birds_raw$date))

这将返回以下错误消息:

Error: cannot allocate vector of size 636.1 Gb
In addition: Warning message:
In ans * length(l) : NAs produced by integer overflow

我确信必须有一种简单的方法来实现这一目标。谁能帮忙吗?

r indexing combinations factors
1个回答
0
投票

您可以通过data.table有效地完成此操作:

library(data.table)
birds_raw <-
  data.table(
    other_var = factor(c("other 1", "other 2", "other 3", "other 4")),
    gridref = factor(c("grid 1", "grid 2", "grid 1", "grid 1")),
    observer = factor(c("person 1", "person 2", "person 2", "person 1")),
    date = factor(c("date 1", "date 2", "date 1", "date 1"))
  )
birds_raw[, visit_id := .GRP, by = c("gridref", "observer", "date")][]
最新问题
© www.soinside.com 2019 - 2025. All rights reserved.