基于多个现有列顺序生成列

问题描述 投票:4回答:3

我有一个如下所示的数据框:

 df <- data.frame(project = c("A", "B"),
                  no_dwellings = c(150, 180),
                  first_occupancy = c(2020, 2019))

  project no_dwellings first_occupancy
1       A          150            2020
2       B          180            2019

project是一个标识住宅建筑区域的专栏,no_dwellings表示这些区域最终建造了多少住宅,first_occupancy是关于第一批居民何时开始搬入新建公寓的估计。

我需要将这些信息纳入人口预测。我们最好的估计是每年(从first occupancy开始),​​60个住宅被搬入。因此,我需要按顺序生成组合来自first_occupancyno_dwellings的信息的列,以指示每年可能搬入多少住宅。由于建造的住宅数量不一定除以60,因此剩余部分需要放入相应项目的最后一栏。

这就是我期望我的数据框看起来像进一步处理:

  project no_dwellings first_occupancy year_2019 year_2020 year_2021 year_2022
1       A          150            2020         0        60        60        30
2       B          180            2019        60        60        60         0
r dplyr
3个回答
5
投票

使用data.table-package,您可以按如下方式处理:

library(data.table)

setDT(df)[, .(yr = first_occupancy:(first_occupancy + no_dwellings %/% 60),
              dw = c(rep(60, no_dwellings %/% 60), no_dwellings %% 60))
          , by = .(project, no_dwellings, first_occupancy)
          ][, dcast(.SD, project + no_dwellings + first_occupancy ~ paste0('year_',yr), value.var = 'dw', fill = 0)]

这使:

   project no_dwellings first_occupancy year_2019 year_2020 year_2021 year_2022
1:       A          150            2020         0        60        60        30
2:       B          180            2019        60        60        60         0

tidyverse相同的逻辑:

library(dplyr)
library(tidyr)

df %>% 
  group_by(project) %>% 
  do(data.frame(no_dwellings = .$no_dwellings, first_occupancy = .$first_occupancy,
                yr = paste0('year_',.$first_occupancy:(.$first_occupancy + .$no_dwellings %/% 60)),
                dw = c(rep(60, .$no_dwellings %/% 60), .$no_dwellings %% 60))) %>% 
  spread(yr, dw, fill = 0)

3
投票

生成所需内容的长数据框非常简单,我们可以使用make_pop_df。那么你要做的就是在mutate调用中使用该函数,将结果数据帧存储在非常方便的'list columns'中,这是tibbles允许的,使用unnest从列表列中获取数据帧,然后tidyr::spread显示广泛的数据。

library(tidyverse)

make_pop_df <- function(no_dwellings, first_year, decay = -60) {
    seq(from = no_dwellings, to = 0, by = decay) %>%
    tibble(pop  = ., year = first_year + 1:length(.) - 1
    )
}

df %>%
    group_by(project) %>% 
    mutate(pop_df = list(make_pop_df(no_dwellings, first_occupancy))) %>% 
    unnest(pop_df) %>%
    spread(key = year, value = pop)

2
投票

使用函数创建所有年份然后填充数字的另一个complete解决方案。

library(dplyr)
library(tidyr)

df2 <- df %>%
  mutate(year = first_occupancy) %>%
  group_by(project) %>%
  complete(nesting(no_dwellings, first_occupancy), 
         year = full_seq(c(year, min(year) + unique(no_dwellings) %/% 60), period = 1)) %>%
  mutate(number = c(rep(60, unique(no_dwellings) %/% 60), unique(no_dwellings) %% 60),
         year = paste("year", year, sep = "_")) %>%
  spread(year, number, fill = 0) %>%
  ungroup()
df2
# # A tibble: 2 x 7
#   project no_dwellings first_occupancy year_2019 year_2020 year_2021 year_2022
#   <fct>          <dbl>           <dbl>     <dbl>     <dbl>     <dbl>     <dbl>
# 1 A               150.           2020.        0.       60.       60.       30.
# 2 B               180.           2019.       60.       60.       60.        0.
© www.soinside.com 2019 - 2024. All rights reserved.