将所有变量都包含在tsibble公式中

问题描述 投票:1回答:1

我想使用tsibble软件包拟合线性回归模型,并且我想在分析中包含一堆虚拟变量。示例数据集如下:

library(tsibble)
library(dplyr)
library(fable)

ex = structure(list(id = c("KEY1", "KEY1", "KEY1", "KEY1", "KEY1", 
"KEY1", "KEY1", "KEY1", "KEY1", "KEY1", "KEY1", "KEY1", "KEY1", 
"KEY1", "KEY1"), sales = c(0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0), date = structure(c(15003, 15004, 15005, 15006, 15007, 
15008, 15009, 15010, 15011, 15012, 15013, 15014, 15015, 15016, 
15017), class = "Date"), wday = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 
1L, 2L, 3L, 4L, 5L, 6L, 7L, 1L), dummy_1 = c(0, 0, 0, 1, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0), dummy_2 = c(0, 0, 0, 0, 0, 0, 1, 
0, 0, 0, 0, 0, 0, 0, 0), dummy_3 = c(0, 0, 1, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, -15L), key = structure(list(
    id = "KEY1", .rows = list(1:15)), row.names = c(NA, -1L), class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE), index = structure("date", ordered = TRUE), index2 = "date", interval = structure(list(
    year = 0, quarter = 0, month = 0, week = 0, day = 1, hour = 0, 
    minute = 0, second = 0, millisecond = 0, microsecond = 0, 
    nanosecond = 0, unit = 0), class = "interval"), class = c("tbl_ts", 
"tbl_df", "tbl", "data.frame"))

> ex
# A tsibble: 15 x 7 [1D]
# Key:       id [1]
   id    sales date        wday dummy_1 dummy_2 dummy_3
   <chr> <dbl> <date>     <int>   <dbl>   <dbl>   <dbl>
 1 KEY1      0 2011-01-29     1       0       0       0
 2 KEY1      5 2011-01-30     2       0       0       0
 3 KEY1      0 2011-01-31     3       0       0       1
 4 KEY1      0 2011-02-01     4       1       0       0
 5 KEY1      0 2011-02-02     5       0       0       0
 6 KEY1      0 2011-02-03     6       0       0       0
 7 KEY1      0 2011-02-04     7       0       1       0
 8 KEY1      0 2011-02-05     1       0       0       0
 9 KEY1      0 2011-02-06     2       0       0       0
10 KEY1      0 2011-02-07     3       0       0       0
11 KEY1      0 2011-02-08     4       0       0       0
12 KEY1      0 2011-02-09     5       0       0       0
13 KEY1      0 2011-02-10     6       0       0       0
14 KEY1      0 2011-02-11     7       0       0       0
15 KEY1      0 2011-02-12     1       0       0       0 

它们太多,无法手动指定,所以我希望更快。通常,我会通过以下方式在公式中使用.符号:

fit = ex %>% 
  model(TSLM(sales ~ trend() + season() + .))

但是这不起作用:

Warning message:
1 error encountered for TSLM(sales ~ trend() + season() + .)
[1] '.' in formula and no 'data' argument

是否有系统的tsibble解决方法,还是必须使用数据集的名称动态创建公式?

r time-series linear-regression tsibble
1个回答
1
投票

我们可以使用'虚拟'列名称使用reformulate创建一个公式

nm1 <- names(ex)[startsWith(names(ex), 'dummy')]
ex %>%
    model(lm = TSLM(reformulate(c(nm1, 'trend()', 'season()'), 'sales') ))
© www.soinside.com 2019 - 2024. All rights reserved.