我有下面的模拟数据,它的构造方式是反映出列的 use
和 budget
长短不一:
year <- seq(2010, 2019, 1)
budget <- runif(10)
df1 <- data.frame(year, budget)
years2 <- seq(2010, 2019, 3)
use <- runif(4)
gender <- c("w", "w", "w", "w")
df2 <- data.frame(years2, gender, use) %>% rename(year = years2)
df_corr <- df2 %>% full_join(df1, by = "year") %>% arrange(year)
我已经成功过滤掉了 NA
和倒退 use
关于 budget
同年的。
我想将 use
关于 budget
从上一年开始。例如,如果 use
从2010年到2013年的增长,看看会发生什么。budget
2010年至 2012,因为可能需要一年的预算才能影响使用。
谢谢您的帮助!我有以下模拟数据,其构造方式是为了反映出使用和预算这两列的长度不一样的事实。
你的例子数据和你陈述问题的方式让我有些困惑,所以让我给你一些可重复的数据,看看我是否接近你要找的东西......
library(dplyr)
set.seed(2020)
group <- rep(letters[1:10], each = 10)
year <- rep(2010:2019, times= 10)
budget <- runif(100, min = 10, max = 100)
use <- runif(100)
df <- data.frame(group, year, budget, use)
glimpse(df)
#> Rows: 100
#> Columns: 4
#> $ group <chr> "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "b", "b", "b…
#> $ year <int> 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 20…
#> $ budget <dbl> 68.22126, 45.48032, 65.66516, 52.92020, 22.24875, 16.06459, 21…
#> $ use <dbl> 0.42512874, 0.26608799, 0.72619972, 0.93059516, 0.25023194, 0.…
df <- df %>% group_by(group) %>% mutate(prev_year_use = lag(use, order_by = year))
glimpse(df)
#> Rows: 100
#> Columns: 5
#> Groups: group [10]
#> $ group <chr> "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "b", …
#> $ year <int> 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2…
#> $ budget <dbl> 68.22126, 45.48032, 65.66516, 52.92020, 22.24875, 16.06…
#> $ use <dbl> 0.42512874, 0.26608799, 0.72619972, 0.93059516, 0.25023…
#> $ prev_year_use <dbl> NA, 0.42512874, 0.26608799, 0.72619972, 0.93059516, 0.2…
lm(budget ~ use, data = df)
#>
#> Call:
#> lm(formula = budget ~ use, data = df)
#>
#> Coefficients:
#> (Intercept) use
#> 50.604 7.639
lm(budget ~ prev_year_use, data = df)
#>
#> Call:
#> lm(formula = budget ~ prev_year_use, data = df)
#>
#> Coefficients:
#> (Intercept) prev_year_use
#> 47.60 10.66