子集数据帧由许多因素在R中的同时

Question

我有几个变量的数据帧：地区，季节，年，海拔高度和响应（这里为例）：

region   season   year   altitud   response
IT       wint     2013   800       45
IT       wint     2013   815       47
IT       wint     2013   840       54
IT       wint     2014   800       49
IT       wint     2014   815       59

等等。有三个区域，四季两年了，我想altitud与反应之间执行数线性模型和绘制，子集化根据所有可能的组合数据。即

subset(region&season&year)   and get  altitud~response
IT&wint&2013
IT&wint&2014
IT&spring&2013
IT&spring&2014

等等。因此，24个的组合。有任何想法吗？

预先感谢您非常

大卫

Answer 1

我的解决方案使用broom与tidy功能。

读取数据：

library(readr)

data <- read_table("region   season   year   altitud   response
IT       wint     2013   800       45
IT       wint     2013   815       47
IT       wint     2013   840       54
IT       wint     2014   800       49
IT       wint     2014   815       59")

实际的解决方案：

library(dplyr)
library(broom)
data_fit <- data %>%
    group_by(region, season, year) %>%
    do(fit = lm(altitud ~ response, data = .))

dfCoefs <- tidy(data_fit, fit)
dfCoefs

其给出的示例数据如下回归系数：

# A tibble: 4 x 8
# Groups:   region, season, year [2]
  region season  year term        estimate std.error statistic  p.value
  <chr>  <chr>  <dbl> <chr>          <dbl>     <dbl>     <dbl>    <dbl>
1 IT     wint    2013 (Intercept)   613.      34.7       17.7    0.0360
2 IT     wint    2013 response        4.22     0.711      5.93   0.106 
3 IT     wint    2014 (Intercept)   726.     NaN        NaN    NaN     
4 IT     wint    2014 response        1.5    NaN        NaN    NaN

虽然，你想altitud ~ response（即预测从响应高度）或response ~ altitud（预测给予高度的反应如何？）

Answer 2

但愿我给你正确的，这里有一个purrr的解决方案：

library(purrr)
library(dplyr)
nested<-df %>% 
  mutate_if(is.character,as.factor) %>% 
  group_by(year,season,region) %>% 
  nest()
my_model<-function(df){
  lm(altitud~response,data=df)
}

nested %>% 
  mutate(Mod=map(data,my_model))

结果：部分修改的数据得到的因素。

 A tibble: 3 x 5
   year season region data             Mod     
  <int> <fct>  <fct>  <list>           <list>  
1  2013 wint   IT     <tibble [3 x 2]> <S3: lm>
2  2014 wint   IT     <tibble [1 x 2]> <S3: lm>
3  2014 Summer IF     <tibble [1 x 2]> <S3: lm>

与modelr预测。你可以使用broom统计如图对方的回答。

require(modelr)
nested %>% 
  mutate(Mod=map(data,my_model)) %>% 
  mutate(Preds=map2(data,Mod,add_predictions)) %>% 
  unnest(Preds)
# A tibble: 5 x 6
   year season region altitud response  pred
  <int> <fct>  <fct>    <int>    <int> <dbl>
1  2013 wint   IT         800       45  44.4
2  2013 wint   IT         815       47  47.9
3  2013 wint   IT         840       54  53.7
4  2014 wint   IT         800       49  49  
5  2014 Summer IF         815       59  59

掌握broom和purrr汇总统计：

# A tibble: 4 x 8
   year season region term        estimate std.error statistic p.value
  <int> <fct>  <fct>  <chr>          <dbl>     <dbl>     <dbl>   <dbl>
1  2013 wint   IT     (Intercept) -140.      31.8        -4.40   0.142
2  2013 wint   IT     altitud        0.231    0.0389      5.93   0.106
3  2014 wint   IT     (Intercept)   49      NaN         NaN    NaN    
4  2014 Summer IF     (Intercept)   59      NaN         NaN    NaN

nested %>% 
  mutate(Mod=map(data,my_model)) %>% 
  mutate(Preds=map2(data,Mod,add_predictions),Tidy=map(Mod,tidy)) %>% 
  unnest(Tidy)

数据：

df<-read.table(text="region   season   year   altitud   response
IT       wint     2013   800       45
               IT       wint     2013   815       47
               IT       wint     2013   840       54
               IT       wint     2014   800       49
               IF       Summer     2014   815       59",header=T)

Answer 3

为了完整起见，这里也是基础R和data.table解决方案。

Base R

使用split()和lapply()一个可能的基础R的方法是suggested by Jogo：

result <- lapply(split(DT, list(DT$region, DT$season, DT$year)), 
                 lm, formula = response ~ altitud)
print(result)

$IT.wint.2013

Call:
FUN(formula = ..1, data = X[[i]])

Coefficients:
(Intercept)      altitud  
  -140.0510       0.2306  


$IT.wint.2014

Call:
FUN(formula = ..1, data = X[[i]])

Coefficients:
(Intercept)      altitud  
  -484.3333       0.6667

或者，使用管道来提高可读性

library(magrittr)
result <- split(DT, list(DT$region, DT$season, DT$year)) %>% 
  lapply(lm, formula = response ~ altitud)

data.table

随着broom的一些帮助：

library(data.table)
library(magrittr)
setDT(DT)[, lm(response ~ altitud, .SD) %>% broom::tidy(), by = .(region, season, year)]

   region season year        term     estimate   std.error statistic   p.value
1:     IT   wint 2013 (Intercept) -140.0510204 31.82553603 -4.400586 0.1422513
2:     IT   wint 2013     altitud    0.2306122  0.03888277  5.930962 0.1063382
3:     IT   wint 2014 (Intercept) -484.3333333         NaN       NaN       NaN
4:     IT   wint 2014     altitud    0.6666667         NaN       NaN       NaN

setDT(DT)[, lm(response ~ altitud, .SD) %>% broom::glance(), by = .(region, season, year)]

   region season year r.squared adj.r.squared    sigma statistic   p.value df    logLik      AIC    BIC deviance df.residual
1:     IT   wint 2013 0.9723576     0.9447152 1.111168  35.17631 0.1063382  2 -2.925132 11.85026 9.1461 1.234694           1
2:     IT   wint 2014 1.0000000           NaN      NaN       NaN       NaN  2       Inf     -Inf   -Inf 0.000000           0

如果计算lm()为不同的基团是耗时这可能是值得的存储结果，并使用这些用于随后的处理步骤：

mod <- setDT(DT)[, .(model = .(lm(response ~ altitud, .SD))), by = .(region, season, year)]
mod

   region season year models
1:     IT   wint 2013   <lm>
2:     IT   wint 2014   <lm>

mod$models是模型等价的列表result。

现在，我们可以从计算模型中提取所需的信息，例如，

mod[, models[[1]] %>% broom::tidy(), by = .(region, season, year)]

   region season year        term     estimate   std.error statistic   p.value
1:     IT   wint 2013 (Intercept) -140.0510204 31.82553603 -4.400586 0.1422513
2:     IT   wint 2013     altitud    0.2306122  0.03888277  5.930962 0.1063382
3:     IT   wint 2014 (Intercept) -484.3333333         NaN       NaN       NaN
4:     IT   wint 2014     altitud    0.6666667         NaN       NaN       NaN

Data

library(data.table)
DT <- fread("
region   season   year   altitud   response
IT       wint     2013   800       45
IT       wint     2013   815       47
IT       wint     2013   840       54
IT       wint     2014   800       49
IT       wint     2014   815       59")

子集数据帧由许多因素在R中的同时

问题描述投票：1回答：3

3个回答

Base R

data.table

Data

最新问题

子集数据帧由许多因素在R中的同时

问题描述 投票：1回答：3

3个回答

Base R

data.table

Data

最新问题

问题描述投票：1回答：3