根据来自多个列的条件对数据进行分组

问题描述 投票:0回答:3

问题描述 :

我正在尝试计算新近度,基于“年度”列中最近的值,其中目标实现指标等于1,如果指标列的0为Salesman + Year键的唯一可用值,请选择这种情况下的最低年份

数据:

   Salesman_ID  Year         Yearly_Targets_Achieved_Indicator

 1     AA-5468  2012                                 1
 2     AA-5468  2013                                 0
 3     AA-5468  2014                                 0
 4     AA-5468  2015                                 0
 5     AA-5468  2016                                 1
 6     AL-3791  2012                                 1
 7     AL-3791  2013                                 1
 8     AL-3791  2014                                 0
 9     AL-3893  2015                                 0
10     AL-3893  2016                                 0

预期产出:

  Salesman_ID  Year Yearly_Targets_Achieved_Indicator
         <chr> <dbl>                             <dbl>
 1     AA-5468  2016                                 1
 2     AA-3791  2013                                 1
 9     AL-3893  2015                                 0
r group-by dplyr
3个回答
0
投票

使用包tidyverse我建议您使用以下代码:

library(tidyverse)

Prashant_df <- data.frame(
    c("AA-5468","AA-5468","AA-5468","AA-5468","AA-5468","AL-3791","AL-3791","AL-3791","AL-3893","AL-3893"),
    c(2012,2013,2014,2015,2016,2012,2013,2014,2015,2016),
    c(1,0,0,0,1,1,1,0,0,0)
)
names(Prashant_df) <- c("Salesman_ID","Year","Yearly_Targets_Achieved_Indicator")

Prashant_df <- Prashant_df %>% 
    group_by(Salesman_ID) %>% 
    mutate(Year_target=case_when(
        Yearly_Targets_Achieved_Indicator==1 ~ max(Year),
        Yearly_Targets_Achieved_Indicator==0 ~ min(Year)
        ))

Prashant_df_collapsed <- Prashant_df %>% 
    group_by(Salesman_ID) %>% 
    summarise(Year=max(Year_target),
              Yearly_Targets_Achieved_Indicator=max(Yearly_Targets_Achieved_Indicator))

0
投票

您可以为每个销售员存储最大和最小年份,以及二进制变量的最大值。

newdf = df %>% group_by(Salesman_ID) %>% summarise(
  maximum = max(Year),
  minimum = min(Year),
  maxInd = max(Yearly_Targets_Achieved_Indicator))

从这里你可以构建你的结果变量。


0
投票

使用Base R:

  c(by(dat,dat[1],function(x)if(all(x[,3]==0)) x[1,2] else max(x[which(x[,3]==1),2])))

   AA-5468 AL-3791 AL-3893 
      2016    2013    2015 

这段代码有点乱,但产生了所需的输出:这是解释:

首先由salesman_id组成,然后针对该特定组检查所有指标是否为零,如果是,则返回第一年。否则,在指标为1的那些中寻找最新/最大年份

© www.soinside.com 2019 - 2024. All rights reserved.