过滤跨列的分层逻辑 - 按类型更高或更低的值

问题描述 投票:0回答:1

在R中,尝试在大型数据集上按组实现以下过滤逻辑:

每组内:

如果L超过1,则保留L值最低的行。

如果N个以上,则保留N值最高的行。

如果同时存在 L 和 N,则删除 N 高于 L 的任何行。

如果同时存在 L 和 N,请将 N 的最高值的行保留在 L 的最低值之下(除了 L 的最低值之外)。

保留 B 的所有值。

样本数据:

dat <- data.frame(group=c("AB","AB","AB","AB","BC","BC","B","B","AD","AD","AD","G"),
type=c("B","L","N","N","N","L","N","N","B","L","L","L"),
value=c(2,4,3,2,5,3,8,9,4,3,9,7))

enter image description here

所需输出:

desired_output <- data.frame(group=c("AB","AB","AB","BC","B","AD","AD","G"),
type=c("B","L","N","L","N","B","L","L"),
value=c(2,4,3,3,9,4,3,7))

enter image description here

寻找 dplyr/tidyr 解决方案。我已经尝试过在过滤器内的pivot_wider或case_when之后过滤逻辑,但我还没有非常接近。我原以为这很简单,但跨列应用逻辑让我难住了。

这符合我的想法,但它不会产生所需的输出(例如,L 在组内的所有类型上取最小值,而不是仅在 L 内):

df <- dat %>%
group_by(group) %>% 
filter(type=="B"|type=="L" & value==min(value)|type=="N" & value==max(value))
r dplyr filtering tidyr
1个回答
0
投票

你可以尝试一下:

### Packages
library(dplyr)
library(tidyr)

### Data
dat <- data.frame(group=c("AB","AB","AB","AB","BC","BC","B","B","AD","AD","AD","G"),
                  type=c("B","L","N","N","N","L","N","N","B","L","L","L"),
                  value=c(2,4,3,2,5,3,8,9,4,3,9,7))

### We add the number of L and N for each group
dat2=dat %>%
  group_by(group) %>%
  mutate(nb_L = sum (type == "L"),
         nb_N = sum (type == "N")) %>%
  ungroup()

### We create 3 dataframes that respect your conditions
a=dat2 %>% group_by(group) %>% filter(nb_L>1&type=="L") %>% slice_min(value,n = 1) %>% ungroup()
b=dat2 %>% group_by(group) %>% filter(nb_N>1&type=="N") %>% slice_max(value,n=1) %>% ungroup()
c=dat2 %>% group_by(group) %>% filter(type=="B"|(nb_L<=1&type=="L")|(nb_N<=1&type=="N")) %>% ungroup()

### We stack the dataframes
dat2=bind_rows(a,b,c) %>% ungroup()

### We add the value of L and N for each group
### We remove the rows regarding the rest of your criterias
dat2=dat2 %>%
  group_by(group) %>%
  mutate(val_L = ifelse(type == "L", value, NA_real_),
         val_N = ifelse(type == "N", value, NA_real_)) %>%
  fill(c(val_L,val_N), .direction = "downup") %>%
  mutate(across(c(val_L,val_N),~replace_na(.x,0)),
         keep=case_when(nb_L>0&type=="N"&val_N>val_L~"remove",.default = "keep")) %>% 
  filter(keep=="keep") %>%
  select(group,type,value) %>% 
  arrange(group,type) %>% 
  ungroup()

输出:

# A tibble: 8 × 3
  group type  value
  <chr> <chr> <dbl>
1 AB    B         2
2 AB    L         4
3 AB    N         3
4 AD    B         4
5 AD    L         3
6 B     N         9
7 BC    L         3
8 G     L         7
© www.soinside.com 2019 - 2024. All rights reserved.