Group_by 函数不适用于长数据

问题描述 投票:0回答:1

我有长格式数据,但还无法转换为短格式(现阶段数据太大且复杂)。我的数据包含有关医院事件的信息,每一行对应一个新事件。

我正在尝试从另一个变量创建一个新变量,但我无法让 group_by 函数工作。

我正在尝试创建一个新的二元变量,它可以告诉我一个人是否曾经因心脏骤停住院(是/否),使用一个变量告诉我每次住院的原因。

由于每个人有多个条目,我假设我需要按“ID”分组才能为每个人的每个条目获得相同的二进制结果。

这是我的代码:

data %>% 
group_by(ID) %>% 
mutate(Ever_Cardiac = 
ifelse(reason_for_hospitalisation == “Cardiac”, ‘1’, ‘0’)

代码的第二位正在工作,创建了一个“Ever_Cardiac”列,“1”代表“Cardiac”,“0”代表任何其他类别。

但是,对于多次住院发作的个人,我仅在因心脏骤停而导致的发作中得到“1”,而不是在其他发作中得到“1”。

有人可以帮助我吗?

r dplyr group-by mutate
1个回答
0
投票

欢迎来到SO。阅读 @Setefan 关于 minimal reproducible example 的链接并尝试这个:
(玩具数据在最后)

library(tidyverse)

# `mutate by` and `any`
new_df <- my_df %>% 
  mutate(
    .by = c(id, name),
    ever_cardiac = if_else(any(reason == "Cardiac"), 1, 0))

输出:

> arrange(new_df, name, date)
# A tibble: 23 × 5
      id name          date       reason   ever_cardiac
   <dbl> <chr>         <date>     <chr>           <dbl>
 1     3 Alice Johnson 2024-05-02 Asthma              1 #
 2     3 Alice Johnson 2024-05-05 Cold                1 #
 3     3 Alice Johnson 2024-05-10 Cardiac             1 # < group with just one
 4     4 Bob Brown     2024-05-03 Migraine            0
 5     5 Charlie Davis 2024-05-04 Cardiac             1 
 6     6 Eve Clark     2024-05-05 Flu                 0
 7     7 Frank White   2024-05-05 Cold                0
 8     8 Grace Lewis   2024-05-06 Asthma              0
 9     9 Hank Walker   2024-05-06 Migraine            0
10    10 Ivy Hall      2024-05-07 Fracture            0
11    11 Jack Young    2024-05-07 Flu                 0
12     2 Jane Smith    2024-05-01 Cold                0
13     2 Jane Smith    2024-05-09 Migraine            0
14     1 John Doe      2024-05-01 Flu                 1
15     1 John Doe      2024-05-08 Asthma              1
16     1 John Doe      2024-05-10 Cardiac             1 
17    12 Karen Allen   2024-05-08 Cold                0
18    13 Leo King      2024-05-09 Asthma              0
19    14 Mia Wright    2024-05-10 Migraine            0
20    15 Nick Scott    2024-05-08 Cardiac             1 # < All cardiac reasons
21    15 Nick Scott    2024-05-10 Cardiac             1 # <
22    16 Olivia Green  2024-05-09 Cardiac             1 
23    17 Paul Baker    2024-05-10 Cardiac             1 

计数:

> new_df %>% 
+   count(id, name, ever_cardiac, reason) %>% 
+   pivot_wider(
+     id_cols = c(id, name, ever_cardiac),
+     names_from = reason, values_from = n, 
+     values_fill = "-", values_fn = as.character)
# A tibble: 17 × 9
      id name          ever_cardiac Asthma Cardiac Flu   Cold  Migraine Fracture
   <dbl> <chr>                <dbl> <chr>  <chr>   <chr> <chr> <chr>    <chr>   
 1     1 John Doe                 1 1      1       1     -     -        -       
 2     2 Jane Smith               0 -      -       -     1     1        -       
 3     3 Alice Johnson            1 1      1       -     1     -        -       
 4     4 Bob Brown                0 -      -       -     -     1        -       
 5     5 Charlie Davis            1 -      1       -     -     -        -       
 6     6 Eve Clark                0 -      -       1     -     -        -       
 7     7 Frank White              0 -      -       -     1     -        -       
 8     8 Grace Lewis              0 1      -       -     -     -        -       
 9     9 Hank Walker              0 -      -       -     -     1        -       
10    10 Ivy Hall                 0 -      -       -     -     -        1       
11    11 Jack Young               0 -      -       1     -     -        -       
12    12 Karen Allen              0 -      -       -     1     -        -       
13    13 Leo King                 0 1      -       -     -     -        -       
14    14 Mia Wright               0 -      -       -     -     1        -       
15    15 Nick Scott               1 -      2       -     -     -        -       
16    16 Olivia Green             1 -      1       -     -     -        -       
17    17 Paul Baker               1 -      1       -     -     -        -   

玩具数据:

# Toy data
my_df <- tibble::tribble(
  ~id,           ~name,        ~date,    ~reason,
    1,      "John Doe", "2024-05-01",      "Flu",
    2,    "Jane Smith", "2024-05-01",     "Cold",
    3, "Alice Johnson", "2024-05-02",   "Asthma",
    4,     "Bob Brown", "2024-05-03", "Migraine",
    5, "Charlie Davis", "2024-05-04",  "Cardiac",
    6,     "Eve Clark", "2024-05-05",      "Flu",
    7,   "Frank White", "2024-05-05",     "Cold",
    8,   "Grace Lewis", "2024-05-06",   "Asthma",
    9,   "Hank Walker", "2024-05-06", "Migraine",
   10,      "Ivy Hall", "2024-05-07", "Fracture",
    1,      "John Doe", "2024-05-08",   "Asthma",
    2,    "Jane Smith", "2024-05-09", "Migraine",
    3, "Alice Johnson", "2024-05-05",     "Cold",
    3, "Alice Johnson", "2024-05-10",  "Cardiac",
   11,    "Jack Young", "2024-05-07",      "Flu",
   12,   "Karen Allen", "2024-05-08",     "Cold",
   13,      "Leo King", "2024-05-09",   "Asthma",
   14,    "Mia Wright", "2024-05-10", "Migraine",
   15,    "Nick Scott", "2024-05-08",  "Cardiac",
   15,    "Nick Scott", "2024-05-10",  "Cardiac",
   16,  "Olivia Green", "2024-05-09",  "Cardiac",
    1,      "John Doe", "2024-05-10",  "Cardiac",
   17,    "Paul Baker", "2024-05-10",  "Cardiac")

my_df <- mutate(my_df, date = ymd(date))

创建于 2024-05-22,使用 reprex v2.1.0

最新问题
© www.soinside.com 2019 - 2025. All rights reserved.