我有长格式数据,但还无法转换为短格式(现阶段数据太大且复杂)。我的数据包含有关医院事件的信息,每一行对应一个新事件。
我正在尝试从另一个变量创建一个新变量,但我无法让 group_by 函数工作。
我正在尝试创建一个新的二元变量,它可以告诉我一个人是否曾经因心脏骤停住院(是/否),使用一个变量告诉我每次住院的原因。
由于每个人有多个条目,我假设我需要按“ID”分组才能为每个人的每个条目获得相同的二进制结果。
这是我的代码:
data %>%
group_by(ID) %>%
mutate(Ever_Cardiac =
ifelse(reason_for_hospitalisation == “Cardiac”, ‘1’, ‘0’)
代码的第二位正在工作,创建了一个“Ever_Cardiac”列,“1”代表“Cardiac”,“0”代表任何其他类别。
但是,对于多次住院发作的个人,我仅在因心脏骤停而导致的发作中得到“1”,而不是在其他发作中得到“1”。
有人可以帮助我吗?
欢迎来到SO。阅读 @Setefan 关于 minimal reproducible example 的链接并尝试这个:
(玩具数据在最后)
library(tidyverse)
# `mutate by` and `any`
new_df <- my_df %>%
mutate(
.by = c(id, name),
ever_cardiac = if_else(any(reason == "Cardiac"), 1, 0))
输出:
> arrange(new_df, name, date)
# A tibble: 23 × 5
id name date reason ever_cardiac
<dbl> <chr> <date> <chr> <dbl>
1 3 Alice Johnson 2024-05-02 Asthma 1 #
2 3 Alice Johnson 2024-05-05 Cold 1 #
3 3 Alice Johnson 2024-05-10 Cardiac 1 # < group with just one
4 4 Bob Brown 2024-05-03 Migraine 0
5 5 Charlie Davis 2024-05-04 Cardiac 1
6 6 Eve Clark 2024-05-05 Flu 0
7 7 Frank White 2024-05-05 Cold 0
8 8 Grace Lewis 2024-05-06 Asthma 0
9 9 Hank Walker 2024-05-06 Migraine 0
10 10 Ivy Hall 2024-05-07 Fracture 0
11 11 Jack Young 2024-05-07 Flu 0
12 2 Jane Smith 2024-05-01 Cold 0
13 2 Jane Smith 2024-05-09 Migraine 0
14 1 John Doe 2024-05-01 Flu 1
15 1 John Doe 2024-05-08 Asthma 1
16 1 John Doe 2024-05-10 Cardiac 1
17 12 Karen Allen 2024-05-08 Cold 0
18 13 Leo King 2024-05-09 Asthma 0
19 14 Mia Wright 2024-05-10 Migraine 0
20 15 Nick Scott 2024-05-08 Cardiac 1 # < All cardiac reasons
21 15 Nick Scott 2024-05-10 Cardiac 1 # <
22 16 Olivia Green 2024-05-09 Cardiac 1
23 17 Paul Baker 2024-05-10 Cardiac 1
计数:
> new_df %>%
+ count(id, name, ever_cardiac, reason) %>%
+ pivot_wider(
+ id_cols = c(id, name, ever_cardiac),
+ names_from = reason, values_from = n,
+ values_fill = "-", values_fn = as.character)
# A tibble: 17 × 9
id name ever_cardiac Asthma Cardiac Flu Cold Migraine Fracture
<dbl> <chr> <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 John Doe 1 1 1 1 - - -
2 2 Jane Smith 0 - - - 1 1 -
3 3 Alice Johnson 1 1 1 - 1 - -
4 4 Bob Brown 0 - - - - 1 -
5 5 Charlie Davis 1 - 1 - - - -
6 6 Eve Clark 0 - - 1 - - -
7 7 Frank White 0 - - - 1 - -
8 8 Grace Lewis 0 1 - - - - -
9 9 Hank Walker 0 - - - - 1 -
10 10 Ivy Hall 0 - - - - - 1
11 11 Jack Young 0 - - 1 - - -
12 12 Karen Allen 0 - - - 1 - -
13 13 Leo King 0 1 - - - - -
14 14 Mia Wright 0 - - - - 1 -
15 15 Nick Scott 1 - 2 - - - -
16 16 Olivia Green 1 - 1 - - - -
17 17 Paul Baker 1 - 1 - - - -
玩具数据:
# Toy data
my_df <- tibble::tribble(
~id, ~name, ~date, ~reason,
1, "John Doe", "2024-05-01", "Flu",
2, "Jane Smith", "2024-05-01", "Cold",
3, "Alice Johnson", "2024-05-02", "Asthma",
4, "Bob Brown", "2024-05-03", "Migraine",
5, "Charlie Davis", "2024-05-04", "Cardiac",
6, "Eve Clark", "2024-05-05", "Flu",
7, "Frank White", "2024-05-05", "Cold",
8, "Grace Lewis", "2024-05-06", "Asthma",
9, "Hank Walker", "2024-05-06", "Migraine",
10, "Ivy Hall", "2024-05-07", "Fracture",
1, "John Doe", "2024-05-08", "Asthma",
2, "Jane Smith", "2024-05-09", "Migraine",
3, "Alice Johnson", "2024-05-05", "Cold",
3, "Alice Johnson", "2024-05-10", "Cardiac",
11, "Jack Young", "2024-05-07", "Flu",
12, "Karen Allen", "2024-05-08", "Cold",
13, "Leo King", "2024-05-09", "Asthma",
14, "Mia Wright", "2024-05-10", "Migraine",
15, "Nick Scott", "2024-05-08", "Cardiac",
15, "Nick Scott", "2024-05-10", "Cardiac",
16, "Olivia Green", "2024-05-09", "Cardiac",
1, "John Doe", "2024-05-10", "Cardiac",
17, "Paul Baker", "2024-05-10", "Cardiac")
my_df <- mutate(my_df, date = ymd(date))
创建于 2024-05-22,使用 reprex v2.1.0