我正在搜索多个诊断以识别某些代码(S00-S99、T07-T34、T4112、W00-W99),并希望标记至少存在一个代码的病例。当我运行以下代码时:
library(tidyverse)
library(stringr)
raw_df <- tibble::tribble(
~dx1, ~dx2, ~ecm1, ~ecm2,
"S045", "T401", "X64", "V99",
"R901", "T5621A", "Y141", "U033",
"J76", "I51", "K44", "G304"
)
dat <- raw_df %>%
mutate(diagn = case_when(
if_any(c(dx1:odx2, ecm1:ecm2),
~str_detect(., regex("[S]+[00-99]|[T]+[07-34]|T4112|[W]+[00-99]"))) ~ 1,
TRUE ~ 0))
我收到错误:
Error in `mutate()`:
ℹ In argument: `diagn = case_when(...)`.
Caused by error in `case_when()`:
! Failed to evaluate the left-hand side of formula 1.
Caused by error in `if_any()`:
! Can't compute column `dx1`.
Caused by error in `stri_detect_regex()`:
! In a character range [x-y], x is greater than y. (U_REGEX_INVALID_RANGE, context=`[S]+[00-99]|[T]+[07-34]|T4112|[W]+[00-99]`)
一种方法是在向量中定义代码,创建辅助函数来确定输入代码是否在范围内,然后创建满足该条件的参考矩阵。这也允许您识别匹配的列和行,或者仅识别行,无论您想要什么:
您的示例数据有一行和一列满足您的条件(第一行和第一列)。为了更好地测试,我在示例数据中添加了一行,其中有第四个观察结果满足每列中的不同代码:
raw_df <- tibble::tribble(
~dx1, ~dx2, ~ecm1, ~ecm2,
"S045", "T401", "X64", "V99",
"R901", "T5621A", "Y141", "U033",
"J76", "I51", "K44", "G304",
"S33", "T14", "T4112", "W01"
)
首先定义向量和辅助函数
codes <- c("S00-S99", "T07-T34", "T4112", "W00-W99")
range_fun <- function(code, range_str) {
if(grepl("-", range_str)){
range_parts <- stringr::str_split(range_str, "-", simplify = TRUE)
dplyr::between(code, range_parts[1], range_parts[2])
} else {
code == range_str
}}
然后使用
*apply
函数来运行代码:
ref_matrix <- sapply(codes, \(x)
apply(raw_df, 1, \(y) any(range_fun(y, x))))
# S00-S99 T07-T34 T4112 W00-W99
# [1,] TRUE FALSE FALSE FALSE
# [2,] FALSE FALSE FALSE FALSE
# [3,] FALSE FALSE FALSE FALSE
# [4,] TRUE TRUE TRUE TRUE
如果您只想识别行,您可以使用
apply
: 进行索引
raw_df[apply(ref_matrix, 1, any),]
# dx1 dx2 ecm1 ecm2
# 1 S045 T401 X64 V99
# 4 S33 T14 T4112 W01
如果你想识别行和列,你可以使用
which
:
which(ref_matrix, arr.ind = TRUE)
# row col
# [1,] 1 1
# [2,] 4 1
# [3,] 4 2
# [4,] 4 3
# [5,] 4 4