我有数据框,我试图提取列的内容,然后将其作为新列附加到数据框。
例如,我的数据框看起来像:
> head(df)
id event_params
1 {"type":"L","maximumangle":-87.618,"duration":25}
2 {"type":"L","maximumangle":1.62,"duration":25}
3 {"maximumangle":-29.661,"type":"L","duration":20}
我希望提取最大角度,然后将其作为标题为maximumangle的新列附加到现有数据框。我最初的想法是使用grep函数。但是,由于maximumangle在每行中没有出现相同的顺序,因此不起作用。
我能做些什么才能达到我的目的?
1)使用rjson包中的fromJSON
解析最后一列。这会添加所有JSON数据。
library(rjson)
L <- lapply(as.character(DF$event_params), fromJSON)
cbind(DF, do.call("rbind", lapply(L, as.data.frame, stringsAsFactors = FALSE)))
赠送:
id event_params type maximumangle duration
1 1 {"type":"L","maximumangle":-87.618,"duration":25} L -87.618 25
2 2 {"type":"L","maximumangle":1.62,"duration":25} L 1.620 25
3 3 {"maximumangle":-29.661,"type":"L","duration":20} L -29.661 20
2)如果你真的只需要maximumangle
,我们可以稍微简化一下:
maximumangle <- function(x) fromJSON(as.character(x))$maximumangle
transform(DF, maximumangle = sapply(DF$event_params, maximumangle, USE.NAMES = FALSE))
赠送:
id event_params maximumangle
1 1 {"type":"L","maximumangle":-87.618,"duration":25} -87.618
2 2 {"type":"L","maximumangle":1.62,"duration":25} 1.620
3 3 {"maximumangle":-29.661,"type":"L","duration":20} -29.661
我们假设可重复形式的输入由下式给出:
Lines <- '
id event_params
1 {"type":"L","maximumangle":-87.618,"duration":25}
2 {"type":"L","maximumangle":1.62,"duration":25}
3 {"maximumangle":-29.661,"type":"L","duration":20}'
DF <- read.table(text = Lines, header = TRUE, as.is = TRUE)
1)我们可以使用来自str_extract
的stringr
,使用正则表达式查找来匹配字符串'maximumangle'后跟引号("
)和冒号(:
)并提取其后的模式,即零或更多-
(-*
),然后是数字用数字([0-9.]+
)
library(dplyr)
library(stringr)
df %>%
mutate(maximumangle = as.numeric(str_extract(event_params,
'(?<=maximumangle":)-*[0-9.]+')))
# id event_params maximumangle
#1 1 {"type":"L","maximumangle":-87.618,"duration":25} -87.618
#2 2 {"type":"L","maximumangle":1.62,"duration":25} 1.620
#3 3 {"maximumangle":-29.661,"type":"L","duration":20} -29.661
2)或者使用base R
可以使用regexpr/regmatches
完成相同的操作
df$maximumangle <- as.numeric(regmatches(df$event_params,
regexpr('(?<=maximumangle":)-*[0-9.]+', df$event_params, perl = TRUE)))
df <- structure(list(id = 1:3, event_params = c("{\"type\":\"L\",\"maximumangle\":-87.618,\"duration\":25}",
"{\"type\":\"L\",\"maximumangle\":1.62,\"duration\":25}", "{\"maximumangle\":-29.661,\"type\":\"L\",\"duration\":20}"
)), .Names = c("id", "event_params"), class = "data.frame", row.names = c(NA,
-3L))