r中的字符串操作,其中感兴趣的内容具有不同的顺序

问题描述 投票:0回答:2

我有数据框,我试图提取列的内容,然后将其作为新列附加到数据框。

例如,我的数据框看起来像:

> head(df)  
id event_params  
1 {"type":"L","maximumangle":-87.618,"duration":25}  
2 {"type":"L","maximumangle":1.62,"duration":25}  
3 {"maximumangle":-29.661,"type":"L","duration":20}  

我希望提取最大角度,然后将其作为标题为maximumangle的新列附加到现有数据框。我最初的想法是使用grep函数。但是,由于maximumangle在每行中没有出现相同的顺序,因此不起作用。

我能做些什么才能达到我的目的?

r regex string
2个回答
2
投票

1)使用rjson包中的fromJSON解析最后一列。这会添加所有JSON数据。

library(rjson)

L <- lapply(as.character(DF$event_params), fromJSON)
cbind(DF, do.call("rbind", lapply(L, as.data.frame, stringsAsFactors = FALSE)))

赠送:

  id                                      event_params type maximumangle duration
1  1 {"type":"L","maximumangle":-87.618,"duration":25}    L      -87.618       25
2  2    {"type":"L","maximumangle":1.62,"duration":25}    L        1.620       25
3  3 {"maximumangle":-29.661,"type":"L","duration":20}    L      -29.661       20

2)如果你真的只需要maximumangle,我们可以稍微简化一下:

maximumangle <- function(x) fromJSON(as.character(x))$maximumangle
transform(DF, maximumangle = sapply(DF$event_params, maximumangle, USE.NAMES = FALSE))

赠送:

  id                                      event_params maximumangle
1  1 {"type":"L","maximumangle":-87.618,"duration":25}      -87.618
2  2    {"type":"L","maximumangle":1.62,"duration":25}        1.620
3  3 {"maximumangle":-29.661,"type":"L","duration":20}      -29.661

注意

我们假设可重复形式的输入由下式给出:

Lines <- '
id event_params  
1 {"type":"L","maximumangle":-87.618,"duration":25}  
2 {"type":"L","maximumangle":1.62,"duration":25}  
3 {"maximumangle":-29.661,"type":"L","duration":20}'
DF <- read.table(text = Lines, header = TRUE, as.is = TRUE)

0
投票

1)我们可以使用来自str_extractstringr,使用正则表达式查找来匹配字符串'maximumangle'后跟引号(")和冒号(:)并提取其后的模式,即零或更多--*),然后是数字用数字([0-9.]+

library(dplyr)
library(stringr)
df %>%
   mutate(maximumangle =  as.numeric(str_extract(event_params, 
                        '(?<=maximumangle":)-*[0-9.]+')))
#  id                                      event_params maximumangle
#1  1 {"type":"L","maximumangle":-87.618,"duration":25}      -87.618
#2  2    {"type":"L","maximumangle":1.62,"duration":25}        1.620
#3  3 {"maximumangle":-29.661,"type":"L","duration":20}      -29.661

2)或者使用base R可以使用regexpr/regmatches完成相同的操作

df$maximumangle <-  as.numeric(regmatches(df$event_params, 
     regexpr('(?<=maximumangle":)-*[0-9.]+', df$event_params, perl = TRUE)))

data

df <- structure(list(id = 1:3, event_params = c("{\"type\":\"L\",\"maximumangle\":-87.618,\"duration\":25}", 
"{\"type\":\"L\",\"maximumangle\":1.62,\"duration\":25}", "{\"maximumangle\":-29.661,\"type\":\"L\",\"duration\":20}"
)), .Names = c("id", "event_params"), class = "data.frame", row.names = c(NA, 
-3L))
© www.soinside.com 2019 - 2024. All rights reserved.