R中的子串提取

问题描述 投票:2回答:2

我有一个看起来像这样的字符串:

{"created_at":"Tue May 12 09:45:33 +0000 2015","id":598061439090196480,"id_str":"598061439090196480","text":"I've collected 72,455 gold coins! http:\/\/t.co\/eTEbfxpAr0 #iphone"}

我希望结果是:

"Tue May 12 09:45:33 +0000 2015"  

598061439090196480

"598061439090196480"

"I've collected 72,455 gold coins! http:\/\/t.co\/eTEbfxpAr0 #iphone"

分隔符可以工作,但它会为某些字符串分隔一行并开始一个新行。请建议一些函数,我可以给出子串的开始和结束模式或不同的方法将非常有帮助。谢谢。

r string
2个回答
8
投票

由于您拥有JSON格式的内容,因此请使用其中一个JSON解析器。

例:

string <- '{"created_at":"Tue May 12 09:45:33 +0000 2015","id":598061439090196480,"id_str":"598061439090196480","text":"I\'ve collected 72,455 gold coins! http://example.com/eTEbfxpAr0 #iphone"}'
library(jsonlite)
fromJSON(string)
# $created_at
# [1] "Tue May 12 09:45:33 +0000 2015"
# 
# $id
# [1] 5.980614e+17
# 
# $id_str
# [1] "598061439090196480"
# 
# $text
# [1] "I've collected 72,455 gold coins! http://example.com/eTEbfxpAr0 #iphone"

0
投票

你也可以使用regmatches功能。最好与Ananda一起使用,因为使用专门为解析json文件而创建的解析器是可行的方法。

> string <- '{"created_at":"Tue May 12 09:45:33 +0000 2015","id":598061439090196480,"id_str":"598061439090196480","text":"I\'ve collected 72,455 gold coins! http://t.co/eTEbfxpAr0 #iphone"}'
> regmatches(string, gregexpr("(?<=:)(?:\"[^\"]*\"|[^,}]*)", string, perl=T))[[1]]
[1] "\"Tue May 12 09:45:33 +0000 2015\""                                  
[2] "598061439090196480"                                                  
[3] "\"598061439090196480\""                                              
[4] "\"I've collected 72,455 gold coins! http://t.co/eTEbfxpAr0 #iphone\""
© www.soinside.com 2019 - 2024. All rights reserved.