我有以下网址:
www.google.com?utm_source=site_corriere&utm_medium=video&utm_content=box
www.google.com?utm_source=site_rep&utm_medium=display&utm_content=box
www.google.com?utm_source=site_fattoquotidiano&utm_medium=social&utm_content=box
www.google.com?utm_source=site_inter&utm_medium=video&utm_content=box
www.google.com?utm_source=site_foglio&utm_medium=video&utm_content=box
使用包 stringr,我只想提取“utm_source=”和“&”之间的值
所以我期望:
site_corriere
网站代表
site_fattoquotidiano
site_inter
site_foglio
我正在使用这个正则表达式
(?<=utm_source=)(.*)(?=&)
但它无法正常工作,因为它不排除这部分&utm_medium=video&utm_content=box
你能帮我吗?
谢谢
如果您将正则表达式模式稍微更改为以下内容,它应该可以工作:
(?<=\butm_source=)[^&]+
R 脚本:
library(stringr)
x <- c("www.google.com?utm_source=site_corriere&utm_medium=video&utm_content=box",
"www.google.com?utm_source=site_rep&utm_medium=display&utm_content=box",
"www.google.com?utm_source=site_fattoquotidiano&utm_medium=social&utm_content=box",
"www.google.com?utm_source=site_inter&utm_medium=video&utm_content=box",
"www.google.com?utm_source=site_foglio&utm_medium=video&utm_content=box")
output <- str_extract(x, "(?<=\\butm_source=)[^&]+")
output