正则表达式使用 stringr r 包提取 URL 的一部分

问题描述 投票:0回答:1

我有以下网址:

www.google.com?utm_source=site_corriere&utm_medium=video&utm_content=box

www.google.com?utm_source=site_rep&utm_medium=display&utm_content=box

www.google.com?utm_source=site_fattoquotidiano&utm_medium=social&utm_content=box

www.google.com?utm_source=site_inter&utm_medium=video&utm_content=box

www.google.com?utm_source=site_foglio&utm_medium=video&utm_content=box

使用包 stringr,我只想提取“utm_source=”和“&”之间的值

所以我期望:

site_corriere

网站代表

site_fattoquotidiano

site_inter

site_foglio

我正在使用这个正则表达式

(?<=utm_source=)(.*)(?=&)

但它无法正常工作,因为它不排除这部分&utm_medium=video&utm_content=box

你能帮我吗?

谢谢

r stringr data-wrangling
1个回答
0
投票

如果您将正则表达式模式稍微更改为以下内容,它应该可以工作:

(?<=\butm_source=)[^&]+

R 脚本:

library(stringr)

x <- c("www.google.com?utm_source=site_corriere&utm_medium=video&utm_content=box",
       "www.google.com?utm_source=site_rep&utm_medium=display&utm_content=box",
       "www.google.com?utm_source=site_fattoquotidiano&utm_medium=social&utm_content=box",
       "www.google.com?utm_source=site_inter&utm_medium=video&utm_content=box",
       "www.google.com?utm_source=site_foglio&utm_medium=video&utm_content=box")
output <- str_extract(x, "(?<=\\butm_source=)[^&]+")
output
© www.soinside.com 2019 - 2024. All rights reserved.