C ++使用Regex查找子字符串

Question

我有一个字符串测试

<td><a href="4.%20Functions,%20scope.ppt">4. Functions, scope.ppt</a></td>

我想找到<a href="4.%20Functions,%20scope.ppt">（作为子串）

与Dr.Google一起搜索：regex e ("<a href=.*?>"); cmatch =cm;标记我想要找到的子字符串。

我接下来该怎么办？

我是否正确使用regex_match(htmlString, cm, e);和htmlString作为wchar_t*

Answer 1

如果要查找所有匹配的子字符串，则需要使用正则表达式迭代器：

// example data
std::wstring const html = LR"(

<td><a href="4.%20Functions,%20scope.ppt">4. Functions, scope.ppt</a></td>
<td><a href="4.%20Functions,%20scope.ppt">4. Functions, scope.ppt</a></td>
<td><a href="4.%20Functions,%20scope.ppt">4. Functions, scope.ppt</a></td>

)";

// for convenience
constexpr auto fast_n_loose = std::regex_constants::optimize|std::regex_constants::icase;

// extract href's
std::wregex const e_link{LR"~(href=(["'])(.*?)\1)~", fast_n_loose};

int main()
{
    // regex iterators       
    std::wsregex_iterator itr_end;
    std::wsregex_iterator itr{std::begin(html), std::end(html), e_link};

    // iterate through the matches
    for(; itr != itr_end; ++itr)
    {
        std::wcout << itr->str(2) << L'\n';
    }
}

Answer 2

这将匹配完整的a标记，并获得href属性值，在捕获组2中。

它应该这样做，因为href属性可以在标记中的任何位置。

<a(?=(?:[^>"']|"[^"]*"|'[^']*')*?\shref\s*=\s*(?:(['"])([\S\s]*?)\1))\s+(?:"[\S\s]*?"|'[\S\s]*?'|[^>]*?)+>

你可以用[\w:}+代替a标签来获取所有标签的href。

https://regex101.com/r/LHZXUM/1

Formatted and tested

 < a                    # a tag, substitute [\w:]+ for any tag

 (?=                    # Asserttion (a pseudo atomic group)
      (?: [^>"'] | " [^"]* " | ' [^']* ' )*?
      \s href \s* = \s* 
      (?:
           ( ['"] )               # (1), Quote
           ( [\S\s]*? )           # (2), href value
           \1 
      )
 )
 \s+ 
 (?: " [\S\s]*? " | ' [\S\s]*? ' | [^>]*? )+
 >

C ++使用Regex查找子字符串

问题描述投票：-1回答：2

2个回答

最新问题

C ++使用Regex查找子字符串

问题描述 投票：-1回答：2

2个回答

最新问题

问题描述投票：-1回答：2