我有这个字符串:
"<span class='break'><div class='name-and-date'><strong>Mr. Talon
Williamson - Dec 18, 1:47 PM Eastern</div></strong><div class='note-
contents'>- wrong</div></span><span class='break'><div class='name-and-
date'><strong>Mr. Talon Williamson - Dec 18, 1:47 PM Eastern</div>
</strong><div class='note-contents'>- Wrong again</div></span><span
class='break'><div class='name-and-date'><strong>Mr. Talon Williamson
- Dec 18, 1:47 PM Eastern</div></strong><div class='note-contents'>-
okay what is the matter with you.</div></span><span class='break'><div
class='name-and-date'><strong>Mr. Talon Williamson - Dec 18, 1:50 PM
Eastern</div></strong><div class='note-contents'>- Bro!</div></span>"
如何从此字符串中删除最后一个跨度,以便获得此返回值:
"<span class='break'><div class='name-and-date'><strong>Mr. Talon
Williamson - Dec 18, 1:47 PM Eastern</div></strong><div class='note-
contents'>- wrong</div></span><span class='break'><div class='name-and-
date'><strong>Mr. Talon Williamson - Dec 18, 1:47 PM Eastern</div>
</strong><div class='note-contents'>- Wrong again</div></span><span
class='break'><div class='name-and-date'><strong>Mr. Talon Williamson
- Dec 18, 1:47 PM Eastern</div></strong><div class='note-contents'>-
okay what is the matter with you.</div></span>"
我理解使用Nokogiri进行html解析是更好的做法,但对于我的用例,保持字符串的完整性非常重要。这意味着除了删除最后一个跨度之外,它必须完全相同。
我想做这样的事情:
string.scan(/<span class='break'>/)
但是这并没有抓住整个字符串并将它们分解为数组元素。
看看这是否有帮助。这是你在找什么?
txt = "<span class='break'><div class='name-and-date'><strong>Mr. Talon
Williamson - Dec 18, 1:47 PM Eastern</div></strong><div class='note-
contents'>- wrong</div></span><span class='break'><div class='name-and-
date'><strong>Mr. Talon Williamson - Dec 18, 1:47 PM Eastern</div>
</strong><div class='note-contents'>- Wrong again</div></span><span
class='break'><div class='name-and-date'><strong>Mr. Talon Williamson
- Dec 18, 1:47 PM Eastern</div></strong><div class='note-contents'>-
okay what is the matter with you.</div></span><span class='break'><div
class='name-and-date'><strong>Mr. Talon Williamson - Dec 18, 1:50 PM
Eastern</div></strong><div class='note-contents'>- Bro!</div></span>"
txt.rindex('<span')
# => 540
txt.rindex('</span')
# => 700
txt[txt.rindex('<span'), txt.rindex('</span')]
# => "<span class='break'><div \n class='name-and-date'><strong>Mr. Talon Williamson - Dec 18, 1:50 PM \n Eastern</div></strong><div class='note-contents'>- Bro!</div></span>"
txt[txt.rindex('<span'), txt.rindex('</span')] = ""
txt
# => "<span class='break'><div class='name-and-date'><strong>Mr. Talon \n Williamson - Dec 18, 1:47 PM Eastern</div></strong><div class='note-\n contents'>- wrong</div></span><span class='break'><div class='name-and-\n date'><strong>Mr. Talon Williamson - Dec 18, 1:47 PM Eastern</div>\n </strong><div class='note-contents'>- Wrong again</div></span><span \n class='break'><div class='name-and-date'><strong>Mr. Talon Williamson \n - Dec 18, 1:47 PM Eastern</div></strong><div class='note-contents'>- \n okay what is the matter with you.</div></span>"
你可以通过很多方式做到这一点。
假设你在txt变量中有那个字符串,那么txt.split("<span class='break'")[0..-2].join("<span class='break")
就可以轻松工作了。这只是一个问题。
除非有嵌套的span
s具有类"break"
,否则以下内容将起作用。
input.scan(%r|<span\s+class=['"]break["']>.*?</span>|m)[0...-1].join
稍慢,但总是按预期工作:
input[%r|.*(?=<span\s+class=['"]break["']>.*?</span>\z)|m]
后一个解决方案使用positive lookahead来捕获所有内容,除了最后一个模式后紧跟字符串结尾(\z
。)
关于String#[]
的更多信息,以正则表达式作为参数。