说我有x
x <- 'This is the <span I want to remove this> text I would like. But we must check I can have another chevron, >, in the string.'
我想要
x <- 'This is the text I would like. But we must check I can have another chevron, >, in the string.'
我该怎么做?
到目前为止,我做了以下操作,但它删除了我想保留的文本:
sub("<span.*>", "", x)
#> [1] "This is the , in the string."
谢谢
可能重复:从 R 中的字符串中删除 html 标签
您是否尝试用
html
解析 regex
?试试这个:
library(rvest)
x <- 'This is the <span I want to remove this> text I would like. But we must check I can have another chevron, >, in the string.'
html_text(read_html(x))
输出
x
:
[1] "This is the text I would like. But we must check I can have another chevron, >, in the string."
来自 txt 文件:
x <- read_file("temp.txt") # Content bellow
cat(rvest::html_text(read_html(x)))
输出
temp.txt
:
Say I have x
x <- 'This is the <span I want to remove this> text I would like. But we must check I can have another chevron, >, in the string.'
and I want x <- 'This is the text I would like. But we must check I can have another chevron, >, in the string.'
How would I do that?
So far I did the following, but it got rid of text I wanted to keep:
sub("<span.*>", "", x)
#> [1] "This is the , in the string."
Thanks
temp.txt
内容:
<div class="postcell post-layout--right">
<div class="s-prose js-post-body" itemprop="text">
<p>Say I have x</p>
<pre class="lang-r s-code-block"><code class="hljs language-r">x <span class="hljs-operator"><-</span> <span class="hljs-string">'This is the <span I want to remove this> text I would like. But we must check I can have another chevron, >, in the string.'</span>
</code></pre>
<p>and I want <code>x <- 'This is the text I would like. But we must check I can have another chevron, >, in the string.'</code></p>
<p>How would I do that?</p>
<p>So far I did the following, but it got rid of text I wanted to keep:</p>
<pre class="lang-r s-code-block"><code class="hljs language-r">sub<span class="hljs-punctuation">(</span><span class="hljs-string">"<span.*>"</span><span class="hljs-punctuation">,</span> <span class="hljs-string">""</span><span class="hljs-punctuation">,</span> x<span class="hljs-punctuation">)</span>
<span class="hljs-comment">#> [1] "This is the , in the string."</span>
</code></pre>
<p>Thanks</p>
</div>