replace_emoticon函数错误地替换了单词中的字符-R

问题描述 投票:1回答:1

我正在R中工作,并使用textclean包中的replace_emoticon函数用其对应的单词替换图释:

library(textclean)
test_text <- "i had a great experience xp :P"
replace_emoticon(test_text)

[1] "i had a great e tongue sticking out erience tongue sticking out tongue sticking out "

如上所示,该功能有效,但它也替换了看起来像表情符号但在单词内的字符(例如,“ e xp erience”中的“ xp”)。我试图找到解决此问题的方法,并发现以下声称可以解决此问题的函数覆盖:

 replace_emoticon <- function(x, emoticon_dt = lexicon::hash_emoticons, ...){

     trimws(gsub(
         "\\s+", 
         " ", 
         mgsub_regex(x, paste0('\\b\\Q', emoticon_dt[['x']], '\\E\\b'), paste0(" ", emoticon_dt[['y']], " "))
     ))

 }

replace_emoticon(test_text)

[1] "i had a great experience tongue sticking out :P"

但是,虽然确实用“ experience”一词解决了问题,但它创建了一个全新的问题:它将停止替换“:P”,它是一个表情符号,通常应由该函数替换。

此外,用字符“ xp”知道该错误,但我不确定除“ xp”以外是否还有其他字符在单词中时也会被错误替换。

是否有一种方法可以告诉replace_emoticon函数仅在不属于单词的情况下替换“表情符号”?

谢谢!

r regex data-cleaning sentiment-analysis emoticons
1个回答
0
投票

Wiktor是正确的,边界检查一词引起了问题。我在以下功能中对其进行了稍微的调整。仍然有1个问题,那就是如果表情符号后紧跟一个单词,且表情符号和该单词之间没有空格。问题是最后一个问题是否重要。请参阅下面的示例。

注意:我已使用textclean将此信息添加到问题跟踪器中。

replace_emoticon2 <- function(x, emoticon_dt = lexicon::hash_emoticons, ...){
  trimws(gsub(
    "\\s+", 
    " ", 
    mgsub_regex(x, paste0('\\Q', emoticon_dt[['x']], '\\E\\b'), paste0(" ", emoticon_dt[['y']], " "))
  ))
}

# works
replace_emoticon2("i had a great experience xp :P")
[1] "i had a great experience tongue sticking out tongue sticking out"
replace_emoticon2("i had a great experiencexp:P:P")
[1] "i had a great experience tongue sticking out tongue sticking out tongue sticking out"


# does not work:
replace_emoticon2("i had a great experience xp :Pnewword")
[1] "i had a great experience tongue sticking out :Pnewword"
© www.soinside.com 2019 - 2024. All rights reserved.