删除除撇号之外的所有特殊字符

问题描述 投票:0回答:3

给出一个句子,我想要计算所有重复的单词:这是来自Exercism.io Word count的练习

例如输入"olly olly in come free"

plain olly: 2 in: 1 come: 1 free: 1

我有这个测试例子:

  def test_with_quotations
    phrase = Phrase.new("Joe can't tell between 'large' and large.")
    counts = {"joe"=>1, "can't"=>1, "tell"=>1, "between"=>1, "large"=>2, "and"=>1}
    assert_equal counts, phrase.word_count
  end

这是我的方法

def word_count
    phrase = @phrase.downcase.split(/\W+/)
    counts = phrase.group_by{|word| word}.map {|k,v| [k, v.count]}
    Hash[*counts.flatten]
  end

对于上面的测试,当我在终端中运行它时会出现这种情况:

  2) Failure:
PhraseTest#test_with_apostrophes [word_count_test.rb:69]:
--- expected
+++ actual
@@ -1 +1 @@
-{"first"=>1, "don't"=>2, "laugh"=>1, "then"=>1, "cry"=>1}
+{"first"=>1, "don"=>2, "t"=>2, "laugh"=>1, "then"=>1, "cry"=>1}

我的问题是删除除'apostrophe之外的所有字符...

方法中的正则表达式几乎可以工作... phrase = @phrase.downcase.split(/\W+/)但它删除了撇号......

我不希望单引号围绕一个单词,'Hello' => HelloDon't be cruel => Don't be cruel

ruby regex
3个回答
4
投票

也许是这样的:

string.scan(/\b[\w']+\b/i).each_with_object(Hash.new(0)){|a,(k,v)| k[a]+=1}

正则表达式使用单词边界(\ b)。扫描输出找到的单词的数组,并且对于数组中的每个单词,它们被添加到散列中,对于每个项目,其具有默认值零,然后递增。

在找到所有项目并忽略大小写的情况下结果我的解决方案仍然会将项目保留在最初找到它们的情况下。现在,这将决定Nelly是否接受原样或在原始字符串或数组项上执行小写,因为它被添加到散列中。

我会把这个决定留给你:)


1
投票

鉴于:

irb(main):015:0> phrase
=> "First: don't laugh. Then: don't cry."

尝试:

irb(main):011:0> Hash[phrase.downcase.scan(/[a-z']+/)
                     .group_by{|word| word.downcase}
                     .map{|word, words|[word, words.size]}
                    ]
=> {"first"=>1, "don't"=>2, "laugh"=>1, "then"=>1, "cry"=>1}

有了您的更新,如果您想删除单引号,请先执行此操作:

irb(main):038:0> p2
=> "Joe can't tell between 'large' and large."
irb(main):039:0> p2.gsub(/(?<!\w)'|'(?!\w)/,'')
=> "Joe can't tell between large and large."

然后使用相同的方法。

但你说 - gsub(/(?<!\w)'|'(?!\w)/,'')将删除'Twas the night before.中的撇号我回答你最终需要构建一个解析器,如果/(?<!\w)'|'(?!\w)/不够,可以确定撇号和单引号之间的区别。

您还可以使用单词边界:

irb(main):041:0> Hash[p2.downcase.scan(/\b[a-z']+\b/)
                  .group_by{|word| word.downcase}
                  .map{|word, words|[word, words.size]}
                 ]
=> {"joe"=>1, "can't"=>1, "tell"=>1, "between"=>1, "large"=>2, "and"=>1}

但这也无法解决'Tis the night问题。


0
投票

其他方式:

str = "First: don't 'laugh'. Then: 'don't cry'."
reg = /
      [a-z]         #single letter
      [a-z']+       #one or more letters or apostrophe
      [a-z]         #single letter
      '?            #optional single apostrophe

      /ix           #case-insensitive and free-spacing regex

str.scan(reg).group_by(&:itself).transfor‌​m_values(&:count) 
  #=> {"First"=>1, "don't"=>2, "laugh"=>1, "Then"=>1, "cry'"=>1}
© www.soinside.com 2019 - 2024. All rights reserved.