c#删除多余空格的最快方法

问题描述 投票:39回答:25

将额外的空白区域替换为一个空白区域的最快方法是什么? 例如

foo      bar 

foo bar
c# string replace whitespace removing-whitespace
25个回答
48
投票

最快的方法?迭代字符串并按字符在StringBuilder中构建第二个副本,仅为每组空格复制一个空格。

更容易键入Replace变体将创建额外字符串的桶负载(或浪费时间构建正则表达式DFA)。

使用比较结果进行编辑

使用http://ideone.com/NV6EzU,n = 50(因为花了这么长时间,他们不得不杀死我的进程,因此必须减少它,我得到:

正则表达式:7771ms。

Stringbuilder:894ms。

这确实如预期的那样,Regex对于这么简单的事情是非常低效的。


3
投票

它并不快,但如果简单有帮助,这可行:

while (text.Contains("  ")) text=text.Replace("  ", " ");

2
投票

试试这个:

System.Text.RegularExpressions.Regex.Replace(input, @"\s+", " ");

2
投票

在这个值得一些思考的问题中,有一些要求尚不清楚。

  1. 你想要一个前导或尾随空格字符吗?
  2. 使用单个字符替换所有空格时,是否希望该字符保持一致? (即许多这些解决方案将用\ t和''''代替\ t \ t。

这是一个非常有效的版本,它用一个空格替换所有空格,并在for循环之前删除任何前导和尾随空格。

  public static string WhiteSpaceToSingleSpaces(string input)
  {
    if (input.Length < 2) 
        return input;

    StringBuilder sb = new StringBuilder();

    input = input.Trim();
    char lastChar = input[0];
    bool lastCharWhiteSpace = false;

    for (int i = 1; i < input.Length; i++)
    {
        bool whiteSpace = char.IsWhiteSpace(input[i]);

        //Skip duplicate whitespace characters
        if (whiteSpace && lastCharWhiteSpace)
            continue;

        //Replace all whitespace with a single space.
        if (whiteSpace)
            sb.Append(' ');
        else
            sb.Append(input[i]);

        //Keep track of the last character's whitespace status
        lastCharWhiteSpace = whiteSpace;
    }

    return sb.ToString();
  }

2
投票

这段代码很好用。我没有衡量表现。

string text = "   hello    -  world,  here   we go  !!!    a  bc    ";
string.Join(" ", text.Split().Where(x => x != ""));
// Output
// "hello - world, here we go !!! a bc"

1
投票

您可以使用indexOf首先抓取空格序列开始的位置,然后使用replace方法将空格更改为“”。从那里,您可以使用您抓取的索引并在该位置放置一个空格字符。


1
投票

这很有趣,但在我的电脑上,下面的方法和Sergey Povalyaev的StringBulder方法一样快 - (对于1000次重复,大约282ms,10k src字符串)。虽然不确定内存使用情况。

string RemoveExtraWhiteSpace(string src, char[] wsChars){
   return string.Join(" ",src.Split(wsChars, StringSplitOptions.RemoveEmptyEntries));
}

显然它适用于任何字符 - 而不仅仅是空格。

虽然这不是OP所要求的 - 但如果您真正需要的是用一个实例替换字符串中的特定连续字符,您可以使用这种相对有效的方法:

    string RemoveDuplicateChars(string src, char[] dupes){  
        var sd = (char[])dupes.Clone();  
        Array.Sort(sd);

        var res = new StringBuilder(src.Length);

        for(int i = 0; i<src.Length; i++){
            if( i==0 || src[i]!=src[i-1] || Array.BinarySearch(sd,src[i])<0){
                res.Append(src[i]); 
            }
        }
        return res.ToString();
    }

1
投票
public string GetCorrectString(string IncorrectString)
    {
        string[] strarray = IncorrectString.Split(' ');
        var sb = new StringBuilder();
        foreach (var str in strarray)
        {
            if (str != string.Empty)
            {
                sb.Append(str).Append(' ');
            }
        }
        return sb.ToString().Trim();
    }

1
投票

我只是掀起了这个,但还没有测试过。但我觉得这很优雅,避免了正则表达式:

    /// <summary>
    /// Removes extra white space.
    /// </summary>
    /// <param name="s">
    /// The string
    /// </param>
    /// <returns>
    /// The string, with only single white-space groupings. 
    /// </returns>
    public static string RemoveExtraWhiteSpace(this string s)
    {
        if (s.Length == 0)
        {
            return string.Empty;
        }

        var stringBuilder = new StringBuilder();
        var whiteSpaceCount = 0;
        foreach (var character in s)
        {
            if (char.IsWhiteSpace(character))
            {
                whiteSpaceCount++;
            }
            else
            {
                whiteSpaceCount = 0;
            }

            if (whiteSpaceCount > 1)
            {
                continue;
            }

            stringBuilder.Append(character);
        }

        return stringBuilder.ToString();
    }

1
投票

我在这里错过了什么吗?我想出了这个:

// Input: "HELLO     BEAUTIFUL       WORLD!"
private string NormalizeWhitespace(string inputStr)
{
    // First split the string on the spaces but exclude the spaces themselves
    // Using the input string the length of the array will be 3. If the spaces
    // were not filtered out they would be included in the array
    var splitParts = inputStr.Split(' ').Where(x => x != "").ToArray();

   // Now iterate over the parts in the array and add them to the return
   // string. If the current part is not the last part, add a space after.
   for (int i = 0; i < splitParts.Count(); i++)
   {
        retVal += splitParts[i];
        if (i != splitParts.Count() - 1)
        {
            retVal += " ";
        }
   }
    return retVal;
}
// Would return "HELLO BEAUTIFUL WORLD!"

我知道我在这里创建第二个字符串以返回它以及创建splitParts数组。刚认为这很简单。也许我没有考虑到一些潜在的情况。


1
投票

我知道这很老,但压缩空格的最简单方法(用一个“空格”字符替换任何重复的空白字符)如下:

    public static string CompactWhitespace(string astring)
    {
        if (!string.IsNullOrEmpty(astring))
        {
            bool found = false;
            StringBuilder buff = new StringBuilder();

            foreach (char chr in astring.Trim())
            {
                if (char.IsWhiteSpace(chr))
                {
                    if (found)
                    {
                        continue;
                    }

                    found = true;
                    buff.Append(' ');
                }
                else
                {
                    if (found)
                    {
                        found = false;
                    }

                    buff.Append(chr);
                }
            }

            return buff.ToString();
        }

        return string.Empty;
    }

39
投票

你可以使用正则表达式:

static readonly Regex trimmer = new Regex(@"\s\s+");

s = trimmer.Replace(s, " ");

为了增加性能,请通过RegexOptions.Compiled


1
投票
public static string RemoveExtraSpaces(string input)
{
    input = input.Trim();
    string output = "";
    bool WasLastCharSpace = false;
    for (int i = 0; i < input.Length; i++)
    {
        if (input[i] == ' ' && WasLastCharSpace)
            continue;
        WasLastCharSpace = input[i] == ' ';
        output += input[i];
    }
    return output;
}

1
投票

对于那些只想复制粘贴并继续下去的人:

    private string RemoveExcessiveWhitespace(string value)
    {
        if (value == null) { return null; }

        var builder = new StringBuilder();
        var ignoreWhitespace = false;
        foreach (var c in value)
        {
            if (!ignoreWhitespace || c != ' ')
            {
                builder.Append(c);
            }
            ignoreWhitespace = c == ' ';
        }
        return builder.ToString();
    }

1
投票

我对C#不太熟悉,因此我的代码不是优雅/最有效的代码。我来到这里找到一个适合我的用例的答案,但我找不到一个(或者我找不到一个)。

对于我的用例,我需要使用以下条件对所有White Spaces(WS:{spacetabcr lf})进行标准化:

  • WS可以任意组合
  • 用最重要的WS替换WS序列
  • 在某些情况下需要保留tab(例如,标签分隔文件。在这种情况下,还需要保留重复的标签)。但在大多数情况下,他们必须转换成空格。

所以这是一个示例输入和预期输出(免责声明:我的代码仅针对此示例进行测试)



        Every night    in my            dreams  I see you, I feel you
    That's how    I know you go on

Far across the  distance and            places between us   



You            have                 come                    to show you go on


被转换成

Every night in my dreams I see you, I feel you
That's how I know you go on
Far across the distance and places between us
You have come to show you go on

这是我的代码

using System;
using System.Text.RegularExpressions;

public class Program
{
    public static void Main(string text)
    {
        bool preserveTabs = false;

        //[Step 1]: Clean up white spaces around the text
        text = text.Trim();
        //Console.Write("\nTrim\n======\n" + text);

        //[Step 2]: Reduce repeated spaces to single space. 
        text = Regex.Replace(text, @" +", " ");
        // Console.Write("\nNo repeated spaces\n======\n" + text);

        //[Step 3]: Hande Tab spaces. Tabs needs to treated with care because 
        //in some files tabs have special meaning (for eg Tab seperated files)
        if(preserveTabs)
        {
            text = Regex.Replace(text, @" *\t *", "\t");
        }
        else
        {
            text = Regex.Replace(text, @"[ \t]+", " ");
        }
        //Console.Write("\nTabs preserved\n======\n" + text);

        //[Step 4]: Reduce repeated new lines (and other white spaces around them)
                  //into a single new line.
        text = Regex.Replace(text, @"([\t ]*(\n)+[\t ]*)+", "\n");
        Console.Write("\nClean New Lines\n======\n" + text);    
    }
}

请在此处查看此代码:https://dotnetfiddle.net/eupjIU


1
投票

我不知道这是否是最快的方式,但我使用它,这对我有用:

    /// <summary>
    /// Remove all extra spaces and tabs between words in the specified string!
    /// </summary>
    /// <param name="str">The specified string.</param>
    public static string RemoveExtraSpaces(string str)
    {
        str = str.Trim();
        StringBuilder sb = new StringBuilder();
        bool space = false;
        foreach (char c in str)
        {
            if (char.IsWhiteSpace(c) || c == (char)9) { space = true; }
            else { if (space) { sb.Append(' '); }; sb.Append(c); space = false; };
        }
        return sb.ToString();
    }

-1
投票

不需要复杂的代码!这是一个简单的代码,将删除任何重复:

public static String RemoveCharOccurence(String s, char[] remove)
{
    String s1 = s;
    foreach(char c in remove)
    {
        s1 = RemoveCharOccurence(s1, c);
    }

    return s1;
}

public static String RemoveCharOccurence(String s, char remove)
{
    StringBuilder sb = new StringBuilder(s.Length);

    Boolean removeNextIfMatch = false;
    foreach(char c in s)
    {
        if(c == remove)
        {
            if(removeNextIfMatch)
                continue;
            else
                removeNextIfMatch = true;
        }
        else
            removeNextIfMatch = false;

        sb.Append(c);
    }

    return sb.ToString();
}

-1
投票

这很简单,只需使用.Replace()方法:

string words = "Hello     world!";
words = words.Replace("\\s+", " ");

输出>>>“Hello world!”


23
投票

有点晚了,但我做了一些基准测试,以获得删除额外空格的最快方法。如果有更快的答案,我很乐意添加它们。

结果:

  1. NormalizeWhiteSpaceForLoop:156毫秒(by Me - From my answer on removing all whitespace
  2. NormalizeWhiteSpace:267毫秒(by Alex K.
  3. Regex编译:1950 ms(by SLaks
  4. 正则表达式:2261毫秒(by SLaks

码:

public class RemoveExtraWhitespaces
{
    public static string WithRegex(string text)
    {
        return Regex.Replace(text, @"\s+", " ");
    }

    public static string WithRegexCompiled(Regex compiledRegex, string text)
    {
        return compiledRegex.Replace(text, " ");
    }

    public static string NormalizeWhiteSpace(string input)
    {
        if (string.IsNullOrEmpty(input))
            return string.Empty;

        int current = 0;
        char[] output = new char[input.Length];
        bool skipped = false;

        foreach (char c in input.ToCharArray())
        {
            if (char.IsWhiteSpace(c))
            {
                if (!skipped)
                {
                    if (current > 0)
                        output[current++] = ' ';

                    skipped = true;
                }
            }
            else
            {
                skipped = false;
                output[current++] = c;
            }
        }

        return new string(output, 0, current);
    }

    public static string NormalizeWhiteSpaceForLoop(string input)
    {
        int len = input.Length,
            index = 0,
            i = 0;
        var src = input.ToCharArray();
        bool skip = false;
        char ch;
        for (; i < len; i++)
        {
            ch = src[i];
            switch (ch)
            {
                case '\u0020':
                case '\u00A0':
                case '\u1680':
                case '\u2000':
                case '\u2001':
                case '\u2002':
                case '\u2003':
                case '\u2004':
                case '\u2005':
                case '\u2006':
                case '\u2007':
                case '\u2008':
                case '\u2009':
                case '\u200A':
                case '\u202F':
                case '\u205F':
                case '\u3000':
                case '\u2028':
                case '\u2029':
                case '\u0009':
                case '\u000A':
                case '\u000B':
                case '\u000C':
                case '\u000D':
                case '\u0085':
                    if (skip) continue;
                    src[index++] = ch;
                    skip = true;
                    continue;
                default:
                    skip = false;
                    src[index++] = ch;
                continue;
            }
        }

        return new string(src, 0, index);
    }
}

测试:

[TestFixture]
public class RemoveExtraWhitespacesTest
{
    private const string _text = "foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo foo                  bar                  foobar                     moo ";
    private const string _expected = "foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo ";

    private const int _iterations = 10000;

    [Test]
    public void Regex()
    {
        var result = TimeAction("Regex", () => RemoveExtraWhitespaces.WithRegex(_text));
        Assert.AreEqual(_expected, result);
    }

    [Test]
    public void RegexCompiled()
    {
        var compiledRegex = new Regex(@"\s+", RegexOptions.Compiled);
        var result = TimeAction("RegexCompiled", () => RemoveExtraWhitespaces.WithRegexCompiled(compiledRegex, _text));
        Assert.AreEqual(_expected, result);
    }

    [Test]
    public void NormalizeWhiteSpace()
    {
        var result = TimeAction("NormalizeWhiteSpace", () => RemoveExtraWhitespaces.NormalizeWhiteSpace(_text));
        Assert.AreEqual(_expected, result);
    }

    [Test]
    public void NormalizeWhiteSpaceForLoop()
    {
        var result = TimeAction("NormalizeWhiteSpaceForLoop", () => RemoveExtraWhitespaces.NormalizeWhiteSpaceForLoop(_text));
        Assert.AreEqual(_expected, result);
    }

    public string TimeAction(string name, Func<string> func)
    {
        var timer = Stopwatch.StartNew();
        string result = string.Empty; ;
        for (int i = 0; i < _iterations; i++)
        {
            result = func();
        }

        timer.Stop();
        Console.WriteLine(string.Format("{0}: {1} ms", name, timer.ElapsedMilliseconds));
        return result;
    }
}

12
投票

我使用下面的方法 - 它们处理所有空格字符不仅是空格,修剪前导和尾随空格,删除额外的空格,所有空格都替换为空格char(所以我们有统一的空格分隔符)。这些方法很快。

public static String CompactWhitespaces( String s )
{
    StringBuilder sb = new StringBuilder( s );

    CompactWhitespaces( sb );

    return sb.ToString();
}

public static void CompactWhitespaces( StringBuilder sb )
{
    if( sb.Length == 0 )
        return;

    // set [start] to first not-whitespace char or to sb.Length

    int start = 0;

    while( start < sb.Length )
    {
        if( Char.IsWhiteSpace( sb[ start ] ) )
            start++;
        else 
            break;
    }

    // if [sb] has only whitespaces, then return empty string

    if( start == sb.Length )
    {
        sb.Length = 0;
        return;
    }

    // set [end] to last not-whitespace char

    int end = sb.Length - 1;

    while( end >= 0 )
    {
        if( Char.IsWhiteSpace( sb[ end ] ) )
            end--;
        else 
            break;
    }

    // compact string

    int dest = 0;
    bool previousIsWhitespace = false;

    for( int i = start; i <= end; i++ )
    {
        if( Char.IsWhiteSpace( sb[ i ] ) )
        {
            if( !previousIsWhitespace )
            {
                previousIsWhitespace = true;
                sb[ dest ] = ' ';
                dest++;
            }
        }
        else
        {
            previousIsWhitespace = false;
            sb[ dest ] = sb[ i ];
            dest++;
        }
    }

    sb.Length = dest;
}

8
投票
string q = " Hello     how are   you           doing?";
string a = String.Join(" ", q.Split(new string[] { " " }, StringSplitOptions.RemoveEmptyEntries));

7
投票
string text = "foo       bar";
text = Regex.Replace(text, @"\s+", " ");
// text = "foo bar"

此解决方案适用于空格,制表符和换行符。如果您只想要空格,请将'\ s'替换为''。


7
投票

我需要其中一个用于更大的字符串,并提出了下面的例程。

任何连续的空格(包括制表符,换行符)都会被normalizeTo中的任何内容替换。前导/尾随空格被删除。

它比我的5k-> 5mil char字符串的RegEx快8倍。

internal static string NormalizeWhiteSpace(string input, char normalizeTo = ' ')
{
    if (string.IsNullOrEmpty(input))
        return string.Empty;

    int current = 0;
    char[] output = new char[input.Length];
    bool skipped = false;

    foreach (char c in input.ToCharArray())
    {
        if (char.IsWhiteSpace(c))
        {
            if (!skipped)
            {
                if (current > 0)
                    output[current++] = normalizeTo;

                skipped = true;
            }
        }
        else
        {
            skipped = false;
            output[current++] = c;
        }
    }

    return new string(output, 0, skipped ? current - 1 : current);
}

6
投票
string yourWord = "beep boop    baap beep   boop    baap             beep";

yourWord = yourWord .Replace("  ", " |").Replace("| ", "").Replace("|", "");

5
投票

我尝试过使用StringBuilder来:

  1. 删除额外的空格子串
  2. Blindy建议,接受循环遍历原始字符串的字符

这是我发现的性能和可读性的最佳平衡(使用100,000次迭代计时运行)。有时这种测试比不易读的版本更快,最多慢5%。在我的小测试字符串上,正则表达式需要4.24倍的时间。

public static string RemoveExtraWhitespace(string str)
    {
        var sb = new StringBuilder();
        var prevIsWhitespace = false;
        foreach (var ch in str)
        {
            var isWhitespace = char.IsWhiteSpace(ch);
            if (prevIsWhitespace && isWhitespace)
            {
                continue;
            }
            sb.Append(ch);
            prevIsWhitespace = isWhitespace;
        }
        return sb.ToString();
    }
最新问题
© www.soinside.com 2019 - 2025. All rights reserved.