获取内容并显示控制字符,例如`r - 可视化字符串中的控制字符

问题描述 投票:5回答:3

我们可以通过什么旗帜向Get-Content展示control characters\r\n\n

我想要做的是确定文件的行结尾是Unix还是Dos风格。我试过简单地运行Get-Content,它没有显示任何行结束。我也尝试过使用带有set list的Vim,无论线条的结尾是什么,它都只显示$

我想用PowerShell做这件事,因为那将是非常有用的。

powershell diagnostics control-characters
3个回答
7
投票

一种方法是使用Get-Content的-Encoding参数,例如:

Get-Content foo.txt -Encoding byte | % {"0x{0:X2}" -f $_}

如果你有PowerShell Community Extensions,你可以使用Format-Hex命令:

Format-Hex foo.txt

Address:  0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F ASCII
-------- ----------------------------------------------- ----------------
00000000 61 73 66 09 61 73 64 66 61 73 64 66 09 61 73 64 asf.asdfasdf.asd
00000010 66 61 73 0D 0A 61 73 64 66 0D 0A 61 73 09 61 73 fas..asdf..as.as

如果你真的想在输出中看到“\ r \ n”而不是BaconBits建议但是你必须使用-Raw参数,例如:

(Get-Content foo.txt -Raw) -replace '\r','\r' -replace '\n','\n' -replace '\t','\t'

输出:

asf\tasdfasdf\tasdfas\r\nasdf\r\nas\tasd\r\nasdfasd\tasf\tasdf\t\r\nasdf

5
投票

下面是自定义函数Debug-String,它可视化字符串中的控制字符:

  • 在可用的情况下,使用PowerShell自己的`前缀转义序列表示法(例如,`r表示CR),其中提供了本机PowerShell转义,
  • 回落到caret notation(例如,带代码点0x4的ASCII范围控制字符 - TRANS OF TRANSMISSION - 表示为^D)。 或者,您可以使用-CaretNotation开关来表示插入符号中的所有ASCII范围控制字符,这样您的输出类似于Linux上的cat -A和macOS / BSD上的cat -et
  • 所有其他控制字符,即ASCII范围之外的字符(跨越代码点的ASCII范围点0x0 - 0x7F)以`u{<hex>}的形式表示,其中<hex>是十六进制。代码点的表示,最多6位数;例如,`u{85}是Unicode char。 U+0085,NEXT LINE控制字符。现在,可扩展字符串("...")也支持此表示法,但仅限于PowerShell Core。

应用于您的用例,您将使用(需要PSv3 +,因为使用Get-Content -Raw来确保整个文件的读取;没有它,有关行结尾的信息将丢失):

Get-Content -Raw $file | Debug-String

两个简单的例子:


使用PowerShell的转义序列表示法。请注意,这看起来只是一个无操作:“...”字符串中的`-prefixed序列创建实际的控制字符。

PS> "a`ab`t c`0d`r`n" | Debug-String
a`ab`t c`0d`r`n

使用-CaretNotation,输出类似于Linux上的cat -A

PS> "a`ab`t c`0d`r`n" | Debug-String -CaretNotation
a^Gb^I c^@d^M$

Debug-String source code:

Function Debug-String {
  param(
    [Parameter(ValueFromPipeline, Mandatory)]
    [string] $String
    ,
    [switch] $CaretNotation
  )

  begin {
    # \p{C} matches any Unicode control character, both inside and outside
    # the ASCII range; note that tabs (`t) are control character too, but not spaces.
    $re = [regex] '\p{C}'
  }

  process {

    $re.Replace($String, {
      param($match)
      $handled = $False
      if (-not $CaretNotation) {
        # Translate control chars. that have native PS escape sequences into them.
        $handled = $True
        switch ([Int16] [char] $match.Value) {
          0  { '`0'; break }
          7  { '`a'; break }
          8  { '`b'; break }
          12 { '`f'; break }
          10 { '`n'; break }
          13 { '`r'; break }
          9  { '`t'; break }
          11 { '`v'; break }
          default { $handled = $false }
        } # switch
      }
      if (-not $handled) {
          switch ([Int16] [char] $match.Value) {
            10 { '$'; break } # cat -A / cat -e visualizes LFs as '$'
            # If it's a control character in the ASCII range, 
            # use caret notation too (C0 range).
            # See https://en.wikipedia.org/wiki/Caret_notation
            { $_ -ge 0 -and $_ -le 31 -or $_ -eq 127 } {
              # Caret notation is based on the letter obtained by adding the
              # control-character code point to the code point of '@' (64).
              '^' + [char] (64 + $_)
              break
            }
            # NON-ASCII control characters; use the - PS Core-only - Unicode
            # escape-sequence notation:
            default { '`u{{{0}}}' -f ([int16] [char] $_).ToString('x') }
          }
      } # if (-not $handled)
    })  # .Replace
  } # process

}

为简洁起见,我没有在上面提供基于评论的帮助;这里是:

<#
.SYNOPSIS
Outputs a string in diagnostic form.

.DESCRIPTION
Prints a string with normally hidden control characters visualized.

Common control characters are visualized using PowerShell's own escaping 
notation by default, such as
"`t" for a tab, "`n" for a LF, and "`r" for a CR.

Any other control characters in the ASCII range (C0 control characters)
are represented in caret notation (see https://en.wikipedia.org/wiki/Caret_notation).

If you want all ASCII range control characters visualized using caret notation,
except LF visualized as "$", similiar to `cat -A` on Linux, for instance, 
use -CaretNotation.

Non-ASCII control characters are visualized by their Unicode code point
in the form `u{<hex>}, where <hex> is the hex. representation of the
code point with up to 6 digits; e.g., `u{85} is U+0085, the NEXT LINE
control char.

.PARAMETER CaretNotation
Causes LF to be visualized as "$" and all other ASCII-range control characters
in caret notation, similar to `cat -A` on Linux.

.EXAMPLE
PS> "a`ab`t c`0d`r`n" | Debug-String
a`ab`t c`0d`r`n

.EXAMPLE
PS> "a`ab`t c`0d`r`n" | Debug-String -CaretNotation
a^Gb^I c^@d^M$
#>

2
投票

这是使用正则表达式替换的一种方法:

function Printable([string] $s) {
    $Matcher = 
    {  
      param($m) 

      $x = $m.Groups[0].Value
      $c = [int]($x.ToCharArray())[0]
      switch ($c)
      {
          9 { '\t' }
          13 { '\r' }
          10 { '\n' }
          92 { '\\' }
          Default { "\$c" }
      }
    }
    return ([regex]'[^ -~\\]').Replace($s, $Matcher)
}

PS C:\> $a = [char[]](65,66,67, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)

PS C:\> $b = $a -join ""

PS C:\> Printable $b
ABC\1\2\3\4\5\6\7\8\t\n\11\12\r
© www.soinside.com 2019 - 2024. All rights reserved.