PowerShell 删除脚本中的所有注释

Question

我正在寻找一种从文件中删除所有注释的方法。发表评论的方式有很多种，但我只对简单的

形式的评论感兴趣。原因是我只将

<# #>

用于函数内

.SYNOPSIS

，这是功能代码，而不只是注释，所以我想保留它们）。

编辑：我已经使用下面的有用答案更新了这个问题。

所以我只需要几个场景：

a）在行首用

进行整行注释（或者之前可能有空格。即

^\s*#

的正则表达式似乎有效。

b）在行首添加一些代码，然后在行尾添加命令。我想避免剥离具有例如

Write-Host "#####"

但我认为这已包含在我拥有的代码中。

我能够通过拆分删除行尾注释，因为我无法弄清楚如何使用正则表达式来做到这一点，有谁知道如何使用正则表达式来实现这一点？

分割并不理想，因为线上的

<#

会被

-split

删除，但我已经通过分割

" #"

修复了这个问题。这并不完美，但可能足够好 - 也许可能存在更可靠的正则表达式方法？

当我对 7,000 行长的脚本执行以下操作时，它可以工作（！）并删除大量注释，但是，输出文件的大小几乎翻倍（！？）从 400kb 到大约 700kb。有谁明白为什么会发生这种情况以及如何防止这种情况（这与 BOM 或 Unicode 或类似的东西有关吗？Out-File 似乎确实使文件大小膨胀了！）

$x = Get-Content ".\myscript.ps1"   # $x is an array, not a string
$out = ".\myscript.ps1"
$x = $x -split "[\r\n]+"               # Remove all consecutive line-breaks, in any format '-split "\r?\n|\r"' would just do line by line
$x = $x | ? { $_ -notmatch "^\s*$" }   # Remove empty lines
$x = $x | ? { $_ -notmatch "^\s*#" }   # Remove all lines starting with ; including with whitespace before
$x = $x | % { ($_ -split " #")[0] }    # Remove end of line comments
$x = ($x -replace $regex).Trim()       # Remove whitespace only at start and end of line
$x | Out-File $out
# $x | more

Answer 1

老实说，识别和处理所有评论的最佳方法是使用 PowerShell 的语言解析器或 Ast 类之一。很抱歉我不知道哪个 Ast 包含评论；所以这是一种更丑陋的方式，会过滤掉块和行注释。

$code = Get-Content file.txt -Raw
$comments = [System.Management.Automation.PSParser]::Tokenize($code,[ref]$null) |
    Where Type -eq 'Comment' | Select -Expand Content
$regex = ( $comments |% { [regex]::Escape($_) } ) -join '|'

# Output to remove all empty lines
$code -replace $regex -split '\r?\n' -notmatch '^\s*$'

# Output that Removes only Beginning and Ending Blank Lines
($code -replace $regex).Trim()

Answer 2

执行与示例相反的操作：仅发出不匹配的行：

## Output to console
Get-Content .\file.ps1 | Where-Object { $_ -notmatch '#' }

## Output to file
Get-Content .\file.ps1 | Where-Object { $_ -notmatch '#' } | Out-file .\newfile.ps1 -Append

Answer 3

基于@AdminOfThings使用抽象语法树（AST）类解析器方法的有用答案，但避免任何正则表达式：

$Code = $Code.ToString() # Prepare any ScriptBlock for the substring method
$Tokens = [System.Management.Automation.PSParser]::Tokenize($Code, [ref]$null)
-Join $Tokens.Where{ $_.Type -ne 'Comment' }.ForEach{ $Code.Substring($_.Start, $_.Length) }

Answer 4

至于附带的问题，输出文件的大小大约是输入文件的两倍：

正如 AdminOfThings 指出的那样，
Windows PowerShell
```
 中的 
```
Out-File 默认为 UTF-16LE（“Unicode”）编码，其中字符由（至少）两个字节表示，而 ANSI 编码，如所用默认情况下，Windows PowerShell 中的
```
Set-Content
```
将所有（支持的）字符编码为 single 字节。同样，UTF-8 编码文件仅使用 one 字节表示 ASCII 范围内的字符（请注意，PowerShell (Core) 7+ 现在始终默认为（无 BOM）UTF-8）。根据需要使用
```
-Encoding
```
参数。

基于 regex 的问题解决方案永远不会完全稳健，即使您尝试将注释删除限制为单行注释。

为了获得完全的鲁棒性，您确实必须使用 PowerShell 的语言解析器，如其他答案中所述。

但是，在删除注释后重建原始源代码时必须小心：

AdminOfThings 的答案有删除太多的风险，考虑到随后使用-replace进行的基于全局
```
正则表达式
```
的处理：虽然这种情况不太可能发生，但如果注释在字符串中重复，则会被错误地也从那里删除了。
iRon 的答案通过加入不带空格的标记来冒语法错误的风险，例如，. .\foo.ps1 会变成
```
..\foo.ps1
```
。盲目地在标记之间放置空格是
```
不是
```
一个选项，因为属性访问语法会被破坏（例如，$host.Name会变成
```
$host . Name
```
，但值和
```
.
```
运算符之间不允许有空格）

# Tokenize the file content.
# Note that tabs, if any, are replaced by 2 spaces first; adjust as needed.
$tokens = $null
$null = [System.Management.Automation.Language.Parser]::ParseInput(    
  ((Get-Content -Raw .\myscript.ps1) -replace '\t', '  '), 
  [ref] $tokens,
  [ref] $null
)  

# Loop over all tokens while omitting comments, and rebuild the source code 
# without them, trying to preserve the original formatting as much as possible.
$sb = [System.Text.StringBuilder]::new() 
$prevExtent = $null; $numConsecNewlines = 0
$tokens.
  Where({ $_.Kind -ne 'Comment' }).
  ForEach({ 
    $startColumn = if ($_.Extent.StartLineNumber -eq $prevExtent.StartLineNumber) { $prevExtent.EndColumnNumber }
                   else { 1 }
    if ($_.Kind -eq 'NewLine') {
      # Fold multiple blank or empty lines into a single empty one.
      if (++$numConsecNewlines -ge 3) { return }
    } else {
      $numConsecNewlines = 0
      $null = $sb.Append(' ' * ($_.Extent.StartColumnNumber - $startColumn))
    }
    $null = $sb.Append($_.Text)
    $prevExtent = $_.Extent
  })

# Output the result.
# Pipe to Set-Content as needed.
$sb.ToString()

Answer 5

# $code = ...get string contents of file or script block... $commentTokens = [System.Management.Automation.PSParser]::Tokenize($code, [ref]$null) | Where Type -eq 'Comment' $newCode = $code # any unique token that we will replace at the end (could generate guid if we want to) $newComment = "<#~xRMx~#>" # $newComment = "<#{0}#>" -f [Guid]::NewGuid() $overlapSize = 0 # Normalize all comments to a known comment value `$newComment` $commentTokens | foreach { # adjust starting position based on previous replacement overlap sizes $start = $_.Start + $overlapSize $newCode = $newCode.Remove($start, $_.Length).Insert($start, $newComment) $overlapSize += ($newComment.Length - $_.Length) # calculate overlap sizes } $newCode = $newCode -replace $newComment, "" ` # -split '\r?\n' -notmatch '^\s*$' ` # uncomment to remove blank lines -join "`n" $newCode

根据我的测试，可以处理各种注释配置（单行、多行、部分行）。查看

这个复制项目

。你可以分叉它并玩弄它。

PowerShell 删除脚本中的所有注释

问题描述投票：0回答：5

5个回答

最新问题

PowerShell 删除脚本中的所有注释

问题描述 投票：0回答：5

5个回答

最新问题

问题描述投票：0回答：5