我有一个大文件,包括每个项目的“之前”和“之后”案例,如下所示:
case1 (BEF) ACT
(AFT) BLK
case2 (BEF) ACT
(AFT) ACT
case3 (BEF) ACT
(AFT) CLC
...
我需要选择所有在“first”字符串上有(BEF) ACT
的字符串和在“second”字符串上的(AFT) BLK
,并将结果放在一个文件中。
想法是创建一个像这样的子句
IF (stringX.LineNumber consists of "(BEF) ACT" AND stringX+1.LineNumber consists of (AFT) BLK)
{OutFile $stringX+$stringX+1}
对不起语法,我刚刚开始使用PS :)
$logfile = 'c:\temp\file.txt'
$matchphrase = '\(BEF\) ACT'
$linenum=Get-Content $logfile | Select-String $matchphrase | ForEach-Object {$_.LineNumber+1}
$linenum
#I've worked out how to get a line number after the line with first required phrase
使用如下结果创建一个新文件:带有“(BEF)ACT”的字符串,后跟带有“(AFT)BLK”的字符串
Select-String -SimpleMatch -CaseSensitive '(BEF) ACT' c:\temp\file.txt -Context 0,1 |
ForEach-Object {
$lineAfter = $_.Context.PostContext[0]
if ($lineAfter.Contains('(AFT) BLK')) {
$_.Line, $lineAfter # output
}
} # | Set-Content ...
-SimpleMatch
执行字符串文字子字符串匹配,这意味着您可以按原样传递搜索字符串,而无需转义它。
但是,如果你需要进一步约束搜索,例如确保它只发生在一行($
)的末尾,你确实需要一个带有(隐含的)regular expression参数的-Pattern
:'\(BEF\) ACT$'
另请注意,PowerShell默认情况下通常不区分大小写,这就是使用switch -CaseSensitive
的原因。Select-String
如何直接接受文件路径 - 不需要前面的Get-Content
调用。-Context 0,1
在每场比赛之后捕获0
线和1
线,并将它们包含在[Microsoft.PowerShell.Commands.MatchInfo]
输出的Select-String
实例中。ForEach-Object
脚本块中,$_.Context.PostContext[0]
在匹配后检索该行,.Contains()
在其中执行文字子串搜索。
请注意,.Contains()
是.NET System.String
类型的一种方法,这种方法 - 与PowerShell不同 - 默认情况下区分大小写,但您可以使用可选参数来更改它。| Select-Object -First 2
附加到Select-String
电话。另一种方法是将$ logFile作为单个字符串读取,并使用RegEx匹配来获取所需的部分:
$logFile = 'c:\temp\file.txt'
$outFile = 'c:\temp\file2.txt'
# read the content of the logfile as a single string
$content = Get-Content -Path $logFile -Raw
$regex = [regex] '(case\d+\s+\(BEF\)\s+ACT\s+\(AFT\)\s+BLK)'
$match = $regex.Match($content)
($output = while ($match.Success) {
$match.Value
$match = $match.NextMatch()
}) | Set-Content -Path $outFile -Force
使用时结果如下:
case1 (BEF) ACT
(AFT) BLK
case7 (BEF) ACT
(AFT) BLK
正则表达式详细信息:
( Match the regular expression below and capture its match into backreference number 1 case Match the characters “case” literally \d Match a single digit 0..9 + Between one and unlimited times, as many times as possible, giving back as needed (greedy) \s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.) + Between one and unlimited times, as many times as possible, giving back as needed (greedy) \( Match the character “(” literally BEF Match the characters “BEF” literally \) Match the character “)” literally \s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.) + Between one and unlimited times, as many times as possible, giving back as needed (greedy) ACT Match the characters “ACT” literally \s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.) + Between one and unlimited times, as many times as possible, giving back as needed (greedy) \( Match the character “(” literally AFT Match the characters “AFT” literally \) Match the character “)” literally \s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.) + Between one and unlimited times, as many times as possible, giving back as needed (greedy) BLK Match the characters “BLK” literally )
Select-String
的解决方案尝试。 Select-String
是多功能的,但速度很慢,虽然它适合处理文件太大而无法整合到内存中,因为它会逐行处理文件。
但是,PowerShell提供了更快的逐行处理替代方案:
switch -File
- 请参阅下面的解决方案。$(
$firstLine = ''
switch -CaseSensitive -Regex -File t.txt {
'\(BEF\) ACT' { $firstLine = $_; continue }
'\(AFT\) BLK' {
# Pair found, output it.
# If you don't want to look for further pairs,
# append `; break` inside the block.
if ($firstLine) { $firstLine, $_ }
# Look for further pairs.
$firstLine = ''; continue
}
default { $firstLine = '' }
}
) # | Set-Content ...
注意:仅当您想要将输出直接发送到管道到cmdlet(例如$(...)
)时,才需要包含Set-Content
;捕获变量中的输出不需要它:$pair = switch ...
-Regex
将分支条件解释为regular expressions。$_
在分支的动作脚本块内({ ... }
指的是手头的线。$firstLine
存储了第一条感兴趣的线路,当找到第二条线的模式并且设置了$firstLine
(非空)时,输出该线对。
default
处理程序重置$firstLine
,以确保只考虑包含感兴趣字符串的两个连续行。