我的目标是递归搜索包含正则表达式的所有文件的目录,并考虑到速度。然后输出到CSV,其中包含完全匹配的列,另一列显示找到它们的文件。感谢用户woxxom,我开始玩IO.File
,因为它显然比使用Select-String
快得多。
这是一个我已经工作了很长时间并且能够通过Select-String
和使用Export-Csv
完成的项目,但这是一个相当缓慢的过程。
对我的新尝试遗漏了什么?
$ResultsCSV = "C:\TEMP\Results.csv"
$Directory = "C:\TEMP\examples"
$RX = "(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?:\.|dot|\[dot\]|\[\.\])){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)"
$TextFiles = Get-ChildItem $Directory -Include *.txt*,*.csv*,*.rtf*,*.eml*,*.msg*,*.dat*,*.ini*,*.mht* -Recurse
$out = [Text.StringBuilder]
foreach ($FileSearched in $TextFiles) {
$text = [IO.File]::ReadAllText($FileSearched)
foreach ($match in ([regex]$RX).Matches($text)) {
if (!(Test-Path $ResultsCSV)) {
'Matches,File Path' | Out-File $ResultsCSV -Encoding ASCII
$out.AppendLine('' + $match.value + ',' + $FileSearched.fullname)
$match.value | Out-File $ResultsCSV -Encoding ascii -Append
$FileSearched.Fullname | Out-File $ResultsCSV -Encoding ascii -Append
$out.ToString() | Out-File $ResultsCSV -Encoding ascii -Append -NoNewline
}
}
}
您可以使用Stream进行读写来提高性能
$ResultsCSV = "C:\TEMP\Results.csv"
$Directory = "C:\TEMP\examples"
$RX = "(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?:\.|dot|\[dot\]|\[\.\])){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)"
$TextFiles = Get-ChildItem $Directory -Include *.txt*,*.csv*,*.rtf*,*.eml*,*.msg*,*.dat*,*.ini*,*.mht* -Recurse
$file2 = new-object System.IO.StreamWriter($ResultsCSV) #output Stream
$file2.WriteLine('Matches,File Path') # write header
foreach ($FileSearched in $TextFiles) { #loop over files in folder
# $text = [IO.File]::ReadAllText($FileSearched)
$file = New-Object System.IO.StreamReader ($FileSearched) # Input Stream
while ($text = $file.ReadLine()) { # read line by line
foreach ($match in ([regex]$RX).Matches($text)) {
# write line to output stream
$file2.WriteLine("{0},{1}",$match.Value, $FileSearched.fullname )
} #foreach $match
}#while $file
$file.close();
} #foreach
$file2.close()