如何解析csv文件，查找触发器并使用PowerShell拆分成新文件

Question

我有一个CSV文件，其结构如下：

"SA1";"21020180123155514000000000000000002"
"SA2";"21020180123155514000000000000000002";"210"
"SA4";"21020180123155514000000000000000002";"210";"200000001"
"SA5";"21020180123155514000000000000000002";"210";"200000001";"140000001";"ZZ"
"SA1";"21020180123155522000000000000000002"
"SA2";"21020180123155522000000000000000002";"210"
"SA4";"21020180123155522000000000000000002";"210";"200000001"
"SA5";"21020180123155522000000000000000002";"210";"200000001";"140000671";"ZZ"
"SA1";"21020180123155567000000000000000002"
"SA2";"21020180123155567000000000000000002";"210"
"SA4";"21020180123155567000000000000000002";"210";"200000001"
"SA5";"21020180123155567000000000000000002";"210";"200000001";"140000001";"ZZ"

因此，第二个字段中的值（分隔符';'）标记属于一起的数据，值140000001或140000671是触发器。所以结果应该是：

第1个圆角：140000001.http

"SA1";"21020180123155514000000000000000002"
"SA2";"21020180123155514000000000000000002";"210"
"SA4";"21020180123155514000000000000000002";"210";"200000001"
"SA5";"21020180123155514000000000000000002";"210";"200000001";"140000001";"ZZ"
"SA1";"21020180123155567000000000000000002"
"SA2";"21020180123155567000000000000000002";"210"
"SA4";"21020180123155567000000000000000002";"210";"200000001"
"SA5";"21020180123155567000000000000000002";"210";"200000001";"140000001";"ZZ"

第二个文件：140000671.txt

"SA1";"21020180123155522000000000000000002"
"SA2";"21020180123155522000000000000000002";"210"
"SA4";"21020180123155522000000000000000002";"210";"200000001"
"SA5";"21020180123155522000000000000000002";"210";"200000001";"140000671";"ZZ"

现在我找到了一个片段，它通过第二个字段拆分大文件：

$src = "C:\temp\ORD001.txt"
$dstDir = "C:\temp\files\"
Remove-Item -Path "$dstDir\\*"

$header = Get-Content -Path $src | select -First 1

Get-Content -Path $src | select -Skip 1 | foreach {
    $file = "$(($_ -split ";")[1]).txt"
    Write-Verbose "Wrting to $file"
    $file = $file.Replace('"',"")
    if (-not (Test-Path -Path $dstDir\$file))
    {
        Out-File -FilePath $dstDir\$file -InputObject $header -Encoding ascii
    }
    $file -replace '"', ""
    Out-File -FilePath $dstDir\$file -InputObject $_ -Encoding ascii -Append
}

其余的我站在黑暗中。请帮忙。

Answer 1

如果您还不知道，Import-CSV cmdlet将在此处运行。我会使用它，因为它将所有行作为数组中的不同对象返回，属性是列值。而且您不必手动删除引号等。假设第二列是日期时间值，并且对于每组连续4行应该是唯一的，那么这将起作用：

$src = "C:\temp\ORD001.txt"
$dstDir = "C:\temp\files\"
Remove-Item -Path "$dstDir\*"
$csv = Import-CSV $src -Delimiter ';'
$DateTimeGroups = $csv | Group-Object -Property 'ColumnTwoHeader'
foreach ($group in $DateTimeGroups) {
    $filename = $group.Group.'ColumnFiveHeader' | select -Unique
    $group.Group | Export-CSV "$dstDir\$filename.txt" -Append -NoTypeInformation
}

但是，如果其中两个“连续4行的组”中的两个具有第二列和第五列的相同值，则会中断。除非您确定每个时间组中始终有4个连续的行，否则无法解决此问题。在这种情况下：

$src = "C:\temp\ORD001.txt"
$dstDir = "C:\temp\files\"
Remove-Item -Path "$dstDir\*"
$csv = Import-CSV $src -Delimiter ';'
if ($csv.count % 4 -ne 0) {
    Write-Error "CSV does not have a proper number of rows. Attempting to continue will be bad :)"
    return
}
for ($i = 0 ; $i -lt $csv.Count ; $i=$i+4) {
    $group = $csv[$i..($i+4)]
    $group | Export-Csv "$dstDir\$($group[3].'ColumnFiveHeader').txt" -Append -NoTypeInformation
}

一定要用适当的值替换Column2Header和Column5Header。

Answer 2

如果性能不是问题，将Import-Csv / Export-Csv与Group-Object结合使用，可以使用PowerShell将CSV转换为对象并返回的能力，最简洁，直接地表达您的意图：

$src =    "C:\temp\ORD001.txt"  # Input CSV file
$dstDir = "C:\temp\files"       # Output directory

# Delete previous output files, if necessary.
Remove-Item -Path "$dstDir\*" -WhatIf

# Import the source CSV into custom objects with properties named for the columns.
# Note: The assumption is that your CSV header line defines columns  "Col1", "Col2", ...
Import-Csv $src -Delimiter ';' | 
  # Group the resulting objects by column 2
  Group-Object -Property Col2 | 
    ForEach-Object {  # Process each resulting group.
      # Determine the output filename via the group's last row's column 5 value.
      $outFile = '{0}\{1}.txt' -f $dstDir, $_.Group[-1].Col5
      # Append the group at hand to the target file.
      $_.Group | Export-Csv -Append -Encoding Ascii $outFile -Delimiter ';' -NoTypeInformation
    }

注意：

假设 - 与您的示例数据一致 - 是它始终是共享相同列2值的一组行中的最后一行，其列5包含输出文件名的根（例如，140000001）

Answer 3

对不起，但我没有标题栏。它是一个以分号分隔的接口文本文件

如何解析csv文件，查找触发器并使用PowerShell拆分成新文件

问题描述投票：3回答：3

3个回答

最新问题

如何解析csv文件，查找触发器并使用PowerShell拆分成新文件

问题描述 投票：3回答：3

3个回答

最新问题

问题描述投票：3回答：3