有效处理Powershell中的多个CSV

问题描述 投票:-1回答:1

我正在从API中检索两个CSV,一个名为students.csv,类似于:

StudentNo,PreferredFirstnames,PreferredSurname,UPN
111, john, smith, [email protected]
222, jane, doe, [email protected]

一个叫rooms.csv

roomName, roomNo, students
room1, 1, {@{StudentNo=111; StudentName=john smith; StartDate=2018-01-01T00:00:00; EndDate=2018-07-06T00:00:00},....
room2, 2,{@{StudentNo=222; StudentName=jane doe; StartDate=2018-01-01T00:00:00; EndDate=2018-07-06T00:00:00},...   

rooms.csv中的第三列是API提供的数组

将两者合并的最佳方法是什么?

StudentNo,PreferredFirstnames,PreferredSurname,UPN, roomName
111, john, smith, [email protected], room1
222, jane, doe, [email protected], room2

我在想......

$rooms = Import-Csv rooms.csv
$students  = Import-Csv students.csv
$combined = $students | select-object StudentNo,PreferredSurname,PreferredFirstnames,UPN,
@{Name="roomName";Expression={ ForEach ($r in $rooms) {
    if ($r.Students.StudentNo.Contains($_.StudentNo) -eq "True") 
{return $r.roomName}}}} 

这是有效的,但是foreach是正确的方式,我混合的东西或有更有效的方式???

---原帖---

有了所有这些信息,我需要比较学生数据并更新AzureAD,然后编译一个数据列表,包括first namelast nameupnroom以及从AzureAD检索到的其他数据。

我的问题是“效率”。我的代码大部分都可以运行,但需要花费数小时才能运行。目前我正在循环通过students.csv然后为每个学生循环通过rooms.csv找到他们所在的房间,显然等待所有这些之间的多个api呼叫。

找到每个学生的房间最有效的方法是什么?是否将CSV导入为与使用哈希表相当的自定义PSObject

powershell csv import-csv export-csv
1个回答
0
投票

我能够让您的建议代码工作,但它需要对代码和数据进行一些调整:

  • 必须有一些额外的步骤,您将studentsrooms.csv列反序列化为对象集合。它似乎是一个ScriptBlock评估为HashTables数组,但仍然需要对CSV输入进行一些更改: 需要引用StartDateEndDate属性并投射到[DateTime]。 至少对于包含多个学生的房间,必须引用该值,因此Import-Csv不会将,解释为将数组元素分离为附加列。
  • 使用CSV作为中间格式的缺点是原始属性类型丢失;一切都变成了导入时的[String]。为了提高效率,有时需要回退到原始类型,有时为了使某些操作起作用,这是绝对必要的。您可以在每次使用它们时强制转换这些属性,但我更喜欢在导入后立即强制转换它们。

随着这些变化,rooms.csv变得......

roomName, roomNo, students
room1, 1, "{@{StudentNo=111; StudentName='john smith'; StartDate=[DateTime] '2018-01-01T00:00:00'; EndDate=[DateTime] '2018-07-06T00:00:00'}}"
room2, 2, "{@{StudentNo=222; StudentName='jane doe'; StartDate=[DateTime] '2018-01-01T00:00:00'; EndDate=[DateTime] '2018-07-06T00:00:00'}}"

......脚本变成......

# Replace the [String] property "students" with an array of [HashTable] property "Students"
$rooms = Import-Csv rooms.csv `
    | Select-Object `
        -ExcludeProperty 'students' `
        -Property '*', @{
            Name = 'Students'
            Expression = {
                $studentsText = $_.students
                $studentsScriptBlock = Invoke-Expression -Command $studentsText
                $studentsArray = @(& $studentsScriptBlock)

                return $studentsArray
            }
        }
# Replace the [String] property "StudentNo" with an [Int32] property of the same name
$students = Import-Csv students.csv `
    | Select-Object `
        -ExcludeProperty 'StudentNo' `
        -Property '*', @{
            Name = 'StudentNo'
            Expression = { [Int32] $_.StudentNo }
        }
$combined = $students `
    | Select-Object -Property `
        'StudentNo', `
        'PreferredSurname', `
        'PreferredFirstnames', `
        'UPN', `
        @{
            Name = "roomName";
            Expression = {
                foreach ($r in $rooms)
                {
                    if ($r.Students.StudentNo -contains $_.StudentNo)
                    {
                        return $r.roomName
                    }
                }

                #TODO: Return text indicating room not found?
            }
        }

这可能很慢的原因是因为你正在为每个学生对象进行线性搜索 - 实际上是其中两个;首先通过收集房间(foreach),然后通过每个房间的学生集合(-contains)。这可以很快变成大量的迭代和平等比较,因为在没有分配当前学生的每个房间中,你会一直在迭代整个房间的学生集合,直到找到该学生的房间。

在执行线性搜索时,您可以进行一项简单的优化,即对您要搜索的项目进行排序(在这种情况下,Students属性将按每个学生的StudentNo属性排序)...

# Replace the [String] property "students" with an array of [HashTable] property "Students"
$rooms = Import-Csv rooms.csv `
    | Select-Object `
        -ExcludeProperty 'students' `
        -Property '*', @{
            Name = 'Students'
            Expression = {
                $studentsText = $_.students
                $studentsScriptBlock = Invoke-Expression -Command $studentsText
                $studentsArray = @(& $studentsScriptBlock) `
                    | Sort-Object -Property @{ Expression = { $_.StudentNo } }

                return $studentsArray
            }
        }

...然后当你搜索相同的集合时,如果你遇到的项目大于你正在搜索的项目,你知道集合的其余部分可能不包含你要搜索的内容而你可以立即中止搜索...

@{
    Name = "roomName";
    Expression = {
        foreach ($r in $rooms)
        {
            # Requires $room.Students to be sorted by StudentNo
            foreach ($roomStudentNo in $r.Students.StudentNo)
            {
                if ($roomStudentNo -eq $_.StudentNo)
                {
                    # Return the matched room name and stop searching this and further rooms
                    return $r.roomName
                }
                elseif ($roomStudentNo -gt $_.StudentNo)
                {
                    # Stop searching this room
                    break
                }

                # $roomStudentNo is less than $_.StudentNo; keep searching this room
            }
        }

        #TODO: Return text indicating room not found?
    }
}

更好的是,通过排序集合,您还可以执行binary search,这比线性搜索*更快。 Array class已经提供了BinarySearch static method,所以我们也可以用更少的代码完成这个...

@{
    Name = "roomName";
    Expression = {
        foreach ($r in $rooms)
        {
            # Requires $room.Students to be sorted by StudentNo
            if ([Array]::BinarySearch($r.Students.StudentNo, $_.StudentNo) -ge 0)
            {
                return $r.roomName
            }
        }

        #TODO: Return text indicating room not found?
    }
}

然而,我将解决这个问题的方法是使用[HashTable]StudentNo映射到房间。构建[HashTable]需要一些预处理,但这将为学生检索房间时提供恒定时间查找。

function GetRoomsByStudentNoTable()
{
    $table = @{ }

    foreach ($room in $rooms)
    {
        foreach ($student in $room.Students)
        {
            #NOTE: It is assumed each student belongs to at most one room
            $table[$student.StudentNo] = $room
        }
    }

    return $table
}

# Replace the [String] property "students" with an array of [HashTable] property "Students"
$rooms = Import-Csv rooms.csv `
    | Select-Object `
        -ExcludeProperty 'students' `
        -Property '*', @{
            Name = 'Students'
            Expression = {
                $studentsText = $_.students
                $studentsScriptBlock = Invoke-Expression -Command $studentsText
                $studentsArray = @(& $studentsScriptBlock)

                return $studentsArray
            }
        }
# Replace the [String] property "StudentNo" with an [Int32] property of the same name
$students = Import-Csv students.csv `
    | Select-Object `
        -ExcludeProperty 'StudentNo' `
        -Property '*', @{
            Name = 'StudentNo'
            Expression = { [Int32] $_.StudentNo }
        }
$roomsByStudentNo = GetRoomsByStudentNoTable
$combined = $students `
    | Select-Object -Property `
        'StudentNo', `
        'PreferredSurname', `
        'PreferredFirstnames', `
        'UPN', `
        @{
            Name = "roomName";
            Expression = {
                $room = $roomsByStudentNo[$_.StudentNo]
                if ($room -ne $null)
                {
                    return $room.roomName
                }

                #TODO: Return text indicating room not found?
            }
        }

你可以在导入$roomsByStudentNo的同时改善建设rooms.csv的打击......

# Replace the [String] property "students" with an array of [HashTable] property "Students"
$rooms = Import-Csv rooms.csv `
    | Select-Object `
        -ExcludeProperty 'students' `
        -Property '*', @{
            Name = 'Students'
            Expression = {
                $studentsText = $_.students
                $studentsScriptBlock = Invoke-Expression -Command $studentsText
                $studentsArray = @(& $studentsScriptBlock)

                return $studentsArray
            }
        } `
    | ForEach-Object -Begin {
        $roomsByStudentNo = @{ }
    } -Process {
        foreach ($student in $_.Students)
        {
            #NOTE: It is assumed each student belongs to at most one room
            $roomsByStudentNo[$student.StudentNo] = $_
        }

        return $_
    }

*除小阵列外

© www.soinside.com 2019 - 2024. All rights reserved.