我正在从API中检索两个CSV,一个名为students.csv
,类似于:
StudentNo,PreferredFirstnames,PreferredSurname,UPN
111, john, smith, [email protected]
222, jane, doe, [email protected]
一个叫rooms.csv
:
roomName, roomNo, students
room1, 1, {@{StudentNo=111; StudentName=john smith; StartDate=2018-01-01T00:00:00; EndDate=2018-07-06T00:00:00},....
room2, 2,{@{StudentNo=222; StudentName=jane doe; StartDate=2018-01-01T00:00:00; EndDate=2018-07-06T00:00:00},...
rooms.csv中的第三列是API提供的数组
将两者合并的最佳方法是什么?
StudentNo,PreferredFirstnames,PreferredSurname,UPN, roomName
111, john, smith, [email protected], room1
222, jane, doe, [email protected], room2
我在想......
$rooms = Import-Csv rooms.csv
$students = Import-Csv students.csv
$combined = $students | select-object StudentNo,PreferredSurname,PreferredFirstnames,UPN,
@{Name="roomName";Expression={ ForEach ($r in $rooms) {
if ($r.Students.StudentNo.Contains($_.StudentNo) -eq "True")
{return $r.roomName}}}}
这是有效的,但是foreach
是正确的方式,我混合的东西或有更有效的方式???
---原帖---
有了所有这些信息,我需要比较学生数据并更新AzureAD,然后编译一个数据列表,包括first name
,last name
,upn
,room
以及从AzureAD检索到的其他数据。
我的问题是“效率”。我的代码大部分都可以运行,但需要花费数小时才能运行。目前我正在循环通过students.csv
然后为每个学生循环通过rooms.csv
找到他们所在的房间,显然等待所有这些之间的多个api呼叫。
找到每个学生的房间最有效的方法是什么?是否将CSV导入为与使用哈希表相当的自定义PSObject
?
我能够让您的建议代码工作,但它需要对代码和数据进行一些调整:
students
的rooms.csv
列反序列化为对象集合。它似乎是一个ScriptBlock
评估为HashTable
s数组,但仍然需要对CSV输入进行一些更改:
需要引用StartDate
和EndDate
属性并投射到[DateTime]
。
至少对于包含多个学生的房间,必须引用该值,因此Import-Csv
不会将,
解释为将数组元素分离为附加列。[String]
。为了提高效率,有时需要回退到原始类型,有时为了使某些操作起作用,这是绝对必要的。您可以在每次使用它们时强制转换这些属性,但我更喜欢在导入后立即强制转换它们。随着这些变化,rooms.csv
变得......
roomName, roomNo, students
room1, 1, "{@{StudentNo=111; StudentName='john smith'; StartDate=[DateTime] '2018-01-01T00:00:00'; EndDate=[DateTime] '2018-07-06T00:00:00'}}"
room2, 2, "{@{StudentNo=222; StudentName='jane doe'; StartDate=[DateTime] '2018-01-01T00:00:00'; EndDate=[DateTime] '2018-07-06T00:00:00'}}"
......脚本变成......
# Replace the [String] property "students" with an array of [HashTable] property "Students"
$rooms = Import-Csv rooms.csv `
| Select-Object `
-ExcludeProperty 'students' `
-Property '*', @{
Name = 'Students'
Expression = {
$studentsText = $_.students
$studentsScriptBlock = Invoke-Expression -Command $studentsText
$studentsArray = @(& $studentsScriptBlock)
return $studentsArray
}
}
# Replace the [String] property "StudentNo" with an [Int32] property of the same name
$students = Import-Csv students.csv `
| Select-Object `
-ExcludeProperty 'StudentNo' `
-Property '*', @{
Name = 'StudentNo'
Expression = { [Int32] $_.StudentNo }
}
$combined = $students `
| Select-Object -Property `
'StudentNo', `
'PreferredSurname', `
'PreferredFirstnames', `
'UPN', `
@{
Name = "roomName";
Expression = {
foreach ($r in $rooms)
{
if ($r.Students.StudentNo -contains $_.StudentNo)
{
return $r.roomName
}
}
#TODO: Return text indicating room not found?
}
}
这可能很慢的原因是因为你正在为每个学生对象进行线性搜索 - 实际上是其中两个;首先通过收集房间(foreach
),然后通过每个房间的学生集合(-contains
)。这可以很快变成大量的迭代和平等比较,因为在没有分配当前学生的每个房间中,你会一直在迭代整个房间的学生集合,直到找到该学生的房间。
在执行线性搜索时,您可以进行一项简单的优化,即对您要搜索的项目进行排序(在这种情况下,Students
属性将按每个学生的StudentNo
属性排序)...
# Replace the [String] property "students" with an array of [HashTable] property "Students"
$rooms = Import-Csv rooms.csv `
| Select-Object `
-ExcludeProperty 'students' `
-Property '*', @{
Name = 'Students'
Expression = {
$studentsText = $_.students
$studentsScriptBlock = Invoke-Expression -Command $studentsText
$studentsArray = @(& $studentsScriptBlock) `
| Sort-Object -Property @{ Expression = { $_.StudentNo } }
return $studentsArray
}
}
...然后当你搜索相同的集合时,如果你遇到的项目大于你正在搜索的项目,你知道集合的其余部分可能不包含你要搜索的内容而你可以立即中止搜索...
@{
Name = "roomName";
Expression = {
foreach ($r in $rooms)
{
# Requires $room.Students to be sorted by StudentNo
foreach ($roomStudentNo in $r.Students.StudentNo)
{
if ($roomStudentNo -eq $_.StudentNo)
{
# Return the matched room name and stop searching this and further rooms
return $r.roomName
}
elseif ($roomStudentNo -gt $_.StudentNo)
{
# Stop searching this room
break
}
# $roomStudentNo is less than $_.StudentNo; keep searching this room
}
}
#TODO: Return text indicating room not found?
}
}
更好的是,通过排序集合,您还可以执行binary search,这比线性搜索*更快。 Array
class已经提供了BinarySearch
static method,所以我们也可以用更少的代码完成这个...
@{
Name = "roomName";
Expression = {
foreach ($r in $rooms)
{
# Requires $room.Students to be sorted by StudentNo
if ([Array]::BinarySearch($r.Students.StudentNo, $_.StudentNo) -ge 0)
{
return $r.roomName
}
}
#TODO: Return text indicating room not found?
}
}
然而,我将解决这个问题的方法是使用[HashTable]
将StudentNo
映射到房间。构建[HashTable]
需要一些预处理,但这将为学生检索房间时提供恒定时间查找。
function GetRoomsByStudentNoTable()
{
$table = @{ }
foreach ($room in $rooms)
{
foreach ($student in $room.Students)
{
#NOTE: It is assumed each student belongs to at most one room
$table[$student.StudentNo] = $room
}
}
return $table
}
# Replace the [String] property "students" with an array of [HashTable] property "Students"
$rooms = Import-Csv rooms.csv `
| Select-Object `
-ExcludeProperty 'students' `
-Property '*', @{
Name = 'Students'
Expression = {
$studentsText = $_.students
$studentsScriptBlock = Invoke-Expression -Command $studentsText
$studentsArray = @(& $studentsScriptBlock)
return $studentsArray
}
}
# Replace the [String] property "StudentNo" with an [Int32] property of the same name
$students = Import-Csv students.csv `
| Select-Object `
-ExcludeProperty 'StudentNo' `
-Property '*', @{
Name = 'StudentNo'
Expression = { [Int32] $_.StudentNo }
}
$roomsByStudentNo = GetRoomsByStudentNoTable
$combined = $students `
| Select-Object -Property `
'StudentNo', `
'PreferredSurname', `
'PreferredFirstnames', `
'UPN', `
@{
Name = "roomName";
Expression = {
$room = $roomsByStudentNo[$_.StudentNo]
if ($room -ne $null)
{
return $room.roomName
}
#TODO: Return text indicating room not found?
}
}
你可以在导入$roomsByStudentNo
的同时改善建设rooms.csv
的打击......
# Replace the [String] property "students" with an array of [HashTable] property "Students"
$rooms = Import-Csv rooms.csv `
| Select-Object `
-ExcludeProperty 'students' `
-Property '*', @{
Name = 'Students'
Expression = {
$studentsText = $_.students
$studentsScriptBlock = Invoke-Expression -Command $studentsText
$studentsArray = @(& $studentsScriptBlock)
return $studentsArray
}
} `
| ForEach-Object -Begin {
$roomsByStudentNo = @{ }
} -Process {
foreach ($student in $_.Students)
{
#NOTE: It is assumed each student belongs to at most one room
$roomsByStudentNo[$student.StudentNo] = $_
}
return $_
}
*除小阵列外