$searchWord
变量设置为“ yani”时,脚本按预期执行;但是,当我将其更改为“ KOLI”时,它将找不到匹配。如何在使用Word Files时使用UTF-8编码的脚本搜索?
# Define the directory to search and the word to search for
$OutputEncoding = [Console]::OutputEncoding = [Text.UTF8Encoding]::UTF8
$directoryPath = "D:\BAKIM_ARIZA_TAKIP_FORMU\2024\AGUSTOS_AYI"
$searchWord = "KOLİ"
# Load the Word application
$word = New-Object -ComObject Word.Application
$word.Visible = $false
# Get all .docx files in the directory
$docxFiles = Get-ChildItem -Path $directoryPath -Filter *.doc
foreach ($file in $docxFiles) {
# Open the document
$document = $word.Documents.Open($file.FullName)
# Search for the word
$found = $false
foreach ($range in $document.StoryRanges) {
if ($range.Text -match [System.Text.Encoding]::UTF8.GetString([System.Text.Encoding]::UTF8.GetBytes($searchWord))) {
$found = $true
break
}
}
# Output the file name if the word is found
if ($found) {
Write-Output "Found '$searchWord' in file: $($file.FullName)"
}
# Close the document
$document.Close()
}
# Quit the Word application
$word.Quit()
UTF-8,否则PowerShell引擎将在脚本中使用任何非ASCII-RANGE字符(例如İ
)。
如果您需要在脚本中使用非ASCII字符,请将其保存为 UTF-8与Bom。没有BOM,Windows Powershell误解了您的 脚本被编码在遗留“ ANSI”代码中。反过来, 确实具有UTF-8 BOM的文件可能会在类似于Unix的文件上有问题 平台。许多Unix工具,例如CAT,SED,AWK和一些编辑 例如Gedit不知道如何治疗BOM。
Source参考:https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core.core/about/about/about/about_character_encter_encoding
BTW,无需明确设置[Console]::OutputEncoding = [Text.UTF8Encoding]::UTF8
,也无需将字符串编码为字节。您可以简单地使用$range.Text -match $searchWord