从特定字符串旁边的文本文件中提取 url

Question

我有一个很大的文本文件，其中包含如下内容：

在浏览器中查看此电子邮件 (https://us15.campaign-archive.com/?e=3D1460&u=3Df6e2bb1612577510b&id=3D2c8be)

URL 不起作用 - 它经常变化。有时部分网址会进入下一行。

我只需要使用不带括号的 powershell 提取该 url，这样我就可以将它下载为 html 文件。

提前致谢

我已经尝试过批量执行此操作，这是我最熟悉的，但事实证明这是不可能的，而且似乎这在 powershell 中是可能的。

Answer 1

以下使用基于regex的运算符和.NET API。

在这两个解决方案中，

-replace '\r?\n'

用于从找到的 URL 中删除任何嵌入的换行符（换行符），使用

-replace

运算符（

\r?\n

是匹配 Windows 格式 CRLF 和Unix 格式 LF-only 换行）。

如果您只需要 one 或 first URL，请使用
```
-match
```
运算符
，如果它返回
```
$true
```
- 报告在
automatic
```
$Matches
```
变量
变量中匹配的内容。

# Sample multi-line input string.
# To read such a string from a file, use, e.g.:
#     $str = Get-Content -Raw file.txt
$str = @'
  Initial text.

  View this email in your browser (https://us15.campaign-archive.com/?e=3D1460&u=3Df6e2b
b1612577510b&id=3D2c8be)

  More text.
'@

# Find the (first) embedded URL...
if ($str -match '(?<=\()https?://[^)]+') {
  # ... remove any line breaks from it, and output the result.
  $Matches.0 -replace '\r?\n'
}

如果您需要all（或固定计数）的匹配项，则需要直接使用
```
System.Text.RegularExpressions.Regex.Matches
```
.NET API：

# Extract *all* URLs and remove any embedded line breaks from each
[regex]::Matches(
  $str, 
  '(?<=\()https?://[^)]+'
).Value -replace '\r?\n'

有关第一个正则表达式的解释以及使用它进行试验的能力，请参阅this regex101.com page.

从特定字符串旁边的文本文件中提取 url

问题描述投票：0回答：1

1个回答

最新问题

从特定字符串旁边的文本文件中提取 url

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1