win批量正则表达式搜索和替换

问题描述 投票:0回答:4

我有这样一组数据

7859 10000:00 7859 10000:00 (xfer#1, to-check=1033/1035)

32768 000:17 22174479 10000:00 (xfer#2, to-check=1032/1035)

它们从文件中读取并逐行传递到我的批处理脚本中的方法 我想用那个方法做的是只提取

7859

22174479

从这一行开始,基本上是“\d+:\d\d\s+”之后的任何内容,然后是我需要的数字,然后是另一个“\d\d.*”

仅使用批处理脚本正则表达式以及搜索和替换是否可能? 我尝试并阅读了一堆文章,但找不到解决方案 在和我想添加数字

谢谢

编辑
根据 Andrei 对 David Ruhmann 的回答的评论,Andrei 想要在

(xfer#
之前 2 个位置的令牌,而不是从头开始的第 3 个令牌。

windows batch-file
4个回答
0
投票

请注意,批处理不是用于正则表达式的最佳语言! Cmd 一次处理一行输入,而正则表达式允许多行处理。

听起来你只需要从线路中执行令牌抢夺。假设该行的更完整的正则表达式看起来像这样

[\d+\s+\d+:\d\d\s+]+\(xfer#\d+, to-check=\d+/\d+\)
.

这让我们知道行中有常量分隔符。

:
冒号和
\s+
空格。从那里开始,只需要使用这些锚点来确定令牌位置。


从行中提取由单行空格分隔的第三个标记。

for /f "tokens=3" %%A in ("line") do echo %%A

从行中由冒号分隔的第二个标记中提取由单行空格分隔的第二个标记。

for /f "tokens=2 delims=:" %%A in ("line") do (
    for /f "tokens=2" %%B in ("%%A") do echo %%B
)

更新

提取最后一个冒号之前的第二个标记。

@echo off
setlocal EnableExtensions EnableDelayedExpansion
set "Line=32768 004:47 2686976 2200:03 11707819 10000:01 (xfer#5264, to-check=1020/6975)"

set "Last="
for /f "delims=" %%A in ('echo("%Line::="^&echo("%"') do (
    for /f "tokens=2" %%B in ("%%A") do (
        if defined This set "Last=!This!"
        set "This=%%B"
    )
)
echo %Last%

endlocal
pause >nul

限制

  1. 包含奇数个双引号的行
    "
    会导致脚本崩溃。防止这种情况的一种方法是用
    set Line=%Line:"=%
    .
  2. 删除 for 循环之前的引号

0
投票

根据您对 David Ruhmann 的回答的评论,您需要在

(xfer#
字符串之前 2 个位置的标记。我想这可以使用本机批处理命令来完成,但这是一个令人讨厌的问题。

我假设您仅限于 Windows 原生的命令 - 没有下载的可执行文件。

我希望您可以使用 JScript,因为它是 Windows 原生的。

我编写了一个名为“REPL.BAT”的混合 JScript/Batch 实用程序脚本,用于执行正则表达式搜索和替换。尽管不需要太多代码,但它是一个非常有用的实用程序。该实用程序使解决方案非常简单。

我使用 FINDSTR 过滤掉在

(xfer#
之前不符合至少 2 个空格分隔标记模板的行。我将这些结果通过管道传输到我的 REPL 实用程序并仅保留所需的标记。结果被发送到标准输出。

findstr /r /c:" [^ ][^ ]* [^ ][^ ]* (xfer#" test.txt | repl ".* ([^ ]+) ([^ ]+) \(xfer#.*" "$1"

这里是 REPL.BAT 实用程序脚本的代码。脚本中嵌入了完整的文档。

@if (@X)==(@Y) @end /* Harmless hybrid line that begins a JScript comment

::************ Documentation ***********
:::
:::REPL  Search  Replace  [Options  [SourceVar]]
:::REPL  /?
:::
:::  Performs a global search and replace operation on each line of input from
:::  stdin and prints the result to stdout.
:::
:::  Each parameter may be optionally enclosed by double quotes. The double
:::  quotes are not considered part of the argument. The quotes are required
:::  if the parameter contains a batch token delimiter like space, tab, comma,
:::  semicolon. The quotes should also be used if the argument contains a
:::  batch special character like &, |, etc. so that the special character
:::  does not need to be escaped with ^.
:::
:::  If called with a single argument of /? then prints help documentation
:::  to stdout.
:::
:::  Search  - By default this is a case sensitive JScript (ECMA) regular
:::            expression expressed as a string.
:::
:::            JScript syntax documentation is available at
:::            http://msdn.microsoft.com/en-us/library/ae5bf541(v=vs.80).aspx
:::
:::  Replace - By default this is the string to be used as a replacement for
:::            each found search expression. Full support is provided for
:::            substituion patterns available to the JScript replace method.
:::            A $ literal can be escaped as $$. An empty replacement string
:::            must be represented as "".
:::
:::            Replace substitution pattern syntax is documented at
:::            http://msdn.microsoft.com/en-US/library/efy6s3e6(v=vs.80).aspx
:::
:::  Options - An optional string of characters used to alter the behavior
:::            of REPL. The option characters are case insensitive, and may
:::            appear in any order.
:::
:::            I - Makes the search case-insensitive.
:::
:::            L - The Search is treated as a string literal instead of a
:::                regular expression. Also, all $ found in Replace are
:::                treated as $ literals.
:::
:::            E - Search and Replace represent the name of environment
:::                variables that contain the respective values. An undefined
:::                variable is treated as an empty string.
:::
:::            M - Multi-line mode. The entire contents of stdin is read and
:::                processed in one pass instead of line by line. ^ anchors
:::                the beginning of a line and $ anchors the end of a line.
:::
:::            X - Enables extended substitution pattern syntax with support
:::                for the following escape sequences:
:::
:::                \\     -  Backslash
:::                \b     -  Backspace
:::                \f     -  Formfeed
:::                \n     -  Newline
:::                \r     -  Carriage Return
:::                \t     -  Horizontal Tab
:::                \v     -  Vertical Tab
:::                \xnn   -  Ascii (Latin 1) character expressed as 2 hex digits
:::                \unnnn -  Unicode character expressed as 4 hex digits
:::
:::                Escape sequences are supported even when the L option is used.
:::
:::            S - The source is read from an environment variable instead of
:::                from stdin. The name of the source environment variable is
:::                specified in the next argument after the option string.
:::

::************ Batch portion ***********
@echo off
if .%2 equ . (
  if "%~1" equ "/?" (
    findstr "^:::" "%~f0" | cscript //E:JScript //nologo "%~f0" "^:::" ""
    exit /b 0
  ) else (
    call :err "Insufficient arguments"
    exit /b 1
  )
)
echo(%~3|findstr /i "[^SMILEX]" >nul && (
  call :err "Invalid option(s)"
  exit /b 1
)
cscript //E:JScript //nologo "%~f0" %*
exit /b 0

:err
>&2 echo ERROR: %~1. Use REPL /? to get help.
exit /b

************* JScript portion **********/
var env=WScript.CreateObject("WScript.Shell").Environment("Process");
var args=WScript.Arguments;
var search=args.Item(0);
var replace=args.Item(1);
var options="g";
if (args.length>2) {
  options+=args.Item(2).toLowerCase();
}
var multi=(options.indexOf("m")>=0);
var srcVar=(options.indexOf("s")>=0);
if (srcVar) {
  options=options.replace(/s/g,"");
}
if (options.indexOf("e")>=0) {
  options=options.replace(/e/g,"");
  search=env(search);
  replace=env(replace);
}
if (options.indexOf("l")>=0) {
  options=options.replace(/l/g,"");
  search=search.replace(/([.^$*+?()[{\\|])/g,"\\$1");
  replace=replace.replace(/\$/g,"$$$$");
}
if (options.indexOf("x")>=0) {
  options=options.replace(/x/g,"");
  replace=replace.replace(/\\\\/g,"\\B");
  replace=replace.replace(/\\b/g,"\b");
  replace=replace.replace(/\\f/g,"\f");
  replace=replace.replace(/\\n/g,"\n");
  replace=replace.replace(/\\r/g,"\r");
  replace=replace.replace(/\\t/g,"\t");
  replace=replace.replace(/\\v/g,"\v");
  replace=replace.replace(/\\x[0-9a-fA-F]{2}|\\u[0-9a-fA-F]{4}/g,
    function($0,$1,$2){
      return String.fromCharCode(parseInt("0x"+$0.substring(2)));
    }
  );
  replace=replace.replace(/\\B/g,"\\");
}
var search=new RegExp(search,options);

if (srcVar) {
  WScript.Stdout.Write(env(args.Item(3)).replace(search,replace));
} else {
  while (!WScript.StdIn.AtEndOfStream) {
    if (multi) {
      WScript.Stdout.Write(WScript.StdIn.ReadAll().replace(search,replace));
    } else {
      WScript.Stdout.WriteLine(WScript.StdIn.ReadLine().replace(search,replace));
    }
  }
}

0
投票
  :: Does %variable% =~ s/old/new/
  setlocal ENABLEDELAYEDEXPANSION     
  for /f "delims=" %%a in ('echo !variable! ^|perl -pe "s/regexp/replace/" ') do set variable=%%a  

0
投票

完成你想要的最简单和最灵活的方法是使用来自 GnuWin32 的 awk正则表达式示例)或 sed(例如:

sed -i -r -e "s/(\d+:\d\d\s+)\d+/\1replacementstring/g" filename
),它们都支持 Perl 正则表达式语法。我认为你所涉及的正是 awk 的设计目的。

如果您只使用可用的工具而不必使用 3rd 方工具,您可以使用 vbscript 执行正则表达式匹配。您可以通过将脚本回显到 .vbs 文件、调用 cscript vbsfile 并捕获其输出来调用 vbscript。这是一个概念证明。

@echo off & setlocal enabledelayedexpansion

:: rxp.bat
:: rxp /? for usage instructions

if #%4==# goto usage
set global=false
set replace=false
for %%I in (%*) do (
    if not #!next!==# (
        if !next!==string set string=%%I
        if !next!==pattern set pattern=%%I
        if !next!==replace set replace=%%I
        set next=
    )
    if #%%I==#/s set next=string
    if #%%I==#/p set next=pattern
    if #%%I==#/r set next=replace
    if #%%I==#/g set global=true
)
if #%string==# goto usage
if #%pattern==# goto usage

set string=!string:"=""!
set string=!string:\=!
set pattern=!pattern:"=""!
set pattern=!pattern:\=!
if #!replace!==#false (
    call :rxp !string:~1,-1! !pattern:~1,-1! !global!
) else (
    set replace=!replace:"=""!
    set replace=!replace:\=!
    call :rxp !string:~1,-1! !pattern:~1,-1! !global! !replace:~1,-1!
)
goto :EOF

:rxp string pattern global replacement
echo Set rxp = New RegExp>regexp.vbs
echo rxp.Pattern = %2>>regexp.vbs
echo rxp.Global = %3>>regexp.vbs
if #%4==# (
    echo Set res = rxp.Execute^(%1^)>>regexp.vbs
    echo For Each match in res>>regexp.vbs
    echo Wscript.Echo match.value>>regexp.vbs
    echo Next>>regexp.vbs
) else (
    echo Wscript.echo rxp.Replace^(%1, %4^)>>regexp.vbs
)
cscript /nologo regexp.vbs
del /q regexp.vbs
goto :EOF

:usage
echo Usage: %~nx0 /s "string" /p "regexp" [/g] [/r "replacement text"]
echo;
echo    /s -- search string
echo;
echo    /p -- regular expression pattern
echo          Example: /p "<[^>]+>" to search for markup tags
echo          matches ^<span class='a'^> or similar
echo;
echo    /r -- replacement text (optional)
echo          If specified, replace the matched text
echo          Example: /p "(<div class=')blue('>)" /r "$1red$2"
echo          matches ^<div class='blue'^>
echo          replaces match with ^<div class='red'^>
echo;
echo    /g -- global match (optional)
echo          match every occurrence (matches only the first by default)
echo;
echo notes: If the regexp pattern includes capturing parentheses, use ^$1-^$9 as
echo backreferences in your replacement text.  If any of your strings include
echo quotation marks, they can be escaped with a backslash (\).
echo;
echo Example:
echo %~nx0 /s "text begin <div id=\"foo\"> text end" /p "(<div)[^>]+(>)"
echo /r "$1 class=\"bar\"$2"
echo;
echo matches ^<div id="foo"^>, replaces match with ^<div class="bar"^>
echo output: text begin ^<div class="bar"^> text end

示例输出:

C:\Users\me\Desktop>rxp /s "7859 10000:00 7849 10000:00 (xfer#1, to-check=1033/1035)" /p "(\d+:\d\d\s+)\d+" /r "$1foo"
7859 10000:00 foo 10000:00 (xfer#1, to-check=1033/1035)

C:\Users\me\Desktop>rxp
Usage: rxp.bat /s "string" /p "regexp" [/g] [/r "replacement text"]

   /s -- search string

   /p -- regular expression pattern
         Example: /p "<[^>]+>" to search for markup tags
         matches <span class='a'> or similar

   /r -- replacement text (optional)
         If specified, replace the matched text

   /g -- global match (optional)
         match every occurrence (matches only the first by default)

notes: If the regexp pattern includes capturing parentheses, use $1-$9 as
backreferences in your replacement text.  If any of your strings include
quotation marks, they can be escaped with a backslash (\).

Example:
rxp.bat /s "text begin <div id=\"foo\"> text end" /p "(<div)[^>]+(>)"
/r "$1 class=\"bar\"$2"

matches <div id="foo">, replaces match with <div class="bar">
output: text begin <div class="bar"> text end
© www.soinside.com 2019 - 2024. All rights reserved.