如何在`StringCvt.scanString(RE.find compiledComment)输入中访问len和pos

问题描述 投票:0回答:1

背景:我正在尝试使用正则表达式来解析一种语言的注释,该注释以qazxsw poi开头:

//

structure Main = struct structure RE = RegExpFn( structure P = AwkSyntax structure E = ThompsonEngine ) val regexes = [ ("[a-zA-z@= ]* *//.*", fn match => ("comment", match)), ("[0-9]*", fn match => ("2nd", match)), ("1tom|2jerry", fn match => ("3rd", match)) ] fun main () = let val input = "@=abs //sdfasdfdfa sdf as" val comment = "[a-zA-z@= ]* *//" val compiledComment = RE.compileString comment in (* #1 StringCvt.scanString (RE.match regexes) input *) (* #2 StringCvt.scanString (RE.find compiledComment) input *) (* #3 case ... of ... *) end end 是我的测试用例,我希望只修剪input和prserve //sdfasdfdfa sdf as

以下是我的一些试验:

  • @=abs成为StringCvt.scanString (RE.find compiledComment) input的回报值:

fun main

  • - Main.main(); [autoloading] [autoloading done] val it = SOME (Match ({len=8,pos=0},[])) : StringCvt.cs Main.RE.match option 成为回报值:

StringCvt.scanString (RE.match regexes) input

这两个案例告诉我- Main.main(); [autoloading] [autoloading done] val it = SOME ("comment",Match ({len=#,pos=#},[])) : (string * StringCvt.cs Main.RE.match) option 是我想要的,因为它的值包含StringCvt.scanString (RE.find compiledComment) input,可以用来修剪所有评论。但我对它的价值和类型有点困惑:{len=8,pos=0},[])。我怎样才能访问val it = SOME (Match ({len=8,pos=0},[])) : StringCvt.cs Main.RE.match optionlenhere?为什么posStringCvt.cs只能按空间划分?

在使用sml的文档进行搜索后,我将包含以下所有信息:

Main.RE.match

IIUC,#+BEGIN_SRC sml StringCvt.scanString (RE.match regexes) input val it = SOME ( "comment" , Match ({len=#,pos=#},[])) : (string * StringCvt.cs Main.RE.match) option StringCvt.scanString (RE.find compiledComment) input val it = SOME ( Match ({len=8,pos=0},[])) : StringCvt.cs Main.RE.match option val find : regexp -> (char,'a) StringCvt.reader -> ({pos : 'a, len : int} option MatchTree.match_tree,'a) StringCvt.reader val scanString : ((char, cs) reader -> ('a, cs) reader) -> string -> 'a option val match : (string * ({pos : 'a, len : int} option MatchTree.match_tree -> 'b)) list -> (char,'a) StringCvt.reader -> ('b,'a) StringCvt.reader #+END_SRC type cs The abstract type of the character stream used by scanString. A value of this type represents the state of a character stream. The concrete type is left unspecified to allow implementations a choice of representations. Typically, cs will be an integer index into a string. 的类型应该是MatchStringCvt.cs({len=8,pos=0},[]))的类型应该是({len=#,pos=#},[]))。然后我开始模式匹配:

Main.RE.match

不幸的,

let
...
in 
case StringCvt.scanString (RE.find compiledComment) input of
            NONE => ""
         |  SOME (
                StringCvt.cs ({len = b, pos = a}, _)) => String.substring (input a b)

似乎我不能使用main.sml:23.19-23.39 Error: non-constructor applied to argument in pattern main.sml:23.92 Error: unbound variable or constructor: a main.sml:23.94 Error: unbound variable or constructor: b main.sml:23.86-23.95 Error: operator is not a function [tycon mismatch] operator: string in expression: input <errorvar> [autoloading failed: unable to load module(s)] stdIn:1.2-1.11 Error: unbound structure: Main in path Main.main 作为模式,因为它不是构造函数。然后我尝试使用StringCvt.cs

wildcard

,

case StringCvt.scanString (RE.find compiledComment) input of
    NONE => ""
 |  SOME (_ ({len = b, pos = a}, _)) => String.substring (input a b)

那么,main.sml:23.19 Error: non-constructor applied to argument in pattern 的构造函数是必须的吗?我不能再深入挖掘了。你有什么想法?提前致谢

regex pattern-matching sml smlnj
1个回答
0
投票

解决了:

Match
© www.soinside.com 2019 - 2024. All rights reserved.