我已经写了几百行来解析我正在使用的玩具语言。我以为我开始真正理解秒差距了。但现在我在看似非常简单的解析任务上遇到了困难,所以我显然缺少一些基本理解的元素。
我想匹配这样的例子(实际上不是,但在这个最小的例子中我这样做):
我已将其简化为一个人为的最小示例(位于文件 MinEx.hs 中):
module MinEx where
import Text.Parsec
import Text.Parsec.Token
import Data.Char (isSpace)
import Data.Maybe (fromMaybe)
import System.Environment (getArgs)
import System.IO.Unsafe (unsafePerformIO)
myDef :: LanguageDef st
myDef = LanguageDef
{ commentStart = ""
, commentEnd = ""
, commentLine = "#"
, nestedComments = True
, identStart = letter
, identLetter = alphaNum
, opStart = opLetter myDef
, opLetter = oneOf ":!#$%&*+./<=>?@\\^|-~"
, reservedOpNames = []
, reservedNames = []
, caseSensitive = True
}
TokenParser{parens = myParens
, identifier = myIdentifier
, reservedOp = myReservedOp
, reserved = myReserved
, semiSep1 = mySemiSep1
, whiteSpace = myWhiteSpace } = makeTokenParser myDef
simpleSpace :: Parsec String st ()
simpleSpace = skipMany1 (satisfy isSpace)
upperIdentifier :: Parsec String st String
upperIdentifier = lookAhead upper >> myIdentifier
x `uio` y = unsafePerformIO x `seq` y
nameThenEnd :: Parsec String st String
nameThenEnd = do
print "at name" `uio` string "name"
print "at spaces after name" `uio` simpleSpace
maybeName <- print "at ident" `uio` optionMaybe upperIdentifier
-- I only match this here and not above with `optionMaybe (upperIdentifier <* simpleSpace)` for debugging purposes.
case maybeName of
Nothing -> (print "at no spaces after no ident" >> print maybeName) `uio` return ()
Just name -> (print "at spaces after ident" >> print maybeName) `uio` simpleSpace
string "end"
return (fromMaybe "" maybeName)
main :: IO ()
main = getArgs >>= \args -> print (parse (nameThenEnd <* eof) "" (args !! 0))
在没有给出 ident 的情况下,在它工作的地方运行示例:
> runhaskell MinEx.hs "name end"
"at name"
"at spaces after name"
"at ident"
"at no spaces after no ident"
Nothing
Right ""
非工作示例:
> runhaskell MinEx.hs "name Foo end"
"at name"
"at spaces after name"
"at ident"
"at spaces after ident"
Just "Foo"
Left (line 1, column 10):
unexpected "e"
谢谢。如果我遗漏了一些非常明显的东西,请道歉。
生成的
identifier
解析器是一个 lexeme 解析器,这意味着它将吃掉标识符后面的任何空格。因此,您的 simpleSpace
失败,因为没有剩余空间可供消耗,并且您使用 skipMany1
定义了它。
如果使用
makeTokenParser
生成的解析器,通常不需要手动处理空格(例如,使用 symbol
、reserved
或 reservedOp
而不是 string
)。请参阅文档了解更多信息。