如何更改多线模式和单线模式?

问题描述 投票:0回答:1

看来,这两个类似的问题都不包含我寻求的信息。

我不明白 CL-PPCRE 中的模式更改是如何工作的。我尝试使用嵌入式修饰符和关键字参数。你能向我解释一下这种行为吗?

;; From: https://perldoc.perl.org/perlretut
;;    "Here are the four possible combinations:
;;     $x = "There once was a girl\nWho programmed in Perl\n";
;;     $x =~ /^Who/;   # doesn't match, "Who" not at start of string
;;     $x =~ /^Who/s;  # doesn't match, "Who" not at start of string
;;     $x =~ /^Who/m;  # matches, "Who" at start of second line
;;     $x =~ /^Who/sm; # matches, "Who" at start of second line
;;     $x =~ /girl.Who/;   # doesn't match, "." doesn't match "\n"
;;     $x =~ /girl.Who/s;  # matches, "." matches "\n"
;;     $x =~ /girl.Who/m;  # doesn't match, "." doesn't match "\n"
;;     $x =~ /girl.Who/sm; # matches, "." matches "\n"

(defparameter *x* (str:concat "There once was a lady"
                              (string #\Newline)
                              "Who programmed in Lisp"
                              (string #\Newline))))
;; I was pointed out on #commonlisp that I had simply copied 
;; the Perl string: "There once was a lady\nWho programmed in Lisp".
;;  But - unfortunately - that wasn't the solution already. :-(

(ppcre:scan "^Who" *x*)         ; => NIL
(ppcre:scan "^(?s)Who" *x*)     ; => NIL
(ppcre:scan "^(?m)Who" *x*)     ; => NIL -> unfortunately
(ppcre:scan "^(?sm)Who" *x*)    ; => NIL -> unfortunately
(ppcre:scan "lady.Who" *x*)     ; => 17, 25, #(), #() -> unfortunately
(ppcre:scan "(?s)lady.Who" *x*) ; => 17, 25, #(), #() -> unfortunately
(ppcre:scan "(?m)lady.Who" *x*) ; => 17, 25, #(), #() -> unfortunately
(ppcre:scan "^(?sm)Who" *x*)    ; => NIL -> unfortunately

;; Maybe I embedded the modifiers at the wrong place?
(let ((s (ppcre:create-scanner "^Who")))
  (ppcre:scan s *x*)) ; => NIL
(let ((s (ppcre:create-scanner "^Who" :single-line-mode t)))
  (ppcre:scan s *x*)) ; => NIL
(let ((s (ppcre:create-scanner "^Who" :single-line-mode t)))
  (ppcre:scan s *x*)) ; => NIL -> nnnnope
(let ((s (ppcre:create-scanner "lady.Who" :single-line-mode t
                                          :multi-line-mode t)))
  (ppcre:scan s *x*)) ; => 17, 25, #(), #() -> At least consistent
(let ((s (ppcre:create-scanner "lady.Who" :single-line-mode t)))
  (ppcre:scan s *x*)) ; => 17, 25, #(), #() -> as well
(let ((s (ppcre:create-scanner "lady.Who" :multi-line-mode t)))
  (ppcre:scan s *x*)) ; => 17, 25, #(), #() -> as well
(let ((s (ppcre:create-scanner "^Who" :single-line-mode t
                                      :multi-line-mode t)))
  (ppcre:scan s *x*)) ; => NIL -> as well

我的理解问题的入口是这样的:

;; So (three let forms just for three explicit outputs):
(let ((s (ppcre:create-scanner "^.$" :multi-line-mode nil)))
  (ppcre:scan s "\\n"))                              ; => NIL -> like above
(let ((s (ppcre:create-scanner "^.$" :multi-line-mode nil)))
  (ppcre:scan s "a"))                     ; => 0, 1, #(), #() -> like above
(let ((s (ppcre:create-scanner "^.$" :multi-line-mode nil)))
  (ppcre:scan s "a\\n"))                             ; => NIL -> like above

;; This seems to be the default setting. Now, let's try the opposite:

(let ((s (ppcre:create-scanner "^.$" :multi-line-mode t)))
  (ppcre:scan s "\\n"))                              ; => NIL -> like above
(let ((s (ppcre:create-scanner "^.$" :multi-line-mode t)))
  (ppcre:scan s "a"))                     ; => 0, 1, #(), #() -> like above
(let ((s (ppcre:create-scanner "^.$" :multi-line-mode t)))
  (ppcre:scan s "a\\n"))                             ; => NIL -> like above

;; Oops.

;; The documentation:
;;   "* Consider using 'single-line mode' if it makes sense for your task.
;;      By default (following Perl's practice), a dot means to search for
;;      any character except line breaks. In single-line mode a dot searches
;;      for any character which in some cases means that large parts of
;;      the target can actually be skipped. This can be vastly more
;;      efficient for large targets."

;; So, by default :MULTI-LINE-MODE is T. But why there is no effect?

(let ((s (ppcre:create-scanner "^.$" :multi-line-mode nil :single-line-mode t)))
  (ppcre:scan s "\\n"))                              ; => NIL -> like above
(let ((s (ppcre:create-scanner "^.$" :multi-line-mode nil :single-line-mode t)))
  (ppcre:scan s "a"))                     ; => 0, 1, #(), #() -> like above
(let ((s (ppcre:create-scanner "^.$" :multi-line-mode nil :single-line-mode t)))
  (ppcre:scan s "a\\n"))                             ; => NIL -> like above

非常感谢您的提示。

regex common-lisp modifier cl-ppcre
1个回答
0
投票

您的正则表达式存在一些问题:例如,

"^(?m)Who"
应该是
"(?m)^Who"
。随着这一变化,

CL-USER> (ppcre:scan "(?m)^Who" *x*)
22
25
#()
#()

扫描仪必须处于多行模式之前看到

^
,以便它知道在行的开头匹配,而不仅仅是在字符串的开头。

其他一些:

  • "lady.Who"
    应该无法匹配为 。与换行符不匹配。确实:
CL-USER> (ppcre:scan "lady.Who" *x*) 
NIL

但是您在代码中的注释中指出它匹配。你确定

*x*
就是你想的那样吗?

  • 下一张,
CL-USER> (ppcre:scan "(?s)lady.Who" *x*)
17
25
#()
#()

你也说这些结果很不幸,但它们正是我所期望的。毕竟,进入单行模式会使

.
匹配换行符。

  • 然后
CL-USER> (ppcre:scan "(?m)lady.Who" *x*)
NIL

多行模式不会改变

.
匹配的内容,因此这里会失败;但你的评论再次表明它适合你?

  • 最后,
    "^(?sm)Who"
    再次出现
    ^
    太早,无法受到模式更改的影响:
CL-USER> (ppcre:scan "^(?sm)Who" *x*) 
NIL
CL-USER> (ppcre:scan "(?sm)^Who" *x*)
22
25
#()
#()
© www.soinside.com 2019 - 2024. All rights reserved.