标题出现时分割 Markdown 片段

问题描述 投票:0回答:2

我有以下 Markdown 片段:

# Glossary

This guide is aimed to familiarize the users with definitions to relevant DVC
concepts and terminologies which are frequently used.

## Workspace directory

Also abbreviated as workspace, it is the root directory of a project where DVC
is initialized by running `dvc init` command. Therefore, this directory will
contain a `.dvc` directory as well.

## Cache directory

DVC cache is a hidden storage which is found at `.dvc/cache`. This storage is
used to manage different versions of files which are under DVC control. For more
information on cache, please refer to the this
[guide](/doc/commands-reference/config#cache).

我想将其拆分,以便存在应该是的匹配:

# Glossary
...
## Workspace directory
...
## Cache directory
...

我尝试使用正则表达式来匹配它们

/#{1,2}\s.+\n{2}[^(#{2}\s)]*/
。我的意图是首先将标题与这部分
#{1,2}\s.+\n{2}
匹配,然后在找到
##\s
时终止匹配。但我在第二部分失败了。有人可以指导我吗?

javascript regex markdown
2个回答
2
投票

split
/^(?=#+ )/m
正则表达式(演示)一起使用或与
match(/^#+ [^#]*(?:#(?!#)[^#]*)*/gm)
匹配(参见 另一个演示):

let contents = `# Glossary

This guide is aimed to familiarize the users with definitions to relevant DVC
concepts and terminologies which are frequently used.

## Workspace directory

Also abbreviated as workspace, it is the root directory of a project where DVC
is initialized by running \`dvc init\` command. Therefore, this directory will
contain a \`.dvc\` directory as well.

## Cache directory

DVC cache is a hidden storage which is found at \`.dvc/cache\`. This storage is
used to manage different versions of files which are under DVC control. For more
information on cache, please refer to the this
[guide](/doc/commands-reference/config#cache).`;

console.log(contents.split(/^(?=#+ )/m).filter(Boolean));
console.log(contents.match(/^#+ [^#]*(?:#(?!#)[^#]*)*/gm));

输出:

[
  "# Glossary\n\nThis guide is aimed to familiarize the users with definitions to relevant DVC\nconcepts and terminologies which are frequently used.\n\n",
  "## Workspace directory\n\nAlso abbreviated as workspace, it is the root directory of a project where DVC\nis initialized by running `dvc init` command. Therefore, this directory will\ncontain a `.dvc` directory as well.\n\n",
  "## Cache directory\n\nDVC cache is a hidden storage which is found at `.dvc/cache`. This storage is\nused to manage different versions of files which are under DVC control. For more\ninformation on cache, please refer to the this\n[guide](/doc/commands-reference/config#cache)."
]

Regex #1(分割)图

enter image description here

正则表达式#2(匹配)图:

enter image description here


0
投票

我知道这是一篇旧帖子,但主题仍然相关,我希望比我有更多正则表达式知识的人会看到此评论并提供更新。

我一直在使用 Wiktor 的匹配正则表达式来查找标题以及下一个标题之前的后续文本。

除非文本正文中的任何位置有 h1 (#) 标题,否则它效果很好。如果存在,它将被“吞噬”并成为上一节的一部分,因为当正则表达式看到两个或多个 # 后跟一个空格,并且“#”不符合该条件时,它实际上会停止。

这会失败:

## header 2
some text
# header 1
some more text
## header 2b

第一场比赛将是:

## header 2
some text
# header 1
some more text

而不是:

## header 2
some text

假设似乎只有一个 h1 (#) 标题,并且其前面没有任何其他标题,那么我没有发现任何问题。

说实话,这对我来说在实践中并不是一个真正的问题,我只是在尝试理解 regex101.com 中的正则表达式时才发现它。

© www.soinside.com 2019 - 2024. All rights reserved.