使用 Pandoc 和 Lua 过滤掉不需要的类

问题描述 投票:0回答:1

我希望能够根据传入的类列表,使用 Pandoc 从命令行过滤 markdown。规则是:

  • 如果元素具有不在传入列表中的一个或多个类,则会将其删除,包括其空格。
  • 如果元素没有类,则它保留。

我用来测试它的降价将是这样的:

---
title: Example doc
---
    
## Images

![Generic image](generic_image.png)

![Generic image](generic_image.png){width=30px}

![Color only image](color_image.png){.color-only}

![Color only image](color_image.png){.color-only width=30px}

![Color only image](color_image.png){width=30px .color-only}

![BW only image](bw_image.png){.bw-only}

![BW only image](bw_image.png){.bw-only width=30px}

![BW only image](bw_image.png){width=30px .bw-only}

## Blocks

::: {.other}
Block that shouldn't be filtered.
:::

:::
Block that shouldn't be filtered.
:::

::: {.color-only}
Color only block.
:::

::: {.bw-only}
BW only block.
:::

## Spans

[Span that shouldn't be filtered]{.other}

[Color only span]{.color-only}

[BW only span]{.bw-only}

## Links

[Link that shouldn't be filtered](link.html)

[Link that shouldn't be filtered](link.html){.other}

[Color only link](link.html){.color-only}

[BW only link](link.html){.bw-only}

我尝试创建一个 Lua 过滤器,但我是新手,而且我根本不具备相关知识。另外,我猜可能已经有一个过滤器可以做到这一点,但我还没有找到它。谁能指出我正确的方向吗?

谢谢

lua markdown pandoc
1个回答
0
投票

我有一个类似的过滤器,似乎可以工作——不过,有一个非常令人沮丧的问题,我将在下面尝试解释。

我的方法是在 pandoc cli 的元数据参数中以逗号分隔列表的形式提供我想要保留的类的名称。在过滤器内,我使用 LPeg 将该列表解析为表格;然后,我对块和内联元素分别应用了一个过滤器,并使用该表来测试包含情况。

给定这个过滤器(

classfilter.lua
上的
LUA_PATH
):

-- split arglist by lpeg, function as it appears on lpeg documentation:
-- https://www.inf.puc-rio.br/~roberto/lpeg/
local function split(s, sep)
    sep = lpeg.P(sep)
    local elem = lpeg.C((1 - sep) ^ 0)
    local p = lpeg.Ct(elem * (sep * elem) ^ 0) -- make a table capture
    return lpeg.match(p, s)
end

local keeplist = {}
-- This function will go inside the Meta filter
-- the keeplist table will be available to the rest
-- of the filters to consult
local function collect_vars(m)
    for _, classname in pairs(split(m.keeplist, ",")) do
        keeplist[classname] = true
    end
end

local function keep_elem(_elem)
    -- keep if no class designation
    if not _elem.classes or #_elem.classes == 0 then
        return true
    end
    -- keep if class name in keeplist
    for _, classname in ipairs(_elem.classes) do
        if keeplist[classname] ~= nil then
            return true
        end
    end
    -- don't keep otherwise

    return false
end

local function filter_list_by_classname(_elems)
    for _elemidx, _elem in ipairs(_elems) do
        if not keep_elem(_elem) then
            _elems:remove(_elemidx)
        end
    end
    return _elems
end

-- forcing the meta filter to run first
return { { Meta = collect_vars }, { Inlines = filter_list_by_classname, Blocks = filter_list_by_classname } }

-- 以及这个 pandoc cli 命令:

pandoc -f markdown -t markdown --lua-filter=classfilter.lua orn.md -M 'keeplist=other,bw-only'

-- 我得到以下输出:

## Images

![Generic image](generic_image.png)

![Generic image](generic_image.png){width="30px"}

<figure>

<figcaption>Color only image</figcaption>
</figure>

<figure>

<figcaption>Color only image</figcaption>
</figure>

<figure>

<figcaption>Color only image</figcaption>
</figure>

![BW only image](bw_image.png){.bw-only}

![BW only image](bw_image.png){.bw-only width="30px"}

![BW only image](bw_image.png){.bw-only width="30px"}

## Blocks

::: other
Block that shouldn't be filtered.
:::

::: Block that shouldn't be filtered. :::

::: bw-only
BW only block.
:::

## Spans

[Span that shouldn't be filtered]{.other}

[BW only span]{.bw-only}

## Links

[Link that shouldn't be filtered](link.html)

[Link that shouldn't be filtered](link.html){.other}

[BW only link](link.html){.bw-only}

...这似乎是所需的输出,除了

<figure>
标签,我无法删除它。 假设我的
Inline
过滤器仅从
src
中删除
Figure
元素,并保持
caption
完好无损,我尝试在单独的过滤器中迭代
Figure
Image
元素,以找到
带有空 
Figure
 字段的 contents
块,用空表替换它们——但这根本不会改变结果。我的意思是,在 ipairs 循环之前将以下内容添加到
filter_list_by_classname
函数中:

_elems:walk({
    Figure = function(a_figure)
        if not a_figure.content[1].content[1] then
            return {}
        else
            return a_figure
        end
    end
})

什么也没做。

所以也许这可能是解决方案的开始。

© www.soinside.com 2019 - 2024. All rights reserved.