HTML::Element endtag 为 <br> 和 <img>

问题描述 投票:0回答:1

我使用以下 Perl 代码来遍历和格式化一些 HTML:

#!/usr/bin/env perl 
use v5.38;
use HTML::TreeBuilder;
my $indent = 3;
my $content = do {local $/; <DATA>};
my $tree = HTML::TreeBuilder->new();
$tree->parse_content($content);
visit($tree);

sub visit($x) {
    my $depth = $x->depth;
    my $in = ' ' x ($indent * $depth);
    foreach my $e ($x->content_list) {
        # element
        if (ref ($e)) {
            say $in . $e->starttag;
            visit($e);
            say $in . $e->endtag;
        }
        # text
        else {
            say $in . $e;
        }
    }
}
__DATA__
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
</head>
<body>
    <font size=3><strong>
    5/5/61 Bob & Jerry - Arroyo Lounge, Stanford University, Palo Alto, CA
    </strong></font>
    <br>
    <img src="poster.png" alt="poster/ad" title="poster/ad">
    <i>(Robert Hunter and Jerry Garcia; source: McNally, Jackson research)</i>
    <br><br>

    <font size=3><strong>
    5/26/61 Bob & Jerry - Barbara Meier's 16th birthday party, Menlo Park, CA
    </strong></font>
    <br>
    Follow The Drinking Gourd, John Henry, Santy Anno*, Poor Paddy Works On The Railway
    <br>
    <i>(*included on
        <a href="https://www.garciafamilyprovisions.com/product/JY148COMBO/before-the-dead-4cd-set?cp=640_62123_100764" target="_blank">Before The Dead
        </a>;
        <a href="https://gdsets.com/63posters/1961_05_26.jpg" target="_blank">birthday doodle for Barbara by Jerry
        </a>;
        <a href="https://gdsets.com/63posters/1961_05_26a.jpg" target="_blank">the master tape
        </a>
    )
    </i>
    <br><br>

我的问题是每个

<br>
输出为:

<br />
</br>

<br />
</br>
都会导致渲染新行。我很惊讶
endtag
在标签
br
(和
img
)的情况下生成了任何东西。

我避免使用 HTML::Tree::traverse 因为文档不鼓励使用它:

[I]如果你想递归访问树中的每个节点,那就差不多了 编写子程序总是比编写子程序更简单 将订购前和/或订购后代码捆绑在回调中 遍历法。

没有给出例子,所以以上是我编造的。

我是否正确使用了

starttag
endtag
?我是否应该检测何时显示不带结束标记的标记并避免调用结束标记?遍历 HTML 树并美化它的正确/最佳/最简单的方法是什么?

更新:

按照 Stephen Ullrich 的建议,我尝试使用 as_HTML() 进行格式化:

#!/usr/bin/env perl 
use v5.38;
use HTML::TreeBuilder;
say "\%HTML::Element::optionalEndTag= ",
    join ', ', keys %HTML::Element::optionalEndTag;
my $content = do {local $/; <DATA>};
my $tree = HTML::TreeBuilder->new();
$tree->parse_content($content);
# don't encode any entities; indent with three spaces; 
say $tree->as_HTML('', '   ');
__DATA__
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
</head>
<body>
    <font size=3><strong>
    5/5/61 Bob & Jerry - Arroyo Lounge, Stanford University, Palo Alto, CA
    </strong></font>
    <br>
    <img src="poster.png" alt="poster/ad" title="poster/ad">
    <i>(Robert Hunter and Jerry Garcia; source: McNally, Jackson research)</i>
    <br><br>

    <font size=3><strong>
    5/26/61 Bob & Jerry - Barbara Meier's 16th birthday party, Menlo Park, CA
    </strong></font>
    <br>
    Follow The Drinking Gourd, John Henry, Santy Anno*, Poor Paddy Works On The Railway
    <br>
    <i>(*included on
        <a href="https://www.garciafamilyprovisions.com/product/JY148COMBO/before-the-dead-4cd-set?cp=640_62123_100764" target="_blank">Before The Dead
        </a>;
        <a href="https://gdsets.com/63posters/1961_05_26.jpg" target="_blank">birthday doodle for Barbara by Jerry
        </a>;
        <a href="https://gdsets.com/63posters/1961_05_26a.jpg" target="_blank">the master tape
        </a>
    )
    </i>
    <br><br>

输出:

%HTML::Element::optionalEndTag= dt, dd, li, p
<!DOCTYPE html>
<html lang="en">
   <head>
      <meta charset="utf-8" />
   </head>
   <body><font size="3"><strong> 5/5/61 Bob & Jerry - Arroyo Lounge, Stanford University, Palo Alto, CA </strong></font><br /><img alt="poster/ad" src="poster.png" title="poster/ad" /> <i>(Robert Hunter and Jerry Garcia; source: McNally, Jackson research)</i><br />
      <br /><font size="3"><strong> 5/26/61 Bob & Jerry - Barbara Meier's 16th birthday party, Menlo Park, CA </strong></font><br /> Follow The Drinking Gourd, John Henry, Santy Anno*, Poor Paddy Works On The Railway <br /><i>(*included on <a href="https://www.garciafamilyprovisions.com/product/JY148COMBO/before-the-dead-4cd-set?cp=640_62123_100764" target="_blank">Before The Dead </a>; <a href="https://gdsets.com/63posters/1961_05_26.jpg" target="_blank">birthday doodle for Barbara by Jerry </a>; <a href="https://gdsets.com/63posters/1961_05_26a.jpg" target="_blank">the master tape </a> ) </i><br />
      <br />
   </body>
</html>

不幸的是,这还不够“漂亮”。我不明白为什么缩进在前几级之后就消失了。但是,我确实注意到它不会生成

</br>
</img>
,尽管事实上
%HTML::Element::optionalEndTag
中没有提到这些标签!

更新2

(尽管它们列在 %HTML::Tagset::emptyElement

 中,由 as_HTML 检查。)

html perl pretty-print html-tree
1个回答
0
投票

<br>

<img>
(等等)是空元素;它们并不打算包围任何东西,因此没有必要使用单独的结束标记。尽管如此,无论 
tag
 是否为空元素,
HTML::Element::endtag 始终生成字符串 </
tag>

(请注意,starttag 足够智能,可以为

<

=... /> 等空标签编写 <img ... />tag
 
attr
<br />
。)

因此,程序员必须明确测试结束标记是否合适。幸运的是,有一个变量

%HTML::Tagset::emptyElement

,可以将每个空元素映射到 1(真)。

以下代码将以简单的缩进格式打印 OP 中提供的 HTML,每个标签位于单独的行上。

#!/usr/bin/env perl use v5.38; use HTML::TreeBuilder; my $indent = 3; my $content = do {local $/; <DATA>}; my $tree = HTML::TreeBuilder->new(); $tree->parse_content($content); visit($tree); sub visit($x) { use HTML::Tagset; my $depth = $x->depth; my $in = ' ' x ($indent * $depth); for my $e ($x->content_list) { if (ref ($e)) { # element say $in . $e->starttag; if (! $HTML::Tagset::emptyElement{$e->tag}) { visit($e); say $e->endtag; } } else { # text # for extra prettiness use Text::Wrap; $Text::Wrap::columns = 132; say wrap($in, $in, $e); } } } __DATA__ <!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> </head> <body> <font size=3><strong> 5/5/61 Bob & Jerry - Arroyo Lounge, Stanford University, Palo Alto, CA </strong></font> <br> <img src="poster.png" alt="poster/ad" title="poster/ad"> <i>(Robert Hunter and Jerry Garcia; source: McNally, Jackson research)</i> <br><br> <font size=3><strong> 5/26/61 Bob & Jerry - Barbara Meier's 16th birthday party, Menlo Park, CA </strong></font> <br> Follow The Drinking Gourd, John Henry, Santy Anno*, Poor Paddy Works On The Railway <br> <i>(*included on <a href="https://www.garciafamilyprovisions.com/product/JY148COMBO/before-the-dead-4cd-set?cp=640_62123_100764" target="_blank">Before The Dead </a>; <a href="https://gdsets.com/63posters/1961_05_26.jpg" target="_blank">birthday doodle for Barbara by Jerry </a>; <a href="https://gdsets.com/63posters/1961_05_26a.jpg" target="_blank">the master tape </a> ) </i> <br><br>
输出:

<head> <meta charset="utf-8" /> </head> <body> <font size="3"> <strong> 5/5/61 Bob & Jerry - Arroyo Lounge, Stanford University, Palo Alto, CA </strong> </font> <br /> <img alt="poster/ad" src="poster.png" title="poster/ad" /> <i> (Robert Hunter and Jerry Garcia; source: McNally, Jackson research) </i> <br /> <br /> <font size="3"> <strong> 5/26/61 Bob & Jerry - Barbara Meier's 16th birthday party, Menlo Park, CA </strong> </font> <br /> Follow The Drinking Gourd, John Henry, Santy Anno*, Poor Paddy Works On The Railway <br /> <i> (*included on <a href="https://www.garciafamilyprovisions.com/product/JY148COMBO/before-the-dead-4cd-set?cp=640_62123_100764" target="_blank"> Before The Dead </a> ; <a href="https://gdsets.com/63posters/1961_05_26.jpg" target="_blank"> birthday doodle for Barbara by Jerry </a> ; <a href="https://gdsets.com/63posters/1961_05_26a.jpg" target="_blank"> the master tape </a> ) </i> <br /> <br /> </body>
    
© www.soinside.com 2019 - 2024. All rights reserved.