我可以使用 JEditorPane 解析 rtf 文本并将其转换为 html。 但 html 输出缺少某些格式,即本例中的删除线标记。 正如您在输出中看到的,下划线文本被正确包裹在 内,但没有删除线包裹。 有什么想法吗?
public void testRtfToHtml()
{
JEditorPane pane = new JEditorPane();
pane.setContentType("text/rtf");
StyledEditorKit kitRtf = (StyledEditorKit) pane.getEditorKitForContentType("text/rtf");
try
{
kitRtf.read(
new StringReader(
"{\\rtf1\\ansi \\deflang1033\\deff0{\\fonttbl {\\f0\\froman \\fcharset0 \\fprq2 Times New Roman;}}{\\colortbl;\\red0\\green0\\blue0;} {\\stylesheet{\\fs20 \\snext0 Normal;}} {\\plain \\fs26 \\strike\\fs26 This is supposed to be strike-through.}{\\plain \\fs26 \\fs26 } {\\plain \\fs26 \\ul\\fs26 Underline text here} {\\plain \\fs26 \\fs26 .{\\u698\\'20}}"),
pane.getDocument(), 0);
kitRtf = null;
StyledEditorKit kitHtml =
(StyledEditorKit) pane.getEditorKitForContentType("text/html");
Writer writer = new StringWriter();
kitHtml.write(writer, pane.getDocument(), 0, pane.getDocument().getLength());
System.out.println(writer.toString());
}
catch (Exception e)
{
e.printStackTrace();
}
}
输出:
<html>
<head>
<style>
<!--
p.Normal {
RightIndent:0.0;
FirstLineIndent:0.0;
LeftIndent:0.0;
}
-->
</style>
</head>
<body>
<p class=default>
<span style="color: #000000; font-size: 13pt; font-family: Times New Roman">
This is supposed to be strike-through.
</span>
<span style="color: #000000; font-size: 13pt; font-family: Times New Roman">
</span>
<span style="color: #000000; font-size: 13pt; font-family: Times New Roman">
<u>Underline text here</u>
</span>
<span style="color: #000000; font-size: 13pt; font-family: Times New Roman">
.?
</span>
</p>
</body>
</html>
您可以尝试使用 OpenOffice 或 LibreOffice 进行转换,使用 此转换器库,如本博客文章中所述
这是我用来将 RTF 从 .msg 正文转换为 HTML 的函数。 请参阅 GitHub 上我的 Outlook 消息解析器 yamp 存储库。
public static String rtfToHtml(String rtfText) {
if (rtfText != null) {
rtfText = rtfText.replaceAll("\\{\\\\\\*\\\\[m]?htmltag[\\d]*(.*)}", "$1")
.replaceAll("\\\\htmlrtf[1]?(.*)\\\\htmlrtf0", "")
.replaceAll("\\\\htmlrtf[01]?", "")
.replaceAll("\\\\htmlbase", "")
.replaceAll("\\\\par", "\n")
.replaceAll("\\\\tab", "\t")
.replaceAll("\\\\line", "\n")
.replaceAll("\\\\page", "\n\n")
.replaceAll("\\\\sect", "\n\n")
.replaceAll("\\\\emdash", "ߞ")
.replaceAll("\\\\endash", "ߝ")
.replaceAll("\\\\emspace", "ߓ")
.replaceAll("\\\\enspace", "ߒ")
.replaceAll("\\\\qmspace", "ߕ")
.replaceAll("\\\\bullet", "ߦ")
.replaceAll("\\\\lquote", "ߢ")
.replaceAll("\\\\rquote", "ߣ")
.replaceAll("\\\\ldblquote", "ÉC;")
.replaceAll("\\\\rdblquote", "ÉD;")
.replaceAll("\\\\row", "\n")
.replaceAll("\\\\cell", "|")
.replaceAll("\\\\nestcell", "|")
.replaceAll("([^\\\\])\\{", "$1")
.replaceAll("([^\\\\])}", "$1")
.replaceAll("[\\\\](\\{)", "$1")
.replaceAll("[\\\\](})", "$1")
.replaceAll("\\\\u([0-9]{2,5})", "&#$1;")
.replaceAll("\\\\'([0-9A-Fa-f]{2})", "&#x$1;")
.replaceAll("\"cid:(.*)@.*\"", "\"$1\"");
int index = rtfText.indexOf("<html");
if (index != -1) {
return rtfText.substring(index);
}
}
return null;
}
由于一些错误,我这样修改你的函数:
public static String rtfToHtml(String rtfText) {
StringBuilder sb = new StringBuilder();
if (rtfText != null) {
String[] lignes = rtfText.split("[\\r\\n]+");
for (String ligne : lignes) {
String tempLine = ligne
.replaceAll("\\{\\\\\\*\\\\[m]?htmltag[\\d]*([^}]*)\\}", "$1")
.replaceAll("\\\\htmlrtf0([^\\\\]*)\\\\htmlrtf", "$1")
.replaceAll("\\\\htmlrtf \\{(.*)\\}\\\\htmlrtf0", "$1")
.replaceAll("\\\\htmlrtf (.*)\\\\htmlrtf0", "")
.replaceAll("\\\\htmlrtf[0]?", "")
.replaceAll("\\\\field\\{\\\\\\*\\\\fldinst\\{[^}]*\\}\\}", "")
.replaceAll("\\{\\\\fldrslt\\\\cf1\\\\ul([^}]*)\\}", "$1")
.replaceAll("\\\\htmlbase", "")
.replaceAll("\\\\par", "\n")
.replaceAll("\\\\tab", "\t")
.replaceAll("\\\\line", "\n")
.replaceAll("\\\\page", "\n\n")
.replaceAll("\\\\sect", "\n\n")
.replaceAll("\\\\emdash", "ߞ")
.replaceAll("\\\\endash", "ߝ")
.replaceAll("\\\\emspace", "ߓ")
.replaceAll("\\\\enspace", "ߒ")
.replaceAll("\\\\qmspace", "ߕ")
.replaceAll("\\\\bullet", "ߦ")
.replaceAll("\\\\lquote", "ߢ")
.replaceAll("\\\\rquote", "ߣ")
.replaceAll("\\\\ldblquote", "ÉC;")
.replaceAll("\\\\rdblquote", "ÉD;")
.replaceAll("\\\\row", "\n")
.replaceAll("\\\\cell", "|")
.replaceAll("\\\\nestcell", "|")
.replaceAll("([^\\\\])\\{", "$1")
.replaceAll("([^\\\\])}", "$1")
.replaceAll("[\\\\](\\{)", "$1")
.replaceAll("[\\\\](})", "$1")
.replaceAll("\\\\u([0-9]{2,5})", "&#$1;")
.replaceAll("\\\\'([0-9A-Fa-f]{2})", "&#x$1;")
.replaceAll("\"cid:(.*)@.*\"", "\"$1\"")
.replaceAll(" {2,}", " ")
;
if (!tempLine.replaceAll("\\s+", "").isEmpty()) {
sb.append(tempLine).append("\r\n");
}
}
rtfText = sb.toString();
int index = rtfText.indexOf("<html");
if (index != -1) {
return rtfText.substring(index);
}
}
return null;
}
rtf-to-html开源库有一个相当不错的解决方案,该库仍然活跃(刚刚发布了 1.1.0)。
public String rtfToHtml(String rtfContent) {
return RTF2HTMLConverterRFCCompliant.INSTANCE.rtf2html(rtfContent);
}
出于遗留目的,
RTF2HTMLConverterClassic
和RTF2HTMLConverterJEditorPane
也可用