使用 ITextRenderer 从包含非拉丁字符的 HTML 生成 PDF 不起作用

Question

这是我调查的第二天，但没有任何结果。至少现在我可以问一些非常具体的问题了。

我正在尝试使用 iText，更具体地说，使用 Flying Saucer 中的 ITextRenderer，在 PDF 文件中编写包含一些非拉丁字符的有效 HTML 代码。

我的简短示例/代码首先使用以下值初始化字符串变量 doc：

String doc = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><html xmlns=\"http://www.w3.org/1999/xhtml\" lang=\"en\">"
            + "<body>Some greek characters: Καλημέρα Some greek characters"
            + "</body></html>";

这是我用于调试目的的代码。我将此字符串保存到 HTML 文件，然后通过浏览器打开它，只是为了仔细检查 HTML 内容是否有效，并且我仍然可以读取希腊字符：

//write for debugging purposes in an html file
File newTextFile = new File("C:/work/test.html");
FileWriter fw = new FileWriter(newTextFile);
fw.write(doc);
fw.close();

下一步是尝试将此值写入 PDF 文件中。这是我的代码：

ITextRenderer renderer = new ITextRenderer();
    //add some fonts - if paths are not right, an exception will be thrown
    renderer.getFontResolver().addFont("c:/work/fonts/TIMES.TTF", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
    renderer.getFontResolver().addFont("c:/work/fonts/TIMESBD.TTF", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
    renderer.getFontResolver().addFont("c:/work/fonts/TIMESBI.TTF", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
    renderer.getFontResolver().addFont("c:/work/fonts/TIMESI.TTF", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);


    final DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory
            .newInstance();
    documentBuilderFactory.setValidating(false);
    DocumentBuilder builder = documentBuilderFactory.newDocumentBuilder();
    builder.setEntityResolver(FSEntityResolver.instance());
    org.w3c.dom.Document document = builder.parse(new ByteArrayInputStream(
            doc.toString().getBytes("UTF-8")));

    renderer.setDocument(document, null);
    renderer.layout();
    renderer.createPDF(os);

我的代码最终结果是：

在 HTML 文件中我得到：一些希腊字符：Καλημέρα一些希腊字符（预期）

在PDF文件中我得到：一些希腊字符：一些希腊字符（意外 - 希腊字符被忽略！！）

依赖关系：

java版本“1.6.0_27”
itext-2.0.8.jar
de.huxhorn.lilith.3rdparty.flyingsaucer.core-renderer-8Pre2.jar

我也尝试过更多的字体，但我想我的问题与使用错误的字体无关。非常欢迎任何帮助。

谢谢

Answer 1

我来自捷克共和国，对我们的国家象征也有同样的问题！经过一番搜索，我设法用这个解决方案解决了它。

特别是（你已经拥有的）：

renderer
    .getFontResolver()
    .addFont(fonts.get(i).getFile().getPath(), 
             BaseFont.IDENTITY_H, 
             BaseFont.NOT_EMBEDDED);

然后CSS中的重要部分：

* {
  font-family: Verdana;
/*  font-family: Times New Roman; - alternative. Without ""! */
}

在我看来，如果没有那个CSS，你的字体就不会被使用。当我从 CSS 中删除这些行时，编码再次被破坏。

希望这会有所帮助！

Answer 2

在 HTML 中添加如下内容：

<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE html>
<html>
    <head>
        <meta http-equiv='Content-Type' content='text/html; charset=UTF-8'/>
        <style type='text/css'> 
            * { font-family: 'Arial Unicode MS'; }
        </style>
    </head>
    <body>
        <span>Some text with šđčćž characters</span>
    </body>
</html>

然后在java代码中将FontResolver添加到ITextRenderer中：

ITextRenderer renderer = new ITextRenderer();
renderer.getFontResolver().addFont("fonts/ARIALUNI.TTF", BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);

非常适合克罗地亚角色

用于生成 PDF 的 jar 是：

core-renderer.jar
iText-2.0.8.jar

Answer 3

让

iText

从您的 html 内容中读取包含

utf-8

内容的标题信息。
在 html 代码中为

meta

添加

content-type

标签，使用

utf-8

charset

编码，然后运行

iText

生成 PDF 并检查结果。

<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
 <head>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
 </head>
 <body>
  Some greek characters: Καλημέρα Some greek characters
 </body>
</html>

更新：
如果上述方法不起作用，请参阅

http://www.manning.com/lowagie2/iText2E_MEAP_CH02.pdf

发布的文档中的 ENCODING VERSUS THE DEFAULT CHARSET USED BY THE JVM

Answer 4

我在渲染泰国角色时遇到了类似的问题。通过以下方式解决了问题：

将 ttf 文件附加到
```
ITextRenderer
```
。我使用的 TTF 文件：
```
arialuni.ttf
```
。链接：https://code.google.com/archive/p/ipwn/downloads
将
```
Arial Unicode MS
```
添加到您的 CSS 字体系列中。

* {
  font-family: 'Arial Unicode MS';
}

希望这对其他有需要的人也有用。

使用 ITextRenderer 从包含非拉丁字符的 HTML 生成 PDF 不起作用

问题描述投票：0回答：4

4个回答

最新问题

使用 ITextRenderer 从包含非拉丁字符的 HTML 生成 PDF 不起作用

问题描述 投票：0回答：4

4个回答

最新问题

问题描述投票：0回答：4