我正在尝试使用docx4j将HTML5文件转换为docx。更大的图景是HTML包含阿拉伯数据和英语数据。我已经在HTML中的元素上设置了样式。我的HTML在chrome上看起来很整洁,但是当我使用docx4j转换为docx时,阿拉伯文本格式会丢失。在MS word上,它表明我的阿拉伯文字设置了粗体样式,但不是粗体。同样,RTL方向也会丢失。表从RTL反转为LTR。解决方法是,我使用BufferedWriter生成.doc文件,该文件将我的HTML文件与样式属性匹配,但html中存在Base64图像,而.doc文件中没有该图像。因此,需要转换为.docx格式。我的要求是从HTML生成的可编辑文档。当我挠头时,请引导我。没有源示例代码也可以正常工作。
这是我用来将HTML转换为docx的代码。
public boolean convertHTMLToDocx(String inputFilePath, String outputFilePath, boolean headerFlag,
boolean footerFlag,String orientation, String logoPath, String margin, JSONObject json,boolean isArabic) {
boolean conversionFlag;
boolean orientationFlag = false;
try {
if(!orientation.equalsIgnoreCase("Y")){
orientationFlag = true;
}
String stringFromFile = FileUtils.readFileToString(new File(inputFilePath), "UTF-8");
String unescaped = stringFromFile;
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.createPackage();
NumberingDefinitionsPart ndp = new NumberingDefinitionsPart();
wordMLPackage.getMainDocumentPart().addTargetPart(ndp);
ndp.unmarshalDefaultNumbering();
ImportXHTMLProperties.setProperty("docx4j-ImportXHTML.Bidi.Heuristic", true);
ImportXHTMLProperties.setProperty("docx4j-ImportXHTML.Element.Heading.MapToStyle", true);
ImportXHTMLProperties.setProperty("docx4j-ImportXHTML.fonts.default.serif", "Frutiger LT Arabic 45 Light");
ImportXHTMLProperties.setProperty("docx4j-ImportXHTML.fonts.default.sans-serif", "Frutiger LT Arabic 45 Light");
ImportXHTMLProperties.setProperty("docx4j-ImportXHTML.fonts.default.monospace", "Frutiger LT Arabic 45 Light");
XHTMLImporterImpl xHTMLImporter = new XHTMLImporterImpl(wordMLPackage);
xHTMLImporter.setHyperlinkStyle("Hyperlink");
xHTMLImporter.setParagraphFormatting(FormattingOption.CLASS_PLUS_OTHER);
xHTMLImporter.setTableFormatting(FormattingOption.CLASS_PLUS_OTHER);
xHTMLImporter.setRunFormatting(FormattingOption.CLASS_PLUS_OTHER);
wordMLPackage.getMainDocumentPart().getContent().addAll(xHTMLImporter.convert(unescaped, ""));
XmlUtils.marshaltoString(wordMLPackage.getMainDocumentPart().getJaxbElement(),true,true);
File output = new File(outputFilePath);
wordMLPackage.save(output);
Console.log("file path where it is stored is" + " " + output.getAbsolutePath());
if (headerFlag || footerFlag) {
File file = new File(outputFilePath);
InputStream in = new FileInputStream(file);
wordMLPackage = WordprocessingMLPackage.load(in);
if (headerFlag) {
// set Header
}
if (footerFlag) {
// set Footer
}
wordMLPackage.save(file);
Console.log("Finished editing the word document");
}
conversionFlag = true;
} catch (InvalidFormatException e) {
Error.log("Invalid format found:-" + getStackTrace(e));
conversionFlag = false;
} catch (Exception e) {
Error.log("Error while converting:-" + getStackTrace(e));
conversionFlag = false;
}
return conversionFlag;
}
我将从@JasonPlutext对docx4j论坛上类似问题之一给出的答案开始。我不得不提到,我使用的罐子存在此问题。然后,我点击了以下链接:
并在上面的链接页面上的以下注释中找到了罐子:
您可以尝试https://docx4java.org/docx4j/docx4j-Imp ... 180801.jar
它包含https://github.com/plutext/docx4j-Impor ... f378022303
[能否请您看一下https://github.com/plutext/docx4j-Impor ... iTest.java并为混合的希伯来语/阿拉伯语和从左到右的文本添加其他测试,尤其是在您觉得实现不正确的任何情况下。
而且,jar也没有下载,所以我搜索了jar的名称,并从jardownload.com下载了所有依赖项。尽管commons-codec和commons-io jar为1.3,但需要将其升级到最新的jar,以便在转换后以docx格式显示图像。但是,请确保html格式正确,因为docx4j要求严格遵守格式良好的html。
现在,我将讨论真正的部分,即如何使所有内容与html相同。我使用写入到.doc文件而不是.docx的简单字节数组来克服它。这样,该文档将与html完全相同。我面临的唯一问题是二进制图像未显示。仅一个框出现在图像位置。所以,我写了两个文件:1,读取html文件中的所有二进制图像标签,并使用Base64解码器解码图像,将图像保存在远程服务器的本地磁盘上,并将所有此类img标签的src属性替换为磁盘上的新位置。 (新位置之前是http:// {remote_server}:{remote_port} / {war_deployment_descriptor} / images /
2,我在部署在服务器上的war文件中创建了一个简单的servlet,该servlet侦听/ images上的获取请求,并在接收到具有路径名的get请求后,在outputstream上返回该图像。瞧,图片开始出现了。
由您决定,您要进行哪种转换。 .docx(需要使用docx4j作为转换之一)或.doc(需要将html作为字节数组简单地写入.doc文件并放置2个代码文件)。我的建议是将英文文档转换为.docx。对于阿拉伯语,希伯来语或其他RTL语言,如果不严格要求生成.docx,则应进行.doc转换。
Listing the two files, please change as per your need:
File1.java
------------------------------------------------------------------------------------------
public static void writeHTMLDatatoDoc(String content, String inputHTMLFile,String outputDocFile,String uniqueName) throws Exception {
String baseTag = getRemoteServerURL()+"/{war_deployment_desciptor}/images?image=";
String tag = "Image_";
String ext = ".png";
String srcTag = "";
String pathOnServer = getDiskPath() + File.separator + "TemplateGeneration"
+ File.separator + "generatedTemplates" + File.separator + uniqueName + File.separator + "images" + File.separator;
int i = 0;
boolean binaryimgFlag = false;
Pattern p = Pattern.compile("<img [^>]*src=[\\\"']([^\\\"^']*)");
Matcher m = p.matcher(content);
while (m.find()) {
String src = m.group();
int startIndex = src.indexOf("src=") + 5;
int endIndex = src.length();
// srcTag will contain data as data:image/png;base64,AAABAAEAEBAAAAEAGABoAw.........
// Replace this whole later with path on local disk
srcTag = src.substring(startIndex, src.length());
if(srcTag.contains("base64")) {
binaryimgFlag = true;
}
if(binaryimgFlag) {
// Extract image mime type and image extension from srcTag containing binary image
ext = extractMimeType(srcTag);
if(ext.lastIndexOf(".") != -1 && ext.lastIndexOf(".") != 0)
ext = ext.substring(ext.lastIndexOf(".")+1);
else
ext = ".png";
// read files already created for the different documents for this unique entity.
// The location contains all image files as Image_{i}.{image_extension}
// Sort files and read max counter in image names.
// Increase value of i to generate next image as Image_{incremented_i}.{image_entension}
i = findiDynamicallyFromFilesCreatedForWI(pathOnServer);
i++; // Increase count for next image
// save whole data to replace later
String srcTagBegin = srcTag;
// Remove data:image/png;base64, from srcTag , so I get only encoded image data.
// Decode this using Base64 decoder.
srcTag = srcTag.substring(srcTag.indexOf(",") + 1, srcTag.length());
byte[] imageByteArray = decodeImage(srcTag);
// Constrcu replacement tag
String replacement = baseTag+pathOnServer+tag+i+ext;
replacement = replacement.replace("\\", "/");
// Writing image inside local directory on server
FileOutputStream imageOutFile = new FileOutputStream(pathOnServer+tag+i+ext);
imageOutFile.write(imageByteArray);
content = content.replace(srcTagBegin, replacement);
imageOutFile.close();
}
}
//Re write HTML file
writeHTMLData(content,inputHTMLFile);
// write content to doc file
writeHTMLData(content,outputDocFile);
}
public static int findiDynamicallyFromFilesCreatedForWI(String pathOnServer) {
String path = pathOnServer;
int nextFileCount = 0;
String number = "";
String[] dirListing = null;
File dir = new File(path);
dirListing = dir.list();
if(dirListing.length != 0) {
Arrays.sort(dirListing);
int length = dirListing.length;
int index = dirListing[length - 1].indexOf('.');
number = dirListing[length - 1].substring(0,index);
int index1 = number.indexOf('_');
number = number.substring(index1+1,number.length());
nextFileCount = Integer.parseInt(number);
}
return nextFileCount;
}
private static String extractMimeType(final String encoded) {
final Pattern mime = Pattern.compile("^data:([a-zA-Z0-9]+/[a-zA-Z0-9]+).*,.*");
final Matcher matcher = mime.matcher(encoded);
if (!matcher.find())
return "";
return matcher.group(1).toLowerCase();
}
private static void writeHTMLData(String inputData, String outputFilepath) {
BufferedWriter writer = null;
try {
writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(new File(outputFilepath)), Charset.forName("UTF-8")));
writer.write(inputData);
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
if(writer != null)
writer.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
public static byte[] decodeImage(String imageDataString) {
return Base64.decodeBase64(imageDataString);
}
private static String readHTMLData(String inputFile) {
String data = "";
String str = "";
try (BufferedReader reader = new BufferedReader(
new InputStreamReader(new FileInputStream(new File(inputFile)), StandardCharsets.UTF_8))) {
while ((str = reader.readLine()) != null) {
data += str;
}
} catch (IOException e) {
e.printStackTrace();
}
return data;
}
------------------------------------------------------------------------------------------
File2.java
------------------------------------------------------------------------------------------
import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import com.newgen.clos.logging.consoleLogger.Console;
public class ImageServlet extends HttpServlet {
public void init() throws ServletException {
public ImageServlet() {
super();
}
protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
String param = request.getParameter("image");
Console.log("Image Servlet executed");
Console.log("File Name Requested: " + param);
param.replace("\"", "");
param.replace("%20"," ");
File file = new File(param);
response.setHeader("Content-Type", getServletContext().getMimeType(param));
response.setHeader("Content-Length", String.valueOf(file.length()));
response.setHeader("Content-Disposition", "inline; filename=\"" + param + "\"");
Files.copy(file.toPath(), response.getOutputStream());
}
}
------------------------------------------------------------------------------------------