如何使用POI替换Word中书签设置的内容?

问题描述 投票:0回答:1

我正在研究一个替换书签内容的功能。我写了一个例子。在输入的单词中,我为 Bill 设置了“姓名”书签。下面的代码将 Bill 对应的“姓名”书签替换为 Ryan。本示例运行可以达到预期的效果。这只是一个简单的演示。实际上,存在一个问题,替换后,迭代 xwpfParagraph.getRuns() 返回的对象会抛出异常。下面的代码是删除书签内的内容并插入新的 XWPFRun。在我的实际项目中,每个书签对应一个XWPFRun。是否可以直接通过书签找到对应的XWPFRun并设置其内容?由于XWPFRuns可以有不同的字体,如果我们直接设置的话,就不需要单独设置字体了。

package com.office;

import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;
import org.apache.xmlbeans.XmlCursor;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.*;
import org.w3c.dom.Node;

import java.io.*;
import java.math.BigInteger;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

public class BookMarkTest {


    public static void main(String[] args) {
        try {
            FileInputStream is = new FileInputStream("f:\\test.docx");
            XWPFDocument document = new XWPFDocument(is);

            Map<String, Object> bookTagMap = new HashMap<>();
            bookTagMap.put("name", "Ryan");
            replaceBookTag(document, bookTagMap);

            FileOutputStream os = new FileOutputStream("f:\\test1.docx");
            document.write(os);

        } catch (Exception e) {
            e.printStackTrace();
        }
    }


    public static void replaceBookTag(XWPFDocument document, Map<String, Object> bookTagMap) {
        List<XWPFParagraph> paragraphList = document.getParagraphs();
        for (XWPFParagraph xwpfParagraph : paragraphList) {
            CTP ctp = xwpfParagraph.getCTP();
            List<CTBookmark> bookmarks =  ctp.getBookmarkStartList();
            for(CTBookmark bookmark: bookmarks) {
                if (bookTagMap.containsKey(bookmark.getName())) {

                    XWPFRun run = xwpfParagraph.createRun();
                    run.setText(bookTagMap.get(bookmark.getName()).toString());

                    Node firstNode = bookmark.getDomNode();
                    Node nextNode = firstNode.getNextSibling();
                    while (nextNode != null) {
                        // 循环查找结束符
                        String nodeName = nextNode.getNodeName();
                        if (nodeName.equals("w:bookmarkEnd")) {
                            break;
                        }

                        // 删除中间的非结束节点,即删除原书签内容
                        Node delNode = nextNode;
                        nextNode = nextNode.getNextSibling();

                        ctp.getDomNode().removeChild(delNode);
                    }

                    if (nextNode == null) {
                        // 始终找不到结束标识的,就在书签前面添加
                        ctp.getDomNode().insertBefore(run.getCTR().getDomNode(), firstNode);
                    } else {
                        // 找到结束符,将新内容添加到结束符之前,即内容写入bookmark中间
                        ctp.getDomNode().insertBefore(run.getCTR().getDomNode(), nextNode);
                    }
                }
            }

        }

        for (XWPFParagraph xwpfParagraph : paragraphList) {
            for(XWPFRun run: xwpfParagraph.getRuns()){
                System.out.println(run.text());
            }
        }
    }

}

以下是异常信息:

org.apache.xmlbeans.impl.values.XmlValueDisconnectedException
    at org.apache.xmlbeans.impl.values.XmlObjectBase.check_orphaned(XmlObjectBase.java:1258)
    at org.apache.xmlbeans.impl.values.XmlObjectBase.newCursor(XmlObjectBase.java:286)
    at org.apache.poi.xwpf.usermodel.XWPFRun.text(XWPFRun.java:1262)
    at com.office.BookMarkTest.replaceBookTag(BookMarkTest.java:79)
    at com.office.BookMarkTest.main(BookMarkTest.java:27)

输入单词test.docx

输出单词test1.docx

在实际项目中,Word文件往往较大,书签较多,可能会导致性能问题。有没有一种有效的方法通过书签定位 XWPFRun?

java replace ms-word apache-poi bookmarks
1个回答
0
投票

Office Open XML 格式 (

*.docx
) 中的 Microsoft Word 中的书签文本部分位于
bookmarkStart
bookmarkEnd
中的
document.xml
元素之间。对于运行元素,它看起来像这样:

test.docx

解压该文件并查看

/word/document.xml
。你会发现这样的 XML:

...
<w:bookmarkStart w:id="0" w:name="name"/>
<w:r>
 <w:t>name</w:t>
</w:r>
<w:bookmarkEnd w:id="0"/>
...

因此,在循环段落并运行时,可以检查找到的运行是否具有

bookmarkStart
作为 XML 中的前一个兄弟。如果是这样,则此运行已添加书签。

要获取 XML 中的上一个同级,需要 XmlCursor

完整代码示例:

import java.io.FileInputStream;
import java.io.FileOutputStream;

import org.apache.poi.xwpf.usermodel.*;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTBookmark;
import org.apache.xmlbeans.XmlCursor;

import java.util.Map;
import java.util.HashMap;

public class WordFindBookmarkedXWPFRun {
    
 static boolean isXWPFRunAfterBookmark(XWPFRun run, String bookmarkName) {
  XmlCursor cursor = run.getCTR().newCursor();
  if (!cursor.toPrevSibling()) { // is there a previous sibling?
   return false; // if not, then there is no bookmark
  }
  if (!(cursor.getObject() instanceof CTBookmark)) { // is previous sibling instance of CTBookmark?
   return false; // if not, then there is no bookmark
  } 
  CTBookmark bookmarkStart = (CTBookmark)cursor.getObject();
  if (!bookmarkName.equals(bookmarkStart.getName())) { // is bookmark name equal to searched name?
   return false; // if not, then this is ot searched bookmark
  }  
  return true; // this run is immediatelly after searched bookmark
 }

 public static void main(String[] args) throws Exception {

  XWPFDocument document = new XWPFDocument(new FileInputStream("./test.docx"));

  Map<String, String> bookTagMap = new HashMap<>();
  bookTagMap.put("name", "Axel Richter");  
  bookTagMap.put("amount", "4,567.89");  
  bookTagMap.put("date", "2023-12-27");  

  for (XWPFParagraph paragraph : document.getParagraphs()) {
   for (XWPFRun run : paragraph.getRuns()) {
    for (String key : bookTagMap.keySet()) {
     if (isXWPFRunAfterBookmark(run, key )) { // check if the run is after a bookmark having this key as name
      // if so, set run text to given string
      run.setText(bookTagMap.get(key), 0);
     }
    }
   }
  }

  FileOutputStream out = new FileOutputStream("./test1.docx");
  document.write(out);
  out.close();
  document.close();

 }
}

结果:

© www.soinside.com 2019 - 2024. All rights reserved.