我想使用基于拆分节点的 StAX 技术将一个大的 XML 文件拆分成许多部分。
问题是当拆分基础节点并列时(没有空格,没有制表符,也没有它们之间的分界线)。在转换指令时,解析器似乎不会读取这些内容。然而,当我输入注释转换指令时,这些节点被正确读取并输出到控制台。
波纹管 XML 示例。分裂基础是
AB
节点。
< ?xml version="1.0" encoding="UTF-8"?>
< root>
< AB Id="1">< BC attB="valB1">b1< /BC>< CD attC="valC1">< EF attE="valD1">c1< /EF>< /CD>< /AB>< AB Id="2">< BC attB="valB2">b2< /BC>< CD attC="valC2">< EF attE="valD2">c2< /EF>< /CD>< /AB>
< AB Id="3">
< BC attB="valB3">b3< /BC>
< CD attC="valC3">
< EF attE="valD3">c3< /EF>
< /CD>
< /AB>
< /root>
预期的输出应该是 3 个名为
Part_1.xml
、Part_2.xml
和Part_3.xml
的文件。每个文件应分别包含< AB Id="1">
及其子标签,< AB Id="2">
及其子标签和< AB Id="3">
及其子标签。他们都应该有 < root >
节点父节点。
不幸的是,我只获得了
Part_1.xml
和Part_2.xml
文件。在Part_1.xml
里面我得到< AB id = "1">
和它的子标签。但是在Part_2.xml
里面,我得到< AB id = "3">
和它的子标签而不是< AB id = "2">
和它的子标签。 < AB id = "2">
和它的子标签没写
当我发表评论时
transformer.transform(staxs, staxr);
行。 StreamReader 读取正确< AB id = "2">
.
代码:
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.XMLStreamReader;
import javax.xml.stream.XMLStreamWriter;
import javax.xml.stream.events.XMLEvent;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stax.StAXResult;
import javax.xml.transform.stax.StAXSource;
import org.apache.commons.lang3.StringUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.stereotype.Component;
@Component
final class Split {
private static final Logger LOG = LoggerFactory.getLogger(Split.class);
private static final String inputXmlFile = "Big xml file to split. Above a dummy sample";
private static final String pathToOutputFolder = System.getProperty("java.io.tmpdir");
private static final String[] parentTags = new String[] {"<root>"};
private static final String splitTag = "<AB>";
private static final String chunkNumber = "1";
public void run() {
final Transformer transformer;
XMLStreamReader xsr = null;
XMLStreamWriter xsw = null;
try {
transformer = TransformerFactory.newInstance().newTransformer();
final XMLInputFactory xif = XMLInputFactory.newInstance();
xsr = xif.createXMLStreamReader(new FileInputStream(inputXmlFile));
Short fileNumber = 0;
Short dataRepetitions = 0;
xsw = write(pathToOutputFolder, ++fileNumber, parentTags);
int tagCount = 0;
while (xsr.hasNext()) {
xsr.next();
if (xsr.getEventType() == XMLEvent.START_ELEMENT) {
tagCount++;
System.out.println("Tag _" + tagCount + ": " + xsr.getLocalName());
if (xsr.getLocalName().equals(splitTag)) {
System.out.println(xsr.getLocalName() + ": [" + xsr.getAttributeLocalName(0) + ", " + xsr.getAttributeValue(0) + "]");
if (dataRepetitions.equals(1)) {
xsw.flush();
xsw.writeEndDocument();
xsw.close();
xsw = write(pathToOutputFolder, ++fileNumber, parentTags);
dataRepetitions = 0;
}
final StAXSource staxs = new StAXSource(xsr);
final StAXResult staxr = new StAXResult(xsw);
transformer.transform(staxs, staxr);
dataRepetitions++;
}
}
}
} catch (final TransformerException | FileNotFoundException | XMLStreamException e) {
throw new SplitXmlRuntimeException(e.getMessage());
} finally {
try {
xsr.close();
if (xsw != null) {
xsw.flush();
xsw.writeEndDocument();
xsw.close();
}
} catch (final XMLStreamException e) {
LOG.error(e.getMessage());
}
}
}
private XMLStreamWriter write(final String pathToOutputFolder, final Short fileNumber, final String[] rootTags) throws XMLStreamException, FileNotFoundException {
XMLOutputFactory xmlOutputFactory = XMLOutputFactory.newInstance();
XMLStreamWriter writer = xmlOutputFactory.createXMLStreamWriter(new FileOutputStream(new File(pathToOutputFolder, "Part_" + fileNumber), true));
writer.writeStartDocument();
for (final String s : rootTags) {
writer.writeStartElement(s);
}
return writer;
}
}