我正在使用 XMLStreamReader 来实现我的目标(分割 xml 文件)。看起来不错,但仍然没有给出想要的结果。我的目标是从输入文件中分割每个节点“nextTag”:
<?xml version="1.0" encoding="UTF-8"?>
<firstTag>
<nextTag>1</nextTag>
<nextTag>2</nextTag>
</firstTag>
结果应该是这样的:
<?xml version="1.0" encoding="UTF-8"?><nextTag>1</nextTag>
<?xml version="1.0" encoding="UTF-8"?><nextTag>2</nextTag>
参考使用Java分割1GB Xml文件我用这段代码实现了我的目标:
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.StringWriter;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamReader;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stax.StAXSource;
import javax.xml.transform.stream.StreamResult;
public class Demo4 {
public static void main(String[] args) throws Exception {
InputStream inputStream = new FileInputStream("input.xml");
BufferedReader in = new BufferedReader(new InputStreamReader(inputStream));
XMLInputFactory factory = XMLInputFactory.newInstance();
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer();
XMLStreamReader streamReader = factory.createXMLStreamReader(in);
while (streamReader.hasNext()) {
streamReader.next();
if (streamReader.getEventType() == XMLStreamReader.START_ELEMENT
&& "nextTag".equals(streamReader.getLocalName())) {
StringWriter writer = new StringWriter();
t.transform(new StAXSource(streamReader), new StreamResult(
writer));
String output = writer.toString();
System.out.println(output);
}
}
}
}
其实很简单。但是,我的输入文件是单行形式:
<?xml version="1.0" encoding="UTF-8"?><firstTag><nextTag>1</nextTag><nextTag>2</nextTag></firstTag>
我的 Java 代码不再产生所需的输出,而只是这个结果:
<?xml version="1.0" encoding="UTF-8"?><nextTag>1</nextTag>
花了几个小时后,我很确定已经找出原因了:
t.transform(new StAXSource(streamReader), new StreamResult(writer));
这是因为,执行完transform方法后,光标会自动前进到下一个事件。在代码中,我有这个分数:
while (streamReader.hasNext()) {
streamReader.next();
...
t.transform(new StAXSource(streamReader), new StreamResult(writer));
...
}
第一次变换后,streamReader直接获取2次next():
1. from the transform method
2. from the next method in the while loop
因此,对于此特定行 XML,光标永远无法到达第二个打开标记。 相反,如果输入 XML 具有漂亮的打印形式,则可以从光标到达第二个,因为第一个结束标记后面有一个空格事件
不幸的是,我找不到任何如何进行设置的内容,因此变换器在执行变换方法后不会自动跳到下一个事件。这太令人沮丧了。
有人知道我该如何处理吗?在语义上也很受欢迎。非常感谢。
问候,
拉特纳
PS。我肯定可以为这个问题写一个解决方法(在转换之前漂亮地打印 xml 文档,但这意味着输入 xml 之前已被修改,这是不允许的)
正如您所阐述的,如果元素节点直接相互跟随,转换步骤是否会继续到下一个创建元素。
为了解决这个问题,您可以使用嵌套 while 循环重写代码,如下所示:
while(reader.next() != XMLStreamConstants.END_DOCUMENT) {
while(reader.getEventType() == XMLStreamConstants.START_ELEMENT && reader.getLocalName().equals("nextTag")) {
StringWriter writer = new StringWriter();
// will transform the current node to a String, moves the cursor to the next START_ELEMENT
t.transform(new StAXSource(reader), new StreamResult(writer));
System.out.println(writer.toString());
}
}
如果您的
xml
文件适合内存,您可以在 JOOX
库的帮助下尝试,该库在 gradle 中导入,例如:
compile 'org.jooq:joox:1.3.0'
还有主类,比如:
import java.io.File;
import java.io.IOException;
import org.joox.JOOX;
import org.joox.Match;
import org.w3c.dom.Document;
import org.xml.sax.SAXException;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import static org.joox.JOOX.$;
public class Main {
public static void main(String[] args)
throws IOException, SAXException, TransformerException {
DocumentBuilder builder = JOOX.builder();
Document document = builder.parse(new File(args[0]));
Transformer transformer =
TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty("omit-xml-declaration", "no");
final Match $m = $(document);
$m.find("nextTag").forEach(tag -> {
try {
transformer.transform(
new DOMSource(tag),
new StreamResult(System.out));
System.out.println();
}
catch (TransformerException e) {
System.exit(1);
}
});
}
}
它产生:
<?xml version="1.0" encoding="UTF-8"?><nextTag>1</nextTag>
<?xml version="1.0" encoding="UTF-8"?><nextTag>2</nextTag>