我正在将包括一些 XML 节点 (org.w3c.dom.Node) 的数据写入 Spark rdd,这会导致数据使用 kryo 进行序列化和反序列化。对于大多数正常工作的节点,我能够毫无错误地序列化和反序列化所有节点。但是,一旦我访问反序列化属性节点 (java.xml/com.sun.org.apache.xerces.internal.dom.AttrImpl) 的 .getNodeValue() ,我就会收到一个内部空指针异常,并显示以下堆栈跟踪:
java.lang.NullPointerException
at java.xml/com.sun.org.apache.xerces.internal.dom.DeferredDocumentImpl.clearChunkIndex(DeferredDocumentImpl.java:1991)
at java.xml/com.sun.org.apache.xerces.internal.dom.DeferredDocumentImpl.getLastChild(DeferredDocumentImpl.java:794)
at java.xml/com.sun.org.apache.xerces.internal.dom.DeferredDocumentImpl.getLastChild(DeferredDocumentImpl.java:779)
at java.xml/com.sun.org.apache.xerces.internal.dom.DeferredDocumentImpl.synchronizeChildren(DeferredDocumentImpl.java:1661)
at java.xml/com.sun.org.apache.xerces.internal.dom.DeferredAttrImpl.synchronizeChildren(DeferredAttrImpl.java:143)
at java.xml/com.sun.org.apache.xerces.internal.dom.AttrImpl.getValue(AttrImpl.java:447)
at java.xml/com.sun.org.apache.xerces.internal.dom.AttrImpl.getNodeValue(AttrImpl.java:315)
我尝试在序列化之前访问节点值,效果非常好。但序列化后,数据结构内部的某些内容会破坏。我的相关代码如下所示:
public class AttributeItem implements Item {
private static final long serialVersionUID = 1L;
private Node attributeNode;
private Item parent;
public AttributeItem(Node attributeNode) {
this.attributeNode = attributeNode;
}
@Override
public void write(Kryo kryo, Output output) {
kryo.writeObject(output, this.parent);
kryo.writeObject(output, this.attributeNode);
}
@Override
public void read(Kryo kryo, Input input) {
this.parent = kryo.readObject(input, Item.class);
this.attributeNode = kryo.readObject(input, Node.class);
}
我能够通过在序列化之前在节点上执行
getNodeValue()
调用来解决该问题。看起来这个调用在序列化之前需要在节点内部产生一些副作用。希望这可以帮助别人。
public class AttributeItem implements Item {
private static final long serialVersionUID = 1L;
private Node attributeNode;
private Item parent;
public AttributeItem(Node attributeNode) {
this.attributeNode = attributeNode;
attributeNode.getNodeValue(); // has side effects that synchronize something internal, otherwise
// kryo serialization breaks
}
@Override
public void write(Kryo kryo, Output output) {
kryo.writeObject(output, this.parent);
kryo.writeObject(output, this.attributeNode);
}
@Override
public void read(Kryo kryo, Input input) {
this.parent = kryo.readObject(input, Item.class);
this.attributeNode = kryo.readObject(input, Node.class);
}