我正在尝试将 JSON 文档从 Couchbase 迁移到 Neo4j。收到文档后,我通过读取一些字段来找出要创建的对象的类型。每个节点对象类从 Transaction 类继承一些节点属性,如 ID、标签和版本(用于乐观锁定)。
节点类示例:
@Node
public class User extends Transaction{
public static final String featureID = "12109";
public static final String featureVariantID = "000";
@Property("FullName")
private String fullName;
@Property(name = "Alias")
private String alias;
@Property(name = "EmploymentType")
private String employmentType;
@Property(name = "EmployeeCode")
private String employeeCode;
@Relationship("LineManager")
Set<RelatedTo<User>> lineManagers;
@Relationship("FunctionalManager")
Set<RelatedTo<User>> functionalManagers;
@Relationship("LegalEntity")
Set<RelatedTo<LegalEntity>> legalEntities;
// More relations like these
public User() {
}
public User(String transactionID, String tenantID) {
super(featureID, featureVariantID, transactionID, tenantID);
this.fullName = "";
this.alias = "";
this.employmentType = "";
this.employeeCode = "";
lineManagers = new LinkedHashSet<>();
functionalManagers = new LinkedHashSet<>();
legalEntities = new LinkedHashSet<>();
// ....
}
//Getters and Setters
// Used to get which fields of the JSON document are to be used to populate the relationship sets
private static final Map<String, String> allowedRelationships = new HashMap<>();
static {
allowedRelationships.put("LineManager", "Data.LineManagerUserID");
allowedRelationships.put("FunctionalManager", "Data.FunctionalManagerUserID");
allowedRelationships.put("LegalEntity", "Data.EmployeeLegalEntityID");
}
public Map<String, String> getAllowedRelationships() {
return User.allowedRelationships;
}
}
除此之外,还有其他 12 个类具有类似的结构。
关系实体的结构为:
@RelationshipProperties
public class RelatedTo<T extends Transaction> extends BaseRelationship <T>{
@Property("DocumentID")
private String documentID;
public RelatedTo() {
}
public RelatedTo(String effectiveFromTimestamp, String effectiveTillTimestamp, String status, T target, String documentID) {
super(effectiveTillTimestamp, effectiveFromTimestamp, status, target);
this.documentID = documentID;
}
//Getter and Setter
}
继承自父类:
@RelationshipProperties
public class BaseRelationship <Target extends Transaction>{
@Id
@GeneratedValue
private Long id;
@Property("EffectiveTillTimestamp")
private String effectiveTillTimestamp;
@Property("EffectiveFromTimestamp")
private String effectiveFromTimestamp;
@Property("Status")
private String status;
@TargetNode
private Target targetNode;
public BaseRelationship(String effectiveTillTimestamp, String effectiveFromTimestamp, String status, Target targetNode) {
this.effectiveTillTimestamp = effectiveTillTimestamp;
this.effectiveFromTimestamp = effectiveFromTimestamp;
this.status = status;
this.targetNode = targetNode;
}
public BaseRelationship() {
}
// Getters and Setters
}
为了确定要创建的对象的类型,我使用了反射,它返回一个
<T extends Transaction> Class<T>
对象。然后我创建节点并将其保存到 Neo4j 中。之后,我再次阅读文档以创建该节点的关系以及一些数据处理,然后保存节点。喜欢:
{
LegalEntity parentNode = transactionRepository.getOrCreateNode(transactionMap, tenantID, relTransactionID, LegalEntity.class);// fetching node from database
RelatedTo<LegalEntity> newParentRelationship = new RelatedTo<>(effectiveFromTimestamp, effectiveTillTimestamp, status, parentNode, documentID);
Set<RelatedTo<LegalEntity>> existingRelationships = user.getLegalEntities();
// some processing of data and then updating the above set
}
问题是每次保存需要 4-10 秒,包括处理、获取和保存。每个节点可以包含 10-15 个关系。这意味着每个关系的创建时间超过 400 毫秒。
有什么方法可以提高速度,因为我必须从 couchbase 导入超过 50,000 个文档。
目前我正在单个线程中更新每个节点和关系。我尝试使用 ExecutorService 来完成此操作,但保存操作会陷入死锁并抛出乐观锁定异常或瞬态数据访问异常。
Spring Data 不太适合 ETL,因为它需要反序列化和重新序列化,这对于大量记录来说不可扩展。
如果从 Couchbase 导出为中间格式(例如 CSV),然后从该格式导入 Neo4j(例如使用 neo4j-admin 工具),速度会快得多。