用 Java 在 Cassandra 上构建 JanusGraph 索引

问题描述 投票:0回答:2

问题

完成JanusGraph indices的后续步骤是什么?

我已经尝试过逐行阅读 JanusGraph 文档;
并尝试使用更接近源代码的提交

janusGraph
而不是
janusGraphManagement
;
并尝试了迄今为止提供的建议;还没有工作。

[o.j.g.t.StandardJanusGraphTx.main] ::   Query requires iterating over all vertices [[]]. For better performance, use indexes

试炼

尝试 1

试图重新创建我最初所做的,以及 重新访问 JanusGraph 文档的建议的一部分
这个问题是日志说我还没有索引。
这就是最初让我检查我是否需要

janusGraph.commit()
的原因。

日志

...

2023-05-15 09:58:05,984 [INFO] [o.j.d.l.k.KCVSLog.main] ::   Loaded unidentified ReadMarker start time 2023-05-15T14:58:05.984686Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@48b4a043
2023-05-15 09:58:06,047 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[]]. For better performance, use indexes
2023-05-15 09:58:06,270 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[~label = entity]]. For better performance, use indexes
2023-05-15 09:58:06,275 [INFO] [Main.main] ::    drop g.V().hasLabel("entity").count().next():  0
2023-05-15 09:58:07,020 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[~label = entity]]. For better performance, use indexes
2023-05-15 09:58:07,033 [INFO] [Main.main] ::    addVertex g.V().hasLabel("entity").count().next(): 1
2023-05-15 09:58:07,046 [INFO] [o.j.g.d.m.GraphIndexStatusWatcher.main] ::   Some key(s) on index _id do not currently have status(es) [REGISTERED]: _id=ENABLED

...

[o.j.g.d.m.GraphIndexStatusWatcher.main] ::  Some key(s) on index _id do not currently have status(es) [REGISTERED]: _id=ENABLED
2023-05-15 09:59:07,520 [INFO] [o.j.g.d.m.GraphIndexStatusWatcher.main] ::   Timed out (PT1M) while waiting for index _id to converge on status(es) [REGISTERED]
2023-05-15 09:59:07,522 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[~label = entity]]. For better performance, use indexes
2023-05-15 09:59:07,531 [INFO] [Main.main] ::    awaitGraphIndexStatus g.V().hasLabel("entity").count().next(): 1
2023-05-15 09:59:07,627 [INFO] [o.j.g.o.j.IndexRepairJob.Thread-68] ::   Index _id metrics: success-tx: 2 doc-updates: 0 succeeded: 1

...

2023-05-15 09:59:08,297 [INFO] [o.j.g.d.m.ManagementSystem.Thread-51] ::     Index update job successful for [_id]
2023-05-15 09:59:08,299 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[~label = entity]]. For better performance, use indexes
2023-05-15 09:59:08,304 [INFO] [Main.main] ::    updateIndex g.V().hasLabel("entity").count().next():   1

Process finished with exit code 0

代码

import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversalSource;
import org.apache.tinkerpop.gremlin.structure.Vertex;
import org.janusgraph.core.JanusGraph;
import org.janusgraph.core.JanusGraphFactory;
import org.janusgraph.core.JanusGraphVertex;
import org.janusgraph.core.PropertyKey;
import org.janusgraph.core.schema.JanusGraphIndex;
import org.janusgraph.core.schema.JanusGraphManagement;
import org.janusgraph.core.schema.SchemaAction;
import org.janusgraph.graphdb.database.management.GraphIndexStatusReport;
import org.janusgraph.graphdb.database.management.ManagementSystem;

import java.util.concurrent.ExecutionException;

public class Test13 {
    private static final Logger logger = LogManager.getLogger(Main.class);
    public static void main(String[] args) throws InterruptedException, ExecutionException {
        JanusGraph janusGraph = JanusGraphFactory.build().set("storage.backend", "cql").set("storage.hostname", "localhost:9042").open();
        GraphTraversalSource g = janusGraph.traversal();
        g.V().drop().iterate();
        janusGraph.tx().commit();
        janusGraph.tx().open();
        logger.info("drop g.V().hasLabel(\"entity\").count().next():\t" + g.V().hasLabel("entity").count().next());
        JanusGraphVertex vertex = janusGraph.addVertex("entity");
        vertex.property("_id", "Test1");
        janusGraph.tx().commit();
        janusGraph.tx().open();
        logger.info("addVertex g.V().hasLabel(\"entity\").count().next():\t" + g.V().hasLabel("entity").count().next());
        JanusGraphManagement janusGraphManagement = janusGraph.openManagement();
        PropertyKey propertyKey = janusGraphManagement.getOrCreatePropertyKey("_id");
        if (!janusGraphManagement.containsGraphIndex("_id"))
            janusGraphManagement.buildIndex("_id", Vertex.class).addKey(propertyKey).buildCompositeIndex();
        janusGraphManagement.commit();
        janusGraphManagement = janusGraph.openManagement();
        GraphIndexStatusReport report = ManagementSystem.awaitGraphIndexStatus(janusGraph, "_id").call();
        logger.info("awaitGraphIndexStatus g.V().hasLabel(\"entity\").count().next():\t" + g.V().hasLabel("entity").count().next());
        JanusGraphIndex test = janusGraphManagement.getGraphIndex("_id");
        janusGraphManagement.updateIndex(test, SchemaAction.REINDEX).get();
        janusGraphManagement.commit();
        logger.info("updateIndex g.V().hasLabel(\"entity\").count().next():\t" + g.V().hasLabel("entity").count().next());
        janusGraph.close();
    }
}

尝试2

JanusGraph 告诉我无法启用索引,因为它已安装。
哪个 JanusGraph 应该告诉我它已注册。
如果我只是像例子一样做一个基本的

awaitGraphIndexStatus()
,这甚至都不重要。
如果我知道如何在顶点存在之前创建索引,那也没关系。

日志

2023-05-12 12:50:04,641 [INFO] [c.d.o.d.i.c.ContactPoints.main] ::   Contact point localhost:9042 resolves to multiple addresses, will use them all ([localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1])
2023-05-12 12:50:04,737 [INFO] [c.d.o.d.i.c.DefaultMavenCoordinates.main] ::     DataStax Java driver for Apache Cassandra(R) (com.datastax.oss:java-driver-core) version 4.15.0
2023-05-12 12:50:05,236 [INFO] [c.d.o.d.i.c.t.Clock.JanusGraph Session-admin-0] ::   Using native clock for microsecond precision
2023-05-12 12:50:05,518 [WARN] [c.d.o.d.i.c.l.h.OptionalLocalDcHelper.JanusGraph Session-admin-0] ::     [JanusGraph Session|default] You specified datacenter1 as the local DC, but some contact points are from a different DC: Node(endPoint=localhost/127.0.0.1:9042, hostId=null, hashCode=4fe054a)=null; please provide the correct local DC, or check your contact points
2023-05-12 12:50:05,763 [INFO] [o.j.g.i.UniqueInstanceIdRetriever.main] ::   Generated unique-instance-id=c0a8563c21084-rmt-lap-win201
2023-05-12 12:50:05,786 [INFO] [c.d.o.d.i.c.ContactPoints.main] ::   Contact point localhost:9042 resolves to multiple addresses, will use them all ([localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1])
2023-05-12 12:50:05,823 [INFO] [c.d.o.d.i.c.t.Clock.JanusGraph Session-admin-0] ::   Using native clock for microsecond precision
2023-05-12 12:50:05,861 [WARN] [c.d.o.d.i.c.l.h.OptionalLocalDcHelper.JanusGraph Session-admin-0] ::     [JanusGraph Session|default] You specified datacenter1 as the local DC, but some contact points are from a different DC: Node(endPoint=localhost/127.0.0.1:9042, hostId=null, hashCode=5ac9b054)=null; please provide the correct local DC, or check your contact points
2023-05-12 12:50:05,880 [INFO] [o.j.d.c.ExecutorServiceBuilder.main] ::  Initiated fixed thread pool of size 40
2023-05-12 12:50:05,998 [INFO] [o.j.g.d.StandardJanusGraph.main] ::  Gremlin script evaluation is disabled
2023-05-12 12:50:06,026 [INFO] [o.j.d.l.k.KCVSLog.main] ::   Loaded unidentified ReadMarker start time 2023-05-12T17:50:06.025476Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@48b4a043
2023-05-12 12:50:06,086 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[]]. For better performance, use indexes
2023-05-12 12:50:06,344 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[]]. For better performance, use indexes
2023-05-12 12:50:06,353 [INFO] [Main.main] ::    drop g.V().count().next(): 0
2023-05-12 12:50:07,107 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[]]. For better performance, use indexes
2023-05-12 12:50:07,110 [INFO] [Main.main] ::    addVertex g.V().count().next():    1
2023-05-12 12:50:08,197 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[]]. For better performance, use indexes
2023-05-12 12:50:08,202 [INFO] [Main.main] ::    buildIndex g.V().count().next():   1
2023-05-12 12:50:08,209 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[]]. For better performance, use indexes
2023-05-12 12:50:08,212 [INFO] [Main.main] ::    updateIndex g.V().count().next():  1
Exception in thread "main" java.lang.IllegalArgumentException: Update action [ENABLE_INDEX] cannot be invoked for index with status [INSTALLED]
    at org.janusgraph.core.schema.SchemaAction.isApplicableStatus(SchemaAction.java:85)
    at org.janusgraph.graphdb.database.management.ManagementSystem.updateIndex(ManagementSystem.java:864)
    at org.janusgraph.graphdb.database.management.ManagementSystem.updateIndex(ManagementSystem.java:845)
    at Test12.main(Test12.java:39)

代码

public class Test12 {
    private static final Logger logger = LogManager.getLogger(Main.class);
    public static void main(String[] args) throws InterruptedException, ExecutionException {
        JanusGraph janusGraph = JanusGraphFactory.build().set("storage.backend", "cql").set("storage.hostname", "localhost:9042").open();
        GraphTraversalSource g = janusGraph.traversal();
        g.V().drop().iterate();
        janusGraph.tx().commit();
        janusGraph.tx().open();
        logger.info("drop g.V().hasLabel(\"entity\").count().next():\t" + g.V().hasLabel("entity").count().next());
        JanusGraphVertex vertex = janusGraph.addVertex("entity");
        vertex.property("_id", "Test1");
        janusGraph.tx().commit();
        janusGraph.tx().open();
        logger.info("addVertex g.V().hasLabel(\"entity\").count().next():\t" + g.V().hasLabel("entity").count().next());
        JanusGraphManagement janusGraphManagement = janusGraph.openManagement();
        PropertyKey propertyKey = janusGraphManagement.getOrCreatePropertyKey("_id");
        janusGraphManagement.buildIndex("_id", Vertex.class).addKey(propertyKey).buildCompositeIndex();
        janusGraph.tx().commit();
        janusGraph.tx().open();
        logger.info("buildIndex g.V().hasLabel(\"entity\").count().next():\t" + g.V().hasLabel("entity").count().next());
        janusGraphManagement.updateIndex(janusGraphManagement.getGraphIndex("_id"), SchemaAction.REGISTER_INDEX).get();
        janusGraph.tx().commit();
        janusGraph.tx().open();
        logger.info("REGISTER_INDEX g.V().hasLabel(\"entity\").count().next():\t" + g.V().hasLabel("entity").count().next());
        janusGraphManagement.updateIndex(janusGraphManagement.getGraphIndex("_id"), SchemaAction.ENABLE_INDEX).get();
        janusGraph.tx().commit();
        janusGraph.tx().open();
        logger.info("ENABLE_INDEX g.V().hasLabel(\"entity\").count().next():\t" + g.V().hasLabel("entity").count().next());
        ManagementSystem.awaitGraphIndexStatus(janusGraph, "_id").call();
        janusGraph.tx().commit();
        janusGraph.tx().open();
        logger.info("awaitGraphIndexStatus g.V().hasLabel(\"entity\").count().next():\t" + g.V().hasLabel("entity").count().next());
        janusGraph.close();
    }
}

尝试3

建议重新访问 JanusGraph 文档,我不再认为输出是用词不当。
清理到现在的样子。

日志

2023-05-15 11:02:15,142 [INFO] [c.d.o.d.i.c.ContactPoints.main] ::   Contact point localhost:9042 resolves to multiple addresses, will use them all ([localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1])
2023-05-15 11:02:15,230 [INFO] [c.d.o.d.i.c.DefaultMavenCoordinates.main] ::     DataStax Java driver for Apache Cassandra(R) (com.datastax.oss:java-driver-core) version 4.15.0
2023-05-15 11:02:15,853 [INFO] [c.d.o.d.i.c.t.Clock.JanusGraph Session-admin-0] ::   Using native clock for microsecond precision
2023-05-15 11:02:16,152 [WARN] [c.d.o.d.i.c.l.h.OptionalLocalDcHelper.JanusGraph Session-admin-0] ::     [JanusGraph Session|default] You specified datacenter1 as the local DC, but some contact points are from a different DC: Node(endPoint=localhost/127.0.0.1:9042, hostId=null, hashCode=705b0e14)=null; please provide the correct local DC, or check your contact points
2023-05-15 11:02:16,410 [INFO] [o.j.g.i.UniqueInstanceIdRetriever.main] ::   Generated unique-instance-id=c0a856493416-rmt-lap-win201
2023-05-15 11:02:16,433 [INFO] [c.d.o.d.i.c.ContactPoints.main] ::   Contact point localhost:9042 resolves to multiple addresses, will use them all ([localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1])
2023-05-15 11:02:16,472 [INFO] [c.d.o.d.i.c.t.Clock.JanusGraph Session-admin-0] ::   Using native clock for microsecond precision
2023-05-15 11:02:16,522 [WARN] [c.d.o.d.i.c.l.h.OptionalLocalDcHelper.JanusGraph Session-admin-0] ::     [JanusGraph Session|default] You specified datacenter1 as the local DC, but some contact points are from a different DC: Node(endPoint=localhost/127.0.0.1:9042, hostId=null, hashCode=1b473c95)=null; please provide the correct local DC, or check your contact points
2023-05-15 11:02:16,548 [INFO] [o.j.d.c.ExecutorServiceBuilder.main] ::  Initiated fixed thread pool of size 40
2023-05-15 11:02:16,678 [INFO] [o.j.g.d.StandardJanusGraph.main] ::  Gremlin script evaluation is disabled
2023-05-15 11:02:16,704 [INFO] [o.j.d.l.k.KCVSLog.main] ::   Loaded unidentified ReadMarker start time 2023-05-15T16:02:16.704454Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@74ea46e2
2023-05-15 11:02:16,762 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[]]. For better performance, use indexes
2023-05-15 11:02:16,989 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[~label = entity]]. For better performance, use indexes
2023-05-15 11:02:16,992 [INFO] [Main.main] ::    drop g.V().hasLabel("entity").count().next():  0
2023-05-15 11:02:17,010 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[~label = entity]]. For better performance, use indexes
2023-05-15 11:02:17,014 [INFO] [Main.main] ::    makePropertyKey g.V().hasLabel("entity").count().next():   0
2023-05-15 11:02:17,748 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[~label = entity]]. For better performance, use indexes
2023-05-15 11:02:17,759 [INFO] [Main.main] ::    addVertex g.V().hasLabel("entity").count().next(): 1
2023-05-15 11:02:17,761 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[~label = entity]]. For better performance, use indexes
2023-05-15 11:02:17,764 [INFO] [Main.main] ::    updateIndex g.V().hasLabel("entity").count().next():   1

Process finished with exit code 0

代码

public class Test14 {
    private static final Logger logger = LogManager.getLogger(Main.class);
    public static void main(String[] args) throws InterruptedException, ExecutionException {
        JanusGraph janusGraph = JanusGraphFactory.build().set("storage.backend", "cql").set("storage.hostname", "localhost:9042").open();
        GraphTraversalSource g = janusGraph.traversal();
        g.V().drop().iterate();
        janusGraph.tx().commit();
        logger.info("drop g.V().hasLabel(\"entity\").count().next():\t" + g.V().hasLabel("entity").count().next());
        JanusGraphManagement janusGraphManagement = janusGraph.openManagement();
        if (!janusGraphManagement.containsPropertyKey("_id"))
            janusGraphManagement.makePropertyKey("_id").dataType(String.class).make();
        janusGraphManagement.commit();
        logger.info("makePropertyKey g.V().hasLabel(\"entity\").count().next():\t" + g.V().hasLabel("entity").count().next());
        JanusGraphVertex vertex = janusGraph.addVertex("entity");
        vertex.property("_id", "Test1");
        janusGraph.tx().commit();
        janusGraph.tx().open();
        logger.info("addVertex g.V().hasLabel(\"entity\").count().next():\t" + g.V().hasLabel("entity").count().next());
        janusGraph.close();
    }
}

繁殖

此执行运行于:Java-17、Maven-3 和 CQL(Cassandra 查询语言)
JanusGraph v1.x.x 后端服务器 是一个Cassandra v3.x.x [Docker 容器].
Cassandra 可以托管在 Docker 容器上.

full-text-search cassandra-3.0 janusgraph tinkerpop3 java-17
2个回答
1
投票

请再次查看 ref 文档 中的示例:

  • JanusGraph 和JanusGraphManagement 有独立的事务。使用 mgmt.commit() 完成提交索引构建
  • SchemaAction.REGISTER_INDEX 和 SchemaAction.ENABLE_INDEX 未在示例中显式调用,这些在构建调用中是隐式的。

0
投票

最后一个问题是了解必须提供

propertyKey
及其
value

再次感谢Stephen Mallette! 他提供了一个来自Jason Pluradgreat resource如果你推断更多你阅读的内容。

2023-05-15 15:33:56,415 [INFO] [o.j.d.c.b.ReadConfigurationBuilder.main] ::  Set default timestamp provider MICRO
2023-05-15 15:33:56,428 [INFO] [o.j.g.i.UniqueInstanceIdRetriever.main] ::   Generated unique-instance-id=c0a8564917832-rmt-lap-win201
2023-05-15 15:33:56,440 [INFO] [o.j.d.c.ExecutorServiceBuilder.main] ::  Initiated fixed thread pool of size 40
2023-05-15 15:33:56,488 [INFO] [o.j.g.d.StandardJanusGraph.main] ::  Gremlin script evaluation is disabled
2023-05-15 15:33:56,494 [INFO] [o.j.d.l.k.KCVSLog.main] ::   Loaded unidentified ReadMarker start time 2023-05-15T20:33:56.494323Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@54e7391d
2023-05-15 15:33:56,576 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[]]. For better performance, use indexes
2023-05-15 15:33:56,599 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[~label = entity]]. For better performance, use indexes
2023-05-15 15:33:56,601 [INFO] [Main.main] ::    drop g.V().hasLabel("entity").count().next():  0
2023-05-15 15:33:56,805 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[~label = entity]]. For better performance, use indexes
2023-05-15 15:33:56,806 [INFO] [Main.main] ::    REGISTER_INDEX g.V().hasLabel("entity").count().next():    0
2023-05-15 15:33:56,961 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[~label = entity]]. For better performance, use indexes
2023-05-15 15:33:56,971 [INFO] [Main.main] ::    addVertex g.V().hasLabel("entity").count().next(): 1
2023-05-15 15:33:56,972 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[~label = entity]]. For better performance, use indexes
2023-05-15 15:33:56,973 [INFO] [Main.main] ::    g.V().hasLabel("entity").count().next():   1
2023-05-15 15:33:56,991 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[_id <> null]]. For better performance, use indexes
2023-05-15 15:33:56,998 [INFO] [Main.main] ::    g.V().hasLabel("entity").count().next():   1
2023-05-15 15:33:56,998 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[~label = entity, _id <> null]]. For better performance, use indexes
2023-05-15 15:33:56,999 [INFO] [Main.main] ::    g.V().hasLabel("entity").count().next():   1
2023-05-15 15:33:57,000 [INFO] [Main.main] ::    g.V().hasLabel("entity").count().next():   1

Process finished with exit code 0
public class Test {
    private static final Logger logger = LogManager.getLogger(Main.class);
    public static void main(String[] args) throws InterruptedException, ExecutionException {
        JanusGraph janusGraph = JanusGraphFactory.build().set("storage.backend", "inmemory").open();
        GraphTraversalSource g = janusGraph.traversal();
        g.V().drop().iterate();
        janusGraph.tx().commit();
        logger.info("drop g.V().hasLabel(\"entity\").count().next():\t" + g.V().hasLabel("entity").count().next());
        JanusGraphManagement janusGraphManagement = janusGraph.openManagement();
        PropertyKey propertyKey = janusGraphManagement.getOrCreatePropertyKey("_id");
        janusGraphManagement.buildIndex("_id", Vertex.class).addKey(propertyKey).buildCompositeIndex();
        janusGraphManagement.commit();
        logger.info("REGISTER_INDEX g.V().hasLabel(\"entity\").count().next():\t" + g.V().hasLabel("entity").count().next());
        JanusGraphVertex vertex = janusGraph.addVertex("entity");
        vertex.property("_id", "Test1");
        janusGraph.tx().commit();
        logger.info("addVertex g.V().hasLabel(\"entity\").count().next():\t" + g.V().hasLabel("entity").count().next());
        logger.info("g.V().hasLabel(\"entity\").count().next():\t" + g.V().hasLabel("entity").count().next());
        logger.info("g.V().hasLabel(\"entity\").count().next():\t" + g.V().has("_id").count().next());
        logger.info("g.V().hasLabel(\"entity\").count().next():\t" + g.V().hasLabel("entity").has("_id").count().next());
        logger.info("g.V().hasLabel(\"entity\").count().next():\t" + g.V().hasLabel("entity").has("_id", "Test1").count().next());
        janusGraph.close();
    }
}

切换回

inmemory
就像Kelvin Lawrence之前建议的那样,因为我没有使用大内存来理解这个概念。

© www.soinside.com 2019 - 2024. All rights reserved.