为什么这个 JanusGraph 没有使用我创建的索引(我认为)?
[WARN] [o.j.g.t.StandardJanusGraphTx.main] :: Query requires iterating over all vertices [[]]. For better performance, use indexes
janusGraphManagement.buildIndex("_id", Vertex.class).addKey(propertyKey).buildCompositeIndex();
我不明白,因为我是 JanusGraph 的新手。 而且,虽然不是新手,但我很少写 Stack Overflow,所以请原谅我遗漏了一些礼节并忘记了一些写这篇文章的步骤。 我也确实认识到 Stack Overflow 社区中有不同的圈子,所以请分享我在评论中遗漏的任何内容,我会尽可能多地纠正自己。
我很确定索引存在,因为当我不检查索引时,我会看到
IllegalArgumentException: An index with name '_id' has already been defined
.
2023-05-09 15:21:37,854 [INFO] [c.d.o.d.i.c.ContactPoints.main] :: Contact point localhost:9042 resolves to multiple addresses, will use them all ([localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1])
2023-05-09 15:21:37,937 [INFO] [c.d.o.d.i.c.DefaultMavenCoordinates.main] :: DataStax Java driver for Apache Cassandra(R) (com.datastax.oss:java-driver-core) version 4.15.0
2023-05-09 15:21:38,397 [INFO] [c.d.o.d.i.c.t.Clock.JanusGraph Session-admin-0] :: Using native clock for microsecond precision
2023-05-09 15:21:38,639 [WARN] [c.d.o.d.i.c.l.h.OptionalLocalDcHelper.JanusGraph Session-admin-0] :: [JanusGraph Session|default] You specified datacenter1 as the local DC, but some contact points are from a different DC: Node(endPoint=localhost/[0:0:0:0:0:0:0:1]:9042, hostId=null, hashCode=ca3ba3b)=null; please provide the correct local DC, or check your contact points
2023-05-09 15:21:38,861 [INFO] [o.j.g.i.UniqueInstanceIdRetriever.main] :: Generated unique-instance-id=c0a8563c8-rmt-lap-win201
2023-05-09 15:21:38,876 [INFO] [c.d.o.d.i.c.ContactPoints.main] :: Contact point localhost:9042 resolves to multiple addresses, will use them all ([localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1])
2023-05-09 15:21:38,902 [INFO] [c.d.o.d.i.c.t.Clock.JanusGraph Session-admin-0] :: Using native clock for microsecond precision
2023-05-09 15:21:38,935 [WARN] [c.d.o.d.i.c.l.h.OptionalLocalDcHelper.JanusGraph Session-admin-0] :: [JanusGraph Session|default] You specified datacenter1 as the local DC, but some contact points are from a different DC: Node(endPoint=localhost/[0:0:0:0:0:0:0:1]:9042, hostId=null, hashCode=61faabfd)=null; please provide the correct local DC, or check your contact points
2023-05-09 15:21:38,946 [INFO] [o.j.d.c.ExecutorServiceBuilder.main] :: Initiated fixed thread pool of size 40
2023-05-09 15:21:39,053 [INFO] [o.j.g.d.StandardJanusGraph.main] :: Gremlin script evaluation is disabled
2023-05-09 15:21:39,079 [INFO] [o.j.d.l.k.KCVSLog.main] :: Loaded unidentified ReadMarker start time 2023-05-09T20:21:39.079040Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@f1d0004
Exception in thread "main" java.lang.IllegalArgumentException: An index with name '_id' has already been defined
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:220)
at org.janusgraph.graphdb.database.management.ManagementSystem.checkIndexName(ManagementSystem.java:661)
at org.janusgraph.graphdb.database.management.ManagementSystem.createCompositeIndex(ManagementSystem.java:728)
at org.janusgraph.graphdb.database.management.ManagementSystem.access$300(ManagementSystem.java:130)
at org.janusgraph.graphdb.database.management.ManagementSystem$IndexBuilder.buildCompositeIndex(ManagementSystem.java:824)
at Test3.main(Test3.java:24)
所以我希望类
Vertex
(所有顶点)使用索引。 2023-05-09 15:18:38,678 [INFO] [c.d.o.d.i.c.ContactPoints.main] :: Contact point localhost:9042 resolves to multiple addresses, will use them all ([localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1])
2023-05-09 15:18:38,759 [INFO] [c.d.o.d.i.c.DefaultMavenCoordinates.main] :: DataStax Java driver for Apache Cassandra(R) (com.datastax.oss:java-driver-core) version 4.15.0
2023-05-09 15:18:39,215 [INFO] [c.d.o.d.i.c.t.Clock.JanusGraph Session-admin-0] :: Using native clock for microsecond precision
2023-05-09 15:18:39,468 [WARN] [c.d.o.d.i.c.l.h.OptionalLocalDcHelper.JanusGraph Session-admin-0] :: [JanusGraph Session|default] You specified datacenter1 as the local DC, but some contact points are from a different DC: Node(endPoint=localhost/[0:0:0:0:0:0:0:1]:9042, hostId=null, hashCode=60cfcdc3)=null; please provide the correct local DC, or check your contact points
2023-05-09 15:18:39,691 [INFO] [o.j.g.i.UniqueInstanceIdRetriever.main] :: Generated unique-instance-id=c0a8563c21376-rmt-lap-win201
2023-05-09 15:18:39,707 [INFO] [c.d.o.d.i.c.ContactPoints.main] :: Contact point localhost:9042 resolves to multiple addresses, will use them all ([localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1])
2023-05-09 15:18:39,732 [INFO] [c.d.o.d.i.c.t.Clock.JanusGraph Session-admin-0] :: Using native clock for microsecond precision
2023-05-09 15:18:39,766 [WARN] [c.d.o.d.i.c.l.h.OptionalLocalDcHelper.JanusGraph Session-admin-0] :: [JanusGraph Session|default] You specified datacenter1 as the local DC, but some contact points are from a different DC: Node(endPoint=localhost/127.0.0.1:9042, hostId=null, hashCode=3d0c7306)=null; please provide the correct local DC, or check your contact points
2023-05-09 15:18:39,786 [INFO] [o.j.d.c.ExecutorServiceBuilder.main] :: Initiated fixed thread pool of size 40
2023-05-09 15:18:39,897 [INFO] [o.j.g.d.StandardJanusGraph.main] :: Gremlin script evaluation is disabled
2023-05-09 15:18:39,919 [INFO] [o.j.d.l.k.KCVSLog.main] :: Loaded unidentified ReadMarker start time 2023-05-09T20:18:39.919098Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@f1d0004
2023-05-09 15:18:39,971 [WARN] [o.j.g.t.StandardJanusGraphTx.main] :: Query requires iterating over all vertices [[]]. For better performance, use indexes
2023-05-09 15:18:41,311 [INFO] [Main.main] :: g.V().count().next(): 69999
2023-05-09 15:18:41,314 [WARN] [o.j.g.t.StandardJanusGraphTx.main] :: Query requires iterating over all vertices [[]]. For better performance, use indexes
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversal;
import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversalSource;
import org.apache.tinkerpop.gremlin.structure.Vertex;
import org.janusgraph.core.JanusGraph;
import org.janusgraph.core.JanusGraphFactory;
import org.janusgraph.core.JanusGraphVertex;
import org.janusgraph.core.PropertyKey;
import org.janusgraph.core.schema.JanusGraphManagement;
public class Test3 {
private static final Logger logger = LogManager.getLogger(Main.class);
public static void main(String[] args) {
JanusGraph janusGraph = JanusGraphFactory.build().set("storage.backend", "cql").set("storage.hostname", "localhost:9042").open();
GraphTraversalSource g = janusGraph.traversal();
logger.info("g.V().count().next():\t" + g.V().count().next());
g.V().drop().iterate();
logger.info("g.V().count().next():\t" + g.V().count().next());
JanusGraphManagement janusGraphManagement = janusGraph.openManagement();
PropertyKey propertyKey = janusGraphManagement.getOrCreatePropertyKey("_id");
if (!janusGraphManagement.containsGraphIndex("_id"))
janusGraphManagement.buildIndex("_id", Vertex.class).addKey(propertyKey).buildCompositeIndex();
janusGraphManagement.commit();
JanusGraphVertex janusGraphVertex = janusGraph.addVertex();
janusGraphVertex.property("test", "test1");
janusGraph.tx().commit();
janusGraphVertex = janusGraph.addVertex();
janusGraphVertex.property("test", "test2");
janusGraph.tx().commit();
GraphTraversal<Vertex, Vertex> graphTraversal = g.V();
while (graphTraversal.hasNext()) {
Vertex vertex = graphTraversal.next();
logger.info(vertex.property("test"));
}
janusGraph.close();
}
}
我之前用 OrientDB 做过这个,我想我知道如何使用 Tinkerpop3 + Gremlin 相关的图形数据库。但是当我从 OrientDB 过渡到 JanusGraph 时,他们似乎没有相同的索引创建过程。
OrientGraph orientGraph = OrientGraph.open(configuration);
if (!orientGraph.getVertexIndexedKeys("V").contains("_id"))
orientGraph.createVertexIndex("_id", "V", configuration);
orientGraph.commit();
创建索引后,您需要等待它可用,然后查询才能利用它。这是在这里描述
我们可以使用一个简单的
inmemory
JanusGraph 实例和 Gremlin 控制台来测试它。
// New graph
gremlin> graph = JanusGraphFactory.open('inmemory')
==>standardjanusgraph[inmemory:[127.0.0.1]]
gremlin> g = graph.traversal()
==>graphtraversalsource[standardjanusgraph[inmemory:[127.0.0.1]], standard]
// Define our schema - just one vertex key for now
gremlin> mgmt = graph.openManagement()
==>org.janusgraph.graphdb.database.management.ManagementSystem@133aacbe
gremlin> name = mgmt.makePropertyKey('name').dataType(String.class).cardinality(Cardinality.SINGLE).make()
==>name
gremlin> mgmt.commit()
==>null
// This is a key step, always make sure no other transactions are in flight
gremlin> graph.tx().rollback()
==>null
// Create the index
gremlin> mgmt = graph.openManagement()
==>org.janusgraph.graphdb.database.management.ManagementSystem@65f3e805
gremlin> name = mgmt.getPropertyKey('name')
==>name
gremlin> mgmt.buildIndex('byNameComposite', Vertex.class).addKey(name).buildCompositeIndex()
==>byNameComposite
gremlin> mgmt.commit()
==>null
gremlin> mgmt.awaitGraphIndexStatus(graph, 'byNameComposite').call()
==>GraphIndexStatusReport[success=true, indexName='byNameComposite', targetStatus=[REGISTERED], notConverged={}, converged={name=REGISTERED}, elapsed=PT0.002S]
gremlin> g.addV('Dog').property('name','Baxter')
==>v[4344]
// Note this still does not use the index as we have not re-indexed yet
gremlin> g.V().has('name','Baxter')
17:50:25 WARN org.janusgraph.graphdb.transaction.StandardJanusGraphTx - Query requires iterating over all vertices [(name = Baxter)]. For better performance, use indexes
==>v[4344]
// Re-index
gremlin> mgmt = graph.openManagement()
==>org.janusgraph.graphdb.database.management.ManagementSystem@7af1d072
gremlin> mgmt.updateIndex(mgmt.getGraphIndex("byNameComposite"), SchemaAction.REINDEX).get()
==>org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScanMetrics@53d30d23
gremlin> mgmt.commit()
gremlin> mgmt = graph.openManagement()
==>org.janusgraph.graphdb.database.management.ManagementSystem@7af1d072
gremlin> mgmt.updateIndex(mgmt.getGraphIndex("byNameComposite"), SchemaAction.REINDEX).get()
==>org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScanMetrics@53d30d23
gremlin> mgmt.commit()
==>null
gremlin> mgmt.awaitGraphIndexStatus(graph, 'byNameComposite').status(SchemaStatus.ENABLED).call()
==>GraphIndexStatusReport[success=true, indexName='byNameComposite', targetStatus=[ENABLED], notConverged={}, converged={name=ENABLED}, elapsed=PT0.001S]
// Now it works!
gremlin> g.V().has('name','Baxter')
==>v[4344]