以为我创建了一个 JanusGraph 顶点索引,但我猜不是;哪里出了问题?

问题描述 投票:0回答:1

为什么这个 JanusGraph 没有使用我创建的索引(我认为)?

[WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[]]. For better performance, use indexes
janusGraphManagement.buildIndex("_id", Vertex.class).addKey(propertyKey).buildCompositeIndex();

我不明白,因为我是 JanusGraph 的新手。 而且,虽然不是新手,但我很少写 Stack Overflow,所以请原谅我遗漏了一些礼节并忘记了一些写这篇文章的步骤。 我也确实认识到 Stack Overflow 社区中有不同的圈子,所以请分享我在评论中遗漏的任何内容,我会尽可能多地纠正自己。

我很确定索引存在,因为当我不检查索引时,我会看到

IllegalArgumentException: An index with name '_id' has already been defined
.

2023-05-09 15:21:37,854 [INFO] [c.d.o.d.i.c.ContactPoints.main] ::   Contact point localhost:9042 resolves to multiple addresses, will use them all ([localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1])
2023-05-09 15:21:37,937 [INFO] [c.d.o.d.i.c.DefaultMavenCoordinates.main] ::     DataStax Java driver for Apache Cassandra(R) (com.datastax.oss:java-driver-core) version 4.15.0
2023-05-09 15:21:38,397 [INFO] [c.d.o.d.i.c.t.Clock.JanusGraph Session-admin-0] ::   Using native clock for microsecond precision
2023-05-09 15:21:38,639 [WARN] [c.d.o.d.i.c.l.h.OptionalLocalDcHelper.JanusGraph Session-admin-0] ::     [JanusGraph Session|default] You specified datacenter1 as the local DC, but some contact points are from a different DC: Node(endPoint=localhost/[0:0:0:0:0:0:0:1]:9042, hostId=null, hashCode=ca3ba3b)=null; please provide the correct local DC, or check your contact points
2023-05-09 15:21:38,861 [INFO] [o.j.g.i.UniqueInstanceIdRetriever.main] ::   Generated unique-instance-id=c0a8563c8-rmt-lap-win201
2023-05-09 15:21:38,876 [INFO] [c.d.o.d.i.c.ContactPoints.main] ::   Contact point localhost:9042 resolves to multiple addresses, will use them all ([localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1])
2023-05-09 15:21:38,902 [INFO] [c.d.o.d.i.c.t.Clock.JanusGraph Session-admin-0] ::   Using native clock for microsecond precision
2023-05-09 15:21:38,935 [WARN] [c.d.o.d.i.c.l.h.OptionalLocalDcHelper.JanusGraph Session-admin-0] ::     [JanusGraph Session|default] You specified datacenter1 as the local DC, but some contact points are from a different DC: Node(endPoint=localhost/[0:0:0:0:0:0:0:1]:9042, hostId=null, hashCode=61faabfd)=null; please provide the correct local DC, or check your contact points
2023-05-09 15:21:38,946 [INFO] [o.j.d.c.ExecutorServiceBuilder.main] ::  Initiated fixed thread pool of size 40
2023-05-09 15:21:39,053 [INFO] [o.j.g.d.StandardJanusGraph.main] ::  Gremlin script evaluation is disabled
2023-05-09 15:21:39,079 [INFO] [o.j.d.l.k.KCVSLog.main] ::   Loaded unidentified ReadMarker start time 2023-05-09T20:21:39.079040Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@f1d0004
Exception in thread "main" java.lang.IllegalArgumentException: An index with name '_id' has already been defined
    at com.google.common.base.Preconditions.checkArgument(Preconditions.java:220)
    at org.janusgraph.graphdb.database.management.ManagementSystem.checkIndexName(ManagementSystem.java:661)
    at org.janusgraph.graphdb.database.management.ManagementSystem.createCompositeIndex(ManagementSystem.java:728)
    at org.janusgraph.graphdb.database.management.ManagementSystem.access$300(ManagementSystem.java:130)
    at org.janusgraph.graphdb.database.management.ManagementSystem$IndexBuilder.buildCompositeIndex(ManagementSystem.java:824)
    at Test3.main(Test3.java:24)

所以我希望类

Vertex
(所有顶点)使用索引。
但我实际看到的是警告,事实并非如此;和
我的经验是数据返回和更改比预期慢。

2023-05-09 15:18:38,678 [INFO] [c.d.o.d.i.c.ContactPoints.main] ::   Contact point localhost:9042 resolves to multiple addresses, will use them all ([localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1])
2023-05-09 15:18:38,759 [INFO] [c.d.o.d.i.c.DefaultMavenCoordinates.main] ::     DataStax Java driver for Apache Cassandra(R) (com.datastax.oss:java-driver-core) version 4.15.0
2023-05-09 15:18:39,215 [INFO] [c.d.o.d.i.c.t.Clock.JanusGraph Session-admin-0] ::   Using native clock for microsecond precision
2023-05-09 15:18:39,468 [WARN] [c.d.o.d.i.c.l.h.OptionalLocalDcHelper.JanusGraph Session-admin-0] ::     [JanusGraph Session|default] You specified datacenter1 as the local DC, but some contact points are from a different DC: Node(endPoint=localhost/[0:0:0:0:0:0:0:1]:9042, hostId=null, hashCode=60cfcdc3)=null; please provide the correct local DC, or check your contact points
2023-05-09 15:18:39,691 [INFO] [o.j.g.i.UniqueInstanceIdRetriever.main] ::   Generated unique-instance-id=c0a8563c21376-rmt-lap-win201
2023-05-09 15:18:39,707 [INFO] [c.d.o.d.i.c.ContactPoints.main] ::   Contact point localhost:9042 resolves to multiple addresses, will use them all ([localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1])
2023-05-09 15:18:39,732 [INFO] [c.d.o.d.i.c.t.Clock.JanusGraph Session-admin-0] ::   Using native clock for microsecond precision
2023-05-09 15:18:39,766 [WARN] [c.d.o.d.i.c.l.h.OptionalLocalDcHelper.JanusGraph Session-admin-0] ::     [JanusGraph Session|default] You specified datacenter1 as the local DC, but some contact points are from a different DC: Node(endPoint=localhost/127.0.0.1:9042, hostId=null, hashCode=3d0c7306)=null; please provide the correct local DC, or check your contact points
2023-05-09 15:18:39,786 [INFO] [o.j.d.c.ExecutorServiceBuilder.main] ::  Initiated fixed thread pool of size 40
2023-05-09 15:18:39,897 [INFO] [o.j.g.d.StandardJanusGraph.main] ::  Gremlin script evaluation is disabled
2023-05-09 15:18:39,919 [INFO] [o.j.d.l.k.KCVSLog.main] ::   Loaded unidentified ReadMarker start time 2023-05-09T20:18:39.919098Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@f1d0004
2023-05-09 15:18:39,971 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[]]. For better performance, use indexes
2023-05-09 15:18:41,311 [INFO] [Main.main] ::    g.V().count().next():  69999
2023-05-09 15:18:41,314 [WARN] [o.j.g.t.StandardJanusGraphTx.main] ::    Query requires iterating over all vertices [[]]. For better performance, use indexes
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversal;
import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversalSource;
import org.apache.tinkerpop.gremlin.structure.Vertex;
import org.janusgraph.core.JanusGraph;
import org.janusgraph.core.JanusGraphFactory;
import org.janusgraph.core.JanusGraphVertex;
import org.janusgraph.core.PropertyKey;
import org.janusgraph.core.schema.JanusGraphManagement;

public class Test3 {
    private static final Logger logger = LogManager.getLogger(Main.class);

    public static void main(String[] args) {
        JanusGraph janusGraph = JanusGraphFactory.build().set("storage.backend", "cql").set("storage.hostname", "localhost:9042").open();
        GraphTraversalSource g = janusGraph.traversal();
        logger.info("g.V().count().next():\t" + g.V().count().next());
        g.V().drop().iterate();
        logger.info("g.V().count().next():\t" + g.V().count().next());
        JanusGraphManagement janusGraphManagement = janusGraph.openManagement();
        PropertyKey propertyKey = janusGraphManagement.getOrCreatePropertyKey("_id");
        if (!janusGraphManagement.containsGraphIndex("_id"))
            janusGraphManagement.buildIndex("_id", Vertex.class).addKey(propertyKey).buildCompositeIndex();
        janusGraphManagement.commit();
        JanusGraphVertex janusGraphVertex = janusGraph.addVertex();
        janusGraphVertex.property("test", "test1");
        janusGraph.tx().commit();
        janusGraphVertex = janusGraph.addVertex();
        janusGraphVertex.property("test", "test2");
        janusGraph.tx().commit();

        GraphTraversal<Vertex, Vertex> graphTraversal = g.V();
        while (graphTraversal.hasNext()) {
            Vertex vertex = graphTraversal.next();
            logger.info(vertex.property("test"));
        }
        janusGraph.close();
    }
}

我之前用 OrientDB 做过这个,我想我知道如何使用 Tinkerpop3 + Gremlin 相关的图形数据库。但是当我从 OrientDB 过渡到 JanusGraph 时,他们似乎没有相同的索引创建过程。

        OrientGraph orientGraph = OrientGraph.open(configuration);
        if (!orientGraph.getVertexIndexedKeys("V").contains("_id"))
            orientGraph.createVertexIndex("_id", "V", configuration);
        orientGraph.commit();
full-text-search gremlin cassandra-3.0 janusgraph tinkerpop3
1个回答
0
投票

创建索引后,您需要等待它可用,然后查询才能利用它。这是在这里描述

我们可以使用一个简单的

inmemory
JanusGraph 实例和 Gremlin 控制台来测试它。

// New graph
gremlin> graph = JanusGraphFactory.open('inmemory')
==>standardjanusgraph[inmemory:[127.0.0.1]]

gremlin> g = graph.traversal()
==>graphtraversalsource[standardjanusgraph[inmemory:[127.0.0.1]], standard]

// Define our schema - just one vertex key for now
gremlin> mgmt = graph.openManagement()
==>org.janusgraph.graphdb.database.management.ManagementSystem@133aacbe

gremlin> name = mgmt.makePropertyKey('name').dataType(String.class).cardinality(Cardinality.SINGLE).make()
==>name

gremlin> mgmt.commit()
==>null

// This is a key step, always make sure no other transactions are in flight
gremlin> graph.tx().rollback()
==>null

// Create the index
gremlin> mgmt = graph.openManagement()
==>org.janusgraph.graphdb.database.management.ManagementSystem@65f3e805

gremlin> name = mgmt.getPropertyKey('name')
==>name

gremlin> mgmt.buildIndex('byNameComposite', Vertex.class).addKey(name).buildCompositeIndex()
==>byNameComposite

gremlin> mgmt.commit()
==>null

gremlin> mgmt.awaitGraphIndexStatus(graph, 'byNameComposite').call()
==>GraphIndexStatusReport[success=true, indexName='byNameComposite', targetStatus=[REGISTERED], notConverged={}, converged={name=REGISTERED}, elapsed=PT0.002S]

gremlin> g.addV('Dog').property('name','Baxter')
==>v[4344]

// Note this still does not use the index as we have not re-indexed yet
gremlin> g.V().has('name','Baxter')
17:50:25 WARN  org.janusgraph.graphdb.transaction.StandardJanusGraphTx  - Query requires iterating over all vertices [(name = Baxter)]. For better performance, use indexes
==>v[4344]

// Re-index
gremlin>  mgmt = graph.openManagement()
==>org.janusgraph.graphdb.database.management.ManagementSystem@7af1d072

gremlin> mgmt.updateIndex(mgmt.getGraphIndex("byNameComposite"), SchemaAction.REINDEX).get()
==>org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScanMetrics@53d30d23

gremlin> mgmt.commit()

gremlin>  mgmt = graph.openManagement()
==>org.janusgraph.graphdb.database.management.ManagementSystem@7af1d072
gremlin> mgmt.updateIndex(mgmt.getGraphIndex("byNameComposite"), SchemaAction.REINDEX).get()

==>org.janusgraph.diskstorage.keycolumnvalue.scan.StandardScanMetrics@53d30d23

gremlin> mgmt.commit()
==>null

gremlin> mgmt.awaitGraphIndexStatus(graph, 'byNameComposite').status(SchemaStatus.ENABLED).call()
==>GraphIndexStatusReport[success=true, indexName='byNameComposite', targetStatus=[ENABLED], notConverged={}, converged={name=ENABLED}, elapsed=PT0.001S]

// Now it works!
gremlin> g.V().has('name','Baxter')
==>v[4344]                        
© www.soinside.com 2019 - 2024. All rights reserved.