Graphdb 预加载 trig 文件导致错误

问题描述 投票:0回答:1

我正在尝试预加载一个大小约为 24GB 的 trig 文件。 以下是我正在使用的命令。

docker run -v $(pwd)/graphdb-data:/opt/graphdb/home \
       -v $(pwd)/preload:/opt/graphdb-import \
       --entrypoint /opt/graphdb/dist/bin/importrdf \
       -e GDB_JAVA_OPTS="-Xmx12g -Xms2g" \
       -e graphdb.page.cache.size=512m \
       -e graphdb.workers.limit=2 \
       -e graphdb.query.evaluation.mode=disk \
       -e graphdb.repository.index.enable=false \
       -e graphdb.compression.enabled=true \
       -e graphdb.use.native.jena.model=true -e graphdb.verify-literals=false \
       ontotext/graphdb:10.7.1 \
       preload -s --force --recursive -q /tmp -c /opt/graphdb-import/graphdb-repo.ttl /opt/graphdb-import/backup.trig

现在,它似乎摄取了数据,但最后我收到以下错误:

12:05:31.663 [resolver] INFO  c.ontotext.graphdb.importrdf.Preload - 310,000,000 statements ...
12:06:30.853 [resolver] INFO  c.ontotext.graphdb.importrdf.Preload - 320,000,000 statements ...
12:06:50.697 [monitor file position] INFO  c.ontotext.graphdb.importrdf.Preload - File backup.trig processed to position 23,774,363,648 from 26,664,959,488 bytes
12:07:54.327 [resolver] INFO  c.ontotext.graphdb.importrdf.Preload - 330,000,000 statements ...
12:08:50.702 [monitor file position] INFO  c.ontotext.graphdb.importrdf.Preload - File backup.trig processed to position 25,267,535,872 from 26,664,959,488 bytes
12:09:28.692 [resolver] INFO  c.ontotext.graphdb.importrdf.Preload - 340,000,000 statements ...
java.lang.NumberFormatException: empty String
    at java.base/jdk.internal.math.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1842)
    at java.base/jdk.internal.math.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
    at java.base/java.lang.Double.parseDouble(Double.java:543)
    at org.eclipse.rdf4j.model.base.AbstractLiteral$NumberLiteral.parseDouble(AbstractLiteral.java:367)
    at java.base/java.util.Optional.map(Optional.java:265)
    at org.eclipse.rdf4j.model.base.AbstractLiteral.value(AbstractLiteral.java:100)
    at org.eclipse.rdf4j.model.base.AbstractLiteral.doubleValue(AbstractLiteral.java:141)
    at com.ontotext.graphdb.importrdf.Preload.processLiteral(Preload.java:2124)
    at com.ontotext.graphdb.importrdf.Resolver.write(Resolver.java:61)
    at com.ontotext.graphdb.importrdf.Resolver.createid(Resolver.java:124)
    at com.ontotext.graphdb.importrdf.Resolver.run(Resolver.java:203)
java.lang.InterruptedException
    at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2056)
    at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2090)
    at java.base/java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:433)
    at com.ontotext.graphdb.importrdf.Preload$LocalHandler.handleStatement(Preload.java:434)
    at org.eclipse.rdf4j.rio.trig.TriGParser.reportStatement(TriGParser.java:248)
    at org.eclipse.rdf4j.rio.turtle.TurtleParser.parseObject(TurtleParser.java:453)
    at org.eclipse.rdf4j.rio.turtle.TurtleParser.parseObjectList(TurtleParser.java:374)
    at org.eclipse.rdf4j.rio.turtle.TurtleParser.parsePredicateObjectList(TurtleParser.java:347)
    at org.eclipse.rdf4j.rio.trig.TriGParser.parseTriples(TriGParser.java:236)
    at org.eclipse.rdf4j.rio.trig.TriGParser.parseGraph(TriGParser.java:163)
    at org.eclipse.rdf4j.rio.trig.TriGParser.parseStatement(TriGParser.java:115)
    at org.eclipse.rdf4j.rio.turtle.TurtleParser.parse(TurtleParser.java:164)
    at org.eclipse.rdf4j.repository.util.RDFLoader.loadInputStreamOrReader(RDFLoader.java:304)
    at org.eclipse.rdf4j.repository.util.RDFLoader.load(RDFLoader.java:249)
    at com.ontotext.load.GraphdbRDFLoader.load(GraphdbRDFLoader.java:89)
    at com.ontotext.graphdb.importrdf.Preload.processSingleFileInternal(Preload.java:2103)
    at com.ontotext.graphdb.importrdf.Preload.lambda$processSingleFile$22(Preload.java:2053)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)
12:10:32.001 [sorting] INFO  c.ontotext.graphdb.importrdf.Preload - Sorter thread finished.

此文件是开发实例的直接备份。我正在寻找一种忽略类型错误/推断的方法。是否可以忽略此类错误或仅使用 graphdb 中的

preload
命令获取正确的数据?

sparql preload graphdb ontotext trig
1个回答
0
投票

您可以尝试加载数据,删除-s选项并添加-p。有关这些选项的更多信息

 -p,--partialLoad              allow partial load of file that contains corrupt line

 -s,--stopOnFirstError         stop process if the dataset contains a corrupt file
© www.soinside.com 2019 - 2024. All rights reserved.