我正在尝试将计算机中存储在 .ttl 文件中的 RDF 数据上传到 Apache Jena Fuseki 服务器。我根据 Apache Jena Fuseki 服务器页面中给出的指导将 Apache Jena Fuseki 服务器作为独立服务器运行(https://jena.apache.org/documentation/fuseki2/fuseki-webapp.html#fuseki-web-application)和在线文章(https://medium.com/@fadirra/setting-up-jena-fuseki-with-update-in-windows-10-2c8a2802ee8f)。 当我访问 localhost:3030 时,服务器似乎正在运行。我开发的用于上传数据的代码似乎对于较小的文件大小运行良好。但是,对于大文件,数据不会上传。在查看服务器日志时,我发现了以下错误:
Caused by: java.lang.IllegalStateException: form too large > 20000000
at org.eclipse.jetty.server.FormFields.checkMaxLength(FormFields.java:318) ~[fuseki-server.jar:5.0.0]
at org.eclipse.jetty.server.FormFields.parse(FormFields.java:307) ~[fuseki-server.jar:5.0.0]
at org.eclipse.jetty.server.FormFields.parse(FormFields.java:39) ~[fuseki-server.jar:5.0.0]
at org.eclipse.jetty.io.content.ContentSourceCompletableFuture.parse(ContentSourceCompletableFuture.java:104) ~[fuseki-server.jar:5.0.0]
at org.eclipse.jetty.server.handler.ContextHandler$ScopedContext.run(ContextHandler.java:1212) ~[fuseki-server.jar:5.0.0]
at org.eclipse.jetty.server.handler.ContextRequest$OnContextDemand.run(ContextRequest.java:74) ~[fuseki-server.jar:5.0.0]
at org.eclipse.jetty.util.thread.SerializedInvoker$Link.run(SerializedInvoker.java:191) ~[fuseki-server.jar:5.0.0]
at org.eclipse.jetty.server.internal.HttpConnection$DemandContentCallback.succeeded(HttpConnection.java:679) ~[fuseki-server.jar:5.0.0]
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:99) ~[fuseki-server.jar:5.0.0]
at org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53) ~[fuseki-server.jar:5.0.0]
这是我用于上传 RDF 数据的代码:
input_location = "C:/......../Added_Triples.ttl"
with open(input_location, 'r') as f:
content = f.read()
#print(type(content))
rdf_string_no_prefixes = "\n".join(line for line in content.split("\n") if not line.startswith("@prefix"))
update_query = """
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX CSRO: <http://www.semanticweb.org/aagr657/ontologies/2023/9/CraneSpaceRepresentationOntology#>
PREFIX LinkOnt: <http://purl.org/ConstructLinkOnt/LinkOnt#>
PREFIX bot: <https://w3id.org/bot#>
PREFIX expr: <https://w3id.org/express#>
PREFIX geo: <http://www.opengis.net/ont/geosparql#>
PREFIX geom: <http://rdf.bg/geometry.ttl#>
PREFIX ifc: <https://standards.buildingsmart.org/IFC/DEV/IFC2X3/TC1/OWL>
PREFIX inst: <https://www.ugent.be/myAwesomeFirstBIMProject#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX sf: <http://www.opengis.net/ont/sf#>
PREFIX omg: <https://w3id.org/omg#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX lbd: <https://linkedbuildingdata.org/LBD#>
PREFIX props: <http://lbd.arch.rwth-aachen.de/props#>
PREFIX unit: <http://qudt.org/vocab/unit/>
PREFIX IFC4-PSD: <https://www.linkedbuildingdata.net/IFC4-PSD#>
PREFIX smls: <https://w3id.org/def/smls-owl#>
PREFIX fog: <https://w3id.org/fog#>
PREFIX cc: <http://creativecommons.org/ns#>
PREFIX dce: <http://purl.org/dc/elements/1.1/>
PREFIX express: <https://w3id.org/express#>
PREFIX list: <https://w3id.org/list#>
PREFIX vann: <http://purl.org/vocab/vann/>
PREFIX expr: <https://w3id.org/express#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX : <https://standards.buildingsmart.org/IFC/DEV/IFC2x3/TC1/OWL#>
INSERT DATA {
%s
}
""" % (rdf_string_no_prefixes)
sparql = SPARQLWrapper("http://localhost:3030/your-dataset/update")
sparql.setMethod(POST)
sparql.setQuery(update_query)
# Step 5: Execute the SPARQL Update query
sparql.query()
我在 stackoverflow 上读到了一些关于其他服务器中类似错误的问题,建议编辑 jetty.xml 文件。但是,就我而言,我在计算机中找不到任何此类文件。正如我上面提到的,该代码对于较小的文件大小来说工作得非常好,但问题在于较大的文件大小。 我暂时将较大的RDF文件分成较小的块并分别上传。然而,这需要花费大量时间,因为分块所需的时间不断增加。因此,我不想用这个作为解决方案。 任何有关如何在不需要分块的情况下解决此问题的帮助将不胜感激。在理想的情况下,我希望在最短的时间内一次性上传整个图形文件。 我也使用以下代码尝试了 request.post 方法:
file_location = "C:/.........../Added_Triples.ttl"
sparql_endpoint = "http://localhost:3030/construction_dataset_2/update" # Adjust the URL accordingly
headers = {'Content-Type': 'text/turtle;charset=utf-8'}
data = open(file_location, 'r').read()
response = requests.post(sparql_endpoint, headers=headers, data=data)```
The error I am getting is as follows:
```Exception has occurred: ConnectionError
('Connection aborted.', ConnectionAbortedError(10053, 'An established connection was aborted by the software in your host machine', None, 10053, None))
ConnectionAbortedError: [WinError 10053] An established connection was aborted by the software in your host machine```
Also, the server logs show the following:
```11:40:03 INFO Fuseki :: [23] 415 Unsupported Media Type (0 ms)
11:40:03 INFO Fuseki :: [24] POST http://localhost:3030/construction_dataset_2/update
11:40:03 INFO Fuseki :: [24] 415 Unsupported Media Type (0 ms)
11:42:17 INFO Fuseki :: [25] POST http://localhost:3030/construction_dataset_2/update```
不要使用表单和 INSERT DATA(此处通过 SPARQLwrapper),而是尝试 POST 一个文件,并适当设置 Content-type 标头。
或使用外部流程:
curl -XPOST -T DATA.ttl --header "Content-type: text/turtle" http://localhost:3030/ds
或者在启动服务器之前加载数据库(TDB2)。这样就可以使用TDB2 buylk加载器了。