从英国政府读取外部 Excel (.xlsx) 文件时(请参阅 https://assets.publishing.service.gov.uk/media/5d8b3abded915d0373d3540f/File_1_-_IMD2019_Index_of_Multiple_Deprivation.xlsx),不会引发异常,但记录了一个错误:
09:13:35.150 [nREPL-session-b3e5d112-80a2-45f3-9e26-095934eeb06a] ERROR org.apache.poi.openxml4j.opc.PackageRelationshipCollection - Cannot convert All%20of%20the%20data%20files%20and%20supporting%20documents%20for%20the%20English%20Indices%20of%20Deprivation%202019%20are%20available%20from:%20www.gov.uk/government/statistics/english-indices-of-deprivation-2019 in a valid relationship URI-> dummy-URI used
java.net.URISyntaxException: Illegal character in scheme name at index 3: All%20of%20the%20data%20files%20and%20supporting%20documents%20for%20the%20English%20Indices%20of%20Deprivation%202019%20are%20available%20from:%20www.gov.uk/government/statistics/english-indices-of-deprivation-2019
at java.base/java.net.URI$Parser.fail(URI.java:2976) ~[?:?]
at java.base/java.net.URI$Parser.checkChars(URI.java:3147) ~[?:?]
at java.base/java.net.URI$Parser.parse(URI.java:3173) ~[?:?]
at java.base/java.net.URI.<init>(URI.java:623) ~[?:?]
at org.apache.poi.openxml4j.opc.PackagingURIHelper.toURI(PackagingURIHelper.java:723) ~[poi-ooxml-5.2.3.jar:5.2.3]
at org.apache.poi.openxml4j.opc.PackageRelationshipCollection.parseRelationshipsPart(PackageRelationshipCollection.java:358) ~[poi-ooxml-5.2.3.jar:5.2.3]
at org.apache.poi.openxml4j.opc.PackageRelationshipCollection.<init>(PackageRelationshipCollection.java:160) ~[poi-ooxml-5.2.3.jar:5.2.3]
at org.apache.poi.openxml4j.opc.PackageRelationshipCollection.<init>(PackageRelationshipCollection.java:130) ~[poi-ooxml-5.2.3.jar:5.2.3]
at org.apache.poi.openxml4j.opc.PackagePart.loadRelationships(PackagePart.java:565) ~[poi-ooxml-5.2.3.jar:5.2.3]
at org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:751) ~[poi-ooxml-5.2.3.jar:5.2.3]
at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:322) ~[poi-ooxml-5.2.3.jar:5.2.3]
at org.apache.poi.xssf.usermodel.XSSFWorkbookFactory.create(XSSFWorkbookFactory.java:97) ~[poi-ooxml-5.2.3.jar:5.2.3]
at org.apache.poi.xssf.usermodel.XSSFWorkbookFactory.create(XSSFWorkbookFactory.java:36) ~[poi-ooxml-5.2.3.jar:5.2.3]
at org.apache.poi.ss.usermodel.WorkbookFactory.lambda$create$2(WorkbookFactory.java:224) ~[poi-5.2.3.jar:5.2.3]
at org.apache.poi.ss.usermodel.WorkbookFactory.wp(WorkbookFactory.java:329) [poi-5.2.3.jar:5.2.3]
at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:224) [poi-5.2.3.jar:5.2.3]
at org.apache.poi.ss.usermodel.WorkbookFactory.create(WorkbookFactory.java:185) [poi-5.2.3.jar:5.2.3]
我推测源文件中有错误,或者部分内容被错误分类为 URI,然后 java.net.URI 无法解析它。尽管记录了错误,但我可以毫无困难地继续使用 POI 读取文件内容。我可以通过更改日志记录级别来抑制错误,但是还有其他方法可以防止记录此错误吗?
这里有一个 log4j2 配置,可以抑制此错误的错误日志记录,但这对于在其他良好的 xlsx 文件中似乎不是异常情况的情况来说似乎是一个相当戏剧性的步骤。
<?xml version="1.0" encoding="UTF-8"?>
<Configuration status="WARN">
<Appenders>
<Console name="Console" target="SYSTEM_OUT">
<PatternLayout pattern="%d{HH:mm:ss.SSS} [%t] %-5level %logger{36} - %msg%n"/>
</Console>
</Appenders>
<Loggers>
<Root level="error">
<AppenderRef ref="Console"/>
</Root>
<!-- Turn off ERROR logging from the internals of POI when a URI cannot be parsed -->
<Logger name="org.apache.poi.openxml4j.opc.PackageRelationshipCollection" level="off">
</Logger>
</Loggers>
</Configuration>
当我下载文件并在 Excel 中打开它时,第一个工作表“注释”上的整个文本框都有一个超链接集。但该链接已损坏,甚至 Excel 也无法跟踪它。因此 Apache POI 也无法跟随它。这就是错误的原因。
可以通过将鼠标悬停在文本框的空白部分来检查。出现手形光标,单击时 Excel 告知它无法跟随该链接。
由于全部都是开源的,因此可以查看源代码以遵循程序流程。
...
return new URI(value);
...
这里 java.net.URI 如果给定字符串
java.net.URISyntaxException
违反了 RFC 2396,则抛出 value
,如 java.net.URI
构造函数中描述的偏差所增强。
PackagingURIHelper.toURI
由try-catch块中的PackageRelationshipCollection.parseRelationshipsPart调用
...
try {
// when parsing of the given uri fails, we can either
// ignore this relationship, which leads to IllegalStateException
// later on, or use a dummy value and thus enable processing of the
// package
target = PackagingURIHelper.toURI(value);
} catch (URISyntaxException e) {
LOG.atError().withThrowable(e).log("Cannot convert {} in a valid relationship URI-> dummy-URI used", value);
}
addRelationship(target, targetMode, type, id);
...
这里的评论说明了为什么
URISyntaxException
被记录。否则,它会在读取*.xlsx
时破坏整个解析关系过程。这比错误日志条目要糟糕得多。
所以这种行为是设计使然的。人们可能会认为,日志条目不应该处于错误级别,因为它并不重要。但这只能由 Apache POI 开发人员改变。因此,请向 Apache POI 提交错误报告或更改请求。