我已经下载了tika-app进行测试:
java -jar tika-app-2.9.2.jar --metadata test.xlsx
Content-Length: 9217
Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
X-TIKA:Parsed-By: org.apache.tika.parser.DefaultParser
X-TIKA:Parsed-By: org.apache.tika.parser.microsoft.ooxml.OOXMLParser
X-TIKA:origResourceName: C:\Users\users\Documents\
dc:creator: daniele grillo
dc:publisher:
dcterms:created: 2024-04-17T07:44:01Z
dcterms:modified: 2024-04-17T13:58:35Z
extended-properties:AppVersion: 16.0300
extended-properties:Application: Microsoft Excel
extended-properties:Company:
extended-properties:DocSecurityString: None
meta:last-author: daniele grillo
protected: false
resourceName: test.xlsx
所以我运行命令
java -jar tika-app-2.9.2.jar --text test.xlsx
这是输出
Foglio1
date name
2/9/72 one
2/10/98 two
1/3/09 three
1/1/00 four
4/11/00 five
我读过知道可以传递 tika-config.xml 来操纵解析器:
java -jar /tika-app-2.9.2.jar --text test.xlsx --config=tika-config.xml
对于日期,我希望输出如下:dd/mm/yyyy,如 .XLSX 格式
可能吗?如果是的话怎么办?
我尝试使用这个 tika-config.xml 但输出是相同的:
<?xml version="1.0" encoding="UTF-8"?>
<properties>
<parsers>
<parser class="org.apache.tika.parser.DefaultParser">
<mime>application/vnd.openxmlformats-officedocument.spreadsheetml.sheet</mime>
<parser-exclude class="org.apache.tika.parser.microsoft.ooxml.OOXMLParser"/>
</parser>
</parsers>
<dateFormats>
<dateFormat>dd/MM/yyyy</dateFormat>
</dateFormats>
</properties>
OOXMLParser
具有从 setDateFormatOverride(String)
继承的 AbstractOfficeParser
方法。
此参数可以在解析器的
<params>
内设置。
<?xml version="1.0" encoding="UTF-8"?>
<properties>
<parsers>
<parser class="org.apache.tika.parser.DefaultParser"/>
<parser class="org.apache.tika.parser.microsoft.ooxml.OOXMLParser">
<params>
<param name="dateFormatOverride" type="string">dd/mm/yyyy</param>
</params>
</parser>
</parsers>
</properties>