xml到hive表SAXParseException

问题描述 投票:0回答:1

我有一个以下的xml

<tns:TAG>
<REQUEST_ID>1</REQUEST_ID>
<APPLICATION_ID>2</APPLICATION_ID>
<EXTERNAL_SYSTEM_CODE>CF</EXTERNAL_SYSTEM_CODE>
<CCM_CHECK>
<CCM_CHECK_ID>44</CCM_CHECK_ID>
<CCM_CHECK_RESULT>21</CCM_CHECK_RESULT>
</CCM_CHECK>
</tns:TAG>

如果我从中移除tns:我可以创建一个像this一样读取它的hive表

但如果我离开它,我会收到以下错误

java.io.IOException: org.apache.hadoop.hive.serde2.SerDeException: java.lang.RuntimeException: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 42; The prefix "tns" for element "tns:WSCCMVerifyApplicationResultRequest" is not bound.

我唯一能想到的是事先解析文件并删除所有这些tns:元素。我想像regexp_replace()这样的东西会做到这一点。但我的问题是,还有其他方法吗?目前我创建了表格?

xml hive
1个回答
0
投票

在您的文件中包含namespace将消除此错误。像这样的东西

<tns:TAG xmlns:tns="http://localhost">
  <REQUEST_ID>1</REQUEST_ID>
  <APPLICATION_ID>2</APPLICATION_ID>

更新

CREATE EXTERNAL TABLE myxml(
      request_id string
  , application_id string
  , external_system_code string
  , ccm_check map<string, string>
  , verif_answers array<map<string, string>>
)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
  "column.xpath.request_id"="//*[local-name()='REQUEST_ID']/text()",
  "column.xpath.application_id"="//*[local-name()='APPLICATION_ID']/text()",
  "column.xpath.external_system_code"="//*[local-name()='EXTERNAL_SYSTEM_CODE']/text()",
  "column.xpath.ccm_check"="//*[local-name()='CCM_CHECK']/*",
  "column.xpath.verif_answers"="//*[local-name()='VERIF_ANSWERS']/*"
)
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION 'file:///home/cloudera/xmlfiles'
TBLPROPERTIES (
  "xmlinput.start"="<tns:TAG",
  "xmlinput.end"="</tns:TAG>"
);
© www.soinside.com 2019 - 2024. All rights reserved.