为嵌套 JSON 数据创建 Hive 表

问题描述 投票:0回答:3

我无法将嵌套 JSON 数据加载到 Hive 表中。以下是我尝试过的:

示例输入:

{"DocId":"ABC","User1":{"Id":1234,"Username":"sam1234","Name":"Sam","ShippingAddress":{"Address1":"123 Main St.","Address2":null,"City":"Durham","State":"NC"},"Orders":[{"ItemId":6789,"OrderDate":"11/11/2012"},{"ItemId":4352,"OrderDate":"12/12/2012"}]}}

在 Hive (CDH3) 上:

ADD JAR /usr/lib/hive/lib/hive-serdes-1.0-SNAPSHOT.jar;

CREATE TABLE json_tab(
    DocId string,
    user1 struct<Id: int, Username: string, Name:string,ShippingAddress:struct<address1:string,address2:string,city:string,state:string>,orders:array<struct<ItemId:int,orderdate:string>>>
)
ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
STORED AS TEXTFILE;  

hive> select * from json_tab;
OK
NULL    null

我在这里

NULL

还尝试使用 HCatalog jar:

ADD JAR /home/training/Desktop/hcatalog-core-0.11.0.jar;
 
 CREATE TABLE json_tab(
    DocId string,
    user1 struct<Id: int, Username: string, Name:string,ShippingAddress:struct<address1:string,address2:string,city:string,state:string>,orders:array<struct<ItemId:int,orderdate:string>>>
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe';

但是我的

create table
声明面临以下错误:

失败:元数据错误:无法验证 serde: org.apache.hive.hcatalog.data.JsonSerDe 失败:执行错误, 从 org.apache.hadoop.hive.ql.exec.DDLTask 返回代码 1

我可以尝试什么来解决这个问题?

json hadoop hive hdfs
3个回答
4
投票

你可以使用 org.openx.data.jsonserde.JsonSerDe 类来获取 json 数据

您可以从 http://www.congiu.net/hive-json-serde/1.3.6-SNAPSHOT/cdh4/

下载jar 文件

并执行以下步骤

add jar /path/to/jar/json-serde-1.3.6-jar-with-dependencies.jar;

CREATE TABLE json_tab(
    DocId string,
    user1 struct<Id: int, Username: string, Name:string,ShippingAddress:struct<address1:string,address2:string,city:string,state:string>,orders:array<struct<ItemId:int,orderdate:string>>>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe';

LOAD DATA LOCAL INPATH  '/path/to/data/nested.json' INTO TABLE json_tab;

SELECT DocId, User1.Id, User1.ShippingAddress.City as city,
User1.Orders[0].ItemId as order0id,
User1.Orders[1].ItemId as order1id from json_tab;


result
ABC     1234    Durham  6789    4352

0
投票
 I was getting same exception.

我添加了以下罐子,它对我有用。

ADD JAR /home/cloudera/Data/json-serde-1.3.7.3.jar;
ADD JAR /home/cloudera/Data/hive-hcatalog-core-0.13.0.jar;

0
投票

使用 HiveQL 分析 JSON 文件需要

org.openx.data.jsonserde.JsonSerDe
org.apache.hive.hcatalog.data.JsonSerDe
才能正常工作。

org.apache.hive.hcatalog.data.JsonSerDe
这是 Apache 的默认 JSON SerDe。这通常用于处理事件等 JSON 数据。这些事件表示为由换行符分隔的 JSON 编码文本块。 Hive JSON SerDe 不允许映射或结构键名称中有重复的键。

org.openx.data.jsonserde.JsonSerDe
OpenX JSON SerDe 与原生 Apache 类似;但是,它提供了多个可选属性,例如“ignore.malformed.json”、“case.insensitive”等等。在我看来,它在处理嵌套 JSON 文件时通常效果更好。

请参阅下面的工作示例:

CREATE EXTERNAL TABLE IF NOT EXISTS `dbname`.`tablename` ( `DocId` STRING, `User1` STRUCT< `Id`:INT, `Username`:STRING, `Name`:STRING, `ShippingAddress`:STRUCT< `Address1`:STRING, `Address2`:, `City`:STRING, `State`:STRING>, `Orders`:STRUCT< `ItemId`:INT, `OrderDate`:STRING>>) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' LOCATION 's3://awsexamplebucket1-logs/AWSLogs/'
创建表语句生成自:

https://www.hivetablegenerator.com/

© www.soinside.com 2019 - 2024. All rights reserved.