在提出问题/问题之前,先进行以下设置:
表1
CREATE EXTERNAL TABLE `table1`(
`mac_address` string,
`node` string,
`wave_found` string,
`wave_data` string,
`calc_dt` string,
`load_dt` string)
PARTITIONED BY (
`site_id` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://foobucket/object-thing'
TBLPROPERTIES (
'has_encrypted_data'='false',
'transient_lastDdlTime'='1654609315')
表2
CREATE EXTERNAL TABLE `table2`(
`mac_address` string,
`node` string,
`wave_found` string,
`wave_data` string,
`calc_dt` string,
PARTITIONED BY (
`load_dt` string,
`site_id` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://foobucket/object-thing'
TBLPROPERTIES (
'has_encrypted_data'='false',
'transient_lastDdlTime'='1654147830')
执行以下 Athena SQL 时,抛出以下错误:
insert into tabl2
select * from table1;
“HIVE_UNSUPPORTED_FORMAT:输出格式 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat 与 SerDe org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe 不是 支持。”
这个错误看起来相对简单,但我仍然坚持 尽管正在寻找所谓的替代方案,但仍在构建解决方案
HiveIgnoreKeyTextOutputFormat
。还有分区差异
正在进行中,但我不确定这是否与当前错误有任何关系
此处显示。
我相信您应该使用以下 SerDe/格式组合:
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
您还可以通过 CTAS 创建
table2
,避免手动创建/插入:create table table2
with
(
format='parquet',
parquet_compression='snappy',
partitioned_by=array['load_dt', 'site_id'],
external_location = 's3://foobucket/object-thing-table2/'
)
as
select * from table1;
如果您现在运行
show create table table2;
,您将看到 CTAS 已生成 MapredParquetInputFormat
+ MapredParquetOutputFormat