我有一个目录,例如/user/name/folder
。
在此文件夹中,我还有更多名为dt=2020-06-01
,dt=2020-06-02
,dt=2020-06-03
等的文件夹
这些文件夹包含实木复合地板文件。它们都具有相同的架构。
是否可以使用/user/name/folder
创建黑斑羚表格?
每次,我得到一个包含0条记录的表。有没有办法告诉Impala从所有子目录中提取实木复合地板文件?
一种方法是通过静态分区加载数据,在该分区中您可以手动定义不同的分区。对于静态分区,您可以使用ALTER TABLE…ADD PARTITION语句手动创建分区,然后将数据加载到分区中。
CREATE TABLE customers_by_date
(cust_id STRING, name STRING)
PARTITIONED BY (dt STRING)
STORED AS PARQUET;
ALTER TABLE customers_by_country ADD PARTITION (dt='2020-06-01') SET LOCATION '/user/name/folder/dt=2020-06-01';
````
If the location is not specified then the location is created
````mysql
ALTER TABLE customers_by_date
ADD PARTITION (dt='2020-06-01');
````
and you could load data with HDFS commands too
````
$ hdfs dfs -cp /user/name/folder/dt=2020-06-01 /user/directory_impala/table/partition
````
You could follow these links to the Cloudera documentation for further details:
[Impala Create table statement][1]
[Impala Alter table statement][2]
[1]: https://docs.cloudera.com/documentation/enterprise/5-9-x/topics/impala_create_table.html#create_table
[2]: https://docs.cloudera.com/documentation/enterprise/5-9-x/topics/impala_alter_table.html#alter_table