在我安装的 Presto (358) 中,我有两个工作蜂巢连接器:
一切正常,但当我调用
DROP (TABLE/SCHEMA)
或 DELETE FROM
时,删除仅发生在元存储中,并且没有物理删除数据。适用于 S3 和 ABFS。
在替换数据的情况下这会变得相当成问题:
> DROP TABLE hive.abc;
-- ok
> CREATE TABLE hive.abc AS (...)
-- ERROR: Target directory 'abc' already exists.
删除分区等也是同样的情况
有没有办法真的删除数据?
找到解决方案。主要区别在于为架构及其表指定 external_location 与 location。
CREATE SCHEMA hive.xyz WITH (location = 'abfs://...');
CREATE TABLE hive.xyz.test AS SELECT (...);
DELETE FROM hive.xyz.test WHERE TRUE;
-- Data ARE physically deleted
CREATE SCHEMA hive.xyz;
CREATE TABLE hive.xyz.test
WITH (external_location = 'abfs://...')
AS SELECT (...);
DELETE FROM hive.xyz.test WHERE TRUE;
-- Data ARE NOT physically deleted.
结论:
external_location
对于表来说会防止数据被删除。