如何获取特定schema下所有表的详细信息

Question

我通过以下查询在 Hive 中获取表的详细信息，但在 Athena 中没有找到等效信息。

use schema_name;
SHOW TABLE EXTENDED LIKE '*'

作为上述查询输出的一部分，我将获取每个表名称的以下属性的值。 表名称、所有者、位置、输入格式、输出格式、分区、分区列、totalFileSize、maxFileSize、minFileSize、lastAccessTime、lastUpdateTime

想要在 Athena 中获得上述所有详细信息，这就是我正在遵循的方法。

SELECT table_name FROM   information_schema.tables WHERE  table_schema = 'logging' // Lists all the tables under logging schema.
DESCRIBE EXTENDED AwsDataCatalog.logging.logtable1; // Getting the details in plain text per table, can parse and some how we can fetch relevant data. Do the same for all other tables under schema.

上述查询的局限性是，我们必须对每个表执行查询，而不是一次性获取所有表的详细信息。

有没有更好的方法来查询和获取所需信息？

Answer 1

是的，你是对的。

Athena 目前一次只能运行一个描述命令。

您可以尝试使用如下外部脚本：

import boto3

# Initialize Athena client
client = boto3.client('athena', region_name='your-region')

# List of tables you want to describe
tables = ['table1', 'table2', 'table3']

for table in tables:
    query = f"DESCRIBE EXTENDED database_name.{table}"
    
    response = client.start_query_execution(
        QueryString=query,
        QueryExecutionContext={'Database': 'database_name'},
        ResultConfiguration={'OutputLocation': 's3://your-bucket/athena-results/'}
    )
    
    print(f"Started query for table {table}. QueryExecutionId: {response['QueryExecutionId']}")

Answer 2

选项 1：更广泛地使用 information_schema 表 Athena 在 information_schema 中提供了一组元数据表，可以帮助检索一些元数据，尽管不如 Hive 中那么详细。

例如，您可以使用以下方式收集表名称、所有者和位置等信息：选择表名，表模式，表目录，表类型从 information_schema.tables 在哪里 table_schema = 'your_schema_name';

选项 2：通过 Athena 使用 AWS Glue 元数据 API Athena 使用 AWS Glue 作为其数据目录，Glue 存储有关表的详细元数据。要获取更详细的信息，例如输入格式、输出格式和分区列，您可以通过查询 Glue 目录表直接通过 Athena 访问 Glue 数据目录。

选择表名，所有者，参数['EXTERNAL']作为外部，参数['transient_lastDdlTime']作为last_update_time，参数['numFiles']作为total_files，参数['totalSize']作为total_size，参数['maxFileSize']作为max_file_size，参数['minFileSize']为min_file_size，地点从 information_schema.tables 在哪里 table_schema = 'your_schema_name';

parameters 字段是一个映射，其中包含有关表的各种元数据，其中可能包括格式（输入格式、输出格式）以及取决于表结构的分区详细信息。

选项 3：使用 AWS SDK/CLI 或 Boto3 进行编程访问如果 Athena 中的上述 SQL 查询不能满足您的需求，您可以使用 AWS Glue 的 API（通过 Boto3 或 AWS CLI）来获取有关表的详细元数据。这种方法将为一次收集多个表的所有元数据提供更大的灵活性。

awsglue get-table --database-name your_database --name your_table

导入boto3

client = boto3.client('胶水')

响应 = client.get_tables(DatabaseName='your_database')

对于响应['TableList']中的表： print(表['名称'], 表['所有者'], 表['StorageDescriptor']['位置'], ...)

通过这种方式，您可以通过编程方式收集所有表的所有详细信息，例如位置、输入/输出格式、分区列等。

如何获取特定schema下所有表的详细信息

问题描述投票：0回答：2

2个回答

最新问题

如何获取特定schema下所有表的详细信息

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2