AWS Athena 分区投影是否支持多个 `storage.location.template`?

问题描述 投票:0回答:1

AWS ControlTower 管理的 CloutTrail 创建了帐户跟踪日志,该日志使用 S3 存储桶中的

/org id/AWSLogs/…
日志路径,直到 Landing Zone 3.0 更新将其替换为新日志路径为
/org id/AWSLogs/org id/…
的组织跟踪日志。

https://docs.aws.amazon.com/controltower/latest/userguide/2022-all.html

这对 AWS Athena 分区投影提出了挑战。原始DDL如下:

CREATE EXTERNAL TABLE cloudtrail_logs_partition_projected( 
    eventVersion STRING,
    userIdentity STRUCT< 
        type: STRING, 
        principalId: STRING, 
        arn: STRING, 
        accountId: STRING, 
        invokedBy: STRING, 
        accessKeyId: STRING, 
        userName: STRING, 
        sessionContext: STRUCT< 
            attributes: STRUCT< mfaAuthenticated: STRING, creationDate: STRING>, 
            sessionIssuer: STRUCT< type: STRING, principalId: STRING, arn: STRING, accountId: STRING, userName: STRING>>>,
    eventTime STRING,
    eventSource STRING,
    eventName STRING,
    awsRegion STRING,
    sourceIpAddress STRING,
    userAgent STRING, 
    errorCode STRING, 
    errorMessage STRING, 
    requestParameters STRING, 
    responseElements STRING, 
    additionalEventData STRING, 
    requestId STRING, 
    eventId STRING, 
    readOnly STRING, 
    resources ARRAY<STRUCT< arn: STRING, accountId: STRING, type: STRING>>,
    eventType STRING, 
    apiVersion STRING, 
    recipientAccountId STRING, 
    serviceEventDetails STRING, 
    sharedEventID STRING, 
    vpcEndpointId STRING )
PARTITIONED BY ( `accountid` string, `region` string, `date_created` string)
ROW FORMAT SERDE 'com.amazon.emr.hive.serde.CloudTrailSerde'
STORED AS INPUTFORMAT 'com.amazon.emr.cloudtrail.CloudTrailInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://<s3-bucket>/<org-id>/AWSLogs/'
TBLPROPERTIES (
  'projection.enabled'='true', 
  'projection.accountid.type'='injected',
  'projection.region.type'='enum', 
  'projection.region.values'='eu-north-1,ap-south-1,eu-west-3,eu-west-2,eu-west-1,ap-northeast-3,ap-northeast-2,ap-northeast-1,sa-east-1,ca-central-1,ap-southeast-1,ap-southeast-2,eu-central-1,us-east-1,us-east-2,us-west-1,us-west-2', 
  'projection.date_created.format'='yyyy/MM/dd', 
  'projection.date_created.interval'='1', 
  'projection.date_created.interval.unit'='DAYS', 
  'projection.date_created.range'='2021/01/01,NOW', 
  'projection.date_created.type'='date', 
  'storage.location.template'='s3://<s3-bucket-name>/<org-id>/AWSLogs/${accountid}/CloudTrail/${region}/${date_created}')

由于

LOCATION
storage.location.template
对于较旧和较新的 s3 对象(即来自 CloudTrail 的日志)来说是不同的,那么查询 cloudtrail 日志(旧的和新的)的最佳解决方案是什么?我更喜欢使用单个 Athena 表来处理旧日志和新日志,但我不确定是否支持多个
LOCATIONS

amazon-athena amazon-cloudtrail aws-control-tower aws-landing-zone
1个回答
0
投票

作为解决方法,我使用旧分区投影模板在同一存储桶上创建了第二个 Athena 表,并使用 UNION 跨它们进行查询。

© www.soinside.com 2019 - 2024. All rights reserved.