AWS ControlTower 管理的 CloutTrail 创建了帐户跟踪日志,该日志使用 S3 存储桶中的
/org id/AWSLogs/…
日志路径,直到 Landing Zone 3.0 更新将其替换为新日志路径为 /org id/AWSLogs/org id/…
的组织跟踪日志。
https://docs.aws.amazon.com/controltower/latest/userguide/2022-all.html
这对 AWS Athena 分区投影提出了挑战。原始DDL如下:
CREATE EXTERNAL TABLE cloudtrail_logs_partition_projected(
eventVersion STRING,
userIdentity STRUCT<
type: STRING,
principalId: STRING,
arn: STRING,
accountId: STRING,
invokedBy: STRING,
accessKeyId: STRING,
userName: STRING,
sessionContext: STRUCT<
attributes: STRUCT< mfaAuthenticated: STRING, creationDate: STRING>,
sessionIssuer: STRUCT< type: STRING, principalId: STRING, arn: STRING, accountId: STRING, userName: STRING>>>,
eventTime STRING,
eventSource STRING,
eventName STRING,
awsRegion STRING,
sourceIpAddress STRING,
userAgent STRING,
errorCode STRING,
errorMessage STRING,
requestParameters STRING,
responseElements STRING,
additionalEventData STRING,
requestId STRING,
eventId STRING,
readOnly STRING,
resources ARRAY<STRUCT< arn: STRING, accountId: STRING, type: STRING>>,
eventType STRING,
apiVersion STRING,
recipientAccountId STRING,
serviceEventDetails STRING,
sharedEventID STRING,
vpcEndpointId STRING )
PARTITIONED BY ( `accountid` string, `region` string, `date_created` string)
ROW FORMAT SERDE 'com.amazon.emr.hive.serde.CloudTrailSerde'
STORED AS INPUTFORMAT 'com.amazon.emr.cloudtrail.CloudTrailInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://<s3-bucket>/<org-id>/AWSLogs/'
TBLPROPERTIES (
'projection.enabled'='true',
'projection.accountid.type'='injected',
'projection.region.type'='enum',
'projection.region.values'='eu-north-1,ap-south-1,eu-west-3,eu-west-2,eu-west-1,ap-northeast-3,ap-northeast-2,ap-northeast-1,sa-east-1,ca-central-1,ap-southeast-1,ap-southeast-2,eu-central-1,us-east-1,us-east-2,us-west-1,us-west-2',
'projection.date_created.format'='yyyy/MM/dd',
'projection.date_created.interval'='1',
'projection.date_created.interval.unit'='DAYS',
'projection.date_created.range'='2021/01/01,NOW',
'projection.date_created.type'='date',
'storage.location.template'='s3://<s3-bucket-name>/<org-id>/AWSLogs/${accountid}/CloudTrail/${region}/${date_created}')
由于
LOCATION
和 storage.location.template
对于较旧和较新的 s3 对象(即来自 CloudTrail 的日志)来说是不同的,那么查询 cloudtrail 日志(旧的和新的)的最佳解决方案是什么?我更喜欢使用单个 Athena 表来处理旧日志和新日志,但我不确定是否支持多个 LOCATIONS
。
作为解决方法,我使用旧分区投影模板在同一存储桶上创建了第二个 Athena 表,并使用 UNION 跨它们进行查询。