查看 Datadog AWS 集成文档,我发现提到 AWS 警报可以流式传输到 Datadog 中。据称,您可以在警报集合部分选择两种不同的方法将 AWS CloudWatch 警报发送到 Datadog Event Stream 就在这里。 但没有进一步解释如何做到这一点或应该设置什么来做到这一点。此外,尝试用谷歌搜索“Datadog aws Alarm polling”之类的内容会给您一些其他功能的模糊描述,但不会提供 AWS CloudWatch 警报。
我的问题是这可能吗?
到目前为止我尝试的是设置 DataDog Lambda Forwarder,将 CloudWatch 日志(我想也是指标和警报?)发送到 DD。我给了那个 lambda 许可。我创建了一些 AWS 指标过滤器和 AWS 警报,以在发生特定事件时触发。我运行一些 lambda 代码来引发异常并触发 CloudWatch 警报来更改其状态。
我在 DD 中清楚地看到 lambda 日志,但在 DD 事件中找不到与我的警报相关的任何内容。我认为 DD-AWS 集成不是问题,因为我们在大型组织中使用它,并且它在我加入公司之前就已经配置好了。 我做错了什么?
下面的Cloudformation脚本(我删除了一些部分,所以它不能按原样工作)
Resources:
DatadogForwarderLambda:
Type: AWS::Lambda::Function
Properties:
Description: Pushes logs, metrics and traces from AWS to Datadog.
Role: !GetAtt "DatadogForwarderLambdaRole.Arn"
Handler: lambda_function.lambda_handler
Code:
S3Bucket: config-sandbox
S3Key: 'aws-dd-forwarder-3.38.0.zip'
MemorySize: 1024
Runtime: python3.7
Timeout: 120
Tags:
- Key: "dd_forwarder_version"
Value: 3.38.0
Environment:
Variables:
DD_ENHANCED_METRICS: "false"
DD_API_KEY_SECRET_ARN:
Ref: DdApiKeySecret
DD_S3_BUCKET_NAME: config-sandbox
DD_SITE: datadoghq.com
DD_: datadoghq.com
DD_TAGS_CACHE_TTL_SECONDS: 300
DD_FETCH_LAMBDA_TAGS: true
DD_USE_TCP: false
DD_NO_SSL: false
REDACT_IP: false
REDACT_EMAIL: false
DD_USE_PRIVATE_LINK: false
DD_USE_VPC: false
ReservedConcurrentExecutions: 100
DatadogReadonlyPolicy:
Type: 'AWS::IAM::Policy'
Properties:
PolicyName: !Sub "DatadogReadonlyPolicy"
PolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Action:
- 'cloudwatch:Get*'
- 'cloudwatch:List*'
- 'cloudwatch:DescribeAlarmHistory'
- 'cloudtrail:LookupEvents'
- 'ec2:Describe'
- 's3:GetObject'
- 's3:PutObject'
- 's3:DeleteObject'
- 's3:ListBucket'
- 'lambda:List*'
- 'tag:GetResources'
- 'tag:GetTagKeys'
- 'tag:GetTagValues'
- 'support:*'
Resource: !GetAtt DatadogForwarderLambda.Arn
- Effect: Allow
Action:
- secretsmanager:GetSecretValue
Resource:
- Ref: DdApiKeySecret
Roles:
- !Ref DatadogForwarderLambdaRole
DatadogForwarderLambdaRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service:
- lambda.amazonaws.com
AWS:
- Fn::Sub:
- "arn:aws:iam::${AccountId}:role/human-role/some-role-name"
- { AccountId: !Ref 'AWS::AccountId' }
Action:
- sts:AssumeRole
ManagedPolicyArns:
- arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
- arn:aws:iam::aws:policy/service-role/AWSLambdaVPCAccessExecutionRole
Path: /
PermissionsBoundary:
Fn::Join:
- ''
- - 'arn:aws:iam::'
- Ref: AWS::AccountId
- ':policy/some-organisation-permission-boundary'
RoleName:
Fn::Sub:
- 'a${AIID}-dd-forwarder-lambda-${StackID}'
- { StackID: !Select [4, !Split ["-", !Ref 'AWS::StackId']],
AIID: !Ref AIID }
IncomingQueueHasMessagesExceptionAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
AlarmDescription: Incoming queue has unprocessed messages, new processing round can't be started
AlarmName: !Sub "IncomingQueueHasMessagesExceptionAlarm"
ComparisonOperator: GreaterThanThreshold
Threshold: 0 # no messages are allowed in queue if new round started
EvaluationPeriods: 1
Period: 10
Namespace: dev-logs
MetricName: QueueHasMessagesException
Statistic: Sum
TreatMissingData: missing
IncomingQueueHasMessagesExceptionMetricFilter:
Type: AWS::Logs::MetricFilter
Properties:
LogGroupName:
!Sub '/aws/lambda/${SomeLambdaName}'
FilterPattern: "QueueHasMessagesException"
MetricTransformations:
-
MetricNamespace: dev-logs
MetricName: QueueHasMessagesException
MetricValue: 1
最终我发现我的AWS账户并没有完全集成到DD中。
在 Datadog AWS 集成中,在“常规”选项卡中,有一个可以启用的选项:“启用 CloudWatch 警报收集”