在这个问题上已经坚持了一个星期了。
所以我有一个在私有子网中提供服务的fargate容器,我想限制容器单独访问私有网络,但我无法通过私有网络从我的私有ecr存储库中提取图像
启动容器时,出现以下错误:
CannotPullContainerError: ref pull has been retried 5 time(s): failed to copy: httpReadSeeker: failed open: failed to do request: Get 956469741060.dkr.ecr.us-east-1.amazonaws.com/my-ecr-repo:latest: dial tcp 52.216.78.32:443: i/o timeout
因此容器仍在尝试通过公共 IP 拉取 ECR 映像(我的 vpc cidr 是 10.0.0.0/16)。 不用说,一旦我为我的 Fargate 出口打开 0.0.0.0/0,fargate 容器就能够拉取 ecr 映像,但我想避免这种情况,只允许进入/退出私有子网。
我通过在私有子网中启动 ec2 实例来确认 VPC 端点配置,并在上述所有 VPC 端点上运行 nslookup,并且所有端点都返回私有 ip,因此这告诉我端点实际上配置正确
由于 ec2 nslookup 测试,我会假设问题出在我的 Fargate 配置中,这就是 terraform 设置的样子:
resource "aws_ecs_cluster" "test_sdk" {
name = "test-sdk-${var.stage}"
}
resource "aws_ecs_task_definition" "test_task_def" {
family = "test-sdk-${var.stage}"
network_mode = "awsvpc"
task_role_arn = aws_iam_role.ecs_task_execution_role.arn
execution_role_arn = aws_iam_role.ecs_task_execution_role.arn
requires_compatibilities = ["FARGATE"]
cpu = 4096
memory = 8192
container_definitions = jsonencode(
[
{
"name": "test-container",
"image": "${data.aws_caller_identity.self.account_id}.dkr.ecr.${var.region}.amazonaws.com/test-sdk-${var.stage}:latest",
"essential": true,
"portMappings": [
{
"containerPort": var.container_port,
"hostPort": var.container_port
}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "ecs-test-${var.stage}",
"awslogs-region": "${var.region}",
"awslogs-stream-prefix": "streaming"
}
}
}
]
)
}
resource "aws_ecs_service" "test_service" {
name = "test-service"
cluster = aws_ecs_cluster.test_sdk.id
task_definition = aws_ecs_task_definition.test_task_def.arn
launch_type = "FARGATE"
desired_count = 1
network_configuration {
subnets = [data.aws_subnet.private-1.id, data.aws_subnet.private-2.id]
security_groups = [aws_security_group.test-sg.id]
}
load_balancer {
target_group_arn = aws_lb_target_group.test-tg.arn
container_name = "test-container"
container_port = var.container_port
}
}
# Create a security group allowing traffic on container port
resource "aws_security_group" "test-sg" {
name = "test-sg-${var.stage}"
vpc_id = data.aws_vpc.vpc.id
ingress {
from_port = var.container_port
to_port = var.container_port
protocol = "tcp"
cidr_blocks = [
data.aws_subnet.private-1.cidr_block,
data.aws_subnet.private-2.cidr_block
]
}
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = [
data.aws_subnet.private-1.cidr_block,
data.aws_subnet.private-2.cidr_block
] # Allow traffic from private subnet
}
egress {
from_port = var.container_port
to_port = var.container_port
protocol = "tcp"
cidr_blocks = [
data.aws_subnet.private-1.cidr_block,
data.aws_subnet.private-2.cidr_block
] # Allow traffic from private subnet
}
egress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = [
data.aws_subnet.private-1.cidr_block,
data.aws_subnet.private-2.cidr_block
] # Allow traffic from private subnet
}
}
# Create Application Load Balancer
resource "aws_lb" "test" {
name = "test-lb-${var.stage}"
internal = true
load_balancer_type = "application"
security_groups = [aws_security_group.test-sg.id]
subnets = [data.aws_subnet.private-1.id, data.aws_subnet.private-2.id]
}
# Create Target Group
resource "aws_lb_target_group" "test-tg" {
name = "test-tg-${var.stage}"
port = var.container_port
protocol = "HTTP"
target_type = "ip"
vpc_id = data.aws_vpc.vpc.id
health_check {
enabled = true
healthy_threshold = 2
interval = 90
path = "/"
matcher = "200-399"
port = var.container_port
protocol = "HTTP"
timeout = 40
unhealthy_threshold = 2
}
}
# Create listener
resource "aws_lb_listener" "test-listener" {
load_balancer_arn = aws_lb.test.arn
port = var.container_port
protocol = "HTTP"
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.test-tg.arn
}
}
# IAM
resource "aws_iam_role" "ecs_task_execution_role" {
name = "tf-${var.project}-${var.stage}-ecs-task-execution-role"
assume_role_policy = data.aws_iam_policy_document.ecs_assume_role_policy.json
inline_policy {
name = "test-sdk-ecr-repo-policy"
policy = jsonencode({
"Version" : "2012-10-17",
"Statement" : [
{
"Effect" : "Allow",
"Action" : [
"ecr:GetAuthorizationToken",
"ecr:BatchCheckLayerAvailability",
"ecr:GetDownloadUrlForLayer",
"ecr:BatchGetImage",
"logs:CreateLogGroup",
"logs:DescribeLogGroups",
"logs:DescribeLogStreams",
"logs:CreateLogStream",
"logs:PutLogEvents",
"secretsmanager:GetSecretValue",
"events:PutEvents"
],
"Resource" : "*"
}
}
data "aws_iam_policy_document" "ecs_assume_role_policy" {
statement {
actions = [
"sts:AssumeRole"
]
effect = "Allow"
principals {
type = "Service"
identifiers = ["ecs-tasks.amazonaws.com"]
}
}
}
由于 S3 是网关端点,因此它不会在 VPC 上创建网络接口。 因此,即使您的安全组允许流向您的 VPC,但如果不进行一些修改,它也无法获取图像(ECR 在幕后存储在 S3 中)。
正如您在评论中提到的,解决方案是将为 S3 创建的前缀列表 ID 添加到安全组。 本质上,这是将 S3 IP 地址添加为出站通信的允许列表。
本文档概述了详细信息: