我正在尝试创建从 Redshift Cluster 到 dynamoDB 的 Amazon Glue 作业。连接已建立,但出现以下错误:
调用o160.pyWriteDynamicFrame时发生错误。无法 执行 HTTP 请求:连接到 dynamodb.us-east-1.amazonaws.com:443 [dynamodb.us-east-1.amazonaws.com/52.119.535.345] 失败:连接 超时
胶水连接没有问题,爬虫可以工作。但我不知道为什么会出现这个错误。 Redshift集群的可用区是us-east-1b,所以我将子集设置为对应的子集。
我已点击此链接:https://aws.amazon.com/premiumsupport/knowledge-center/connection-timeout-glue-redshift-rds/并添加了连接,但我仍然收到错误。
Glue 脚本如下:
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
args = getResolvedOptions(sys.argv, ["JOB_NAME"])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args["JOB_NAME"], args)
# Script generated for node Redshift Cluster
RedshiftCluster_node1 = glueContext.create_dynamic_frame.from_catalog(
database="redshift_bbd",
redshift_tmp_dir=args["TempDir"],
table_name="financial_data",
transformation_ctx="RedshiftCluster_node1",
)
# Script generated for node ApplyMapping
ApplyMapping_node2 = ApplyMapping.apply(
frame=RedshiftCluster_node1,
mappings=[
("units_7d", "int", "units_7d", "int"),
("pcogs_total_13w", "decimal", "pcogs_total_13w", "decimal"),
(
"npp_contra_cogs_13w_total",
"decimal",
"npp_contra_cogs_13w_total",
"decimal",
),
("revenue_7d", "decimal", "revenue_7d", "decimal"),
("asin", "string", "asin", "string"),
("netppm_4w", "decimal", "netppm_4w", "decimal"),
],
transformation_ctx="ApplyMapping_node2",
)
# Script generated for node DynamoDB bucket
Datasink1 = glueContext.write_dynamic_frame_from_options(
frame=ApplyMapping_node2,
connection_type="dynamodb",
connection_options={
"dynamodb.output.tableName": "FINANCIAL_DATA",
"dynamodb.throughput.write.percent": "1.0"
}
)
job.commit()