我尝试从 Eclipse 运行 Elastic MapReduce,但未能成功。
我的代码如下:
public class RunEMR {
/**
* @param args
*/
public static void main(String[] args) {
// TODO Auto-generated method stub
AWSCredentials credentials = new BasicAWSCredentials("xxxx","xxxx");
AmazonElasticMapReduceClient emr = new AmazonElasticMapReduceClient(credentials);
StepFactory stepFactory = new StepFactory();
StepConfig enableDebugging = new StepConfig()
.withName("Enable Debugging")
.withActionOnFailure("TERMINATE_JOB_FLOW")
.withHadoopJarStep(stepFactory.newEnableDebuggingStep());
StepConfig installHive = new StepConfig()
.withName("Install Hive")
.withActionOnFailure("TERMINATE_JOB_FLOW")
.withHadoopJarStep(stepFactory.newInstallHiveStep());
StepConfig hiveScript = new StepConfig().withName("Hive Script")
.withActionOnFailure("TERMINATE_JOB_FLOW")
.withHadoopJarStep(stepFactory.newRunHiveScriptStep("s3://mywordcountbuckett/binary/WordCount.jar"));
RunJobFlowRequest request = new RunJobFlowRequest()
.withName("Hive Interactive")
.withSteps(enableDebugging, installHive)
.withLogUri("s3://mywordcountbuckett/")
.withInstances(new JobFlowInstancesConfig()
.withEc2KeyName("xxxx")
.withHadoopVersion("0.20")
.withInstanceCount(3)
.withKeepJobFlowAliveWhenNoSteps(true)
.withMasterInstanceType("m1.small")
.withSlaveInstanceType("m1.small"));
RunJobFlowResult result = emr.runJobFlow(request);
}
}
我得到的错误是:
Exception in thread "main" com.amazonaws.AmazonServiceException: InstanceProfile is required for creating cluster. (Service: AmazonElasticMapReduce; Status Code: 400; Error Code: ValidationException; Request ID: 7a96ee32-9744-11e5-947d-65ca8f7db0a5
我已经尝试了几个小时但无法修复它。有谁知道怎么办吗
我也遇到了同样的异常
InstanceProfile is required for creating cluster
。
必须设置服务角色和工作流程角色,如下所示
aRunJobFlowRequest.setServiceRole("EMR_DefaultRole")
aRunJobFlowRequest.setJobFlowRole("EMR_EC2_DefaultRole")
之后我就好了。
AWS Identity and Access Management (IAM) 角色为 IAM 用户或 AWS 服务提供了一种拥有某些指定权限和资源访问权限的方法。例如,这可能允许用户访问资源或其他服务以代表您行事。您必须为集群指定两个 IAM 角色:一个用于 Amazon EMR 服务的角色(服务角色),一个用于 Amazon EMR 管理的 EC2 实例(实例配置文件)的角色。
因此异常消息中的单词
InstanceProfile
可能意味着文档中的a role for the EC2 instances (instance profile)
,但在指定JobFlowRole
后我通过了该异常。有点奇怪。
对于 ec2 角色(此处为 jobflowrole),内部会创建一个同名的实例配置文件。因此,它可以互换使用这些名称。 如果您使用 boto3 从头开始创建 emr 集群,您还应该创建 emr 服务角色、一个 ec2jobflow 角色、一个链接到 ec2jobflow 角色的实例配置文件。 AWS 文档
您尝试使用的版本已弃用,并且需要 IAM 角色。请遵循文档 http://docs.aws.amazon.com/ElasticMapReduce/latest/ManagementGuide/calling-emr-with-java-sdk.html 中给出的示例。
我也面临同样的气流问题
添加参数 JobFlowRole 修复了 EmrCreateJobFlowOperator
的问题{
"Name": "",
"LogUri": "",
"ReleaseLabel": "",
"ServiceRole": "",
"JobFlowRole": "",
"SecurityConfiguration": "",
"AutoTerminationPolicy": {
"IdleTimeout": 120
},
"ScaleDownBehavior": "",
"EbsRootVolumeSize": 50,
"Instances": {
"InstanceGroups": [
{
"Name": "",
"Market": "",
"InstanceRole": "",
"InstanceType": "",
"InstanceCount": 1,
"EbsConfiguration": {
"EbsBlockDeviceConfigs": [
{
"VolumeSpecification": {
"VolumeType": "",
"SizeInGB": 32
},
"VolumesPerInstance": 2
}
]
}
},
{
"Name": "",
"Market": "",
"InstanceRole": "",
"InstanceType": "",
"InstanceCount": 1,
"EbsConfiguration": {
"EbsBlockDeviceConfigs": [
{
"VolumeSpecification": {
"VolumeType": "",
"SizeInGB": 32
},
"VolumesPerInstance": 2
}
]
}
},
{
"Name": "",
"Market": "",
"InstanceRole": "",
"InstanceType": "",
"InstanceCount": 1,
"EbsConfiguration": {
"EbsBlockDeviceConfigs": [
{
"VolumeSpecification": {
"VolumeType": "",
"SizeInGB": 32
},
"VolumesPerInstance": 2
}
]
}
}
],
"KeepJobFlowAliveWhenNoSteps": false,
"TerminationProtected": false,
"EmrManagedMasterSecurityGroup": "",
"EmrManagedSlaveSecurityGroup": "",
"ServiceAccessSecurityGroup": "",
"Ec2KeyName": "",
"Ec2SubnetId": ""
},
"Applications": [
{
"Name": ""
}
],
"Configurations": [
{
"Classification": "",
"Properties": {
"hive.server2.authentication": "",
"hive.server2.thrift.port": "1"
}
}
],
"BootstrapActions": [
{
"Name": "",
"ScriptBootstrapAction": {
"Path": "",
"Args": []
}
}
],
"StepConcurrencyLevel": 4
}