我正在Google Cloud DataProc的hadoop上运行sqoop,以通过Cloud SQL代理访问postgresql,但出现Java依赖项错误:
INFO: First Cloud SQL connection, generating RSA key pair.
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.google.cloud.hadoop.services.agent.job.shim.HadoopRunClassShim.main(HadoopRunClassShim.java:19)
Caused by: java.lang.NoSuchMethodError: com.google.common.base.Stopwatch.createStarted()Lcom/google/common/base/Stopwatch;
at com.google.common.util.concurrent.RateLimiter$SleepingStopwatch$1.<init>(RateLimiter.java:414)
at com.google.common.util.concurrent.RateLimiter$SleepingStopwatch.createFromSystemTimer(RateLimiter.java:413)
at com.google.common.util.concurrent.RateLimiter.create(RateLimiter.java:127)
at com.google.cloud.sql.core.CloudSqlInstance.<init>(CloudSqlInstance.java:73)
at com.google.cloud.sql.core.CoreSocketFactory.lambda$createSslSocket$0(CoreSocketFactory.java:221)
at java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660)
at com.google.cloud.sql.core.CoreSocketFactory.createSslSocket(CoreSocketFactory.java:220)
at com.google.cloud.sql.core.CoreSocketFactory.connect(CoreSocketFactory.java:185)
at com.google.cloud.sql.postgres.SocketFactory.createSocket(SocketFactory.java:71)
at org.postgresql.core.PGStream.<init>(PGStream.java:67)
at org.postgresql.core.v3.ConnectionFactoryImpl.tryConnect(ConnectionFactoryImpl.java:91)
at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:192)
at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:49)
at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:211)
at org.postgresql.Driver.makeConnection(Driver.java:458)
at org.postgresql.Driver.connect(Driver.java:260)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:247)
at org.apache.sqoop.manager.SqlManager.makeConnection(SqlManager.java:904)
at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:59)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:763)
at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:786)
at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:289)
at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:260)
at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:246)
at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:327)
at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1872)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1671)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:106)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:501)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
... 5 more
这将启动集群:
gcloud dataproc clusters create ${CLUSTER_NAME} \
--region=${CLUSTER_REGION} \
--scopes=default,sql-admin \
--initialization-actions=gs://dataproc-initialization-actions/cloud-sql-proxy/cloud-sql-proxy.sh \
--properties=hive:hive.metastore.warehouse.dir=gs://$GCS_BUCKET/export \
--metadata=enable-cloud-sql-hive-metastore=false \
--metadata=additional-cloud-sql-instances=${PSQL_INSTANCE}=tcp:${PSQL_PORT}
这将完成工作:
#!/usr/bin/env bash
export GCS_BUCKET="mybucket"
export CLUSTER_NAME="mycluster"
export CLUSTER_REGION="us-central1"
export SOURCE_DB_NAME="mydb"
export SOURCE_USER="myuser"
export SOURCE_PASSWORD="****"
export SOURCE_HOST="127.0.0.1"
export SOURCE_PORT="5432"
export SQOOP_JAR="gs://$GCS_BUCKET/sqoop-1.4.7.jar"
export AVRO_JAR="gs://$GCS_BUCKET/avro-tools-1.9.1.jar"
export GUAVA_JAR="gs://$GCS_BUCKET/guava-11.0.2.jar"
export PSQL_JAR="gs://$GCS_BUCKET/postgresql-42.2.9.jar"
export PSQL_FACTORY_JAR="gs://$GCS_BUCKET/postgres-socket-factory-1.0.15-jar-with-dependencies.jar"
export INSTANCE_CONNECTION_NAME="myinstance:connection:name"
export CONNECTION_STRING="jdbc:postgresql:///${SOURCE_DB_NAME}?cloudSqlInstance=${INSTANCE_CONNECTION_NAME}&socketFactory=com.google.cloud.sql.postgres.SocketFactory&user=${SOURCE_USER}&password=${SOURCE_PASSWORD}"
gcloud dataproc jobs submit hadoop \
--cluster=$CLUSTER_NAME \
--class=org.apache.sqoop.Sqoop \
--jars=$GUAVA_JAR,$SQOOP_JAR,$PSQL_FACTORY_JAR,$AVRO_JAR,$PSQL_JAR \
--region=$CLUSTER_REGION \
-- import -Dmapreduce.job.user.classpath.first=true \
--connect="${CONNECTION_STRING}" \
--username=${SOURCE_USER} \
--password="${SOURCE_PASSWORD}" \
--target-dir=gs://$GCS_BUCKET/export \
--table=insight_actions \
--as-avrodatafile
我尝试在路径中添加GUAVA_JAR
的不同版本,以为可能是这样,但无法消除错误:guava-11.0.2.jar
,guava-16.0.jar
,guava-18.0.jar
,guava-23.0.jar
,[C0 ]。
[guava-28.2-jre.jar
告诉我dataroc图像是gcloud beta dataflow jobs describe ...
经过进一步研究,我发现Hadoop 2.x覆盖了类路径,因此https://www.googleapis.com/compute/v1/projects/cloud-dataproc/global/images/dataproc-1-3-deb9-20191216-000000-rc01
并将其传递给hadoop。
我也更改为使用特定的sqoop jar代替hadoop260。
所以,我创建了一个the solution is to create an uberjar文件,在其上运行pom.xml
以生成uberjar:
maven package