我试图将hcatalog与sqoop集成,以便将数据从rdbms(oracle)导入到data lake(在蜂巢中。
sqoop-import --connect connection-string --username用户名--password pass --table --hcatalog-database data_extraction --hcatalog-table --hcatalog-storage-stanza'存储为orcfile'-m1 --verbose
作业已成功执行,但无法找到数据。另外,检查了在hcatalog中创建的表的位置后,检查了位置,发现该目录未为此创建任何目录,并且仅找到0字节文件_ $ folder $。
please found the stack trace :
19/09/25 17:53:37 INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
19/09/25 17:54:02 DEBUG db.DBConfiguration: Fetching password from job credentials store
19/09/25 17:54:03 INFO db.DBInputFormat: Using read commited transaction isolation
19/09/25 17:54:03 DEBUG db.DataDrivenDBInputFormat: Creating input split with lower bound '1=1' and upper bound '1=1'
19/09/25 17:54:03 INFO mapreduce.JobSubmitter: number of splits:1
19/09/25 17:54:03 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1569355854349_1231
19/09/25 17:54:04 INFO impl.YarnClientImpl: Submitted application application_1569355854349_1231
19/09/25 17:54:04 INFO mapreduce.Job: The url to track the job: http://<PII-removed-by-me>/application_1569355854349_1231/
19/09/25 17:54:04 INFO mapreduce.Job: Running job: job_1569355854349_1231
19/09/25 17:57:34 INFO hive.metastore: Closed a connection to metastore, current connections: 1
19/09/25 18:02:59 INFO mapreduce.Job: Job job_1569355854349_1231 running in uber mode : false
19/09/25 18:02:59 INFO mapreduce.Job: map 0% reduce 0%
19/09/25 18:03:16 INFO mapreduce.Job: map 100% reduce 0%
19/09/25 18:03:18 INFO mapreduce.Job: Job job_1569355854349_1231 completed successfully
19/09/25 18:03:18 INFO mapreduce.Job: Counters: 35
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=425637
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=87
HDFS: Number of bytes written=0
HDFS: Number of read operations=1
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
S3: Number of bytes read=0
S3: Number of bytes written=310154
S3: Number of read operations=0
S3: Number of large read operations=0
S3: Number of write operations=0
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=29274
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=14637
Total vcore-milliseconds taken by all map tasks=14637
Total megabyte-milliseconds taken by all map tasks=52459008
Map-Reduce Framework
Map input records=145608
Map output records=145608
Input split bytes=87
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=199
CPU time spent (ms)=4390
Physical memory (bytes) snapshot=681046016
Virtual memory (bytes) snapshot=5230788608
Total committed heap usage (bytes)=1483210752
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
19/09/25 18:03:18 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 582.8069 seconds (0 bytes/sec)
19/09/25 18:03:18 INFO mapreduce.ImportJobBase: Retrieved 145608 records.
19/09/25 18:03:18 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table null
19/09/25 18:03:19 DEBUG util.ClassLoaderStack: Restoring classloader: sun.misc.Launcher$AppClassLoader@1d548a08
解决了。由于我们使用的是AWS EMR(托管式hadoop服务),已经在他们的网站上提到过。Aws Forum Screenshot
[当您使用Sqoop将输出写入Amazon S3中的HCatalog表时,通过将mapred.output.direct.NativeS3FileSystem和mapred.output.direct.EmrFileSystem属性设置为false来禁用Amazon EMR直接写入。有关更多信息,请参见使用HCatalog。您可以使用Hadoop -D mapred.output.direct.NativeS3FileSystem = false和-D mapred.output.direct.EmrFileSystem = false命令。
[如果您不禁用直接写入,则不会发生任何错误,但是表是在Amazon S3中创建的,并且不写入任何数据。
可以在https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-sqoop-considerations.html找到