我在使用Flink写带hive目录的Paimon表时遇到异常。
java.lang.ClassNotFoundException: org.apache.hadoop.mapred.JobConf
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at org.apache.hadoop.hive.conf.HiveConf.initialize(HiveConf.java:4051)
at org.apache.hadoop.hive.conf.HiveConf.<init>(HiveConf.java:4019)
at org.apache.paimon.hive.HiveCatalog.createHiveConf(HiveCatalog.java:697)
at org.apache.paimon.hive.HiveCatalog.createHiveConf(HiveCatalog.java:756)
at org.apache.paimon.hive.HiveCatalog.createHiveCatalog(HiveCatalog.java:710)
at org.apache.paimon.hive.HiveCatalogFactory.create(HiveCatalogFactory.java:50)
at org.apache.paimon.catalog.CatalogFactory.createCatalog(CatalogFactory.java:76)
at org.apache.paimon.flink.FlinkCatalogFactory.createCatalog(FlinkCatalogFactory.java:69)
at org.apache.paimon.flink.FlinkCatalogFactory.createCatalog(FlinkCatalogFactory.java:59)
at org.apache.paimon.flink.FlinkCatalogFactory.createCatalog(FlinkCatalogFactory.java:32)
at org.apache.flink.table.factories.FactoryUtil.createCatalog(FactoryUtil.java:488)
问题的直接原因是Hive的依赖存储在Flink JVM classpath中,Hadoop的依赖存储在pipeline.classpaths路径中。
org.apache.hadoop.hive.conf.HiveConf由AppLadLoader加载,而org.apache.hadoop.mapred.JobConf需要由FlinkUserCodeClassLoader加载
====================================================== =========================================
FlinkUserCodeClassLoader的父类是AppClassLoader。所以我在init HiveConf之前添加了代码,我以为HiveConf和JobConf应该可以正确找到,但实际上我还是遇到了同样的异常。
Thread.currentThread().setContextClassLoader(flinkUserCodeClassLoader);
Hadoop和Flink依赖的存储位置不能轻易更改,需要保证生产环境中Flink作业的兼容性。
有熟悉classloader原理的人帮忙指点一下如何解决这个问题吗?
如果您使用 Intellij Idea,请转到设置 -> 应用程序设置并添加选项
Add dependencies with provided scope to the classpath
,这样它可能会得到修复。