我正在使用 Apache NIFI
1.28
版本,我正在尝试创建一个简约的数据流,在其中生成数据并希望在“HDP(Hortonworks 数据平台)2.5.0”中摄取HDFS
,我得到了以下错误,
2024-10-31 12:19:20,860 ERROR [Timer-Driven Process Thread-10] o.apache.nifi.processors.hadoop.PutHDFS PutHDFS[id=d2b7ad77-0192-1000-e937-9d1797c4e0fd] Failed to write to HDFS
java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configurable
at java.base/java.lang.ClassLoader.defineClass1(Native Method)
at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1023)
at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
at java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862)
at java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760)
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681)
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:579)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:579)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:579)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:579)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
at java.base/java.lang.Class.forName0(Native Method)
at java.base/java.lang.Class.forName(Class.java:529)
at java.base/java.lang.Class.forName(Class.java:508)
at org.apache.nifi.processors.hadoop.ExtendedConfiguration.getClassByNameOrNull(ExtendedConfiguration.java:70)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2617)
at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:132)
at org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:182)
at org.apache.nifi.processors.hadoop.AbstractHadoopProcessor.getCompressionCodec(AbstractHadoopProcessor.java:605)
at org.apache.nifi.processors.hadoop.PutHDFS$1.run(PutHDFS.java:341)
at java.base/java.security.AccessController.doPrivileged(AccessController.java:400)
at java.base/javax.security.auth.Subject.doAs(Subject.java:453)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1930)
at org.apache.nifi.processors.hadoop.PutHDFS.onTrigger(PutHDFS.java:328)
at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1361)
at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:247)
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:102)
at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:358)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base/java.lang.Thread.run(Thread.java:1570)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configurable
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
... 37 common frames omitted
2024-10-31 12:19:21,002 INFO [Write-Ahead Local State Provider Maintenance] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog@2cb95c6c checkpointed with 2 Records and 0 Swap Files in 49 milliseconds (Stop-the-world time = 21 milliseconds, Clear Edit Logs time = 14 millis), max Transaction ID 105
2024-10-31 12:19:24,899 INFO [pool-7-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile Repository
2024-10-31 12:19:24,908 INFO [pool-7-thread-1] o.a.n.wali.SequentialAccessWriteAheadLog Checkpointed Write-Ahead Log with 4 Records and 0 Swap Files in 9 milliseconds (Stop-the-world time = 4 milliseconds), max Transaction ID 292
2024-10-31 12:19:24,909 INFO [pool-7-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 4 records in 9 milliseconds
2024-10-31 12:19:29,755 INFO [NiFi Web Server-41] o.a.n.c.s.StandardProcessScheduler Stopping PutHDFS[id=d2b7ad77-0192-1000-e937-9d1797c4e0fd]
2024-10-31 12:19:29,755 INFO [NiFi Web Server-41] o.a.n.controller.StandardProcessorNode Stopping processor: PutHDFS[id=d2b7ad77-0192-1000-e937-9d1797c4e0fd]
2024-10-31 12:19:29,757 INFO [Timer-Driven Process Thread-10] o.a.n.c.s.TimerDrivenSchedulingAgent Stopped scheduling PutHDFS[id=d2b7ad77-0192-1000-e937-9d1797c4e0fd] to run
2024-10-31 12:19:29,757 INFO [NiFi Web Server-41] o.a.n.c.s.StandardProcessScheduler Stopping GenerateFlowFile[id=d2b75046-0192-1000-10ed-4952d26261bc]
2024-10-31 12:19:29,757 INFO [NiFi Web Server-41] o.a.n.controller.StandardProcessorNode Stopping processor: GenerateFlowFile[id=d2b75046-0192-1000-10ed-4952d26261bc]
2024-10-31 12:19:29,757 INFO [Timer-Driven Process Thread-3] o.a.n.c.s.TimerDrivenSchedulingAgent Stopped scheduling GenerateFlowFile[id=d2b75046-0192-1000-10ed-4952d26261bc] to run
2024-10-31 12:19:29,758 INFO [Timer-Driven Process Thread-3] o.a.n.controller.StandardProcessorNode GenerateFlowFile[id=d2b75046-0192-1000-10ed-4952d26261bc] has completely stopped. Completing any associated Futures.
2024-10-31 12:19:29,762 WARN [org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner] org.apache.hadoop.fs.FileSystem Cleaner thread interrupted, will stop
java.lang.InterruptedException: null
at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1722)
at java.base/java.lang.ref.ReferenceQueue.await(ReferenceQueue.java:67)
at java.base/java.lang.ref.ReferenceQueue.remove0(ReferenceQueue.java:158)
at java.base/java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:234)
at org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner.run(FileSystem.java:4157)
at java.base/java.lang.Thread.run(Thread.java:1570)
2024-10-31 12:19:29,763 INFO [Timer-Driven Process Thread-10] o.a.n.controller.StandardProcessorNode PutHDFS[id=d2b7ad77-0192-1000-e937-9d1797c4e0fd] has completely stopped. Completing any associated Futures.
2024-10-31 12:19:30,327 INFO [Flow Service Tasks Thread-2] o.a.nifi.controller.StandardFlowService Saved flow controller org.apache.nifi.controller.FlowController@7e46c2ea // Another save pending = false
2024-10-31 12:19:44,914 INFO [pool-7-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile Repository
2024-10-31 12:19:44,914 INFO [pool-7-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 4 records in 0 milliseconds
2024-10-31 12:20:04,921 INFO [pool-7-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile Repository
2024-10-31 12:20:04,921 INFO [pool-7-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 4 records in 0 milliseconds
2024-10-31 12:20:20,880 INFO [Cleanup Archive for default] o.a.n.c.repository.FileSystemRepository Successfully deleted 0 files (0 bytes) from archive
2024-10-31 12:20:20,881 INFO [Cleanup Archive for default] o.a.n.c.repository.FileSystemRepository Archive cleanup completed for container default; will now allow writing to this container. Bytes used = 184.4 GB, bytes free = 1.02 GB, capacity = 185.42 GB
2024-10-31 12:20:24,934 INFO [pool-7-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile Repository
2024-10-31 12:20:24,934 INFO [pool-7-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 4 records in 0 milliseconds
2024-10-31 12:20:44,938 INFO [pool-7-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile Repository
2024-10-31 12:20:44,938 INFO [pool-7-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 4 records in 0 milliseconds
2024-10-31 12:21:04,949 INFO [pool-7-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Initiating checkpoint of FlowFile Repository
2024-10-31 12:21:04,950 INFO [pool-7-thread-1] o.a.n.c.r.WriteAheadFlowFileRepository Successfully checkpointed FlowFile Repository with 4 records in 0 milliseconds
2024-10-31 12:21:20,921 INFO [Cleanup Archive for default] o.a.n.c.repository.FileSystemRepository Successfully deleted 0 files (0 bytes) from archive
2024-10-31 12:21:20,922 INFO [Cleanup Archive for default] o.a.n.c.repository.FileSystemRepository Archive cleanup completed for container default; will now allow writing to
下面是
puthdfs
处理器配置
下面是处理器组的整个数据流以及错误:
我还添加了一个简约的
core-site.xml
配置,我从 HDP 2.5
导入并放置在 /conf
目录中,并在 puthdfs hdfs-site.xml
中与 processor
一起配置其路径。
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.168.136.131:8020</value>
<final>true</final>
</property>
<property>
<name>fs.trash.interval</name>
<value>360</value>
</property>
<property>
<name>ha.failover-controller.active-standby-elector.zk.op.retries</name>
<value>120</value>
</property>
<property>
<name>hadoop.http.authentication.simple.anonymous.allowed</name>
<value>true</value>
</property>
<property>
<name>hadoop.proxyuser.falcon.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.falcon.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hbase.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hbase.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hcat.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hcat.hosts</name>
<value>sandbox.hortonworks.com</value>
</property>
<property>
<name>hadoop.proxyuser.hdfs.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hdfs.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hive.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hive.hosts</name>
<value>sandbox.hortonworks.com</value>
</property>
<property>
<name>hadoop.proxyuser.hue.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hue.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.livy.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.livy.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.hosts</name>
<value>sandbox.hortonworks.com</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>sandbox.hortonworks.com</value>
</property>
<property>
<name>hadoop.security.auth_to_local</name>
<value>DEFAULT</value>
</property>
<property>
<name>hadoop.security.authentication</name>
<value>simple</value>
</property>
<property>
<name>hadoop.security.authorization</name>
<value>false</value>
</property>
<property>
<name>hadoop.security.key.provider.path</name>
<value></value>
</property>
<property>
<name>io.compression.codec.lzo.class</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>
<property>
<name>io.compression.codecs</name>
<value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.SnappyCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>io.serializations</name>
<value>org.apache.hadoop.io.serializer.WritableSerialization</value>
</property>
<property>
<name>ipc.client.connect.max.retries</name>
<value>50</value>
</property>
<property>
<name>ipc.client.connection.maxidletime</name>
<value>30000</value>
</property>
<property>
<name>ipc.client.idlethreshold</name>
<value>8000</value>
</property>
<property>
<name>ipc.server.tcpnodelay</name>
<value>true</value>
</property>
<property>
<name>mapreduce.jobtracker.webinterface.trusted</name>
<value>false</value>
</property>
<property>
<name>net.topology.script.file.name</name>
<value>/etc/hadoop/conf/topology_script.py</value>
</property>
<property>
<name>hadoop.proxyuser.nifi.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.nifi.hosts</name>
<value>*</value>
</property>
</configuration>
如果有人能帮助我解决我做错的地方,我将不胜感激。
我通过在 Windows 11 计算机上从技术堆栈从
HDP2.5
切换到独立的 hadoop 安装来解决所有这些问题。由于 HDP2.5
没有更多可用支持,这就是为什么在复制时无法解析 datanodes
的内部 ip。以下hdfs-site.xml
和core-site.xml
我已包含在apache nifi安装目录中,即
C: ifi-1.28.0-bin ifi-1.28。