从java应用程序启动数据流作业

问题描述 投票:2回答:2

我正在编写一个试图根据参数提供启动批量数据流管道的应用程序。为此,我使用PipelineOptionsFactory.create().as(...),然后使用setter来配置选项。

但是当我使用Pipeline.create(opts)创建管道对象对象时,我收到以下错误:

04:48:08.711 [pool-4-thread-9] ERROR c.g.c.d.s.r.DataflowPipelineRunner - Unable to convert url (jar:file:/ferric.jar!/) to file.
04:48:08.711 [pool-4-thread-9] WARN  BatchJobManager - unable to start materialization for view
java.lang.RuntimeException: Failed to construct instance from factory method DataflowPipelineRunner#fromOptions(interface com.google.cloud.dataflow.sdk.options.PipelineOptions)
    at com.google.cloud.dataflow.sdk.util.InstanceBuilder.buildFromMethod(InstanceBuilder.java:234)
    at com.google.cloud.dataflow.sdk.util.InstanceBuilder.build(InstanceBuilder.java:163)
    at com.google.cloud.dataflow.sdk.runners.PipelineRunner.fromOptions(PipelineRunner.java:58)
    at com.google.cloud.dataflow.sdk.Pipeline.create(Pipeline.java:135)
    at com.brightcove.rna.dataflow.MaterializationPipeline.<init>(MaterializationPipeline.java:45)
    at com.brightcove.rna.dataflow.MaterializationPipeline.create(MaterializationPipeline.java:92)
    at com.brightcove.rna.ferric.DataflowJobService.createPipeline(DataflowJobService.java:121)
    at javaslang.control.Try.mapTry(Try.java:410)
    at javaslang.control.Try.map(Try.java:380)
    at com.brightcove.rna.ferric.DataflowJobService.create(DataflowJobService.java:102)
    at com.brightcove.rna.ferric.BatchJobScheduler.lambda$null$13(BatchJobScheduler.java:94)
    at javaslang.Value.forEach(Value.java:246)
    at com.brightcove.rna.ferric.BatchJobScheduler.lambda$startMaterializationJobs$14(BatchJobScheduler.java:91)
    at javaslang.control.Try.onSuccess(Try.java:442)
    at com.brightcove.rna.ferric.BatchJobScheduler.startMaterializationJobs(BatchJobScheduler.java:90)
    at com.brightcove.rna.ferric.BatchJobScheduler.run(BatchJobScheduler.java:52)
    at sun.reflect.GeneratedMethodAccessor94.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.springframework.scheduling.support.ScheduledMethodRunnable.run(ScheduledMethodRunnable.java:65)
    at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.InvocationTargetException: null
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at com.google.cloud.dataflow.sdk.util.InstanceBuilder.buildFromMethod(InstanceBuilder.java:223)
    ... 27 common frames omitted
Caused by: java.lang.IllegalArgumentException: Unable to convert url (jar:file:/ferric.jar!/) to file.
    at com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner.detectClassPathResourcesToStage(DataflowPipelineRunner.java:3176)
    at com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner.fromOptions(DataflowPipelineRunner.java:291)
    ... 32 common frames omitted
Caused by: java.lang.IllegalArgumentException: URI is not hierarchical
    at java.io.File.<init>(File.java:418)
    at com.google.cloud.dataflow.sdk.runners.DataflowPipelineRunner.detectClassPathResourcesToStage(DataflowPipelineRunner.java:3172)
    ... 33 common frames omitted

似乎管道运行器正在尝试确定包含它需要上载的类的jar的路径。只有一个jar(uberjar)具有所有必需的类。显然,正在考虑的路径是不正确的。

我可以使用哪些可行的解决方法以编程方式启动数据流作业。

java google-cloud-platform google-cloud-dataflow
2个回答
2
投票

类路径检测和上载逻辑仅限于文件,不支持将jar嵌入其他jar中的场景。您可以通过以下方式解决此问题:

  1. 将多罐罐子压平成一个装有所有提取的罐子的罐子。这很好,因为你保留了一个jar属性,并且不需要编写任何代码来更改管道,但是如果你定期构建它,将使你的构建更复杂。我会查看Maven shade插件并捆绑为您执行此操作或依赖于您的构建系统的等效技术。
  2. 使用更传统的设置,您可以分别指定每个罐子。您可以使用Maven exec插件来帮助构建和启动应用程序的方案。
  3. 在运行时和set the filesToStage property within PipelineOptions中提取所有罐子,并提供您想要暂存的所有资源。
  4. 添加对Apache Beam / Dataflow的嵌入式jar场景的支持。如果你想看看这个贡献,我提交了跟踪issue

还有这个非常相关的SO question,用户使用他们的IDE生成一个uberjar来执行,并且遇到了类似的情况。


0
投票

如果您使用gradle,我可以使用此插件对jar进行着色:

id“com.github.johnrengelman.shadow”version“5.0.0”

© www.soinside.com 2019 - 2024. All rights reserved.