Flink状态在重新运行后初始化

Question

我正在尝试连接两个服务器，首先要保留在MapValueState中：RocksDB将数据保存在chekpoint文件夹中，但是在重新运行后，stete为空。我在本地和flink群集中运行它，并在群集中取消提交，然后直接在本地重新运行

 env.setStateBackend(new RocksDBStateBackend(..)
 env.enableCheckpointing(1000)
 ...

   val productDescriptionStream: KeyedStream[ProductDescription, String] = env.addSource(..)
  .keyBy(_.id)

 val productStockStream: KeyedStream[ProductStock, String] = env.addSource(..)
    .keyBy(_.id)

和

  productDescriptionStream
  .connect(productStockStream)
  .process(ProductProcessor())
  .setParallelism(1)

env.execute("Product aggregator")

ProductProcessor

case class ProductProcessor() extends CoProcessFunction[ProductDescription, ProductStock, Product]{
private[this] lazy val stateDescriptor: MapStateDescriptor[String, ProductDescription] =
new MapStateDescriptor[String, ProductDescription](
  "productDescription",
  createTypeInformation[String],
  createTypeInformation[ProductDescription]
)
private[this] lazy val states: MapState[String, ProductDescription] = getRuntimeContext.getMapState(stateDescriptor)

override def processElement1(value: ProductDescription,
ctx: CoProcessFunction[ProductDescription, ProductStock, Product]#Context,out: Collector[Product]
 ): Unit = {
  states.put(value.id, value)
 }}

 override def processElement2(value: ProductStock,
ctx: CoProcessFunction[ProductDescription, ProductStock, Product]#Context, out: Collector[Product]
 ): Unit = {
  if (states.contains(value.id)) {
         val product =Product(
          id = value.id,
          description = Some(states.get(value.id).description),
          stock = Some(value.stock),
          updatedAt = value.updatedAt)
        out.collect(product )
 }}

Answer 1

检查点由Flink创建，用于从故障中恢复，而不是在手动关闭后恢复。取消作业后，默认行为是Flink删除检查点。由于作业不再失败，因此无需恢复。

您有几种选择：

（（1）在取消作业时将检查点配置为retain checkpoints：

CheckpointConfig config = env.getCheckpointConfig();
config.enableExternalizedCheckpoints(
  CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);

然后，当您重新启动作业时，需要指示您希望它从特定的检查点重新启动：

flink run -s <checkpoint-path> ...

否则，每当您开始工作时，它就以一个空状态后端开始。

（（2）而不是取消作业，请使用stop with savepoint：

flink stop [-p targetDirectory] [-d] <jobID>

此后您将再次需要使用flink run -s ...从保存点恢复。

使用保存点停止是一种更清洁的方法，而不是依靠最近的检查点来解决。

（（3）或者，您可以使用Ververica Platform Community Edition，它可以将抽象级别提高到不必亲自管理这些细节的程度。

Flink状态在重新运行后初始化

问题描述投票：0回答：1

1个回答

最新问题

Flink状态在重新运行后初始化

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1