处理分页结果的 Akka Streams 流程未完成

Question

我想实现一个 Flow 来处理分页结果（例如，底层服务返回一些结果，但也表明通过发出另一个请求、传入游标等可以获取更多结果）。

到目前为止我所做的事情：

我已经实现了以下流程并进行了测试，但流程并未完成。

object AdditionalRequestsFlow {

  private def keepRequest[Request, Response](flow: Flow[Request, Response, NotUsed]): Flow[Request, (Request, Response), NotUsed] = {
    Flow.fromGraph(GraphDSL.create() { implicit builder: GraphDSL.Builder[NotUsed] =>
      import GraphDSL.Implicits._
      val in = builder.add(Flow[Request])

      val bcast = builder.add(Broadcast[Request](2))
      val merge = builder.add(Zip[Request, Response]())

      in ~> bcast         ~> merge.in0
            bcast ~> flow ~> merge.in1

      FlowShape(in.in, merge.out)
    })
  }

  def flow[Request, Response, Output](
    inputFlow: Flow[Request, Response, NotUsed],
    anotherRequest: (Request, Response) => Option[Request],
    extractOutput: Response => Output,
    mergeOutput: (Output, Output) => Output
  ): Flow[Request, Output, NotUsed] = {
    Flow.fromGraph(GraphDSL.create() { implicit b =>
      import GraphDSL.Implicits._

      val start = b.add(Flow[Request])
      val merge = b.add(Merge[Request](2))
      val underlying = b.add(keepRequest(inputFlow))
      val unOption = b.add(Flow[Option[Request]].mapConcat(_.toList))
      val unzip = b.add(UnzipWith[(Request, Response), Response, Option[Request]] { case (req, res) =>
        (res, anotherRequest(req, res))
      })
      val finish = b.add(Flow[Response].map(extractOutput)) // this is wrong as we don't keep to 1 Request -> 1 Output, but first let's get the flow to work

      start ~> merge ~> underlying ~> unzip.in
                                      unzip.out0            ~>  finish
               merge <~ unOption   <~ unzip.out1

      FlowShape(start.in, finish.out)
    })
  }       
}

测试：

    import akka.NotUsed
    import akka.actor.ActorSystem
    import akka.stream.ActorMaterializer
    import akka.stream.scaladsl.{Flow, Sink, Source}
    import org.scalatest.FlatSpec
    import org.scalatest.Matchers._
    import cats.syntax.option._
    import org.scalatest.concurrent.ScalaFutures.whenReady

    class AdditionalRequestsFlowSpec extends FlatSpec {
      implicit val system = ActorSystem()
      implicit val materializer = ActorMaterializer()

      case class Request(max: Int, batchSize: Int, offset: Option[Int] = None)
      case class Response(values: List[Int], nextOffset: Option[Int])

      private val flow: Flow[Request, Response, NotUsed] = {
        Flow[Request]
          .map { request =>
            val start = request.offset.getOrElse(0)
            val end = Math.min(request.max, start + request.batchSize)
            val nextOffset = if (end == request.max) None else Some(end)
            val result = Response((start until end).toList, nextOffset)
            result
          }
      }

      "AdditionalRequestsFlow" should "collect additional responses" in {
        def anotherRequest(request: Request, response: Response): Option[Request] = {
          response.nextOffset.map { nextOffset => request.copy(offset = nextOffset.some) }
        }

        def extract(x: Response): List[Int] = x.values
        def merge(a: List[Int], b: List[Int]): List[Int] = a ::: b

        val requests =
          Request(max = 35, batchSize = 10) ::
          Request(max = 5, batchSize = 10) ::
          Request(max = 100, batchSize = 1) ::
          Nil

        val expected = requests.map { x =>
          (0 until x.max).toList
        }

        val future = Source(requests)
          .via(AdditionalRequestsFlow.flow(flow, anotherRequest, extract, merge))
          .runWith(Sink.seq)

        whenReady(future) { x =>
          x shouldEqual expected
        }
      }
    }

以一种可怕的、阻塞的方式实现相同的流程来说明我想要实现的目标：

   def uglyHackFlow[Request, Response, Output](
    inputFlow: Flow[Request, Response, NotUsed],
    anotherRequest: (Request, Response) => Option[Request],
    extractOutput: Response => Output,
    mergeOutput: (Output, Output) => Output
  ): Flow[Request, Output, NotUsed] = {
    implicit val system = ActorSystem()
    implicit val materializer = ActorMaterializer()

    Flow[Request]
      .map { x =>
        def grab(request: Request): Output = {
          val response = Await.result(Source.single(request).via(inputFlow).runWith(Sink.head), 10.seconds) // :(
          val another = anotherRequest(request, response)
          val output = extractOutput(response)
          another.map { another =>
            mergeOutput(output, grab(another))
          } getOrElse output
        }

        grab(x)
      }
  }

这可行（但此时我们不应该具体化任何东西/

Await

）。

审查了http://doc.akka.io/docs/akka/2.4/scala/stream/stream-graphs.html#Graph_cycles__liveness_and_deadlocks，我相信它包含答案，但我似乎在那里找不到它。就我而言，我希望循环在大多数情况下都应包含一个元素，因此缓冲区溢出或完全饥饿都不会发生 - 但显然确实如此。
尝试使用
```
.withAttributes(Attributes(LogLevels(...)))
```
调试流，但是尽管看似正确配置了记录器，但它不会产生任何输出。

我正在寻找如何修复

flow

方法并保持相同签名和语义的提示（测试将通过）。

或者也许我在这里做了一些完全偏离基础的事情（例如，

akka-stream-contrib

中有一个现有的功能可以解决这个问题）？

Answer 1

我认为使用

Source.unfold

比创建自定义图表安全得多。这是我通常做的事情（根据 API 有细微的变化）。

  override def getArticles(lastTokenOpt: Option[String], filterIds: (Seq[Id]) => Seq[Id]): Source[Either[String, ImpArticle], NotUsed] = {

    val maxRows = 1000

    def getUri(cursor: String, count: Int) = s"/works?rows=$count&filter=type:journal-article&order=asc&sort=deposited&cursor=${URLEncoder.encode(cursor, "UTF-8")}"

    Source.unfoldAsync(lastTokenOpt.getOrElse("*")) { cursor =>

      println(s"Getting ${getUri(cursor, maxRows)}")
      if (cursor.nonEmpty) {
        sendGetRequest[CrossRefResponse[CrossRefList[JsValue]]](getUri(cursor, maxRows)).map {
          case Some(response) =>
            response.message match {
              case Left(list) if response.status == "ok" =>

                println(s"Got ${list.items.length} items")
                val items = list.items.flatMap { js =>
                  try {
                    parseArticle(js)
                  } catch {
                    case ex: Throwable =>
                      logger.error(s"Error on parsing: ${js.compactPrint}")
                      throw ex
                  }
                }

                list.`next-cursor` match {
                  case Some(nextCursor) =>
                    Some(nextCursor -> (items.map(Right.apply).toList ::: List(Left(nextCursor))))
                  case None =>
                    logger.error(s"`next-cursor` is missing when fetching from CrossRef [status ${response.status}][${getUri(cursor, maxRows)}]")
                    Some("" -> items.map(Right.apply).toList)
                }
              case Left(jsvalue) if response.status != "ok" =>
                logger.error(s"API error on fetching data from CrossRef [status ${response.status}][${getUri(cursor, maxRows)}]")
                None
              case Right(someError) =>
                val cause = someError.fold(errors => errors.map(_.message).mkString(", "), ex => ex.message)
                logger.error(s"API error on fetching data from CrossRef [status $cause}][${getUri(cursor, maxRows)}]")
                None
            }

          case None =>
            logger.error(s"Got error on fetching ${getUri(cursor, maxRows)} from CrossRef")
            None
        }
      } else
        Future.successful(None)
    }.mapConcat(identity)
  }

在您的情况下，您可能甚至不需要将光标推到流中。我这样做是因为我将最后一个成功的游标存储在数据库中，以便以后在失败时能够恢复。

Answer 2

感觉这个视频涵盖了您想要做的事情的要点。他们创建一个自定义的 Graphstage 来维护状态并将其发送回服务器，响应流取决于发回的状态，他们还有一个事件来指示完成（在您的情况下，这将是您进行此检查的地方

if (end == request.max) None

Answer 3

我准备了一个简单的项目，使用

Source.unfoldAsync

从 REST API 获取的所有页面收集数据。这可能会对某人有所帮助。

GitHub：https://github.com/emaysyuk/akka-streams-pagination

class CatsHttpClientImpl(implicit system: ActorSystem[_], ec: ExecutionContext) extends CatsHttpClient {
  private val logger: Logger = LoggerFactory.getLogger(classOf[CatsHttpClientImpl])
  private val start: Option[String] = Some("https://catfact.ninja/breeds")

  override def getAllBreads: Future[Seq[Cat]] = {
    Source
      .unfoldAsync(start) {
        case Some(next) =>
          val nextChunkFuture: Future[CatsResponse] = sendRequest(next)

          nextChunkFuture.map { resp =>
            resp.nextPageUrl match {
              case Some(url) => Some((Some(url), resp.data))
              case None => Some((None, resp.data))
            }
          }
        case None => Future.successful(None)
      }
      .runWith(Sink.fold(Seq(): Seq[Cat])(_ ++ _))
  }

  private def sendRequest(url: String): Future[CatsResponse] = {
    logger.info(s"CatsHttpClientImpl: Sending request $url")

    val request = HttpRequest(
      uri = Uri(url),
      headers = List(
        RawHeader("Accept", "application/json")
      )
    )
    Http(system).singleRequest(request).flatMap { response =>
      response.status match {
        case StatusCodes.OK =>
          logger.info("CatsHttpClientImpl: Received success")
          Unmarshal(response.entity).to[CatsResponse]

        case _ =>
          logger.error("CatsHttpClientImpl: Received error")
          throw new CatsHttpClientException()
      }
    }
  }
}

处理分页结果的 Akka Streams 流程未完成

问题描述投票：0回答：3

3个回答

最新问题

处理分页结果的 Akka Streams 流程未完成

问题描述 投票：0回答：3

3个回答

最新问题

问题描述投票：0回答：3