我想了解 Map 函数、从 Pardo 调用的 DoFn 和 Composite 转换之间的用例的区别。
我可以使用以下代码获得相同的结果,以获得我需要为我的管道执行的转换列表。我做了一个样本,说明了我所说的多个阶段。
import apache_beam as beam
def myTransform(line):
line = line * 10
line = line + 5
line = line - 2
return line
class myPTransform(beam.PTransform):
def expand(self, pcoll):
# return pcoll | beam.Map(myTransform)
pcol_output = (pcoll
| beam.Map(lambda line: line * 10)
| beam.Map(lambda line: line + 5)
| beam.Map(lambda line: line - 2)
)
return pcol_output
class mydofunc(beam.DoFn):
def process(self, element):
element = element * 10
element = element + 5
element = element - 2
yield element
with beam.Pipeline() as p:
lines = p | beam.Create([1,2,3,4,5])
### Map Function
manual = (lines
| "Map function" >> beam.Map(myTransform)
| "Print map" >> beam.Map(print))
### Composite Ptransform
ptrans = (lines
| "ptransform call" >> myPTransform()
| "Print ptransform" >> beam.Map(print))
### Do Function
dofnpcol = (lines
| "Dofn call" >> beam.ParDo(mydofunc())
| "Print dofnpcol" >> beam.Map(print))
我应该在什么场景下使用 DoFn 和 Composite Transform? 对于这 3 个选项之间的区别,我可能在这里错过了一个更大的图景。 任何见解都会非常有帮助。