我想为 ETL 流程生成类似数据流程图的内容。
我想要实现的是拥有(最小的示例)两个表 - 源表和目标表 - 两个表及其属性以及它们之间的一个过程,该过程将数据从第一个表转换为第二个表。在此过程中,我想显示与输入字段、输出字段和公式/表达式的描述的关系。
当然,对于现实世界的场景,我需要多个源表来实现一个目的地和整个链,其中数据从输入表通过多个表流向最终目的地。
我尝试在 graphviz 中生成一些东西,但如果它只是比完全简单的稍微复杂一点,我就不能强迫它保持预期的布局,它只是一团糟。
非常繁琐,但也非常易于编写脚本。根据需要查看 Graphviz 文档,尤其是。 https://graphviz.org/doc/info/shapes.html
digraph ETL {
rankdir=LR
subgraph clusterA {
graph [label="Table A"]
// unfortunately, if we let each attribute have its own node, they end up ordered last-to-first
// not interested in fighting this, so put them all in a table that looks like individual nodes
//
// see below for fix to the above mentioned last-to-first bug
// I am to lazy to apply it to the three Tables
//
node [width=2.5]
TA [shape=none label=<
<table border="0" cellborder="1" cellspacing="4">
<tr><td width="130" port="A1">Attribute 1</td></tr>
<tr><td style="invis"></td></tr>
<tr><td port="A2">Attribute 2</td></tr>
<tr><td style="invis"></td></tr>
<tr><td port="A3">Attribute 3</td></tr>
</table>>]
}
subgraph clusterP {
graph [label="Process P"]
node [shape=none]
subgraph clusterP1 {
graph [label="Attribute 1 Derivation"]
{rank=same
F1 [label=<<table border="0" cellborder="1" cellspacing="0">
<tr>
<td rowspan="3" port="f1" width="160" align="text" balign="left">Function 1<BR align="right"/>IF<BR align="left"/>blah blah<BR align="left"/>more<BR/>blah blah<BR/>and more<BR/>blah blah<BR/>no more</td>
<td port="sa1">Source 1</td>
</tr>
<tr> <td port="sa2">Source 2</td> </tr>
<tr> <td port="sa3">Source 3</td> </tr>
</table>>]
F2 [label=<<table border="0" cellborder="1" cellspacing="0">
<tr>
<td rowspan="3" port="f1" width="160" align="text" balign="left">Function 2<BR align="right"/>IF<BR align="left"/>blah blah<BR align="left"/>more<BR/>blah blah<BR/>and more<BR/>blah blah<BR/>no more</td>
<td port="sa1">Source 1</td>
</tr>
<tr> <td port="sa2">Source 2</td> </tr>
<tr> <td port="sa3">Source 3</td> </tr>
</table>>]
}
F2->F1 [style=invis] // yes, this is backwards & stupid, but it gets the nodes inthe correct order
}
}
subgraph clusterB {
graph [label="Table B"]
TB [shape=none label=<
<table border="0" cellborder="1" cellspacing="4">
<tr><td port="A1">Attribute 1</td></tr>
<tr><td style="invis"></td></tr>
<tr><td port="A2">Attribute 2</td></tr>
<tr><td style="invis"></td></tr>
<tr><td port="A3">Attribute 3</td></tr>
</table>>]
}
subgraph clusterC {
graph [label="Table C"]
TC [shape=none label=<
<table border="0" cellborder="1" cellspacing="4">
<tr><td port="A1">Attribute 1</td></tr>
<tr><td style="invis"></td></tr>
<tr><td port="A2">Attribute 2</td></tr>
<tr><td style="invis"></td></tr>
<tr><td port="A3">Attribute 3</td></tr>
</table>>]
}
TA:A1 -> F1:f1
F1:sa1 -> TB:A1
F1:sa2 -> TB:A2
F1:sa3 -> TC:A1
}