在一系列瞬态Python脚本中高效访问数据

Question

Pandoc 有一个接受 Python 片段并使用（例如）Matplotlib 生成图表的过滤器。我想生成从公共数据源（例如 pandas 数据框）生成许多图表的文档。

举个例子：

Here's the first chart:

~~~{.matplotlib}
import sqlite3
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

conn = sqlite3.connect('somedb.db')
query = '''SELECT something'''

df = pd.read_sql_query(query, conn).dropna()
fig, ax = plt.subplots()
ax.something()
~~~

问题是每个图表都必须重新生成数据框，这是昂贵的。我想做的是：

在 Markdown 文档的开头运行一个脚本，创建数据源并使其有效地供后续过滤器调用使用。
使用现有数据源中的数据创建所需数量的图表。
在 pandoc 调用结束时关闭数据源（或者可能使用生存时间参数）。

有什么想法吗？

Answer 1

pandoc-plot的作者在Github中提供了以下答案：

开箱即用，pandoc-plot 过滤器中不会处理您的用例。变成绘图的每个代码块都旨在独立于所有其他代码块。这有很多好处，最重要的是性能——我为书本大小的工作负载编写了

pandoc-plot

，有接近 100 个数字。

使用前导码不起作用的原因是，在 pandoc-plot 渲染图形之前，前导码脚本会被复制粘贴到每个代码块中。因此，数据框的创建仍然会被重复。

我建议您继续使用脚本来包装 pandoc 的使用。例如（假设您使用 bash）：

# Run a script that goes through your expensive computation,
# storing the results as a CSV i
python create-data.py

# Render the document, where plots can reference the file created by 
# your python script instead of re-creating the pandas dataframe for every plot
pandoc -f pandoc-plot ...

# Clean up temporary data file if you know where it is

您可以使用环境变量在上面的 bash 脚本和文档绘图之间进行通信。

在一系列瞬态Python脚本中高效访问数据

问题描述投票：0回答：1

1个回答

最新问题

在一系列瞬态Python脚本中高效访问数据

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1