使用与上一个问题相同的数据框,
df = pl.DataFrame(
[
pl.Series("Time", ['02/01/2018 07:05', '02/01/2018 07:07', '02/01/2018 07:08', '02/01/2018 07:09', '02/01/2018 07:10', '02/01/2018 07:12', '02/01/2018 07:13', '02/01/2018 07:14', '02/01/2018 07:18', '02/01/2018 07:26', '02/01/2018 07:38', '02/01/2018 07:39', '02/01/2018 07:45', '02/01/2018 07:48', '02/01/2018 07:49', '02/01/2018 07:50', '02/01/2018 07:52', '02/01/2018 07:53', '02/01/2018 07:56', '02/01/2018 07:57'], dtype=pl.String),
pl.Series("Open", [8.05, 8.04, 8.02, 8.02, 8.02, 8.01, 8.01, 8.0, 7.99, 7.99, 7.99, 7.99, 7.99, 7.98, 7.96, 7.96, 7.95, 7.94, 7.94, 7.93], dtype=pl.Float64),
pl.Series("High", [8.05, 8.04, 8.02, 8.02, 8.02, 8.01, 8.01, 8.0, 7.99, 7.99, 7.99, 7.99, 7.99, 7.98, 7.96, 7.96, 7.95, 7.94, 7.94, 7.93], dtype=pl.Float64),
pl.Series("Low", [8.05, 8.01, 8.01, 8.02, 8.02, 8.01, 8.01, 8.0, 7.99, 7.99, 7.98, 7.99, 7.99, 7.98, 7.94, 7.96, 7.94, 7.94, 7.94, 7.93], dtype=pl.Float64),
pl.Series("Close", [8.05, 8.01, 8.02, 8.02, 8.02, 8.01, 8.01, 8.0, 7.99, 7.99, 7.98, 7.99, 7.99, 7.98, 7.94, 7.96, 7.94, 7.94, 7.94, 7.93], dtype=pl.Float64),
pl.Series("MA14", [8.13, 8.12, 8.11, 8.11, 8.1, 8.09, 8.08, 8.07, 8.06, 8.05, 8.04, 8.03, 8.02, 8.0, 8.0, 7.99, 7.99, 7.98, 7.98, 7.97], dtype=pl.Float64),
pl.Series("MA28", [8.1, 8.09, 8.09, 8.09, 8.09, 8.09, 8.09, 8.09, 8.09, 8.08, 8.08, 8.08, 8.07, 8.07, 8.06, 8.06, 8.05, 8.04, 8.04, 8.03], dtype=pl.Float64),
pl.Series("PVT", [0.0, -0.3478, -0.2904, -0.2904, -0.2904, -0.3527, -0.3527, -0.3677, -0.374, -0.374, -0.404, -0.3376, -0.3376, -0.3489, -0.4792, -0.459, -0.6224, -0.6224, -0.6224, -0.6362], dtype=pl.Float64),
]
)
并使用之前的答案添加更多列,
ohlc = df.with_columns(
DateTime = pl.col("Time").str.to_datetime()
).with_columns(
Date = pl.col("DateTime").dt.date().set_sorted(),
t = pl.col("DateTime").dt.time(),
ones = pl.lit(1),
vol = (2*(pl.col("High")-pl.col("Low"))/(pl.col("Open")+pl.col("Close"))).round(5),
).with_columns(
MA = pl.col("Close").rolling_mean(20).over("Date"),
n = pl.col("ones").cum_sum().over("Date")
).select(pl.exclude("ones"))
我想尝试使用
rolling
方法创建前滚列表:
pl.Config(fmt_str_lengths=100) # increase repr
out = ohlc.rolling(
index_column = 'n',
period = '20i',
offset = '0i',
group_by = "Date",
).agg(
pl.format("[{}]", pl.col("Close").str.concat(",")).alias("lists")
)
问题是
print(out.head())
是,
shape: (5, 3)
┌────────────┬─────┬─────────────────────────────────────────────────────────────────────────────────────────────────┐
│ Date ┆ n ┆ lists │
│ --- ┆ --- ┆ --- │
│ date ┆ i32 ┆ str │
╞════════════╪═════╪═════════════════════════════════════════════════════════════════════════════════════════════════╡
│ 2018-01-02 ┆ 1 ┆ [8.01,8.02,8.02,8.02,8.01,8.01,8.0,7.99,7.99,7.98,7.99,7.99,7.98,7.94,7.96,7.94,7.94,7.94,7.93] │
│ 2018-01-02 ┆ 2 ┆ [8.02,8.02,8.02,8.01,8.01,8.0,7.99,7.99,7.98,7.99,7.99,7.98,7.94,7.96,7.94,7.94,7.94,7.93] │
│ 2018-01-02 ┆ 3 ┆ [8.02,8.02,8.01,8.01,8.0,7.99,7.99,7.98,7.99,7.99,7.98,7.94,7.96,7.94,7.94,7.94,7.93] │
│ 2018-01-02 ┆ 4 ┆ [8.02,8.01,8.01,8.0,7.99,7.99,7.98,7.99,7.99,7.98,7.94,7.96,7.94,7.94,7.94,7.93] │
│ 2018-01-02 ┆ 5 ┆ [8.01,8.01,8.0,7.99,7.99,7.98,7.99,7.99,7.98,7.94,7.96,7.94,7.94,7.94,7.93] │
└────────────┴─────┴─────────────────────────────────────────────────────────────────────────────────────────────────┘
所以大部分原来的列都被删除了。有没有办法留住他们?理想情况下,输出应该是这样的,
shape: (5, 15)
┌──────────────────┬──────┬──────┬───┬──────┬─────┬─────────────────────────────────┐
│ Time ┆ Open ┆ High ┆ … ┆ MA ┆ n ┆ lists │
│ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- │
│ str ┆ f64 ┆ f64 ┆ ┆ f64 ┆ i32 ┆ str │
╞══════════════════╪══════╪══════╪═══╪══════╪═════╪═════════════════════════════════╡
│ 02/01/2018 07:05 ┆ 8.05 ┆ 8.05 ┆ … ┆ null ┆ 1 ┆ [8.01,8.02,8.02,8.02,8.01,8.01… │
│ 02/01/2018 07:07 ┆ 8.04 ┆ 8.04 ┆ … ┆ null ┆ 2 ┆ [8.02,8.02,8.02,8.01,8.01,8.0,… │
│ 02/01/2018 07:08 ┆ 8.02 ┆ 8.02 ┆ … ┆ null ┆ 3 ┆ [8.02,8.02,8.01,8.01,8.0,7.99,… │
│ 02/01/2018 07:09 ┆ 8.02 ┆ 8.02 ┆ … ┆ null ┆ 4 ┆ [8.02,8.01,8.01,8.0,7.99,7.99,… │
│ 02/01/2018 07:10 ┆ 8.02 ┆ 8.02 ┆ … ┆ null ┆ 5 ┆ [8.01,8.01,8.0,7.99,7.99,7.98,… │
└──────────────────┴──────┴──────┴───┴──────┴─────┴─────────────────────────────────┘
任何
group_by
/agg
的结果都将是您告诉它您关心的列,仅此而已。
尚不清楚您希望以何种形式保留分组时的原始列,但这是最通用的方式,其中每个列都将拥有其组的所有原始值的列表。
ohlc.group_by_rolling(
index_column = 'n',
period = '20i',
offset = '0i',
by = "Date",
).agg(
pl.col("Close").map_elements(lambda col: list_to_string(col.to_list())).alias("lists"),
pl.exclude('n','Date')
)
shape: (20, 15)
┌────────────┬─────┬────────────┬────────────┬───┬────────────┬────────────┬───────────┬───────────┐
│ Date ┆ n ┆ lists ┆ Time ┆ … ┆ DateTime ┆ t ┆ vol ┆ MA │
│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │
│ date ┆ i32 ┆ str ┆ list[str] ┆ ┆ list[datet ┆ list[time] ┆ list[f64] ┆ list[f64] │
│ ┆ ┆ ┆ ┆ ┆ ime[μs]] ┆ ┆ ┆ │
╞════════════╪═════╪════════════╪════════════╪═══╪════════════╪════════════╪═══════════╪═══════════╡
│ 2018-01-02 ┆ 1 ┆ [8.01,8.02 ┆ ["02/01/20 ┆ … ┆ [2018-01-0 ┆ [07:07:00, ┆ [0.00374, ┆ [null, │
│ ┆ ┆ ,8.02,8.02 ┆ 18 07:07", ┆ ┆ 2 ┆ 07:08:00, ┆ 0.00125, ┆ null, … │
│ ┆ ┆ ,8.01,8.01 ┆ "02/01/201 ┆ ┆ 07:07:00, ┆ … ┆ … 0.0] ┆ 7.9855] │
│ ┆ ┆ ,8… ┆ 8… ┆ ┆ 2018-01-02 ┆ 07:57:00] ┆ ┆ │
│ ┆ ┆ ┆ ┆ ┆ … ┆ ┆ ┆ │
│ 2018-01-02 ┆ 2 ┆ [8.02,8.02 ┆ ["02/01/20 ┆ … ┆ [2018-01-0 ┆ [07:08:00, ┆ [0.00125, ┆ [null, │
│ ┆ ┆ ,8.02,8.01 ┆ 18 07:08", ┆ ┆ 2 ┆ 07:09:00, ┆ 0.0, … ┆ null, … │
│ ┆ ┆ ,8.01,8.0, ┆ "02/01/201 ┆ ┆ 07:08:00, ┆ … ┆ 0.0] ┆ 7.9855] │
│ ┆ ┆ 7.… ┆ 8… ┆ ┆ 2018-01-02 ┆ 07:57:00] ┆ ┆ │
│ ┆ ┆ ┆ ┆ ┆ … ┆ ┆ ┆ │
│ 2018-01-02 ┆ 3 ┆ [8.02,8.02 ┆ ["02/01/20 ┆ … ┆ [2018-01-0 ┆ [07:09:00, ┆ [0.0, ┆ [null, │
│ ┆ ┆ ,8.01,8.01 ┆ 18 07:09", ┆ ┆ 2 ┆ 07:10:00, ┆ 0.0, … ┆ null, … │
│ ┆ ┆ ,8.0,7.99, ┆ "02/01/201 ┆ ┆ 07:09:00, ┆ … ┆ 0.0] ┆ 7.9855] │
│ ┆ ┆ 7.… ┆ 8… ┆ ┆ 2018-01-02 ┆ 07:57:00] ┆ ┆ │
│ ┆ ┆ ┆ ┆ ┆ … ┆ ┆ ┆ │
│ 2018-01-02 ┆ 4 ┆ [8.02,8.01 ┆ ["02/01/20 ┆ … ┆ [2018-01-0 ┆ [07:10:00, ┆ [0.0, ┆ [null, │
│ ┆ ┆ ,8.01,8.0, ┆ 18 07:10", ┆ ┆ 2 ┆ 07:12:00, ┆ 0.0, … ┆ null, … │
│ ┆ ┆ 7.99,7.99, ┆ "02/01/201 ┆ ┆ 07:10:00, ┆ … ┆ 0.0] ┆ 7.9855] │
...
│ ┆ ┆ ┆ ┆ ┆ 07:57:00] ┆ ┆ ┆ │
│ -109782-01 ┆ 20 ┆ [] ┆ [] ┆ … ┆ [] ┆ [] ┆ [] ┆ [] │
│ -05 ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ │
└────────────┴─────┴────────────┴────────────┴───┴────────────┴────────────┴───────────┴───────────┘
其中
pl.exclude
表示除您放在那里的所有列之外的所有列。这是 pl.all().exclude()
的快捷方式,更直观,但打字更多。由于您已经在 group_by
中请求“n”和“日期”,因此您不想再次使用 pl.all()
请求它们。
根据编辑,您应该执行自连接来恢复列。另外,UDF 可以用表达式复制,所以也在这里。
ohlc.join(ohlc.group_by_rolling(
index_column = 'n',
period = '20i',
offset = '0i',
by = "Date",
).agg(
pl.col("Close").cast(pl.Utf8()).str.concat(',').alias('lists')
).with_columns(
(pl.lit("[") + pl.col('lists') + pl.lit("]")).alias('lists')
), on=['Date','n']
)
shape: (19, 15)
┌──────────────────┬──────┬──────┬──────┬───┬─────────┬──────┬─────┬───────────────────────────────┐
│ Time ┆ Open ┆ High ┆ Low ┆ … ┆ vol ┆ MA ┆ n ┆ lists │
│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ f64 ┆ f64 ┆ f64 ┆ ┆ f64 ┆ f64 ┆ i32 ┆ str │
╞══════════════════╪══════╪══════╪══════╪═══╪═════════╪══════╪═════╪═══════════════════════════════╡
│ 02/01/2018 07:05 ┆ 8.05 ┆ 8.05 ┆ 8.05 ┆ … ┆ 0.0 ┆ null ┆ 1 ┆ [8.01,8.02,8.02,8.02,8.01,8.0 │
│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ 1,8… │
│ 02/01/2018 07:07 ┆ 8.04 ┆ 8.04 ┆ 8.01 ┆ … ┆ 0.00374 ┆ null ┆ 2 ┆ [8.02,8.02,8.02,8.01,8.01,8.0 │
│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ,7.… │
│ 02/01/2018 07:08 ┆ 8.02 ┆ 8.02 ┆ 8.01 ┆ … ┆ 0.00125 ┆ null ┆ 3 ┆ [8.02,8.02,8.01,8.01,8.0,7.99 │
│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ,7.… │
│ 02/01/2018 07:09 ┆ 8.02 ┆ 8.02 ┆ 8.02 ┆ … ┆ 0.0 ┆ null ┆ 4 ┆ [8.02,8.01,8.01,8.0,7.99,7.99 │
│ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ ,7.… │
│ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … ┆ … │
│ 02/01/2018 07:50 ┆ 7.96 ┆ 7.96 ┆ 7.96 ┆ … ┆ 0.0 ┆ null ┆ 16 ┆ [7.94,7.94,7.94,7.93] │
│ 02/01/2018 07:52 ┆ 7.95 ┆ 7.95 ┆ 7.94 ┆ … ┆ 0.00126 ┆ null ┆ 17 ┆ [7.94,7.94,7.93] │
│ 02/01/2018 07:53 ┆ 7.94 ┆ 7.94 ┆ 7.94 ┆ … ┆ 0.0 ┆ null ┆ 18 ┆ [7.94,7.93] │
│ 02/01/2018 07:56 ┆ 7.94 ┆ 7.94 ┆ 7.94 ┆ … ┆ 0.0 ┆ null ┆ 19 ┆ [7.93] │
└──────────────────┴──────┴──────┴──────┴───┴─────────┴──────┴─────┴───────────────────────────────┘