使用与上一个问题相同的数据框,
pl.DataFrame(
[
pl.Series("Time", ['02/01/2018 07:05', '02/01/2018 07:07', '02/01/2018 07:08', '02/01/2018 07:09', '02/01/2018 07:10', '02/01/2018 07:12', '02/01/2018 07:13', '02/01/2018 07:14', '02/01/2018 07:18', '02/01/2018 07:26', '02/01/2018 07:38', '02/01/2018 07:39', '02/01/2018 07:45', '02/01/2018 07:48', '02/01/2018 07:49', '02/01/2018 07:50', '02/01/2018 07:52', '02/01/2018 07:53', '02/01/2018 07:56', '02/01/2018 07:57'], dtype=pl.Utf8),
pl.Series("Open", [8.05, 8.04, 8.02, 8.02, 8.02, 8.01, 8.01, 8.0, 7.99, 7.99, 7.99, 7.99, 7.99, 7.98, 7.96, 7.96, 7.95, 7.94, 7.94, 7.93], dtype=pl.Float64),
pl.Series("High", [8.05, 8.04, 8.02, 8.02, 8.02, 8.01, 8.01, 8.0, 7.99, 7.99, 7.99, 7.99, 7.99, 7.98, 7.96, 7.96, 7.95, 7.94, 7.94, 7.93], dtype=pl.Float64),
pl.Series("Low", [8.05, 8.01, 8.01, 8.02, 8.02, 8.01, 8.01, 8.0, 7.99, 7.99, 7.98, 7.99, 7.99, 7.98, 7.94, 7.96, 7.94, 7.94, 7.94, 7.93], dtype=pl.Float64),
pl.Series("Close", [8.05, 8.01, 8.02, 8.02, 8.02, 8.01, 8.01, 8.0, 7.99, 7.99, 7.98, 7.99, 7.99, 7.98, 7.94, 7.96, 7.94, 7.94, 7.94, 7.93], dtype=pl.Float64),
pl.Series("MA14", [8.13, 8.12, 8.11, 8.11, 8.1, 8.09, 8.08, 8.07, 8.06, 8.05, 8.04, 8.03, 8.02, 8.0, 8.0, 7.99, 7.99, 7.98, 7.98, 7.97], dtype=pl.Float64),
pl.Series("MA28", [8.1, 8.09, 8.09, 8.09, 8.09, 8.09, 8.09, 8.09, 8.09, 8.08, 8.08, 8.08, 8.07, 8.07, 8.06, 8.06, 8.05, 8.04, 8.04, 8.03], dtype=pl.Float64),
pl.Series("PVT", [0.0, -0.3478, -0.2904, -0.2904, -0.2904, -0.3527, -0.3527, -0.3677, -0.374, -0.374, -0.404, -0.3376, -0.3376, -0.3489, -0.4792, -0.459, -0.6224, -0.6224, -0.6224, -0.6362], dtype=pl.Float64),
]
)
并使用之前的答案添加更多列,
ohlc = jan_data.with_columns(
DateTime = pl.col("Time").str.to_datetime("%d/%m/%Y %H:%M")
).with_columns(
Date = pl.col("DateTime").dt.date().set_sorted(),
t = pl.col("DateTime").dt.time(),
ones = pl.lit(1),
vol = (2*(pl.col("High")-pl.col("Low"))/(pl.col("Open")+pl.col("Close"))).round(5),
).with_columns(
MA = pl.col("Close").rolling_mean(20).over("Date"),
n = pl.col("ones").cumsum().over("Date")
).select(pl.exclude("ones"))
我想尝试使用
group_by_rolling
方法创建前滚列表:
def list_to_string(a):
t = '['
t = t + ','.join(str(x) for x in a)
t = t + ']'
return t
out = ohlc.group_by_rolling(
index_column = 'n',
period = '20i',
offset = '0i',
by = "Date",
).agg(
pl.col("Close").map_elements(lambda col: list_to_string(col.to_list())).alias("lists"),
)
问题是
print(out.head(10))
是,
shape: (10, 3)
┌────────────┬─────┬───────────────────────────────────┐
│ Date ┆ n ┆ lists │
│ --- ┆ --- ┆ --- │
│ date ┆ i32 ┆ str │
╞════════════╪═════╪═══════════════════════════════════╡
│ 2018-01-02 ┆ 1 ┆ [8.01,8.02,8.02,8.02,8.01,8.01,8… │
│ 2018-01-02 ┆ 2 ┆ [8.02,8.02,8.02,8.01,8.01,8.0,7.… │
│ 2018-01-02 ┆ 3 ┆ [8.02,8.02,8.01,8.01,8.0,7.99,7.… │
│ 2018-01-02 ┆ 4 ┆ [8.02,8.01,8.01,8.0,7.99,7.99,7.… │
│ … ┆ … ┆ … │
│ 2018-01-02 ┆ 7 ┆ [8.0,7.99,7.99,7.98,7.99,7.99,7.… │
│ 2018-01-02 ┆ 8 ┆ [7.99,7.99,7.98,7.99,7.99,7.98,7… │
│ 2018-01-02 ┆ 9 ┆ [7.99,7.98,7.99,7.99,7.98,7.94,7… │
│ 2018-01-02 ┆ 10 ┆ [7.98,7.99,7.99,7.98,7.94,7.96,7… │
└────────────┴─────┴───────────────────────────────────┘
所以大部分原来的列都被删除了。有没有办法留住他们?
任何
group_by
/agg
的结果都将是您告诉它您关心的列,仅此而已。
尚不清楚您希望以何种形式保留分组时的原始列,但这是最通用的方式,其中每个列都将拥有其组的所有原始值的列表。
ohlc.group_by_rolling(
index_column = 'n',
period = '20i',
offset = '0i',
by = "Date",
).agg(
pl.col("Close").map_elements(lambda col: list_to_string(col.to_list())).alias("lists"),
pl.exclude('n','Date')
)
shape: (20, 15)
┌────────────┬─────┬────────────┬────────────┬───┬────────────┬────────────┬───────────┬───────────┐
│ Date ┆ n ┆ lists ┆ Time ┆ … ┆ DateTime ┆ t ┆ vol ┆ MA │
│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │
│ date ┆ i32 ┆ str ┆ list[str] ┆ ┆ list[datet ┆ list[time] ┆ list[f64] ┆ list[f64] │
│ ┆ ┆ ┆ ┆ ┆ ime[μs]] ┆ ┆ ┆ │
╞════════════╪═════╪════════════╪════════════╪═══╪════════════╪════════════╪═══════════╪═══════════╡
│ 2018-01-02 ┆ 1 ┆ [8.01,8.02 ┆ ["02/01/20 ┆ … ┆ [2018-01-0 ┆ [07:07:00, ┆ [0.00374, ┆ [null, │
│ ┆ ┆ ,8.02,8.02 ┆ 18 07:07", ┆ ┆ 2 ┆ 07:08:00, ┆ 0.00125, ┆ null, … │
│ ┆ ┆ ,8.01,8.01 ┆ "02/01/201 ┆ ┆ 07:07:00, ┆ … ┆ … 0.0] ┆ 7.9855] │
│ ┆ ┆ ,8… ┆ 8… ┆ ┆ 2018-01-02 ┆ 07:57:00] ┆ ┆ │
│ ┆ ┆ ┆ ┆ ┆ … ┆ ┆ ┆ │
│ 2018-01-02 ┆ 2 ┆ [8.02,8.02 ┆ ["02/01/20 ┆ … ┆ [2018-01-0 ┆ [07:08:00, ┆ [0.00125, ┆ [null, │
│ ┆ ┆ ,8.02,8.01 ┆ 18 07:08", ┆ ┆ 2 ┆ 07:09:00, ┆ 0.0, … ┆ null, … │
│ ┆ ┆ ,8.01,8.0, ┆ "02/01/201 ┆ ┆ 07:08:00, ┆ … ┆ 0.0] ┆ 7.9855] │
│ ┆ ┆ 7.… ┆ 8… ┆ ┆ 2018-01-02 ┆ 07:57:00] ┆ ┆ │
│ ┆ ┆ ┆ ┆ ┆ … ┆ ┆ ┆ │
│ 2018-01-02 ┆ 3 ┆ [8.02,8.02 ┆ ["02/01/20 ┆ … ┆ [2018-01-0 ┆ [07:09:00, ┆ [0.0, ┆ [null, │
│ ┆ ┆ ,8.01,8.01 ┆ 18 07:09", ┆ ┆ 2 ┆ 07:10:00, ┆ 0.0, … ┆ null, … │
│ ┆ ┆ ,8.0,7.99, ┆ "02/01/201 ┆ ┆ 07:09:00, ┆ … ┆ 0.0] ┆ 7.9855] │
│ ┆ ┆ 7.… ┆ 8… ┆ ┆ 2018-01-02 ┆ 07:57:00] ┆ ┆ │
│ ┆ ┆ ┆ ┆ ┆ … ┆ ┆ ┆ │
│ 2018-01-02 ┆ 4 ┆ [8.02,8.01 ┆ ["02/01/20 ┆ … ┆ [2018-01-0 ┆ [07:10:00, ┆ [0.0, ┆ [null, │
│ ┆ ┆ ,8.01,8.0, ┆ 18 07:10", ┆ ┆ 2 ┆ 07:12:00, ┆ 0.0, … ┆ null, … │
│ ┆ ┆ 7.99,7.99, ┆ "02/01/201 ┆ ┆ 07:10:00, ┆ … ┆ 0.0] ┆ 7.9855] │
...
│ ┆ ┆ ┆ ┆ ┆ 07:57:00] ┆ ┆ ┆ │
│ -109782-01 ┆ 20 ┆ [] ┆ [] ┆ … ┆ [] ┆ [] ┆ [] ┆ [] │
│ -05 ┆ ┆ ┆ ┆ ┆ ┆ ┆ ┆ │
└────────────┴─────┴────────────┴────────────┴───┴────────────┴────────────┴───────────┴───────────┘
其中
pl.exclude
表示除您放在那里的所有列之外的所有列。这是 pl.all().exclude()
的快捷方式,更直观,但打字更多。由于您已经在 group_by
中请求“n”和“日期”,因此您不想再次使用 pl.all()
请求它们。