我正在尝试对
df.summary()
数据框执行舍入函数,不包括摘要列。到目前为止,我已经尝试使用 select()
和理解列表,例如
df2 = df.select(*[round(column, 2).alias(column) for column in df.columns])
这是
df2
的输出,分类值转换为 NULL
。
+---------+-------+-------+-------+-------+
| Summary | col 1 | col 2 | col 3 | col 4 |
+---------+-------+-------+-------+-------+
| NULL | 0 | 0.1 | 0.2 | 0.3 |
+---------+-------+-------+-------+-------+
| NULL | 1 | 1.1 | 1.2 | 1.3 |
+---------+-------+-------+-------+-------+
| NULL | 2 | 2.1 | 2.2 | 2.3 |
+---------+-------+-------+-------+-------+
我只想将
columns[1:]
进行四舍五入。
+---------+-------+-------+-------+-------+
| Summary | col 1 | col 2 | col 3 | col 4 |
+---------+-------+-------+-------+-------+
| min | 0 | 0.1 | 0.2 | 0.3 |
+---------+-------+-------+-------+-------+
| max | 1 | 1.1 | 1.2 | 1.3 |
+---------+-------+-------+-------+-------+
| stddev | 2 | 2.1 | 2.2 | 2.3 |
+---------+-------+-------+-------+-------+
我也尝试过切片
df.columns[1:]
,但它没有选择摘要列。
df2 = df.select(*[round(column, 2).alias(column) for column in df.columns[1:])
+-------+-------+-------+-------+
| col 4 | col 1 | col 2 | col 3 |
+-------+-------+-------+-------+
| 0.3 | 0 | 0.1 | 0.2 |
+-------+-------+-------+-------+
| 1.3 | 1 | 1.1 | 1.2 |
+-------+-------+-------+-------+
| 2.3 | 2 | 2.1 | 2.2 |
+-------+-------+-------+-------+
如果您想从舍入操作中排除第一列,您可以修改代码以有选择地将舍入操作仅应用于所需的列。您可以尝试以下方法:
columns_to_round = df.columns[1:]
rounded_df = df.selectExpr("Summary", *[f"round({column}, 2) as {column}" for column in columns_to_round])