polars.read_csv() 具有德语数字格式

Question

在 Polars 中是否有可能以德国数字格式读取 csv，就像在 pandas.read_csv() 中使用参数“十进制”和“千”一样可以读取

Answer 1

目前，Polars read_csv 方法不会公开这些参数。

但是，有一个简单的解决方法可以转换它们。例如，使用此 csv，允许 Polars 将德语格式的数字读取为 utf8。

import polars as pl

my_csv = b"""col1\tcol2\tcol3
1.234,5\tabc\t1.234.567
9.876\tdef\t3,21
"""
df = pl.read_csv(my_csv, separator="\t")
print(df)

shape: (2, 3)
┌─────────┬──────┬───────────┐
│ col1    ┆ col2 ┆ col3      │
│ ---     ┆ ---  ┆ ---       │
│ str     ┆ str  ┆ str       │
╞═════════╪══════╪═══════════╡
│ 1.234,5 ┆ abc  ┆ 1.234.567 │
│ 9.876   ┆ def  ┆ 3,21      │
└─────────┴──────┴───────────┘

从这里开始，转换只需几行代码：

df = df.with_columns(
    pl.col("col1", "col3")
    .str.replace_all(r"\.", "")
    .str.replace(",", ".")
    .cast(pl.Float64)  # or whatever datatype needed
)
print(df)

shape: (2, 3)
┌────────┬──────┬────────────┐
│ col1   ┆ col2 ┆ col3       │
│ ---    ┆ ---  ┆ ---        │
│ f64    ┆ str  ┆ f64        │
╞════════╪══════╪════════════╡
│ 1234.5 ┆ abc  ┆ 1.234567e6 │
│ 9876.0 ┆ def  ┆ 3.21       │
└────────┴──────┴────────────┘

请小心，仅将此逻辑应用于以德语语言环境编码的数字。它会破坏其他语言环境中格式化的数字。

Answer 2

在当前版本的极地（0.20.26）中，有一个标志：

decimal_comma

。

示例：

import polars as pl

df = pl.read_csv('foo.csv', decimal_comma=True)

提示：这不能与参数 use_pyarrow 设置为 true 结合使用。

polars.read_csv() 具有德语数字格式

问题描述投票：0回答：2

2个回答

最新问题

polars.read_csv() 具有德语数字格式

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2