使用 Google Sheet API 读取整数和浮点数值数据

Question

我正在使用 Google Sheets API 从数字使用欧洲区域设置的工作表中获取数据。输入的 Google 表格如下所示：

product_price 展示次数 点击次数 ctr avg_click_price Total_spent 订单    
2296,00 2184 117 5,36 12,63 1478,20 3

然而，当我使用

worksheet.get_all_records()

获取数据并在 Python 中处理它时，数字会被误解。例如，我得到：

product_price 展示次数 点击次数 ctr avg_click_price Total_spent 订单   
229600 2184 117 536 1263 147820 3

这是我处理数据的代码部分：

sheet_data = worksheet.get_all_records()

# Processing integer columns
for col in integer_columns:
    if col in df.columns:
        logger.info(f"Processing integer column {col}...")
        df[col] = (
            df[col]
            .astype(str)  # Convert to string
            .str.replace("\u00A0", "")  # Remove non-breaking spaces
            .str.replace(" ", "")  # Remove regular spaces
            .str.extract(r"(\d+,.-)")  # Keep only digits
            .apply(locale.atof)  # Convert to float based on locale
        )

我怀疑这个问题与如何解析带有逗号和点的数字（例如，

2296,00

）有关。 pl_PL 区域设置中的 Google 表格。区域设置似乎被忽略，并且对于浮点数，数字乘以 100。

如何使用 Python 正确解析和处理这种格式的浮点数和整数，以便输出与原始值匹配，而不为整数和浮点数创建两个循环？

Answer 1

我尝试使用 babel

parse_decimal

将数字转换为十进制。它还有各种转换货币等的选项，您可以根据需要尝试。

这是示例示例

from babel.numbers import parse_decimal
import pandas as pd

data = {
    'product_price': ['2296,00'],
    'impressions': [2184],
    'clicks': [117],
    'ctr': ['5,36'],
    'avg_click_price': ['12,63'],
    'total_spent': ['1478,20'],
    'orders': [3]
}

df = pd.DataFrame(data)
def convert_european_with_babel(df, locale='pl_PL'):
    for col in df.columns:
        if df[col].dtype == 'object': 
            df[col] = df[col].apply(lambda x: parse_decimal(x, locale=locale)) 
    return df

df_cleaned_babel = convert_european_with_babel(df)
print(df_cleaned_babel)

输出

  product_price  impressions  clicks   ctr avg_click_price total_spent  orders
0       2296.00         2184     117  5.36           12.63     1478.20       3

使用 Google Sheet API 读取整数和浮点数值数据

问题描述投票：0回答：1

1个回答

最新问题

使用 Google Sheet API 读取整数和浮点数值数据

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1