使用颜色和数字格式来格式化数据框的正确顺序和方法是什么？

Question

我正在使用 python/panda 并在 Streamlit 下进行可视化，在本地环境下工作。

我有一个数据帧字典，每个数据帧都包含文本和数字作为字符串。

由于数字方面的多样性，我希望将数据帧格式化为更加用户友好，例如：1 M（百万）而不是 1,000,000 我还想通过根据值更改输出文本颜色来添加一些条件格式：例如，如果 ['P/E'] 为 < 25.

则为绿色

我被困住了：

如果我在添加颜色之前更改数字格式（条件格式），由于“字符串”格式（我认为），我将无法在我的值上添加样式
如果我在更改数字格式之前添加颜色，我将陷入 Pandas Styler，TypeError: 'Styler' object is not subscriptable ；颜色将是正确的，但显示值为 20.040102030 而不是 20.04

下面的代码是允许此屏幕截图的代码，这意味着具有颜色格式的值由于非数字格式而发现错误=“强制”。

您的帮助将不胜感激！

    def apply_conditional_formatting(df, column_rules):
        styled_df = df.style
        # Example: Apply green text if P/E < 25
        if 'P/E' in df.columns:
            df['P/E'] = pd.to_numeric(df['P/E'], errors='coerce')
            styled_df = styled_df.map(
                lambda val: 'color: green' if isinstance(val, (int, float)) and val < 25 else '', 
                subset=['P/E']
            )
        if 'Market Cap' in df.columns:
            df['Market Cap'] = pd.to_numeric(df['Market Cap'], errors='coerce')
            styled_df = styled_df.map(
                lambda val: 'color: green' if isinstance(val, (int, float)) and val > 1000000 else '', 
                subset=['Market Cap']
            )
        # Extend the logic to other rules here...
        return styled_df
    
    def apply_formatting_to_dataframes(dataframes, column_rules):
        # Apply conditional formatting to a dictionary of DataFrames.
        styled_dataframes = {}
    
        for section, df in dataframes.items():
            styled_dataframes[section] = apply_conditional_formatting(df, column_rules)
    
        return styled_dataframes

    def convert_to_readable(num):
        if num is None or num == 'N/A':
            return 'N/A'
        if isinstance(num, str):
            try:
                num = float(num)  # Try converting strings to float
            except ValueError:
                return num
    
        if num >= 1_000_000_000 or num <= -1_000_000_000:
            return f'{num / 1_000_000_000:.2f}B'
        elif num >= 1_000_000 or num <= -1_000_000:
            return f'{num / 1_000_000:.2f}M'
        elif num >= 1_000 or num <= -1_000:
            return f'{num / 1_000:.2f}K'
        else:
            return f'{num:.2f}'
    
    def convert_to_readable_dataframes(df):
        # Convert numeric values to a more readable format in the output df
        for column in df.columns:
            df[column] = df[column].apply(lambda x: convert_to_readable(x))
        return df

    for section, df in financial_sections_dataframes.items():
        readable_df = convert_to_readable_dataframes(df)
        formatted_df = apply_conditional_formatting(readable_df, column_rules)
#

提前感谢您提供指导和解决方案，

example_df = pd.DataFrame({
    "Country": ["Netherlands", "France", "Luxembourg", "France"],
    "Market Cap": [142307622912, 1230484753, 12947592845, 987462847],
    "P/E": [33.66, 21.14, 22.87, 7.45],
    "Price": [131.28, 19.80, 22.76, 0.68],
    "Change": [-0.03, -0.02, -0.01, -0.01],
    "Volume": ["1091234", "326568", "629141", "400476"]
})

# Assuming this would be part of a larger dictionary
dataframes = {
    "Example Section": example_df
}

Answer 1

Pandas 区分显示值（您在可视化中看到的内容）和实际值（实际基础数据）。我们希望将数据存储为原始数字，并且仅通过 Styler 界面更改显示值：

# Restored volume back to ints
example_df = pd.DataFrame({
    "Country": ["Netherlands", "France", "Luxembourg", "France"],
    "Market Cap": [142307622912, 1230484753, 12947592845, 987462847],
    "P/E": [33.66, 21.14, 22.87, 7.45],
    "Price": [131.28, 19.80, 22.76, 0.68],
    "Change": [-0.03, -0.02, -0.01, -0.01],
    "Volume": [1091234, 326568, 629141, 400476]
})

def short_form(val):
    if not val:
        return "N/A"
    
    if val > 1e9:
        return f"{val/1e9:.2f}B"
    elif val > 1e6:
        return f"{val/1e6:.2f}M"
    elif val > 1000:
        return f"{val/1000:.2f}K"

def highlight(col, threshold = 25):
    return ["color: green" if val > threshold else "" for val in col]

example_df.style.apply(highlight, subset = "P/E", threshold = 25).format(short_form, subset = "Market Cap").format(precision = 2, subset = ["P/E", "Price", "Change"])

同时

example_df.values

仍然保留原始数据，允许进一步操作：

array([['Netherlands', 142307622912, 33.66, 131.28, -0.03, 1091234],
       ['France', 1230484753, 21.14, 19.8, -0.02, 326568],
       ['Luxembourg', 12947592845, 22.87, 22.76, -0.01, 629141],
       ['France', 987462847, 7.45, 0.68, -0.01, 400476]], dtype=object)

使用颜色和数字格式来格式化数据框的正确顺序和方法是什么？

问题描述投票：0回答：1

1个回答

最新问题

使用颜色和数字格式来格式化数据框的正确顺序和方法是什么？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1