sns 条形图条件突出显示错误值

问题描述 投票:0回答:1

我试图在我的seaborn Barplot 中突出显示特定的城市值(分类),但每次我向它提供 x 条件时,它都会突出显示错误的栏。例如下面 - 我试图突出显示洛杉矶,但它突出显示了旧金山

    import os
    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sns
    import statsmodels.api as sm 
    from statsmodels.formula.api import ols
    from sklearn.linear_model import LinearRegression
    from sklearn.linear_model import LogisticRegression
    from sklearn import metrics
    
    <h3> Part A: What Predicts the Long-term Home Price Appreciation of a City? </h3>
    
    # Set up directory for data imports
    os.chdir("***")
    df = pd.read_excel('W02b_homeprice.xlsx') 
    # NOTE: Data was modified to remove "(Metropolitan Statistical Area)from entire MSA column for easier data manipulation"
    
    Plotting Correlation Chart for all the variables
    
    Replicating Figure 1: Single-Family Home Price Appreciation from 1995Q1 to 2012Q3 for the 30 Largest Metropolitan Areas in the U.S.
    
    df30 = df[df['pop03']>=1700000]
    plt.figure(figsize=(30,20))
    plt.title("Single-Family Home Price Appreciation from 1995Q1 to 2012Q3 for the 30 Largest Metropolitan Areas in the U.S.", fontsize = 30)
    cols = ['red' if x == "Los Angeles" else 'green' for x in df30.MSA]
     # NOTE: Return to this - something is off with the x that is being highlighted!
    sns.barplot(x="MSA", y="homepriceg", data=df30, palette = cols, 
                order=df30.sort_values("homepriceg", ascending=False).MSA) 
    plt.xlabel("")
    plt.ylabel("%", size=30)
    plt.xticks(fontsize=20, rotation=60)
    plt.yticks(fontsize=20)
    sns.set_style("whitegrid")
    plt.show()

如您所见 - 我的代码当前突出显示“旧金山”栏与“洛杉矶”栏,不确定我做错了什么。我尝试过其他状态,但它仍然突出显示错误的状态。这就是使调试变得混乱的原因。刚开始使用 seaborn 和 python。

突出显示错误的城市

python python-3.x seaborn
1个回答
0
投票

您正在使用一种旧的、已弃用的方式来分配颜色。在当前的seaborn版本中,您应该使用

hue
列进行着色。如果颜色取决于 x 值,您可以设置
hue='MSA'
(与
x=
相同)。您可以创建一个将城市名称映射到颜色的字典,而不是将颜色作为列表给出。

在您的代码中,条形图是从左到右着色的,但是您从

sort_values()
获得了条形图的顺序,而颜色使用了城市的原始顺序。

默认情况下,x 刻度标签通过标签文本的中心对齐。右对齐看起来更好。

plt.tight_layout()
重新计算空白以很好地适应所有标签。

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

# create some dummy test data similar to given one
cities =['Long Beach', 'Irvine', 'San Diego', 'Ontario', 'San Jose', 'San Francisco', 'Anaheim', 'Glendale', 'Huntington Beach', 'Chula Vista', 'Riverside', 'Rancho Cucamonga', 'Fontana', 'San Bernardino', 'Modesto', 'Santa Clarita', 'Santa Ana', 'Fresno', 'Fremont', 'Bakersfield', 'Oxnard', 'Oceanside', 'Stockton', 'Los Angeles', 'Elk Grove', 'Santa Rosa', 'Moreno Valley', 'Oakland', 'Sacramento', 'Garden Grove']
df30 = pd.DataFrame({'MSA': cities, 'homepriceg': np.random.randint(10000, 100000, len(cities))})

sns.set_style("whitegrid") # should be called before creating the figure
plt.figure(figsize=(12, 8))
colors = {x: 'crimson' if x == 'Los Angeles' else 'limegreen' for x in df30['MSA']}
sns.barplot(x='MSA', y='homepriceg', hue='MSA', data=df30, palette=colors,
            order=df30.sort_values('homepriceg', ascending=False)['MSA'])
plt.xticks(fontsize=12, rotation=60, ha='right')
plt.yticks(fontsize=12)
plt.xlabel('')
plt.tight_layout() # fit labels nicely into the plot
plt.show()

sns.barplot with one bar highlighted

© www.soinside.com 2019 - 2024. All rights reserved.