我试图在我的seaborn Barplot 中突出显示特定的城市值(分类),但每次我向它提供 x 条件时,它都会突出显示错误的栏。例如下面 - 我试图突出显示洛杉矶,但它突出显示了旧金山
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
from statsmodels.formula.api import ols
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import LogisticRegression
from sklearn import metrics
<h3> Part A: What Predicts the Long-term Home Price Appreciation of a City? </h3>
# Set up directory for data imports
os.chdir("***")
df = pd.read_excel('W02b_homeprice.xlsx')
# NOTE: Data was modified to remove "(Metropolitan Statistical Area)from entire MSA column for easier data manipulation"
Plotting Correlation Chart for all the variables
Replicating Figure 1: Single-Family Home Price Appreciation from 1995Q1 to 2012Q3 for the 30 Largest Metropolitan Areas in the U.S.
df30 = df[df['pop03']>=1700000]
plt.figure(figsize=(30,20))
plt.title("Single-Family Home Price Appreciation from 1995Q1 to 2012Q3 for the 30 Largest Metropolitan Areas in the U.S.", fontsize = 30)
cols = ['red' if x == "Los Angeles" else 'green' for x in df30.MSA]
# NOTE: Return to this - something is off with the x that is being highlighted!
sns.barplot(x="MSA", y="homepriceg", data=df30, palette = cols,
order=df30.sort_values("homepriceg", ascending=False).MSA)
plt.xlabel("")
plt.ylabel("%", size=30)
plt.xticks(fontsize=20, rotation=60)
plt.yticks(fontsize=20)
sns.set_style("whitegrid")
plt.show()
如您所见 - 我的代码当前突出显示“旧金山”栏与“洛杉矶”栏,不确定我做错了什么。我尝试过其他状态,但它仍然突出显示错误的状态。这就是使调试变得混乱的原因。刚开始使用 seaborn 和 python。
您正在使用一种旧的、已弃用的方式来分配颜色。在当前的seaborn版本中,您应该使用
hue
列进行着色。如果颜色取决于 x 值,您可以设置 hue='MSA'
(与 x=
相同)。您可以创建一个将城市名称映射到颜色的字典,而不是将颜色作为列表给出。
在您的代码中,条形图是从左到右着色的,但是您从
sort_values()
获得了条形图的顺序,而颜色使用了城市的原始顺序。
默认情况下,x 刻度标签通过标签文本的中心对齐。右对齐看起来更好。
plt.tight_layout()
重新计算空白以很好地适应所有标签。
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
# create some dummy test data similar to given one
cities =['Long Beach', 'Irvine', 'San Diego', 'Ontario', 'San Jose', 'San Francisco', 'Anaheim', 'Glendale', 'Huntington Beach', 'Chula Vista', 'Riverside', 'Rancho Cucamonga', 'Fontana', 'San Bernardino', 'Modesto', 'Santa Clarita', 'Santa Ana', 'Fresno', 'Fremont', 'Bakersfield', 'Oxnard', 'Oceanside', 'Stockton', 'Los Angeles', 'Elk Grove', 'Santa Rosa', 'Moreno Valley', 'Oakland', 'Sacramento', 'Garden Grove']
df30 = pd.DataFrame({'MSA': cities, 'homepriceg': np.random.randint(10000, 100000, len(cities))})
sns.set_style("whitegrid") # should be called before creating the figure
plt.figure(figsize=(12, 8))
colors = {x: 'crimson' if x == 'Los Angeles' else 'limegreen' for x in df30['MSA']}
sns.barplot(x='MSA', y='homepriceg', hue='MSA', data=df30, palette=colors,
order=df30.sort_values('homepriceg', ascending=False)['MSA'])
plt.xticks(fontsize=12, rotation=60, ha='right')
plt.yticks(fontsize=12)
plt.xlabel('')
plt.tight_layout() # fit labels nicely into the plot
plt.show()