在 pandas 数据框的箱线图上显示数据点

问题描述 投票:0回答:1

我正在模拟论文的结果,需要在 pandas 数据框的箱线图上显示数据点。对应论文的图片是这样的:

enter image description here

我尝试使用

df.boxplot
绘制箱线图,并使用
plt.scatter
绘制散点图。但结果是错误的。看来散点图向左移动了一个格!我该如何解决这个问题?

df_encoded.boxplot(column = columns, rot=90, fontsize=15)
newdata = pd.DataFrame(ndarrX_train[0].reshape(1,-1), columns = columns)
plt.scatter(columns, newdata, c='r')

enter image description here

我的数据与论文的数据不同。为了绘制一个数据点,我创建了一个包含一行的数据框。如果有人能帮助我解决这个问题,我将不胜感激。

python pandas dataframe scatter-plot boxplot
1个回答
0
投票

我尝试将它们的 x 坐标向右移动 1:

plt.scatter(i + 1, q3_values[col],...)

我尝试在选定的特征上重现类似的数据集:

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_openml
from sklearn.preprocessing import LabelEncoder

# Load the Titanic dataset
titanic = fetch_openml(name="titanic", version=1, as_frame=True)
df = titanic.frame

# Load and preprocess Titanic dataset
titanic = fetch_openml(name="titanic", version=1, as_frame=True)
df = titanic.frame
df = df[['pclass', 'sex', 'age', 'sibsp', 'parch', 'fare', 'embarked', 'survived']]

# Select relevant columns and handle missing values
# Use .loc to avoid SettingWithCopyWarning and ensure correct dtype assignment by casting explicitly to right type float, str,...
df.loc[:, 'age']      = df['age'].fillna(df['age'].median()).astype(float)              
df.loc[:, 'embarked'] = df['embarked'].fillna(df['embarked'].mode()[0]).astype(str) 
df.loc[:, 'fare']     = df['fare'].fillna(df['fare'].median()).astype(float) 

# Encode categorical features
le = LabelEncoder()
for col in ['sex', 'embarked']:
    df.loc[:, col] = le.fit_transform(df[col].astype(str))

# Create boxplot with Q3 markers
columns = ['age', 'fare', 'pclass', 'sibsp', 'parch']
df_encoded = df[columns].copy()
df_encoded.boxplot(column=columns, rot=90, fontsize=15, figsize=(10, 6))

# Calculate Q3 for each column for plotting red crosses
q3_values = df_encoded.quantile(0.75)

# Plot Q3 as red crosses. Using .loc for correct indexing
for i, col in enumerate(columns):
  plt.scatter(i + 1, q3_values.loc[col], marker='x', color='red', s=100) # ===> Shifted x-coordinate by 1

plt.title('Boxplot and Q3 Markers')
plt.show()

输出图:

img

© www.soinside.com 2019 - 2024. All rights reserved.