我想绘制具有不同聚类的数据集。
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import sklearn.cluster
rng = np.random.default_rng(seed=5)
df_1_3 = pd.DataFrame(rng.normal(loc=(1, 3), size=(30, 2), scale=0.50), columns=["x", "y"])
df_5_1 = pd.DataFrame(rng.normal(loc=(5, 1), size=(30, 2), scale=0.25), columns=["x", "y"])
df_5_5 = pd.DataFrame(rng.normal(loc=(5, 5), size=(30, 2), scale=0.25), columns=["x", "y"])
df = pd.concat([df_1_3, df_5_1, df_5_5], keys=["df_1_3", "df_5_1", "df_5_5"])
聚类算法将计算聚类标签:
model = sklearn.cluster.AgglomerativeClustering(...)
df["cluster"] = model.fit_predict(df[["x", "y"]]) # [0, 0, 0, ... 1, 1, 1 ... 2, 2, 2]
df["cluster"] = df["cluster"].astype("category")
我想在一张图中可视化数据。每个原始数据应通过单独的标记进行区分,并且标签应通过颜色可视化。
实际上我几乎得到了结果:
fig, ax = plt.subplots()
for marker, (name, sdf) in zip(["o", "s", "^", "d"], df.groupby(level=0)):
sdf.plot.scatter(x="x", y="y", c="cluster", marker=marker, cmap="viridis", ax=ax)
如何去掉多余的颜色条?我努力了好几个小时
我尝试用这个来模拟你的代码。
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
rng = np.random
df_1_3 = pd.DataFrame(rng.normal(loc=(1, 3), size=(30, 2), scale=0.50), columns=["x", "y"])
df_5_1 = pd.DataFrame(rng.normal(loc=(5, 1), size=(30, 2), scale=0.25), columns=["x", "y"])
df_5_5 = pd.DataFrame(rng.normal(loc=(5, 5), size=(30, 2), scale=0.25), columns=["x", "y"])
df_1_3["cluster"] = "0"
df_5_1["cluster"] = "1"
df_5_5["cluster"] = "2"
df = pd.concat([df_1_3, df_5_1, df_5_5], keys=["df_1_3", "df_5_1", "df_5_5"])
df["cluster"] = df["cluster"].astype("category")
这是满足您需要的部分:
fig, ax = plt.subplots()
scatter = ax.scatter(df["x"], df["y"], c=df["cluster"].cat.codes, cmap="viridis")
plt.colorbar(scatter, ax=ax, label='Cluster')
plt.show()
这给出了: