找出散点图中最密集区域的中心

问题描述 投票:0回答:1

我有一组 x 和 y 坐标。我想绘制它们并找出哪个区域绘制密集。我想找到该区域中心的坐标。

我编写了一个用于在 x-y 平面上绘制点的代码。

import numpy as np
import matplotlib.pyplot as plt
x = np.array([111,152,153,14,155,156,153,154,153,152,154,151,200,201,200])
y = np.array([112,151,153,12,153,154,155,153,152,153,152,153,202,202,204])
plt.scatter(x,y)
plt.show()

但是,我无法找到密集绘制区域的中心。有人可以帮我吗?

python-3.x matplotlib seaborn
1个回答
10
投票

首先,您需要区分(并确定)密度聚类。您的问题没有明确定义您想要什么,所以我假设您想要评估找到“最大密度”的点之间的位置。老实说,我想要完成一些令人耳目一新的任务,因此在这里选择了密度选项;) “密度”一词本身并没有明确规定,因为有多种方法可以评估/估计/计算点云密度。我建议使用 KDE(内核密度估计器),因为它很容易满足您的需求,并且允许交换内核(平方、线性、高斯、余弦以及任何其他适用的内核)。

注意:

完整代码如下;我现在将用较小的片段逐步完成它。 预赛

从您提供的点开始,以下代码片段在笛卡尔坐标中计算适当的均匀间隔网格:

RESOLUTION = 50 LOCALITY = 2.0 dx = max(pts_x) - min(pts_x) dy = max(pts_y) - min(pts_y) delta = min(dx, dy) / RESOLUTION nx = int(dx / delta) ny = int(dy / delta) radius = (1 / LOCALITY) * min(dx, dy) grid_x = np.linspace(min(pts_x), max(pts_x), num=nx) grid_y = np.linspace(min(pts_y), max(pts_y), num=ny) x, y = np.meshgrid(grid_x, grid_y, indexing='ij')

如果您愿意,您可以通过绘制
plt.scatter(grid_x, grid_y)

轻松检查它。所有这些初步计算只是确保手头有所有必需的值:我希望能够为 KDE 内核扩展指定某种

resolution
locality 设置;此外,我们还需要x
y
方向上的最大点距离来生成网格,计算步长
delta
,计算x和y方向上的网格单元数,分别为
nx
ny
,以及根据我们的位置设置计算内核
radius
核密度估计

为了使用内核估计点密度,我们确实需要下一个片段中概述的两个函数

def gauss(x1, x2, y1, y2): """ Apply a Gaussian kernel estimation (2-sigma) to distance between points. Effectively, this applies a Gaussian kernel with a fixed radius to one of the points and evaluates it at the value of the euclidean distance between the two points (x1, y1) and (x2, y2). The Gaussian is transformed to roughly (!) yield 1.0 for distance 0 and have the 2-sigma located at radius distance. """ return ( (1.0 / (2.0 * math.pi)) * math.exp( -1 * (3.0 * math.sqrt((x1 - x2)**2 + (y1 - y2)**2) / radius))**2 / 0.4) def _kde(x, y): """ Estimate the kernel density at a given position. Simply sums up all the Gaussian kernel values towards all points (pts_x, pts_y) from position (x, y). """ return sum([ gauss(x, px, y, py) # math.sqrt((x - px)**2 + (y - py)**2) for px, py in zip(pts_x, pts_y) ])

第一个,
gauss

同时执行两个任务:它采用由

x1, x2, y1, y2
定义的两个点,计算它们的欧几里德距离,并使用该距离来评估高斯核的函数值。当距离为
1.0
(或
非常
小)时,高斯核被变换为近似产生
0
,并将其2-sigma固定为先前计算的radius。该特性由上述
locality
设置控制。
确定最大值

幸运的是,

numpy

提供了一些简洁的辅助函数,可以将任意Python函数应用于向量和矩阵,因此计算非常简单:

kde = np.vectorize(_kde)  # Let numpy care for applying our kde to a vector
z = kde(x, y)

xi, yi = np.where(z == np.amax(z))
max_x = grid_x[xi][0]
max_y = grid_y[yi][0]
print(f"{max_x:.4f}, {max_y:.4f}")

在您的情况下(给定高斯内核设置和我的网格假设),最大密度为

155.2041, 154.0800

绘图

第一张图片中显示了您的点云(蓝色十字)和已识别的最大值(红十字)。第二张图显示了高斯 KDE 使用第一个代码片段中的设置计算出的估计密度。

完整代码

import math import matplotlib.pyplot as plt import numpy as np pts_x = np.array([ 111, 152, 153, 14, 155, 156, 153, 154, 153, 152, 154, 151, 200, 201, 200]) pts_y = np.array([ 112, 151, 153, 12, 153, 154, 155, 153, 152, 153, 152, 153, 202, 202, 204]) RESOLUTION = 50 LOCALITY = 2.0 dx = max(pts_x) - min(pts_x) dy = max(pts_y) - min(pts_y) delta = min(dx, dy) / RESOLUTION nx = int(dx / delta) ny = int(dy / delta) radius = (1 / LOCALITY) * min(dx, dy) grid_x = np.linspace(min(pts_x), max(pts_x), num=nx) grid_y = np.linspace(min(pts_y), max(pts_y), num=ny) x, y = np.meshgrid(grid_x, grid_y) def gauss(x1, x2, y1, y2): """ Apply a Gaussian kernel estimation (2-sigma) to distance between points. Effectively, this applies a Gaussian kernel with a fixed radius to one of the points and evaluates it at the value of the euclidean distance between the two points (x1, y1) and (x2, y2). The Gaussian is transformed to roughly (!) yield 1.0 for distance 0 and have the 2-sigma located at radius distance. """ return ( (1.0 / (2.0 * math.pi)) * math.exp( -1 * (3.0 * math.sqrt((x1 - x2)**2 + (y1 - y2)**2) / radius))**2 / 0.4) def _kde(x, y): """ Estimate the kernel density at a given position. Simply sums up all the Gaussian kernel values towards all points (pts_x, pts_y) from position (x, y). """ return sum([ gauss(x, px, y, py) # math.sqrt((x - px)**2 + (y - py)**2) for px, py in zip(pts_x, pts_y) ]) kde = np.vectorize(_kde) # Let numpy care for applying our kde to a vector z = kde(x, y) xi, yi = np.where(z == np.amax(z)) max_x = grid_x[xi][0] max_y = grid_y[yi][0] print(f"{max_x:.4f}, {max_y:.4f}") fig, ax = plt.subplots() ax.pcolormesh(x, y, z, cmap='inferno', vmin=np.min(z), vmax=np.max(z)) fig.set_size_inches(4, 4) fig.savefig('density.png', bbox_inches='tight') fig, ax = plt.subplots() ax.scatter(pts_x, pts_y, marker='+', color='blue') ax.scatter(grid_x[xi], grid_y[yi], marker='+', color='red', s=200) fig.set_size_inches(4, 4) fig.savefig('marked.png', bbox_inches='tight')

© www.soinside.com 2019 - 2024. All rights reserved.