我有一组 x 和 y 坐标。我想绘制它们并找出哪个区域绘制密集。我想找到该区域中心的坐标。
我编写了一个用于在 x-y 平面上绘制点的代码。
import numpy as np
import matplotlib.pyplot as plt
x = np.array([111,152,153,14,155,156,153,154,153,152,154,151,200,201,200])
y = np.array([112,151,153,12,153,154,155,153,152,153,152,153,202,202,204])
plt.scatter(x,y)
plt.show()
但是,我无法找到密集绘制区域的中心。有人可以帮我吗?
首先,您需要区分(并确定)密度和聚类。您的问题没有明确定义您想要什么,所以我假设您想要评估找到“最大密度”的点之间的位置。老实说,我想要完成一些令人耳目一新的任务,因此在这里选择了密度选项;) “密度”一词本身并没有明确规定,因为有多种方法可以评估/估计/计算点云密度。我建议使用 KDE(内核密度估计器),因为它很容易满足您的需求,并且允许交换内核(平方、线性、高斯、余弦以及任何其他适用的内核)。
注意:完整代码如下;我现在将用较小的片段逐步完成它。 预赛
RESOLUTION = 50
LOCALITY = 2.0
dx = max(pts_x) - min(pts_x)
dy = max(pts_y) - min(pts_y)
delta = min(dx, dy) / RESOLUTION
nx = int(dx / delta)
ny = int(dy / delta)
radius = (1 / LOCALITY) * min(dx, dy)
grid_x = np.linspace(min(pts_x), max(pts_x), num=nx)
grid_y = np.linspace(min(pts_y), max(pts_y), num=ny)
x, y = np.meshgrid(grid_x, grid_y, indexing='ij')
如果您愿意,您可以通过绘制
plt.scatter(grid_x, grid_y)
轻松检查它。所有这些初步计算只是确保手头有所有必需的值:我希望能够为 KDE 内核扩展指定某种
resolution和 locality 设置;此外,我们还需要
x
和y
方向上的最大点距离来生成网格,计算步长delta
,计算x和y方向上的网格单元数,分别为nx
和ny
,以及根据我们的位置设置计算内核radius
。核密度估计
def gauss(x1, x2, y1, y2):
"""
Apply a Gaussian kernel estimation (2-sigma) to distance between points.
Effectively, this applies a Gaussian kernel with a fixed radius to one
of the points and evaluates it at the value of the euclidean distance
between the two points (x1, y1) and (x2, y2).
The Gaussian is transformed to roughly (!) yield 1.0 for distance 0 and
have the 2-sigma located at radius distance.
"""
return (
(1.0 / (2.0 * math.pi))
* math.exp(
-1 * (3.0 * math.sqrt((x1 - x2)**2 + (y1 - y2)**2) / radius))**2
/ 0.4)
def _kde(x, y):
"""
Estimate the kernel density at a given position.
Simply sums up all the Gaussian kernel values towards all points
(pts_x, pts_y) from position (x, y).
"""
return sum([
gauss(x, px, y, py)
# math.sqrt((x - px)**2 + (y - py)**2)
for px, py in zip(pts_x, pts_y)
])
第一个,
gauss
同时执行两个任务:它采用由
x1, x2, y1, y2
定义的两个点,计算它们的欧几里德距离,并使用该距离来评估高斯核的函数值。当距离为1.0
(或非常小)时,高斯核被变换为近似产生
0
,并将其2-sigma固定为先前计算的radius
。该特性由上述 locality
设置控制。确定最大值
numpy
提供了一些简洁的辅助函数,可以将任意Python函数应用于向量和矩阵,因此计算非常简单:
kde = np.vectorize(_kde) # Let numpy care for applying our kde to a vector
z = kde(x, y)
xi, yi = np.where(z == np.amax(z))
max_x = grid_x[xi][0]
max_y = grid_y[yi][0]
print(f"{max_x:.4f}, {max_y:.4f}")
在您的情况下(给定高斯内核设置和我的网格假设),最大密度为
155.2041, 154.0800
绘图
import math
import matplotlib.pyplot as plt
import numpy as np
pts_x = np.array([
111, 152, 153, 14, 155, 156, 153, 154, 153, 152, 154, 151, 200, 201, 200])
pts_y = np.array([
112, 151, 153, 12, 153, 154, 155, 153, 152, 153, 152, 153, 202, 202, 204])
RESOLUTION = 50
LOCALITY = 2.0
dx = max(pts_x) - min(pts_x)
dy = max(pts_y) - min(pts_y)
delta = min(dx, dy) / RESOLUTION
nx = int(dx / delta)
ny = int(dy / delta)
radius = (1 / LOCALITY) * min(dx, dy)
grid_x = np.linspace(min(pts_x), max(pts_x), num=nx)
grid_y = np.linspace(min(pts_y), max(pts_y), num=ny)
x, y = np.meshgrid(grid_x, grid_y)
def gauss(x1, x2, y1, y2):
"""
Apply a Gaussian kernel estimation (2-sigma) to distance between points.
Effectively, this applies a Gaussian kernel with a fixed radius to one
of the points and evaluates it at the value of the euclidean distance
between the two points (x1, y1) and (x2, y2).
The Gaussian is transformed to roughly (!) yield 1.0 for distance 0 and
have the 2-sigma located at radius distance.
"""
return (
(1.0 / (2.0 * math.pi))
* math.exp(
-1 * (3.0 * math.sqrt((x1 - x2)**2 + (y1 - y2)**2) / radius))**2
/ 0.4)
def _kde(x, y):
"""
Estimate the kernel density at a given position.
Simply sums up all the Gaussian kernel values towards all points
(pts_x, pts_y) from position (x, y).
"""
return sum([
gauss(x, px, y, py)
# math.sqrt((x - px)**2 + (y - py)**2)
for px, py in zip(pts_x, pts_y)
])
kde = np.vectorize(_kde) # Let numpy care for applying our kde to a vector
z = kde(x, y)
xi, yi = np.where(z == np.amax(z))
max_x = grid_x[xi][0]
max_y = grid_y[yi][0]
print(f"{max_x:.4f}, {max_y:.4f}")
fig, ax = plt.subplots()
ax.pcolormesh(x, y, z, cmap='inferno', vmin=np.min(z), vmax=np.max(z))
fig.set_size_inches(4, 4)
fig.savefig('density.png', bbox_inches='tight')
fig, ax = plt.subplots()
ax.scatter(pts_x, pts_y, marker='+', color='blue')
ax.scatter(grid_x[xi], grid_y[yi], marker='+', color='red', s=200)
fig.set_size_inches(4, 4)
fig.savefig('marked.png', bbox_inches='tight')