我致力于历史文档的分割模型。我的数据集将文本行作为多边形,但为了进行可用的模型训练,我还需要多边形的中心线或基线。
作为示例,这些是作为多边形元组的线的坐标:
603,1220 600,1288 687,1304 691,1304 694,1304 726,1291 762,1278 801,1291 843,1307 846,1307 850,1307 924,1294 976,1285 1009,1294 1054,1310 1057,1310 1061,1310 1158,1298 1203,1291 1223,1298 1262,1314 1265,1314 1268,1314 1272,1314 1320,1298 1366,1285 1388,1301 1405,1310 1408,1310 1411,1310 1745,1323 1749,1323 1752,1323 1788,1304 1810,1294 1891,1294 1963,1307 2057,1327 2060,1327 2161,1310 2222,1301 2317,1314 2359,1317 2362,1317 2388,1314 2440,1304 2459,1314 2495,1336 2498,1336 2502,1336 2505,1336 2602,1317 2651,1307 2738,1317 2878,1336 2881,1336 2884,1336 2943,1320 2972,1314 3248,1323 3258,1262 3254,1203 603,1151 603,1220
这是上述坐标的基线:
603,1220 2138,1255 3258,1262
我尝试使用python的
shapely.centroid
来计算基线,但没有可靠的结果。
from shapely import LineString, MultiPoint, Polygon, centroid
print(centroid(Polygon([(584, 1603), (580, 1654), (649, 1680), (652, 1680), (824, 1684), (827, 1684), (830, 1684), (918, 1661), (947, 1684), (950, 1684), (954, 1684), (957, 1684), (1061, 1671), (1106, 1687), (1109, 1687), (1113, 1687), (1116, 1687), (1187, 1664), (1343, 1690), (1346, 1690), (1732, 1671), (1758, 1671), (1797, 1671), (2158, 1693), (2161, 1693), (2164, 1693), (2193, 1677), (2196, 1677), (2239, 1700), (2242, 1700), (2245, 1700), (2248, 1700), (2320, 1680), (2323, 1680), (2404, 1703), (2407, 1703), (2411, 1703), (2667, 1687), (2712, 1684), (2719, 1687), (2771, 1710), (2774, 1710), (2777, 1710), (3261, 1693), (3267, 1638), (3264, 1570), (584, 1538), (584, 1603)])))
结果:
POINT (1920.3383138928723 1620.2116469627338)
你可以查看pygeoops。它有一个计算多边形中心线的函数:pygeoops.centerline。
您可以使用
centerline
的参数来微调短分支的去除、结果的简化......以优化结果的“清理”。
示例脚本:
import pygeoops
import shapely
poly = shapely.from_wkt("POLYGON ((0 0, 0 8, -2 10, 4 10, 2 8, 2 2, 10 2, 10 0, 0 0))")
centerline = pygeoops.centerline(poly)
# Centerlines for polygons in a geopandas GeoDataFrame
gdf = gpd.GeoDataFrame([{"name": "L-shape", "geometry": poly}], crs="epsg:31370")
gdf.geometry = pygeoops.centerline(gdf.geometry)
结果:
免责声明:我是 pygeoops 的开发者