我有 4 个带有多个多边形的 KML 文件。我想解析 KML 文件,提取数据,然后将其存储到我的数据库中。经过研究,我发现解析 KML 文件的最佳方法是安装 pyKML。
我的一个 KML 文件如下所示:
<?xml version="1.0" encoding="UTF-8"?>
<kml xmlns="http://www.opengis.net/kml/2.2" xmlns:gx="http://www.google.com/kml/ext/2.2" xmlns:kml="http://www.opengis.net/kml/2.2" xmlns:atom="http://www.w3.org/2005/Atom">
<Document>
<name>RecAreaPolygons.TAB</name>
<Schema name="RecAreaPolygons" id="S_RecAreaPolygons_SSSS">
<SimpleField type="string" name="RecAreaName"><displayName><b>RecAreaName</b></displayName>
</SimpleField>
<SimpleField type="string" name="RecAreaCategory"><displayName><b>RecAreaCategory</b></displayName>
</SimpleField>
<SimpleField type="string" name="Province"><displayName><b>Province</b></displayName>
</SimpleField>
<SimpleField type="string" name="Comments"><displayName><b>Comments</b></displayName>
</SimpleField>
</Schema>
<Style id="style1">
<BalloonStyle>
<text><![CDATA[<table border="0">
<tr><td><b>RecAreaName</b></td><td>$[RecAreaPolygons/RecAreaName]</td></tr>
<tr><td><b>RecAreaCategory</b></td><td>$[RecAreaPolygons/RecAreaCategory]</td></tr>
<tr><td><b>Province</b></td><td>$[RecAreaPolygons/Province]</td></tr>
<tr><td><b>Comments</b></td><td>$[RecAreaPolygons/Comments]</td></tr>
</table>
]]></text>
</BalloonStyle>
<PolyStyle>
<color>ff00ff00</color>
</PolyStyle>
</Style>
<Style id="falseColor">
<BalloonStyle>
<text><![CDATA[<table border="0">
<tr><td><b>RecAreaName</b></td><td>$[RecAreaPolygons/RecAreaName]</td></tr>
<tr><td><b>RecAreaCategory</b></td><td>$[RecAreaPolygons/RecAreaCategory]</td></tr>
<tr><td><b>Province</b></td><td>$[RecAreaPolygons/Province]</td></tr>
<tr><td><b>Comments</b></td><td>$[RecAreaPolygons/Comments]</td></tr>
</table>
]]></text>
</BalloonStyle>
<PolyStyle>
<colorMode>random</colorMode>
</PolyStyle>
</Style>
<Folder id="layer 0">
<name>RecAreaPolygons</name>
<Placemark>
<name>Whistler</name>
<styleUrl>#falseColor</styleUrl>
<Style id="inline">
<IconStyle>
<color>ff0000ff</color>
<colorMode>normal</colorMode>
</IconStyle>
<LineStyle>
<color>ff0000ff</color>
<colorMode>normal</colorMode>
</LineStyle>
<PolyStyle>
<color>ff0000ff</color>
<colorMode>normal</colorMode>
</PolyStyle>
</Style>
<ExtendedData>
<SchemaData schemaUrl="#S_RecAreaPolygons_SSSS">
<SimpleData name="RecAreaName">Whistler</SimpleData>
<SimpleData name="RecAreaCategory">World Class</SimpleData>
<SimpleData name="Province">BC</SimpleData>
<SimpleData name="Comments"></SimpleData>
</SchemaData>
</ExtendedData>
<Polygon>
<outerBoundaryIs>
<LinearRing>
<coordinates>
-123.052382,50.094969,0 -123.050613,50.07531199999999,0 -123.029976,50.05263099999998,0 -122.955094,50.045827,0 -122.909104,50.05565599999998,0 -122.869599,50.07871399999998,0 -122.835991,50.10895600000001,0 -122.826557,50.152805,0 -122.78496,50.26872300000001,0 -122.923014,50.26576299999998,0 -122.939174,50.18569200000002,0 -122.979858,50.17057199999998,0 -123.012877,50.151293,0 -123.050613,50.12483200000001,0 -123.053561,50.104419,0 -123.052382,50.094969,0
</coordinates>
</LinearRing>
</outerBoundaryIs>
</Polygon>
</Placemark>
//MULTIPLE OTHER PLACEMARKS
正如我提到的,我的尝试是安装 pyKML,安装后,我运行以下代码将其存储到数据框中:
with open('RecAreaPolygons.kml', 'rb') as f:
s = f.read()
root = parser.fromstring(s)
print(root.Document.Folder.Placemark.Polygon.outerBoundaryIs.LinearRing.coordinates)
我可以打印第一个地标的坐标,但是如何接收其余的坐标并将其迭代地添加到数据框中?
我希望我的输出看起来像这样:
RecAreaName RecAreaCategory Province Comments Coordinates
0 Whistler World Class BC -123.052382,50.094969,0 -123.050613,50.07531199999999,0 -123.029976,50.05263099999998,0 -122.955094,50.045827,0 -122.909104,50.05565599999998,0 -122.869599,50.07871399999998,0 -122.835991,50.10895600000001,0 -122.826557,50.152805,0 -122.78496,50.26872300000001,0 -122.923014,50.26576299999998,0 -122.939174,50.18569200000002,0 -122.979858,50.17057199999998,0 -123.012877,50.151293,0 -123.050613,50.12483200000001,0 -123.053561,50.104419,0 -123.052382,50.094969,0
1 The rest of the entries
2
您可以迭代地标,将名称和几何图形添加到列表中。然后从列表中创建一个数据框。
如果 KML 有多个文件夹,那么您将需要迭代这些文件夹,然后在文件夹中添加地标。
from pykml import parser
import pandas as pd
with open('RecAreaPolygons.kml', 'r', encoding="utf-8") as f:
root = parser.parse(f).getroot()
places = []
for place in root.Document.Folder.Placemark:
data = {item.get("name"): item.text for item in
place.ExtendedData.SchemaData.SimpleData}
coords = place.Polygon.outerBoundaryIs.LinearRing.coordinates.text.strip()
data["Coordinates"] = coords
places.append(data)
df = pd.DataFrame(places)
print(df)
输出:
RecAreaName RecAreaCategory Province Comments Coordinates
0 Whistler World Class BC None -123.052382,50.094969,0, -123.050613,50.07531...
如果希望坐标成为列表,则在 strip() 调用之后将
.split(' ')
添加到循环中 coords 变量的赋值中。