我有这个数据框
transjakarta_lines = gpd.read_file('https://raw.githubusercontent.com/lokalhangatt/stackoverlow/refs/heads/main/dataviz_day13/transjakarta_lines.geojson')
transjakarta_data = pd.read_excel("https://github.com/lokalhangatt/stackoverlow/raw/refs/heads/main/dataviz_day13/TJ_Agustus_2020.xlsx", index_col=0).T.reset_index() #transposed
它显示如下:
我尝试合并这两个数据框。但它显示这样的错误:
合并
epsg_jkt = 5330
transjakarta_lines['koridor'] = transjakarta_lines['koridor'].apply(int)
transjakarta_data['koridor'] = transjakarta_data['index'].apply(int)
transjakarta = pd.merge(transjakarta_lines, transjakarta_data)
transjakarta = gpd.GeoDataFrame(transjakarta)
transjakarta.crs = transjakarta_lines.crs
transjakarta_planar = transjakarta.to_crs(epsg=epsg_jkt)
错误:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[225], line 3
1 # gabungkan data keduanya
2 transjakarta_lines['koridor'] = transjakarta_lines['koridor'].apply(int)
----> 3 transjakarta_data['koridor'] = transjakarta_data['index'].apply(int)
4 transjakarta = pd.merge(transjakarta_lines, transjakarta_data)
6 # convert kembali ke geodataframe
File c:\Program Files\Python313\Lib\site-packages\pandas\core\series.py:4924, in Series.apply(self, func, convert_dtype, args, by_row, **kwargs)
4789 def apply(
4790 self,
4791 func: AggFuncType,
(...)
4796 **kwargs,
4797 ) -> DataFrame | Series:
4798 """
4799 Invoke function on values of Series.
4800
(...)
4915 dtype: float64
4916 """
4917 return SeriesApply(
4918 self,
4919 func,
4920 convert_dtype=convert_dtype,
4921 by_row=by_row,
4922 args=args,
4923 kwargs=kwargs,
-> 4924 ).apply()
File c:\Program Files\Python313\Lib\site-packages\pandas\core\apply.py:1427, in SeriesApply.apply(self)
1424 return self.apply_compat()
1426 # self.func is Callable
-> 1427 return self.apply_standard()
File c:\Program Files\Python313\Lib\site-packages\pandas\core\apply.py:1507, in SeriesApply.apply_standard(self)
1501 # row-wise access
1502 # apply doesn't have a `na_action` keyword and for backward compat reasons
1503 # we need to give `na_action="ignore"` for categorical data.
1504 # TODO: remove the `na_action="ignore"` when that default has been changed in
1505 # Categorical (GH51645).
1506 action = "ignore" if isinstance(obj.dtype, CategoricalDtype) else None
-> 1507 mapped = obj._map_values(
1508 mapper=curried, na_action=action, convert=self.convert_dtype
1509 )
1511 if len(mapped) and isinstance(mapped[0], ABCSeries):
1512 # GH#43986 Need to do list(mapped) in order to get treated as nested
1513 # See also GH#25959 regarding EA support
1514 return obj._constructor_expanddim(list(mapped), index=obj.index)
File c:\Program Files\Python313\Lib\site-packages\pandas\core\base.py:921, in IndexOpsMixin._map_values(self, mapper, na_action, convert)
918 if isinstance(arr, ExtensionArray):
919 return arr.map(mapper, na_action=na_action)
--> 921 return algorithms.map_array(arr, mapper, na_action=na_action, convert=convert)
File c:\Program Files\Python313\Lib\site-packages\pandas\core\algorithms.py:1743, in map_array(arr, mapper, na_action, convert)
1741 values = arr.astype(object, copy=False)
1742 if na_action is None:
-> 1743 return lib.map_infer(values, mapper, convert=convert)
1744 else:
1745 return lib.map_infer_mask(
1746 values, mapper, mask=isna(values).view(np.uint8), convert=convert
1747 )
File lib.pyx:2972, in pandas._libs.lib.map_infer()
ValueError: invalid literal for int() with base 10: 'Rata-rata Harlan'
如果我不转置
transjakarta_data
并尝试apply(int)
,它会显示如下错误:
pd.read_excel("https://github.com/lokalhangatt/stackoverlow/raw/refs/heads/main/dataviz_day13/TJ_Agustus_2020.xlsx",
index_col=0).reset_index().index.apply(int)
错误:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[255], line 1
----> 1 pd.read_excel("https://github.com/lokalhangatt/stackoverlow/raw/refs/heads/main/dataviz_day13/TJ_Agustus_2020.xlsx", index_col=0).reset_index().index.apply(int)
AttributeError: 'RangeIndex' object has no attribute 'apply'
我该如何解决这个问题?
刚刚自己检查了数据,看来您可能正在尝试将索引列“index”而不是索引本身转换为名为 korridor 的新列。 转置和重置索引还会导致创建一个新列 {Rata-rata Harlan, rata-rata Weekdata, Rata-rata Weekend},这使过程进一步复杂化。 如果您希望数据按 13 行中的“Koridor”对齐(它们是 Koridor),我附上代码:
import pandas as pd
import geopandas as gpd
transjakarta_lines = gpd.read_file('https://raw.githubusercontent.com/lokalhangatt/stackoverlow/refs/heads/main/dataviz_day13/transjakarta_lines.geojson')
transjakarta_data = pd.read_excel("https://github.com/lokalhangatt/stackoverlow/raw/refs/heads/main/dataviz_day13/TJ_Agustus_2020.xlsx", index_col=0).reset_index() #transposed
transjakarta_lines['koridor'] = transjakarta_lines['koridor'].apply(int)
transjakarta = pd.merge(transjakarta_lines, transjakarta_data, on = 'koridor')
transjakarta
如果我对代码的意图有错误的理解,请纠正我。