找不到数据框列

问题描述 投票:0回答:1

我有这个数据框

transjakarta_lines = gpd.read_file('https://raw.githubusercontent.com/lokalhangatt/stackoverlow/refs/heads/main/dataviz_day13/transjakarta_lines.geojson')
transjakarta_data = pd.read_excel("https://github.com/lokalhangatt/stackoverlow/raw/refs/heads/main/dataviz_day13/TJ_Agustus_2020.xlsx", index_col=0).T.reset_index() #transposed

它显示如下: 我尝试合并这两个数据框。但它显示这样的错误:
合并

epsg_jkt = 5330

transjakarta_lines['koridor'] = transjakarta_lines['koridor'].apply(int)
transjakarta_data['koridor'] = transjakarta_data['index'].apply(int)
transjakarta = pd.merge(transjakarta_lines, transjakarta_data)

transjakarta = gpd.GeoDataFrame(transjakarta)

transjakarta.crs = transjakarta_lines.crs

transjakarta_planar = transjakarta.to_crs(epsg=epsg_jkt)

错误:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[225], line 3
      1 # gabungkan data keduanya
      2 transjakarta_lines['koridor'] = transjakarta_lines['koridor'].apply(int)
----> 3 transjakarta_data['koridor'] = transjakarta_data['index'].apply(int)
      4 transjakarta = pd.merge(transjakarta_lines, transjakarta_data)
      6 # convert kembali ke geodataframe

File c:\Program Files\Python313\Lib\site-packages\pandas\core\series.py:4924, in Series.apply(self, func, convert_dtype, args, by_row, **kwargs)
   4789 def apply(
   4790     self,
   4791     func: AggFuncType,
   (...)
   4796     **kwargs,
   4797 ) -> DataFrame | Series:
   4798     """
   4799     Invoke function on values of Series.
   4800 
   (...)
   4915     dtype: float64
   4916     """
   4917     return SeriesApply(
   4918         self,
   4919         func,
   4920         convert_dtype=convert_dtype,
   4921         by_row=by_row,
   4922         args=args,
   4923         kwargs=kwargs,
-> 4924     ).apply()

File c:\Program Files\Python313\Lib\site-packages\pandas\core\apply.py:1427, in SeriesApply.apply(self)
   1424     return self.apply_compat()
   1426 # self.func is Callable
-> 1427 return self.apply_standard()

File c:\Program Files\Python313\Lib\site-packages\pandas\core\apply.py:1507, in SeriesApply.apply_standard(self)
   1501 # row-wise access
   1502 # apply doesn't have a `na_action` keyword and for backward compat reasons
   1503 # we need to give `na_action="ignore"` for categorical data.
   1504 # TODO: remove the `na_action="ignore"` when that default has been changed in
   1505 #  Categorical (GH51645).
   1506 action = "ignore" if isinstance(obj.dtype, CategoricalDtype) else None
-> 1507 mapped = obj._map_values(
   1508     mapper=curried, na_action=action, convert=self.convert_dtype
   1509 )
   1511 if len(mapped) and isinstance(mapped[0], ABCSeries):
   1512     # GH#43986 Need to do list(mapped) in order to get treated as nested
   1513     #  See also GH#25959 regarding EA support
   1514     return obj._constructor_expanddim(list(mapped), index=obj.index)

File c:\Program Files\Python313\Lib\site-packages\pandas\core\base.py:921, in IndexOpsMixin._map_values(self, mapper, na_action, convert)
    918 if isinstance(arr, ExtensionArray):
    919     return arr.map(mapper, na_action=na_action)
--> 921 return algorithms.map_array(arr, mapper, na_action=na_action, convert=convert)

File c:\Program Files\Python313\Lib\site-packages\pandas\core\algorithms.py:1743, in map_array(arr, mapper, na_action, convert)
   1741 values = arr.astype(object, copy=False)
   1742 if na_action is None:
-> 1743     return lib.map_infer(values, mapper, convert=convert)
   1744 else:
   1745     return lib.map_infer_mask(
   1746         values, mapper, mask=isna(values).view(np.uint8), convert=convert
   1747     )

File lib.pyx:2972, in pandas._libs.lib.map_infer()

ValueError: invalid literal for int() with base 10: 'Rata-rata Harlan'

如果我不转置

transjakarta_data
并尝试
apply(int)
,它会显示如下错误:

pd.read_excel("https://github.com/lokalhangatt/stackoverlow/raw/refs/heads/main/dataviz_day13/TJ_Agustus_2020.xlsx", 
              index_col=0).reset_index().index.apply(int)

错误:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[255], line 1
----> 1 pd.read_excel("https://github.com/lokalhangatt/stackoverlow/raw/refs/heads/main/dataviz_day13/TJ_Agustus_2020.xlsx", index_col=0).reset_index().index.apply(int)

AttributeError: 'RangeIndex' object has no attribute 'apply'

我该如何解决这个问题?

python dataframe geospatial
1个回答
0
投票

刚刚自己检查了数据,看来您可能正在尝试将索引列“index”而不是索引本身转换为名为 korridor 的新列。 转置和重置索引还会导致创建一个新列 {Rata-rata Harlan, rata-rata Weekdata, Rata-rata Weekend},这使过程进一步复杂化。 如果您希望数据按 13 行中的“Koridor”对齐(它们是 Koridor),我附上代码:

import pandas as pd
import geopandas as gpd
transjakarta_lines = gpd.read_file('https://raw.githubusercontent.com/lokalhangatt/stackoverlow/refs/heads/main/dataviz_day13/transjakarta_lines.geojson')
transjakarta_data = pd.read_excel("https://github.com/lokalhangatt/stackoverlow/raw/refs/heads/main/dataviz_day13/TJ_Agustus_2020.xlsx", index_col=0).reset_index() #transposed
transjakarta_lines['koridor'] = transjakarta_lines['koridor'].apply(int)
transjakarta = pd.merge(transjakarta_lines, transjakarta_data, on = 'koridor')
transjakarta

如果我对代码的意图有错误的理解,请纠正我。

© www.soinside.com 2019 - 2024. All rights reserved.