Pandas合并错误TypeError:'int'和'str'实例之间不支持'>'

问题描述 投票:4回答:1

我有一个包含几个表的数据集,每个表都有国家,年份和一些指标。我已将所有excel表转换为csv文件,然后将它们合并到一个表中。

问题是我有一些表拒绝合并,并出现以下消息TypeError: '>' not supported between instances of 'int' and 'str'我尽我所能,但没有运气,仍然出现相同的错误!此外,我尝试了数百个不同的文件,但仍有数十个文件面临这个问题。

对于示例文件file17.csvfile35.csv(如果有人需要重复它)。这是我使用的代码:

# To load the first file
import pandas as pd
filename1 = 'file17.csv'
df1 = pd.read_csv(filename1, encoding='cp1252', low_memory=False)
df1.set_index(['Country', 'Year'], inplace=True)
df1.dropna(axis=0, how='all', inplace=True)
df1.head()

出>>>

+-------------+------+--------+--------+
|             |      | ind500 | ind356 |
| Country     | Year |        |        |
| Afghanistan | 1800 | 603.0  | NaN    |
|             | 1801 | 603.0  | NaN    |
|             | 1802 | 603.0  | NaN    |
|             | 1803 | 603.0  | NaN    |
|             | 1804 | 603.0  | NaN    |
+-------------+------+--------+--------+

在>>>

# To load the second file
filename2 = 'file35.csv'
df2 = pd.read_csv(filename2, encoding='cp1252', low_memory=False)
df2.set_index(['Country', 'Year'], inplace=True)
df2.dropna(axis=0, how='all', inplace=True)
df2.head()

出>>>

enter image description here

# To merge the two dataframes
gross_df = pd.merge(df1, df2, left_index=True, right_index=True, how='outer')
gross_df.dropna(axis=0, how='all', inplace=True)
print (gross_df.shape)
gross_df.to_csv('merged.csv')

重要提示:我注意到在所有成功的文件中,列名称以升序显示,即ind001ind009ind012,因为它们是自动排序的。而有错误的文件有一个或多个列有错误排列的列,如ind500,然后是第一个表中的in356,同样适用于提供的第二个样本。

请注意,两个dataframes指示了两个索引(Countryyear

错误

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
C:\ProgramData\Anaconda2\envs\conda_python3\lib\site-packages\pandas\core\algorithms.py in safe_sort(values, labels, na_sentinel, assume_unique)
    480         try:
--> 481             sorter = values.argsort()
    482             ordered = values.take(sorter)

TypeError: '>' not supported between instances of 'int' and 'str'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-11-960b2698de60> in <module>()
----> 1 gross_df = pd.merge(df1, df2, left_index=True, right_index=True, how='outer', sort=False)
      2 gross_df.dropna(axis=0, how='all', inplace=True)
      3 print (gross_df.shape)
      4 gross_df.to_csv('merged.csv')

C:\ProgramData\Anaconda2\envs\conda_python3\lib\site-packages\pandas\core\reshape\merge.py in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator)
     52                          right_index=right_index, sort=sort, suffixes=suffixes,
     53                          copy=copy, indicator=indicator)
---> 54     return op.get_result()
     55 
     56 

C:\ProgramData\Anaconda2\envs\conda_python3\lib\site-packages\pandas\core\reshape\merge.py in get_result(self)
    567                 self.left, self.right)
    568 
--> 569         join_index, left_indexer, right_indexer = self._get_join_info()
    570 
    571         ldata, rdata = self.left._data, self.right._data

C:\ProgramData\Anaconda2\envs\conda_python3\lib\site-packages\pandas\core\reshape\merge.py in _get_join_info(self)
    720             join_index, left_indexer, right_indexer = \
    721                 left_ax.join(right_ax, how=self.how, return_indexers=True,
--> 722                              sort=self.sort)
    723         elif self.right_index and self.how == 'left':
    724             join_index, left_indexer, right_indexer = \

C:\ProgramData\Anaconda2\envs\conda_python3\lib\site-packages\pandas\core\indexes\base.py in join(self, other, how, level, return_indexers, sort)
   2995             else:
   2996                 return self._join_non_unique(other, how=how,
-> 2997                                              return_indexers=return_indexers)
   2998         elif self.is_monotonic and other.is_monotonic:
   2999             try:

C:\ProgramData\Anaconda2\envs\conda_python3\lib\site-packages\pandas\core\indexes\base.py in _join_non_unique(self, other, how, return_indexers)
   3076         left_idx, right_idx = _get_join_indexers([self.values],
   3077                                                  [other._values], how=how,
-> 3078                                                  sort=True)
   3079 
   3080         left_idx = _ensure_platform_int(left_idx)

C:\ProgramData\Anaconda2\envs\conda_python3\lib\site-packages\pandas\core\reshape\merge.py in _get_join_indexers(left_keys, right_keys, sort, how, **kwargs)
    980 
    981     # get left & right join labels and num. of levels at each location
--> 982     llab, rlab, shape = map(list, zip(* map(fkeys, left_keys, right_keys)))
    983 
    984     # get flat i8 keys from label lists

C:\ProgramData\Anaconda2\envs\conda_python3\lib\site-packages\pandas\core\reshape\merge.py in _factorize_keys(lk, rk, sort)
   1409     if sort:
   1410         uniques = rizer.uniques.to_array()
-> 1411         llab, rlab = _sort_labels(uniques, llab, rlab)
   1412 
   1413     # NA group

C:\ProgramData\Anaconda2\envs\conda_python3\lib\site-packages\pandas\core\reshape\merge.py in _sort_labels(uniques, left, right)
   1435     labels = np.concatenate([left, right])
   1436 
-> 1437     _, new_labels = algos.safe_sort(uniques, labels, na_sentinel=-1)
   1438     new_labels = _ensure_int64(new_labels)
   1439     new_left, new_right = new_labels[:l], new_labels[l:]

C:\ProgramData\Anaconda2\envs\conda_python3\lib\site-packages\pandas\core\algorithms.py in safe_sort(values, labels, na_sentinel, assume_unique)
    483         except TypeError:
    484             # try this anyway
--> 485             ordered = sort_mixed(values)
    486 
    487     # labels:

C:\ProgramData\Anaconda2\envs\conda_python3\lib\site-packages\pandas\core\algorithms.py in sort_mixed(values)
    469         str_pos = np.array([isinstance(x, string_types) for x in values],
    470                            dtype=bool)
--> 471         nums = np.sort(values[~str_pos])
    472         strs = np.sort(values[str_pos])
    473         return _ensure_object(np.concatenate([nums, strs]))

C:\ProgramData\Anaconda2\envs\conda_python3\lib\site-packages\numpy\core\fromnumeric.py in sort(a, axis, kind, order)
    820     else:
    821         a = asanyarray(a).copy(order="K")
--> 822     a.sort(axis=axis, kind=kind, order=order)
    823     return a
    824 

TypeError: '>' not supported between instances of 'int' and 'str'
python pandas csv merge typeerror
1个回答
2
投票

此错误表示合并DF中的索引具有不同的dtypes

演示 - 如何将字符串索引级别转换为int

In [183]: df
Out[183]:
              0         1         2         3
bar 1 -0.205037  0.762509  0.816608 -1.057907
    2  1.249104  0.338777 -0.982084  0.329330
baz 1  0.845695 -0.996365  0.548100 -0.113733
    2  1.247092 -2.674061 -0.071993 -0.734242
foo 1 -1.233825 -0.195377 -0.240303  1.168055
    2 -0.108942 -0.615612 -1.299512  0.908641
qux 1  0.844421  0.251425 -0.506877  1.307800
    2  0.038580  0.045072 -0.262974  0.629804

In [184]: df.index
Out[184]:
MultiIndex(levels=[['bar', 'baz', 'foo', 'qux'], ['1', '2']],
           labels=[[0, 0, 1, 1, 2, 2, 3, 3], [0, 1, 0, 1, 0, 1, 0, 1]])

In [185]: df.index.get_level_values(1)
Out[185]: Index(['1', '2', '1', '2', '1', '2', '1', '2'], dtype='object')

In [187]: df.index = df.index.set_levels(df.index.get_level_values(1) \
                       .map(lambda x: pd.to_numeric(x, errors='coerce')), level=1)

结果:

In [189]: df.index.get_level_values(1)
Out[189]: Int64Index([1, 2, 1, 2, 1, 2, 1, 2], dtype='int64')

更新:试试这个:

In [247]: d1 = pd.read_csv('https://docs.google.com/uc?id=1jUsbr5pw6sUMvewI4fmbpssroG4RZ7LE&export=download', index_col=[0,1])

In [248]: d2 = pd.read_csv('https://docs.google.com/uc?id=1Ufx6pvnSC6zQdTAj05ObmV027fA4-Mr3&export=download', index_col=[0,1])

In [249]: d2 = d2[pd.to_numeric(d2.index.get_level_values(1), errors='coerce').notna()]

In [250]: d2.index = d2.index.set_levels(d2.index.get_level_values(1).map(lambda x: pd.to_numeric(x, errors='coerce')), level=1)

In [251]: d1.reset_index().merge(d2.reset_index(), on=['Country','Year'], how='outer').set_index(['Country','Year'])
Out[251]:
                            ind500  ind356  ind475  ind476        ind456
Country               Year
Afghanistan           1800   603.0     NaN     NaN     NaN           NaN
                      1801   603.0     NaN     NaN     NaN           NaN
                      1802   603.0     NaN     NaN     NaN           NaN
                      1803   603.0     NaN     NaN     NaN           NaN
                      1804   603.0     NaN     NaN     NaN           NaN
                      1805   603.0     NaN     NaN     NaN           NaN
                      1806   603.0     NaN     NaN     NaN           NaN
                      1807   603.0     NaN     NaN     NaN           NaN
                      1808   603.0     NaN     NaN     NaN           NaN
                      1809   603.0     NaN     NaN     NaN           NaN
...                            ...     ...     ...     ...           ...
Bahamas, The          1967     NaN     NaN     NaN     NaN  18381.131314
Gambia, The           1967     NaN     NaN     NaN     NaN    937.355288
Korea, Dem. Rep.      1967     NaN     NaN     NaN     NaN   1428.689253
Lao PDR               1967     NaN     NaN     NaN     NaN   1412.359955
Netherlands Antilles  1967     NaN     NaN     NaN     NaN  14076.731352
Russian Federation    1967     NaN     NaN     NaN     NaN  11794.726437
Serbia and Montenegro 1967     NaN     NaN     NaN     NaN   2987.080489
Syrian Arab Republic  1967     NaN     NaN     NaN     NaN   2015.913906
Yemen, Rep.           1967     NaN     NaN     NaN     NaN   1075.693355
Bahamas, The          1968     NaN     NaN     NaN     NaN  18712.082830

[46607 rows x 5 columns]
© www.soinside.com 2019 - 2024. All rights reserved.