为什么当 DF 可以首先容纳分类列中的 None 时,串联无法处理它

问题描述 投票:0回答:1

我有 2 个带有

object
类型列的 DF,它们可以很好地连接。

代码

df1 = pd.DataFrame({'A': ['A0', 'A1'], 'B': ['B0', None]})
df2 = pd.DataFrame({'A': ['A4', 'A5'], 'B': [None, None]})

print(">>>>>>>>>>>>>>> Original DFs")
print(df1)
print(df2)

print(">>>>>>>>>>>>>>> Original DTypes")
print(df1.dtypes)
print(df2.dtypes)

对应输出

>>>>>>>>>>>>>>> Original DFs
    A     B
0  A0    B0
1  A1  None
    A     B
0  A4  None
1  A5  None
>>>>>>>>>>>>>>> Original DTypes
A    object
B    object
dtype: object
A    object
B    object
dtype: object
>>>>>>>>>>>>>>> Concatenation 1 - No Warning
    A     B
0  A0    B0
1  A1  None
0  A4  None
1  A5  None

但是如果我对

categorical
列做同样的事情,我会得到
FutureWarning

具有分类数据类型的代码

print(">>>>>>>>>>>>>>> Categorical DTypes")
df1 = df1.astype('category')
df2 = df2.astype('category')
print(df1.dtypes)
print(df2.dtypes)

print(">>>>>>>>>>>>>>> Concatenation 2 - Gives warning")
print(pd.concat([df1, df2]))

对应输出

>>>>>>>>>>>>>>> Categorical DTypes
A    category
B    category
dtype: object
A    category
B    category
dtype: object
>>>>>>>>>>>>>>> Concatenation 2 - Gives warning
bla.py:37: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  print(pd.concat([df1, df2]))
    A    B
0  A0   B0
1  A1  NaN
0  A4  NaN
1  A5  NaN

df2
有一个
NaN
开始,没有问题,但是当我尝试连接所有
NaN
列时,我收到警告。建议完全删除此类条目。为什么会这样呢?为什么串联似乎有问题
NaNs

这是完整代码

import pandas as pd


def bla():
    '''The main function, that can also be called fromother scripts as an API'''

    df1 = pd.DataFrame({'A': ['A0', 'A1'], 'B': ['B0', None]})
    df2 = pd.DataFrame({'A': ['A4', 'A5'], 'B': [None, None]})

    print(">>>>>>>>>>>>>>> Original DFs")
    print(df1)
    print(df2)

    print(">>>>>>>>>>>>>>> Original DTypes")
    print(df1.dtypes)
    print(df2.dtypes)

    print(">>>>>>>>>>>>>>> Concatenation 1 - No Warning")
    print(pd.concat([df1, df2]))

    print(">>>>>>>>>>>>>>> Categorical DTypes")
    df1 = df1.astype('category')
    df2 = df2.astype('category')
    print(df1.dtypes)
    print(df2.dtypes)

    print(">>>>>>>>>>>>>>> Concatenation 2 - Gives warning")
    print(pd.concat([df1, df2]))


if __name__ == '__main__':
    bla()
python pandas dataframe categorical-data
1个回答
0
投票

此行为最初在 here 进行了描述(请参阅 2021 年 9 月的问题 43507),后来又恢复了(2022 年 6 月的问题 47372)。

在第一种情况下,两个 B 列都有一个对象数据类型,因此

concat
之后的通用数据类型保持不变。

例如,组合 datetime64 和 float 列将强制 float 变为 datetime64:

df1 = pd.DataFrame({"bar": [pd.Timestamp("2013-01-01")]}, index=range(1))
df2 = pd.DataFrame({"bar": np.nan}, index=range(1, 2))
res = pd.concat([df1, df2])

         bar
0 2013-01-01
1        NaT  # NaN was converted to NaT

这用于将列转换为对象,现在这会引发警告。我认为这种行为预计会在 pandas 3.0 中改变。

© www.soinside.com 2019 - 2024. All rights reserved.