Pandas 中的 bool 和 boolean Dtype

问题描述 投票:0回答:1

Pandas 中

bool
boolean
Dtype 之间的基本原理是什么?

import pandas as pd
import numpy as np

df1 = pd.DataFrame({'col1': [True, False, False]}, dtype='bool')
print(df1)
print(df1.info())
print()

df2 = pd.DataFrame({'col1': [True, False, None]}, dtype='bool')
print("df2")
print(df2)
print(df2.info())
print()

df3 = pd.DataFrame({'col1': [True, False, np.nan]}, dtype='bool')
print("df3")
print(df3)
print(df3.info())
print()

df4 = pd.DataFrame({'col1': [True, False, None, np.nan]}, dtype='bool')
print("df4")
print(df4)
print(df4.info())
print()

df5 = pd.DataFrame({'col1': [True, False, False]}, dtype='boolean')
print("df5")
print(df5)
print(df5.info())
print()

df6 = pd.DataFrame({'col1': [True, False, None]}, dtype='boolean')
print("df6")
print(df6)
print(df6.info())
print()

df7 = pd.DataFrame({'col1': [True, False, np.nan]}, dtype='boolean')
print("df7")
print(df7)
print(df7.info())
print()

df8 = pd.DataFrame({'col1': [True, False, None, np.nan]}, dtype='boolean')
print("df8")
print(df8)
print(df8.info())

为什么对于

None
np.nan
Dtype,
bool
boolean
的处理方式不同?其背后的原理是什么? 如果
bool
,则
None
np.nan
均被视为非空,其中
None
False
np.nan
True
。 但是,如果
<NA>
,则两者都被视为空值
boolean

df1
        col1
    0   True
    1  False
    2  False
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 3 entries, 0 to 2
    Data columns (total 1 columns):
     #   Column  Non-Null Count  Dtype
    ---  ------  --------------  -----
     0   col1    3 non-null      bool 
    dtypes: bool(1)
    memory usage: 135.0 bytes
    None
    
    df2
        col1
    0   True
    1  False
    2  False
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 3 entries, 0 to 2
    Data columns (total 1 columns):
     #   Column  Non-Null Count  Dtype
    ---  ------  --------------  -----
     0   col1    3 non-null      bool 
    dtypes: bool(1)
    memory usage: 135.0 bytes
    None
    
    df3
        col1
    0   True
    1  False
    2   True
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 3 entries, 0 to 2
    Data columns (total 1 columns):
     #   Column  Non-Null Count  Dtype
    ---  ------  --------------  -----
     0   col1    3 non-null      bool 
    dtypes: bool(1)
    memory usage: 135.0 bytes
    None
    
    df4
        col1
    0   True
    1  False
    2  False
    3   True
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 4 entries, 0 to 3
    Data columns (total 1 columns):
     #   Column  Non-Null Count  Dtype
    ---  ------  --------------  -----
     0   col1    4 non-null      bool 
    dtypes: bool(1)
    memory usage: 136.0 bytes
    None
    
    df5
        col1
    0   True
    1  False
    2  False
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 3 entries, 0 to 2
    Data columns (total 1 columns):
     #   Column  Non-Null Count  Dtype  
    ---  ------  --------------  -----  
     0   col1    3 non-null      boolean
    dtypes: boolean(1)
    memory usage: 138.0 bytes
    None
    
    df6
        col1
    0   True
    1  False
    2   <NA>
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 3 entries, 0 to 2
    Data columns (total 1 columns):
     #   Column  Non-Null Count  Dtype  
    ---  ------  --------------  -----  
     0   col1    2 non-null      boolean
    dtypes: boolean(1)
    memory usage: 138.0 bytes
    None
    
    df7
        col1
    0   True
    1  False
    2   <NA>
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 3 entries, 0 to 2
    Data columns (total 1 columns):
     #   Column  Non-Null Count  Dtype  
    ---  ------  --------------  -----  
     0   col1    2 non-null      boolean
    dtypes: boolean(1)
    memory usage: 138.0 bytes
    None
    
    df8
        col1
    0   True
    1  False
    2   <NA>
    3   <NA>
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 4 entries, 0 to 3
    Data columns (total 1 columns):
     #   Column  Non-Null Count  Dtype  
    ---  ------  --------------  -----  
     0   col1    2 non-null      boolean
    dtypes: boolean(1)
    memory usage: 140.0 bytes
    None
pandas boolean
1个回答
0
投票
  • 无 vs np.nan
> The following values are considered false:
>     - None
>     - False
>     - zero of any numeric type, for example, 0, 0L, 0.0, 0j.
>     - any empty sequence, for example, '', (), [].
>     - any empty mapping, for example, {}.
>     - instances of user-defined classes, if the class defines a __nonzero__() or __len__() method, when that method returns the integer zero or bool value False. 1 All other values are considered
> true — so objects of many types are always true.

来源:https://docs.python.org/2/library/stdtypes.html#truth-value-testing

因此,根据这些约定,None 为 False,np.nan 为 True。

  • 布尔值与布尔值

Boolean dtype 实现 Kleene Logic(有时称为三值逻辑) 逻辑)。

来源:https://pandas.pydata.org/docs/user_guide/boolean.html

例如,True | NA 给出 True,因为 NA 可以是 True 或 False,并且在这两种情况下,OR 运算 (|) 将得到 True,因为我们至少有一个 True。 同样,假 | NA 给出 NA 因为我们不知道是否存在 True。

© www.soinside.com 2019 - 2024. All rights reserved.