Pandas 中
bool
和 boolean
Dtype 之间的基本原理是什么?
import pandas as pd
import numpy as np
df1 = pd.DataFrame({'col1': [True, False, False]}, dtype='bool')
print(df1)
print(df1.info())
print()
df2 = pd.DataFrame({'col1': [True, False, None]}, dtype='bool')
print("df2")
print(df2)
print(df2.info())
print()
df3 = pd.DataFrame({'col1': [True, False, np.nan]}, dtype='bool')
print("df3")
print(df3)
print(df3.info())
print()
df4 = pd.DataFrame({'col1': [True, False, None, np.nan]}, dtype='bool')
print("df4")
print(df4)
print(df4.info())
print()
df5 = pd.DataFrame({'col1': [True, False, False]}, dtype='boolean')
print("df5")
print(df5)
print(df5.info())
print()
df6 = pd.DataFrame({'col1': [True, False, None]}, dtype='boolean')
print("df6")
print(df6)
print(df6.info())
print()
df7 = pd.DataFrame({'col1': [True, False, np.nan]}, dtype='boolean')
print("df7")
print(df7)
print(df7.info())
print()
df8 = pd.DataFrame({'col1': [True, False, None, np.nan]}, dtype='boolean')
print("df8")
print(df8)
print(df8.info())
为什么对于
None
和 np.nan
Dtype,bool
和 boolean
的处理方式不同?其背后的原理是什么?
如果 bool
,则 None
和 np.nan
均被视为非空,其中 None
为 False
,np.nan
为 True
。
但是,如果 <NA>
,则两者都被视为空值 boolean
。
df1
col1
0 True
1 False
2 False
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 col1 3 non-null bool
dtypes: bool(1)
memory usage: 135.0 bytes
None
df2
col1
0 True
1 False
2 False
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 col1 3 non-null bool
dtypes: bool(1)
memory usage: 135.0 bytes
None
df3
col1
0 True
1 False
2 True
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 col1 3 non-null bool
dtypes: bool(1)
memory usage: 135.0 bytes
None
df4
col1
0 True
1 False
2 False
3 True
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 col1 4 non-null bool
dtypes: bool(1)
memory usage: 136.0 bytes
None
df5
col1
0 True
1 False
2 False
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 col1 3 non-null boolean
dtypes: boolean(1)
memory usage: 138.0 bytes
None
df6
col1
0 True
1 False
2 <NA>
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 col1 2 non-null boolean
dtypes: boolean(1)
memory usage: 138.0 bytes
None
df7
col1
0 True
1 False
2 <NA>
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 col1 2 non-null boolean
dtypes: boolean(1)
memory usage: 138.0 bytes
None
df8
col1
0 True
1 False
2 <NA>
3 <NA>
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 col1 2 non-null boolean
dtypes: boolean(1)
memory usage: 140.0 bytes
None
> The following values are considered false: > - None > - False > - zero of any numeric type, for example, 0, 0L, 0.0, 0j. > - any empty sequence, for example, '', (), []. > - any empty mapping, for example, {}. > - instances of user-defined classes, if the class defines a __nonzero__() or __len__() method, when that method returns the integer zero or bool value False. 1 All other values are considered > true — so objects of many types are always true.
来源:https://docs.python.org/2/library/stdtypes.html#truth-value-testing
因此,根据这些约定,None 为 False,np.nan 为 True。
Boolean dtype 实现 Kleene Logic(有时称为三值逻辑) 逻辑)。
来源:https://pandas.pydata.org/docs/user_guide/boolean.html
例如,True | NA 给出 True,因为 NA 可以是 True 或 False,并且在这两种情况下,OR 运算 (|) 将得到 True,因为我们至少有一个 True。 同样,假 | NA 给出 NA 因为我们不知道是否存在 True。