import pandas as pd
import numpy as np
#Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','','Steve','Tom','Jack',
'Lee','David','','Betina','Andres']),
'Age':pd.Series([25,,25,23,30,29,23,'NULL',40,30,51,46]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])
}
#Create a DataFrame
df = pd.DataFrame(d)
summary = df.describe(include='all').T
print(summary)
我怎样才能创建两个列来获取total_duplicate_value_count和total_null_value_count。然后把它添加到现有的 摘要 数据框架 ?
Expected Output :
column_name total_null_value_count total_duplicate_value_count count ...
Name 2 1 12 ...
Age 2 3 12 ...
Rating 0 0 12 ...
首先追加空值计数 isna().sum()
作为新的一行,进行移位,然后将新的列与新的行之间的差值追加到一起。count
和 unique
作为重复的计数。
df.describe(include='all').append(df.isna().sum().rename('total_null_value_count')).T.assign(total_duplicate_count = df.describe(include='all').loc['count'] - df.describe(include='all').loc['unique'])