Pandas缺失值处理-判断和删除
二、缺失值判断
DataFrame.isna()
df = pd.DataFrame({'age': [5, 6, np.NaN], 'born': [pd.NaT, pd.Timestamp('1939-05-27'), pd.Timestamp('1940-04-25')], 'name': ['Alfred', 'Batman', ''], 'toy': [None, 'Batmobile', 'Joker']})df age born name toy0 5.0 NaT Alfred None1 6.0 1939-05-27 Batman Batmobile2 NaN 1940-04-25 Jokerdf.isna() age born name toy0 False True False True1 False False False False2 True False False Falseser = pd.Series([5, 6, np.NaN])ser.isna()0 False1 False2 True# 但对于DataFrame我们更关心到底每列有多少缺失值 统计缺失值的个数df.isna().sum()age 1born 1name 0toy 1DataFrame.isnull()
df.isnull() age born name toy0 False True False True1 False False False False2 True False False False#统计某一列的缺失值个数df['age'].isnull().sum()1DataFrame.notna()
df.notna()age born name toy0 True False True False1 True True True True2 False True True True
DataFrame.notnull()
df.notnull()age born name toy0 True False True False1 True True True True2 False True True True
df.info()<class 'pandas.core.frame.DataFrame'>RangeIndex: 3 entries, 0 to 2Data columns (total 4 columns):# Column Non-Null Count Dtype--- ------ -------------- -----0 age 2 non-null float641 born 2 non-null datetime64[ns]2 name 3 non-null object3 toy 2 non-null objectdtypes: datetime64[ns](1), float64(1), object(2)memory usage: 224.0+ bytes
三、缺失值删除
DataFrame.dropna
DataFrame.dropna(axis=0, how='any', thresh=None,subset=None, inplace=False)
df = pd.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'],"toy": [np.nan, 'Batmobile', 'Bullwhip'],"born": [pd.NaT, pd.Timestamp("1940-04-25"),pd.NaT]})dfname toy born0 Alfred NaN NaT1 Batman Batmobile 1940-04-252 Catwoman Bullwhip NaT#删除包含缺失值的行df.dropna()name toy born1 Batman Batmobile 1940-04-25#删除包含缺失值的列,需要用到参数axis='columns'df.dropna(axis='columns')name0 Alfred1 Batman2 Catwomandf.dropna(how='all')name toy born0 Alfred NaN NaT1 Batman Batmobile 1940-04-252 Catwoman Bullwhip NaTdf.dropna(thresh=2)name toy born1 Batman Batmobile 1940-04-252 Catwoman Bullwhip NaTdf.dropna(subset=['name', 'born'])name toy born1 Batman Batmobile 1940-04-25df.dropna(inplace=True)dfname toy born1 Batman Batmobile 1940-04-25
··· END ···
评论
