数据分析使用pandas,在聚合groupby的情况下,经常使用agg方法来计算多个统计值,如极值、平均值、分位值等,在计算对象是单列、多列条件下,如何使用。

数据准备

使用鸢尾花数据,转换为pandas dataframe使用。

from sklearn import datasets
import pandas as pd

iris = datasets.load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df.loc[:, 'target'] = iris.target
df.loc[:, 'target_name'] = iris.target_names[iris.target]

df.head()
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) target target_name
0 5.1 3.5 1.4 0.2 0 setosa
1 4.9 3.0 1.4 0.2 0 setosa
2 4.7 3.2 1.3 0.2 0 setosa
3 4.6 3.1 1.5 0.2 0 setosa
4 5.0 3.6 1.4 0.2 0 setosa

对单列数据聚合处理

result_df = df.groupby(['target_name'], as_index=False)['sepal length (cm)'].agg(
    Mean = 'mean',
    Std = 'std',
    Min = 'min',
    TenQuantile= lambda x: x.quantile(0.1)
)

result_df
target_name Mean Std Min TenQuantile
0 setosa 5.006 0.352490 4.3 4.59
1 versicolor 5.936 0.516171 4.9 5.38
2 virginica 6.588 0.635880 4.9 5.80

对多列数据聚合处理

result_df = df.groupby(['target_name'], as_index=False).agg(
    sl_mean = ('sepal length (cm)', 'mean'), 
    sl_std = ('sepal length (cm)', 'std'), 
    sl_min = ('sepal length (cm)', 'min'), 
    sl_tenquantile = ('sepal length (cm)', lambda x: x.quantile(0.1)),
    sw_mean = ('sepal width (cm)', 'mean'), 
    sw_std = ('sepal width (cm)', 'std'), 
    sw_min = ('sepal width (cm)', 'min'), 
    sw_tenquantile = ('sepal width (cm)', lambda x: x.quantile(0.1)),
)

result_df
target_name sl_mean sl_std sl_min sl_tenquantile sw_mean sw_std sw_min sw_tenquantile
0 setosa 5.006 0.352490 4.3 4.59 3.428 0.379064 2.3 3.00
1 versicolor 5.936 0.516171 4.9 5.38 2.770 0.313798 2.0 2.30
2 virginica 6.588 0.635880 4.9 5.80 2.974 0.322497 2.2 2.59

总结及存在问题

如果不聚合直接计算统计值,可以借助构建dictionary再转换为dataframe,方法如下:

result_dict = {
    'Mean': df.iloc[:, :-2].mean(),
    'Std': df.iloc[:, :-2].std(),
    'Min': df.iloc[:, :-2].min(),
    '10% Quantile': df.iloc[:, :-2].quantile(0.1),
}

result_df = pd.DataFrame(result_dict)

result_df
Mean Std Min 10% Quantile
sepal length (cm) 5.843333 0.828066 4.3 4.8
sepal width (cm) 3.057333 0.435866 2.0 2.5
petal length (cm) 3.758000 1.765298 1.0 1.4
petal width (cm) 1.199333 0.762238 0.1 0.2

版本记录

2023-12-24,初稿
2024-01-19,增加另外一种方式,更简明