pivot_table

pandas.pivot_table(data, values=None, index=None, columns=None, aggfunc='mean', fill_value=None, margins=False, dropna=True, margins_name='All')

The function pandas.pivot_table can be used to create spreadsheet-style pivot tables.

It takes a number of arguments:

  • data: A DataFrame object

  • values: a column or a list of columns to aggregate

  • index: a column, Grouper, array which has the same length as data, or list of them. Keys to group by on the pivot table index. If an array is passed, it is being used as the same manner as column values.

  • columns: a column, Grouper, array which has the same length as data, or list of them. Keys to group by on the pivot table column. If an array is passed, it is being used as the same manner as column values.

  • aggfunc: function to use for aggregation, defaulting to numpy.mean

  • fill_value : scalar, default None
    Value to replace missing values with

它的使用至少需要三个字段:index,columns,values;

crosstab

pandas.crosstab(index, columns, values=None, rownames=None, colnames=None, aggfunc=None, margins=False, dropna=True, normalize=False)

Compute a simple cross-tabulation of two (or more) factors. By default computes a frequency table of the factors unless an array of values and an aggregation function are passed.

It takes a number of arguments

  • index: array-like, values to group by in the rows

  • columns: array-like, values to group by in the columns

  • values: array-like, optional, array of values to aggregate according to the factors

  • aggfunc: function, optional, If no values array is passed, computes a frequency table

  • rownames: sequence, default None, must match number of row arrays passed

  • colnames: sequence, default None, if passed, must match number of column arrays passed

  • margins: boolean, default False, Add row/column margins (subtotals)

  • normalize: boolean, {‘all’, ‘index’, ‘columns’}, or {0,1}, default False. Normalize by dividing all values by the sum of values.

Any Series passed will have their name attributes used unless row or column names for the cross-tabulation are specified

注意:此处index, columns, values均为numpy.array的格式。
统计频次。

groupby

DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False, **kwargs) groupby

Group series using mapper (dict or key function, apply given function to group, return result as series) or by a series of columns.

  • by : mapping function / list of functions, dict, Series, or tuple list of column names. Called on each element of the object index to determine the groups. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups

  • axis : int, default 0

  • level : int, level name, or sequence of such, default None

    If the axis is a MultiIndex (hierarchical), group by a particular level or levels

  • as_index : boolean, default True

    For aggregated output, return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQL-style” grouped output

  • sort : boolean, default True

    Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. groupby preserves the order of rows within each group.

  • group_keys : boolean, default True
    When calling apply, add group keys to index to identify pieces

  • squeeze : boolean, default False

    reduce the dimensionality of the return type if possible, otherwise return a consistent type

import pandas as pd
import numpy as np
df_x=pd.DataFrame({'type':['x','y','z','x','z'],'A': [1, 2, 2, 2, 2], 'B': [3, 3, 4, 4, 4],'C': [1, 1, np.nan, 1, 1]})
df_x
A B C type
0 1 3 1.0 x
1 2 3 1.0 y
2 2 4 NaN z
3 2 4 1.0 x
4 2 4 1.0 z
#注意fill_value的用法
pd.pivot_table(df_x,index=['type'],columns=['B'],values=['A'],aggfunc=np.sum,fill_value=0)
A
B 3 4
type
x 1 2
y 2 0
z 0 4
#统计值出现次数的方法:
pd.pivot_table(df_x,index=['type'],columns=['B'],values=['A'],aggfunc=len,fill_value=0)
A
B 3 4
type
x 1 1
y 1 0
z 0 2
pd.crosstab(df_x.type,df_x.B)
B 3 4
type
x 1 1
y 1 0
z 0 2
# crosstab can also be passed a third Series and an aggregation function (aggfunc) 
#that will be applied to the values of the third Series within each group defined by the first two Series:

pd.crosstab(df_x.type,df_x.B,values=df_x.A,aggfunc=np.sum)
B 3 4
type
x 1.0 2.0
y 2.0 NaN
z NaN 4.0
df_x.groupby(['type']).get_group('x')
A B C type
0 1 3 1.0 x
3 2 4 1.0 x
df_x.groupby(['type']).get_group('x').sum()
A        3
B        7
C        2
type    xx
dtype: object