pandas:groupby,pivot_table,crosstable比较
pivot_table
The function pandas.pivot_table can be used to create spreadsheetstyle pivot tables.
It takes a number of arguments:

data: A DataFrame object

values: a column or a list of columns to aggregate

index: a column, Grouper, array which has the same length as data, or list of them. Keys to group by on the pivot table index. If an array is passed, it is being used as the same manner as column values.

columns: a column, Grouper, array which has the same length as data, or list of them. Keys to group by on the pivot table column. If an array is passed, it is being used as the same manner as column values.

aggfunc: function to use for aggregation, defaulting to numpy.mean

fill_value : scalar, default None
Value to replace missing values with
它的使用至少需要三个字段：index,columns,values;
crosstab
Compute a simple crosstabulation of two (or more) factors. By default computes a frequency table of the factors unless an array of values and an aggregation function are passed.
It takes a number of arguments

index: arraylike, values to group by in the rows

columns: arraylike, values to group by in the columns

values: arraylike, optional, array of values to aggregate according to the factors

aggfunc: function, optional, If no values array is passed, computes a frequency table

rownames: sequence, default None, must match number of row arrays passed

colnames: sequence, default None, if passed, must match number of column arrays passed

margins: boolean, default False, Add row/column margins (subtotals)

normalize: boolean, {‘all’, ‘index’, ‘columns’}, or {0,1}, default False. Normalize by dividing all values by the sum of values.
Any Series passed will have their name attributes used unless row or column names for the crosstabulation are specified
注意：此处index, columns, values均为numpy.array的格式。
统计频次。
groupby
Group series using mapper (dict or key function, apply given function to group, return result as series) or by a series of columns.

by : mapping function / list of functions, dict, Series, or tuple list of column names. Called on each element of the object index to determine the groups. If a dict or Series is passed, the Series or dict VALUES will be used to determine the groups

axis : int, default 0

level : int, level name, or sequence of such, default None
If the axis is a MultiIndex (hierarchical), group by a particular level or levels

as_index : boolean, default True
For aggregated output, return object with group labels as the index. Only relevant for DataFrame input. as_index=False is effectively “SQLstyle” grouped output

sort : boolean, default True
Sort group keys. Get better performance by turning this off. Note this does not influence the order of observations within each group. groupby preserves the order of rows within each group.

group_keys : boolean, default True
When calling apply, add group keys to index to identify pieces

squeeze : boolean, default False
reduce the dimensionality of the return type if possible, otherwise return a consistent type
import pandas as pd
import numpy as np
df_x=pd.DataFrame({'type':['x','y','z','x','z'],'A': [1, 2, 2, 2, 2], 'B': [3, 3, 4, 4, 4],'C': [1, 1, np.nan, 1, 1]})
df_x
A  B  C  type  

0  1  3  1.0  x 
1  2  3  1.0  y 
2  2  4  NaN  z 
3  2  4  1.0  x 
4  2  4  1.0  z 
#注意fill_value的用法
pd.pivot_table(df_x,index=['type'],columns=['B'],values=['A'],aggfunc=np.sum,fill_value=0)
A  

B  3  4 
type  
x  1  2 
y  2  0 
z  0  4 
#统计值出现次数的方法：
pd.pivot_table(df_x,index=['type'],columns=['B'],values=['A'],aggfunc=len,fill_value=0)
A  

B  3  4 
type  
x  1  1 
y  1  0 
z  0  2 
pd.crosstab(df_x.type,df_x.B)
B  3  4 

type  
x  1  1 
y  1  0 
z  0  2 
# crosstab can also be passed a third Series and an aggregation function (aggfunc)
#that will be applied to the values of the third Series within each group defined by the first two Series:
pd.crosstab(df_x.type,df_x.B,values=df_x.A,aggfunc=np.sum)
B  3  4 

type  
x  1.0  2.0 
y  2.0  NaN 
z  NaN  4.0 
df_x.groupby(['type']).get_group('x')
A  B  C  type  

0  1  3  1.0  x 
3  2  4  1.0  x 
df_x.groupby(['type']).get_group('x').sum()
A 3
B 7
C 2
type xx
dtype: object