One of my favourite tools in Pandas is
agg for aggregation (it's a worse version of
Unfortunately it can be difficult to work with for custom aggregates, like nth largest value.
If your aggregate is parameterised, like quantile, you potentially have to define a function for every parameter you use.
A neat trick is to use a class to capture the parameters, making it much easier to try out variations.
Suppose you have some data on avocado prices containing the
year and the
price in a dataframe
If you want to calculate the 25th percentile of price you could run
If you wanted to calculate the median of price per year you could run
But what if you wanted to calculate the 25th percentile of price per year?
You could define a function
percentile25, but defining all those functions gets annoying and slow if you calculate lots of percentiles.
You could define a function that takes a percentile and returns a percentile function, but these inner functions create confusing stack traces and can't be pickled.
A better solution is to use a class, that can act just like a function using the
This one works on Pandas Dataframes and Series:
class Quantile: def __init__(self, q): self.q = q def __call__(self, x): return x.quantile(self.q) # Or using numpy # return np.quantile(x.dropna(), self.q)
Then to calculate the quartiles of price per year you could run
( df .groupby('year') .agg(price_p25 = ('price', Quantile(0.25)), price_p50 = ('price', Quantile(0.50)), price_p75 = ('price', Quantile(0.75))) )