# Aggregating Quantiles with Pandas

One of my favourite tools in Pandas is `agg`

for aggregation (it’s a worse version of `dplyr`

s `summarise`

). Unfortunately it can be difficult to work with for custom aggregates, like nth largest value. If your aggregate is parameterised, like quantile, you potentially have to define a function for every parameter you use. A neat trick is to use a class to capture the parameters, making it much easier to try out variations.

Suppose you have some data on avocado prices containing the `year`

and the `price`

in a dataframe `df`

. If you want to calculate the 25th percentile of price you could run `df.price.quantile(0.25)`

. If you wanted to calculate the median of price per year you could run `df.groupby('year').agg(med_price=('price', 'median'))`

. But what if you wanted to calculate the 25th percentile of price per year?

You could define a function `percentile25`

, but defining all those functions gets annoying and slow if you calculate lots of percentiles. You could define a function that takes a percentile and returns a percentile function, but these inner functions create confusing stack traces and can’t be pickled.

A better solution is to use a class, that can act just like a function using the `__call__`

parameter. This one works on Pandas Dataframes and Series:

```
class Quantile:
def __init__(self, q):
self.q = q
def __call__(self, x):
return x.quantile(self.q)
# Or using numpy
# return np.quantile(x.dropna(), self.q)
```

Then to calculate the quartiles of price per year you could run

```
(
df
.groupby('year')
.agg(price_p25 = ('price', Quantile(0.25)),
price_p50 = ('price', Quantile(0.50)),
price_p75 = ('price', Quantile(0.75)))
)
```