Cartesian Product in R and Python


May 14, 2020

You’ve got a couple of groups and you want to get every possible combination of them. This is called the Cartesian Product of the groups. There are standard ways of doing this in R and Python.

Python: List Comprehensions

Concretely we’ve got (in Python notation) the vectors x = [1, 2, 3] and y = [4, 5] and we want to get all possible pairs: [(1, 4), (2, 4), (3, 4), (1, 5), (2, 5), (3, 5)]`. The “pythonic” way to do this is with a list comprehension:

[(x_, y_) for x_ in x for y_ in y]

Another possibility is to use itertools.product which is especially useful for a large number of lists.

R: Expand.grid

In R we can use expand.grid to get a data.frame of all pairs:

expand.grid(x=x, y=y)

In this expression the x and y to the left of the = sign are the names of the columns in the dataframe. I find this really useful when creating plots of functions with ggplot2 to try every possible combination of parameters. You can also do this manually using rep; for example:

data.frame(x=rep(x, length(y)), y=rep(y, each=length(x)))

Python: More Complex List Comprehensions

What if we have a slightly harder problem: there’s another vector z = [6, 7] and we want to take every aligned pair from y and z and combine it with every possible x. So the output should be [(1, 4, 6), (2, 4, 6), (3, 4, 6), (1, 5, 7), (2, 5, 7), (3, 5, 7)]. This is straightforward with list comprehensions by combining y and z with zip:

[(x_, y_, z_) for x_ in x for y_, z_ in zip(y, z)]

This is one of the strengths of Python list comprehensions, it’s easy to extend with different variables and with functions acting on those variables.

R: tidyr expand

I don’t know how to do this harder task in R with expand.grid, and so I would have to fallback to the long way with rep. This would be

data.frame(x=rep(x, length(y)), y=rep(y, each=length(x)), z=rep(z, each=length(x)))

This gets quite tedious to write!

However there are neat ways to do this with the tidyr package, and in particular with the expand function. You can solve it like this:

expand(data.frame(y=y, z=z), x, nesting(y, z)

This gets all combinations of x, y, and z, providing that the pairs y and z are in the data.frame from the first argument.

Note that expand is not referentially transparent, and the variables rely on their names in the data frame (as is typical of tidyverse functions). For example expand(data.frame(y=z, z=y), x, nesting(y, z) will reverse the order of the last two columns.