Normalising Salary


December 24, 2020

Salary ranges come in many forms; how can we convert them to a common form? A first approximation is to annualise them; it ignores the difference between full-time, part-time, and temporary work.

The other question is how to pick the range, for jobs with a bery large range. I started with the minimum because the maximum is often an inspirational nubmer (especially in commission sales roles).

The way I approached this was:

For example by looking at the data I can see for Australian jobs annual salaries should be above $10,000. Daily salaries are above $100 and hourly salaries below $200; between $100 and $200 it’s ambiguous depending on the kind of role. But below $100 it’s unambiguously hourly. This approach could be applied to different markets and currencies I’m less familiar with.

I used the TDD approach to parsing salary, which allowed me to improve it and the tests caught some regressions I would have introduced.

After removing the out-of-band result and annualising I got a reasonable result:

Distribution of annualised salary

I undoubtedly removed some results that are valid, or that could be corrected, but this was an effective way of getting a lot of the valid data with a little work. I’ve got a Notebook showing the approach (raw).