# Generalizing the concept of median

Ram Rachum, March 2010. Back to homepage.

In a recent evening my mind started to wander and I was thinking about the concepts of average and median. I was starting to think that these are two animals of the same family, that they are in fact the same function just with different parameters. That evening I did some calculations with Wolfram Mathematica and realized I was right. I managed to generalize the concept of median, and out of that I developed a few concepts that may be useful for scientists and programmers: an n-dimensional median, a weighted median, and what I like to call the medimean.

A reference implementation of these concepts is available here, as are the examples shown in this article.

One thing I want to note before I start: I'm not claiming to have “invented” anything here. Most of the things here are known to statisticians. To be frank, I don't care much whether what I developed is “new” or not— The important thing is that these concepts are presented here in a clear, friendly manner.

## A quick refresher about averages and medians

Let's say we are analyzing the salaries of a group of 10 people who work in a small factory. There are nine employees, and these are their salaries, in ascending order: • `18,000`
• `20,000`
• `22,000`
• `24,000`
• `26,000`
• `28,000`
• `30,000`
• `50,000`
• `80,000`

We see that most of the employees are in the `18,000``30,000` range, but there are a couple that earn a lot more, `50,000``80,000`, which are presumably the managers.

We can say that the average salary is `33,111.10`. This average is some sort of attempt to summarize all the salaries in one concise number, instead of having to think about a list of 9 different salaries. As if to say, “the employees here make `33,111.10`, give or take.” Or, “the average employee makes `33,111.10`.” As you probably know, the average is calculated by adding all the numbers and dividing by how many numbers there are.

This concept of “average” is not suitable for every problem. For instance, in our salaries example it won't be suitable. If you will go to the factory where all the 9 employees are busy working, and say “the average employee in this factory makes `33,111.10`,” you are likely to stir up some anger (and possibly a fistfight), since 7 out of the 9 employees earn a salary lower than this “average employee,” and would probably wish very much to make as much money as this fictional person does.

What happened is that the two bosses, who earn a lot more than the other guys, have skewed the average in their direction. Even though there are only two of them, their salaries are big enough to move the average upwards considerably.

For analyzing these kinds of systems, the median is a better tool. The median is defined as “the number that half of the members are bigger than, and the other half are smaller than.” (If there is a whole segment of numbers that qualify, the average of them is taken to be the median.)

Said more simply, the median is the value of the middle member. If you have an odd number of items, to say `2⋅n+1`, than the value of member `n` is your median. If you have an even number of members, to say `2⋅n`, then you take the average of member `n` and member `n+1`.

In the factory example, the median salary would be `26,000`. This is a value that is closer to the salaries that most of the employees make. If you'd go to the factory and say, “the median employee here makes `26,000`,” you are less likely to be beat up. (Though you may have to spend some time explaining what a median is. Good thing you took this refresher.)

## Finding more workable definitions for average and median

In the refresher we gave the usual definitions for average and median. We could see that they are calculated in different ways. Now I am going to redefine both of them, so their new definitions will be more comparable to each other. (And of course, still equivalent to the old ones.)

`Median(S) = argxmin(∑s∈S│s-x│1)`

`Average(S) = argxmin(∑s∈S│s-x│2)`

(The meaning of `argxmin(Whatever(x))` is "The number `x` such that `Whatever(x)` is as small as it could be.")

Now I will try to explain these definitions.

When we are looking for a median/average/whatever, we want a point that is fairly close to all the members in the sequence. In other words, a point for which the sum of distances from each member is the lowest. This holds true for both the median and the average. The critical difference between them is how we define distance.

When we take the average, we actually care about the Euclidean distance squared. (Even though we may not know about it when we calculate the average in the normal, simple way.) When we take the median, we care about the Euclidean distance itself, not squared.

What does it mean, in the case of average, that we care about the distance squared? It means that big distances have a lot more effect on the final number than small distances. Which in turn means that far-away members, like our two bosses, have a big influence on the average. So the average goes a considerable amount in their direction.

In the median though, we care about the distance itself, not squared. So far-away members cease to have the big influence they had on the average. As a result we get a number that is closer to the common employee. Here are the definitions repeated:

`Median(S) = argxmin(∑s∈S│s-x│1)`

`Average(S) = argxmin(∑s∈S│s-x│2)`

You can see that they are the same, except for that exponent in the middle. For the median the exponent is `1`, and for the average it is `2`. (This is the whole business of distance`^1` vs. distance`^2` we talked about.) So average and median are actually the same animal, just with a different parameter!

## 1. The medimean This leads us into the first “invention”: The medimean. The word medimean is a fusion of median and mean, (which is another word for average). This is the definition of the medimean:

`Medimean(S, i) = argxmin(∑s∈S│s-x│i)`

The medimean takes a parameter `i` which we can call “the distance exponent”. When given a distance exponent of `1`, the medimean produces the median. When given a distance exponent of `2`, the medimean produces the average.

So the medimean is a continuous and natural transition between the median and the average for a given sequence of members:

What can the medimean be useful for? It may be useful when you want some sort of compromise between the average and the median. For example, if you are interested in something like a median, but which moves just a little bit in the direction of far-away elements, if there are any. In that case you may want to use a `1.1`-medimean for your analysis. ## 2. The n-dimensional median

The median is generally thought to be a tool that is only relevant in 1-dimensional analysis. This is because the conventional definition of median is based on ordering of numbers, and n-dimensional numbers have no ordering. But since our re-definition is not based on ordering, we could use it for n-dimensional spaces as well!

Here's our definition for median again:

`Median(S) = argxmin(∑s∈S│s-x│1)`

We can use this definition as-is on n-dimensional vectors.

Of course, we can combine this with the previous generalization, and use an n-dimensional medimean. Here's the definition of medimean again:

`Medimean(S, i) = argxmin(∑s∈S│s-x│i)`

We can see that this definition can be used as-is with vectors instead of scalars.

Here is a nice plot showing the medimean of points in a 2-dimensional world: (I would have liked to demonstrate this with 3 dimensions, but the plot would get too complex.)

And now, a final super-duper generalization:

## 3. The weighted n-dimensional medimean

We all know there are weighted averages. Now it's time for weighted medians. Here is the weighted n-dimensional medimean:

`Medimean(S, w, i) = argxmin(∑s∈Sw(s)⋅│s-x│i)`

We are passing in a weight function `w` that gives a weight for each member.

And here's a plot to demonstrate. A dot's size signifies its weight: ## Reference implementation

Here is a reference implementation in Mathematica: medimean.nb

Keep in mind that this implementation is not efficient. It's just for demonstration. If you want to contribute an efficient implementation, mail it to me and I'll post it here. Also, if you wish to provide an implementation of these concepts in your favorite language, mail these too and I'll be happy to post them.

There's a little gotcha here. For the medimean to really touch the median at `i=1`, there needs to be either (a) an odd number of members (b) a very big number of members. Otherwise it will only get fairly close to the median and not touch it completely. This gotcha happens because of the average taken when there are two “middle members” in a sequence with an even number of members.