Projects related to cartographic theory and techniques.

"Which map should I use?"
Comparing cartograms and choropleth maps.

I've been a fan of value-by-area cartograms ever since I discovered worldmapper, which has a cartogram for just about every world statistic you can imagine (i.e., "Molluscs at Risk"). However, people that care about this sort of thing are skeptical about the usefulness of cartograms as a visualization tool (see this scholarly source).

One widely-used alternative is the choropleth map, which recolors or reshades areas based on whatever metric it aims to visualize. Below, a choropleth map and a cartogram:

There are a number of papers out there that take a single dataset, make a cartogram and a choropleth map from it, and ask participants to make judgements about the underlying data after seeing one of the maps. From this type of survey, these papers tend to make sweeping declarations about the value of cartograms and choropleths.

However, while both visualization types have their drawbacks, those drawbacks don't always affect the map's efficacy in the same way. Choropleths are problematic because brightness or shading is a very difficult visual mark for our eyes to deal with and get information from. Cartograms are problematic because as the data gets highly variant, it's easy to lose sense of the geography (thereby negating any reason to visualize the data with a map).

Thus, these simple "which is better" surveys don't really capture the whole picture. I decided to (try) to develop a slightly more nuanced understanding of when to use which type of projection. First I made some simple maps (like the ones above) and then asked a bunch of people some really simple questions about these maps. Then I wrote it up in a long and obtuse manner. You can find the paper here.

For those who (quite understandably) don't want to read that, the result was that at low levels of data variation, people were really good at understanding choropleths but only decent at understanding cartograms. However, at high data variation, people were really bad at understanding choropleths, but still understood cartograms about as well as they did before.

The paper lays out the exact mathematical limit that makes one projection preferable to the other, but the idea can be applied without it - if you have a highly-variant dataset, a cartogram is a better method; otherwise, a choropleth will suit your data better.

If you're looking for a little more mathematical rigor without slightly less-obtuse language than the paper, here's a more moderate poster about the research.

Oh, and here is an easier-to-read PDF.

"How are cartograms made?"
A look at the Gastner-Newman Algorithm.

What is a cartogram?

The term cartogram is sometimes used fairly generally to describe any map that displays some trend or metric on geographical coordinates. However, most of the time the term is used to reference value-by-area cartograms, where instead of land area, each country or state is sized by some other metric.

The most common example is this map, with the countries sized by population:

The algorithm used to create this was developed by Michael Gastner and Mark Newman, and described in their paper, "Density Equalizing Map Projections." The process was then popularized by the site, the source for the map above.

Here's a quick conceptual understanding of cartograms and their purpose. Take a plane x,y. You also have a metric z that is variable over the plane x,y. It can be represented as z(x,y). z thus has a certain value at every infinitely small area dx,dy. It is also possible to have a discrete z across x,y.

One way that you represent these distributions in two-dimensional space is with color. However, instead of using color, you can resize each single dx,dy unit based on the value of z in that unit.

The mechanism for doing this is not simple, but it does have powerful implications in information visualization. Here's a computationally 'lite' version of what the Gastner-Newman algorithm does to create cartograms. This also illustrates one of the applications of cartograms - the election results below are more meaningful when the states are sized by electoral vote count.
For the more mathematically inclined, here's a slideshow about the process for creating these cartograms.

"How do we fill space?"
It's not as simple as you think.

The goal of the project: to write a space-filling algorithm using hexagons of different sizes. While it was an engaging project in itself, the motivation behind it was perhaps more interesting.

Waldo Tobler (1963) stated that "a basic truism of geography is that the incidence of phenomena differs from place to place on the surface of the earth." Cartograms are one way to show the uneven distribution of a phenomenon on this surface.

There are many criteria that can guide the creation of successful cartograms. Simply, though, good cartograms must be:
  • not unnecessarily complex
  • accessible to viewer
  • able display a large quantity of information
  • interesting to the eye
However, there's a natural tension between criteria 3 and 1 (and 2, to a certain extent). So, the goal is to strike a balance between these two criteria.

In a slightly different vein, consider the geodemographic theory that people behave in accordance with their environment not as it actually is, but as they believe it to be. Areas with greater diversity have more variable data for many of the metrics we might care to visualize. So, the goal is to create a visualization that is capable of showing more detail in highly-diverse areas while reducing detail in more homogenous regions.

Now, how might we determine broadly where those diverse areas are? There's a lot of science behind this, but coastal areas tend to have higher diversity than non-coastal areas. They also tend to have higher population density, meaning lots of data in small spaces of the map. So, we want a map projection that simplifies non-coastal areas (reducing map complexity) since the data there is less variant, but allows us to see coastal areas in more depth. Consider the simple example at left.

Here, the second map more closely mimics the original map, and it uses fewer squares. However, squares are pretty boring. Hexagons are one of the most space-efficient shapes, and can stack more easily than squares. So, I decided to write a space-filling algorithm using hexagons.

However, it turns out that the things that make hexagons so effective (for example, 120° angles) also make them very difficult to work with. The first problem I encountered was how to define the space in a way I could work with easily. The best way that I thought to do this was to define the shape by a number of vertices, and then connect those vertices with lines. I started working with the shape of Australia because it was small enough to not be overly complicated but detailed enough to still provide interesting results.

The next task was to find the largest possible space inside the shape that I could place a hexagon. I first defined the coordinates for the center of the shape as the midrange of the array of x-coordinates and the midrange of the array of y-coordinates of the shape. The reason I used midrange and not mean or median is that the vertices are not spaced evenly around the continent, so the mean and median would be biased towards areas with more detail (more vertices). Once I defined the center of the shape, I constructed lines to each of the vertices and found the smallest line. I then defined that line as the radius (center to vertex) of the central hexagon.

After defining the radius of the hexagon, I drew the hexagon as a series of points defined by the radius and the center (n and x,y).

However, this is clearly not the largest possible hexagon that can fit in this space. I then defined an expandpolygon() function which incrementally increased the size of the hexagon and moved the center accordingly. I found the distance from the top, bottom, left, and right edges of the hexagon, and then found the smallest and largest of those distances, and then expanded the polygon a little bit accordingly. (Originally I had a problem with overshooting the edges, but then I added parameters for when the polygon could expand.) This problem would have been simpler if the smallest side was opposite the largest side, but this was not always true and I had to account for all combinations of the most and least room. For example, if the most room to expand was on the right, move the center of the hexagon to the right by one and increases the radius enough so the left side of the hexagon stays in the same place.

The next task, placing more hexagons, was different because instead of using the original coordinates as boundaries, the shapes had to fit where there weren't already hexagons, so had to take a portion of the array defining the outside shape but use the sides of the hexagon for other boundaries, but from there the function to expand the hexagon was similar.

If desired, the code that corresponds with the above explanation can be downloaded here. To someone who doesn't know what they're doing, it's long and impressive. To someone who does, it's probably nonsensical and unwieldy. I wouldn't know - I belong to the former group. Parts of it are commented for your viewing pleasure. Other parts are not. Sorry.