• ## Statistics in the Triad, Part VIIIb: Binning, or Where the Data Are actually Concentrated

The first section, ‘Data distribution in a triad’, of Part VIIIa listed some of the quantitative methods for comparison of story data in triads, including confidence regions, point-counting of clusters, and smooth contouring. The last of these uses kernel density estimation (KDE) to calculate a probability density function (PDF) for the data. This statistical alphabet soup is an elegant way of showing where the data are concentrated and is also visually appealing because of the continuous nature of the contours.

Unfortunately, a storyteller in a SenseMaker project can intentionally place a story dot at the vertices or along the opposite legs of a triad (a relatively infrequent location for data in most ternary plots). An unexpected consequence of this freedom is that the subsequent mathematical steps in analyzing story data appear to introduce distortion in the KDE-PDF contours for such near-vertex, near-zero data. This problem is described in considerable detail in that prior post. Fortunately, for readers who don’t care about the details, the rest of this post can be read on its own.

### Turning a triad into a histogram

The discrete, discontinuous alternative to drawing smooth, continuous KDE-PDF contours is simply to put the story dots in bins and count them. And the easiest way to do that is to superimpose a regular grid on the triad. In other words, we can turn a ternary plot into a two-dimensional histogram, and if we add color-coding then we have a triangular heat map.

Here are examples from the website for ggtern, the R package written by Nicholas Hamilton for ternary plots and widely used in SenseMaker projects; the latest version (2.2.2, partly supported by QED Insight) now includes hexagonal and triangular bins:

This implementation has several noteworthy features:

• customizable color-coding for the binning scale;
• specification of a bin width along each leg, e.g., n = 5 corresponds to a nominal 20% grid (but with different meaning for tribins vs. hexbins, see examples above and the discussion at the ggtern link); and
• ability to display a calculated scalar value (e.g., mean age of respondents) for each bin (not shown above).

Nicholas eloquently captured the value of these additions for SenseMaker projects (and others):

There are some subtle differences which give some added functionality, and together these will provide an additional level of richness to ternary diagrams produced with ggtern, when the data-set is perhaps significantly large and points themselves start to lose their meaning from visual clutter.