Statistics in the Triad, Part VI: The Story as Unit of Observation

If you had asked me a year ago to identify the primary unit of observation in a SenseMaker project, I would have said, without much hesitation, it’s the story, of course. When I started writing Part IV in this series on Confidence Regions, however, I had to revisit that question. I knew what was typically collected from participants — stories, signifiers (dyads, triads), and multi-choice questions including demographics. And I certainly knew both the preparatory work that Laurie did, for example, in designing prompting questions and signifiers, and her subsequent analysis.

Unit of observation, however, has a somewhat technical meaning, in both the natural and social sciences. Here is an excerpt from the (already) brief entry in Wikipedia (linked above):

In statistics, a unit of observation is the unit described by the data that one analyzes…. A study may have a differing unit of observation and unit of analysis: for example, in community research, the research design may collect data at the individual level of observation but the level of analysis might be at the neighborhood level, drawing conclusions on neighborhood characteristics from data collected from individuals. Together, the unit of observation and the level of analysis define the population of a research enterprise.

How should we view stories, signifiers, data points in triads, and even the study participants themselves through these observation-colored glasses? I’m going to answer this question by looking again at the geological precedents that I discussed in the prior Confidence Regions post. Then I’ll come back to that opening question.

Point counting as data collection

The importance of the distinction between the levels of observation and analysis was very evident in the paper by Weltje (2002) from which I took a geological example to illustrate right and wrong ways to estimate confidence regions in a triad. The primary data in his example came from a method called “point counting.”

Sawed block from Llano Uplift, Texas; section from Little Sitkin Is., Alaska

A thin slice cut from a hand specimen (left, above) is mounted on a glass slide (right) and examined under a microscope with a precision stage that allows the thin section to be moved in fixed increments. As the slide is viewed grid-wise in both x- and y-traverses, the mineral grains that successively appear under the cross-hairs in the center of the field of view are identified and recorded. Depending on the size of the individual grains, several hundred such points would be counted.

The resulting data – n1 grains of mineral 1; n2 of mineral 2; and so on for typically 6 or 8 abundant minerals (or identifiable rock fragments), plus a catch-all “other” – are categorical, that is, they fall into mutually-exclusive descriptive bins without regard to any natural ordering.[1] These numbers can be converted to compositional data by dividing each individual count ni by the sum of all counts, and multiplying by 100 to get percentages. In order to show them in a triad, as in Weltje’s figures, any three can be re-normalized to 100%.

Grain vs. composition as unit of observation

As in the quote (above) from Wikipedia, there are several levels at which a geologist might analyze such data. Weltje treats two of these levels. His Model A is “the grain as unit of observation.” At this level we can ask how uncertain the composition of the hand specimen is. Given reasonable and testable assumptions of specimen homogeneity and stochastic independence of adjacent grains, point counting can be viewed as a form of Bernoulli sampling. As Weltje discussed in mathematical detail, this allows calculation of a confidence region for the composition of a single hand specimen from a multinomial generalization of the common chi-squared test. (A more accessible discussion is given by Xu et al. (2010), using data from experimental biology.)

Hence, multiple hand specimens can be compared in a ternary plot – each with its own confidence ellipse – and homogeneity assessed at various spatial scales, such as serial sawed slices, specimens along a single roadcut, or regional-scale sampling of a mappable unit (“formation”). Thus, Weltje’s grain-as-unit-of-observation Model A displays a confidence region around each data point in the ternary. In other words, the data from all observed grains in a single thin section are aggregated into a single point, which is then the focus of statistical analysis.

His Model B is “the composition as unit of observation.” This is the level of data analysis that I illustrated in Part IV of this series, with the two-panel figure comparing the non-rigorous “hexagonal fields of variation” and the formal confidence ellipses for both the total population and the geometric mean of the population. In this model, the data from all observed compositions for all hand specimens are aggregated into a single point (the mean), which is then the focus of statistical analysis.

Collecting and counting in a SenseMaker project

Imagine a criminology grad student who would like to interview witnesses of a particular event and ask them to recount what they saw and to signify their stories. With a prompting question and signifier designed to elicit a normative response, that would be a way to test the “homogeneity” — consistency, accuracy, veracity — of the witnesses and the student’s methodology. This is a scenario in which the first model (A) looks superficially useful, one in which the student would like to calculate a confidence ellipse for each witness (data point) in a triad. But it is inconsistent with the kind of undirected discovery for which SenseMaker is designed. More importantly, there would be no categorical data, no “point counts” to use in the chi-squared calculations, because the witnesses’ marks in the triad would be compositional from the outset. So, instead of trying to force Bernoulli sampling and a multinomial distribution on some unsuspecting data, I’m going to proceed by semantic rather than mathematical manipulation.

The perfect pair: a story and a hand specimen

Here are some commonalities or pairings in the geological vs. sensemaking approaches to displaying results in a triad, arranged from coarest to finest:
an outcrop/layer/formation = a cohort – both are regionally definable;
a hand specimen = a story – both are collected in the field;
a composition = a signifier – both appear in a triad, by calculation or touchscreen; and
a grain = a word – each is embedded in its specimen/story.

There are also pairings of agents that guide or define each of the above:
• geological processes (for example, sediment deposition or volcanic eruption) = the project participants who wrote the stories; and
• the field geologist = the analyst or practitioner.

In the geology era of my life — the left-hand side of each of these pairings — I have stood at a lot of outcrops; collected a lot of specimens; measured and plotted a lot of chemical compositions; and looked at a lot of mineral grains and probed them with not only visible light but also various charged-particle beams (yet more chemical compositions!). Occasionally, I needed to do some kind of calibration or cross-check of data or a method, and those times looked like Weltje’s model A, the grain as unit of observation. But the bulk of the time I was working with the composition as unit of observation.

That’s why I already knew about triads, when Laurie first asked me about them, and why the mathematical underpinning of her work was familiar. Even with different terminology and unfamiliar subject matter, I could generally follow what she was doing: objectives and cohorts were defined; prompting questions and signifiers designed; stories and responses collected; results analyzed and plotted; and occasionally stories were themed by a client’s subject-matter experts.

So what (is/should be the unit of observation in sensemaking)?

Remember that this is yet another post in a series on Statistics in the Triad. That would suggest I should trot out some quasi-formal answer along the lines of the opening quote from Wikipedia. Instead, having watched Laurie work with clients, I suspect that the correct answer is “all of the above,” that the unit of observation is whatever helps those clients listen to and understand their constituents, formulate probes, and take a best guess at next steps.

Looking now at the boldface bullets (above), however, one thing stands out by its absence – the individual word in the participants’ stories. Nowhere are the words in a story examined the way the grains in a rock may be examined.[2] Even the aforementioned story theming, which certainly requires reading stories, is more like a geologist looking at a hand specimen. An experienced eye can look at the rocks in the two photos above and think “granite” without needing to count anything, just as a reader can scan a printed story and grasp the content without having to deal (consciously) with each individual word.

Of course if you saw the handwritten text in the second photo (above) and anticipated “… stormy night,” you were probably surprised or puzzled or amused, or all three, when you got to “avocado” instead.[3] If so, thank you for illustrating my point: except in aberrant situations, sensemaking doesn’t require attention to words in a way that would make them a unit of observation.

Not requiring attention, however, doesn’t mean that we shouldn’t be giving it. I’ve said to Laurie for years that I am amazed at how much potential information is being left on the table with unexplored story texts, what I would now express as failing to consider the word as unit of observation. On the other hand, I completely take her point that there is only so much the community of practitioners can do and that people with the requisite analytical skills may just not (yet) be part of it.[4]

Full disclosure: I certainly don’t possess those skills. In fact, about all I can do is rattle off names like “latent semantic analysis” (see “natural language processing”) and “support vector machine” (see “machine learning”).[5] They are examples of tools that a knowledgable person might apply to examining the words, looking at the story “grains” under a sensemaking microscope, if you will. Whatever the tool, the words should be added to the list of units of observation. They surely have much to tell when we ask the right questions.

Meanwhile in the rest of the world…

It is a truism that the big-S websites — searching, shopping, socializing — not only look at the words we type, but they monetize them. This may be for their own benefit, as in “people who bought this also bought…”; or it may be by selling to others what they learn about us from analyzing our words.

Such analyses can give insight across a scale that a SenseMaker project could never encompass. Facebook has how many hundred million active users?… But the result is also exactly what SenseMaker was designed to avoid — someone else’s compilation of a third-person, this-might-be-what-they-meant inference and representation. Instead, practitioners help clients receive a message from their audience that is a first-person, here-is-what-we-think declaration and contextualization.

The logical extension of the foregoing is to combine these two approaches. I mean much more than simply doing textual analysis of stories collected in a project. My fantasy is that an entity that falls outside of the big-S websites, yet appreciates and has access to their data, would recognize the symbiotic value of combining narrative and counting. I’m thinking of a social-media analytics company like Crimson Hexagon. OK, let’s be honest — I’m thinking precisely of Crimson Hexagon.

A search of their website on “surveys” gives 10 hits, almost all of which mention the word only once, and all of which are dismissive. That’s OK. It may be a form of mission bias for them, but I’m dismissive of surveys as well. Anyone who understands SenseMaker probably is. What I envision instead is a project in which Crimson Hexagon chooses a sample population representative of social-media participants who commented on one of their clients. That population would become the target for a SenseMaker project, and they would tell stories about their experience with the client and signify them.

The resulting whole — combining narrative and counting — would surely be much greater than the sum of its parts. After all, if you want to be really sure that your broad-scale analytical results are accurate, you should test them, right? And what better way — arguably the only way — than to ask some of the participants to tell you directly, precisely, in-context, with no intervening inference? And on the flip side, if you are confident that a small sample of a population has told you precisely what they think, how much more valuable could it be if you could confidently extend that picture to a vast number whose stories and signifiers you could never hope to gather?


Weltje, G.J. (2002) Quantitative analysis of detrital modes: statistically rigorous confidence regions in ternary diagrams and their use in sedimentary petrology. Earth-Science Reviews, v. 57, p. 211-253.

Xu, B., Feng, X., and Burdine, R.D. (2010) Categorical data analysis in experimental biology. Developmental Biology, v. 348, p. 3-11.


  1. One of the most commonplace examples of this scheme is a recalcitrant child partitioning mixed vegetables into separate piles on the plate and counting in particular the number of distasteful items, say lima beans, that must be eaten (or otherwise disposed of). This also offers a learning opportunity, probably seldom realized, of how the closure constraint can produce an alarming increase in the percentage of beans as the more palatable carrots and corn are eaten.  ^
  2. At one level, this is unfair, because the grains must be counted in order to calculate the rock compositions that are then plotted in a ternary diagram, whereas the words are irrelevant when the storyteller places a finger on a touchscreen to indicate the “composition” of a signifier in a triad.  ^
  3. Although I think I thought of this word swap on my own, I can’t exclude the possibility that I ran across mention of Daniel Gilbert’s use of it in Stumbling on Happiness (2006, p. 7, Vintage Books). ^
  4. Amazingly enough, in the three weeks since I wrote the post on Confidence Regions, Laurie has learned of three practitioners — two clients, one fellow analyst — who have small groups of students working with them on analyzing story texts. Check back here to see if they are willing to be publicly identified at this time.  ^
  5. These names may not be helpful if you don’t already know what they mean. Kind of like the self-referential “cat (see feline), feline (see cat)” that you might encounter if you didn’t already know about the small, furry, carnivorous, mammalian house pet.  ^

The QED brain trust designing signifiers