The Quest for the “Number 12”

Posted on May 2, 2015




For a while I was a panel member of a University Ethics Committee (I know, how cool is that!).


Our specific panel dealt with studies at the fringes of social sciences, medicine/epidemiology and psychology, and as a result many of the studies we had to deal with were qualitative in design. Now this blog post is not going to slowly, or more rapidly, disintegrate in a rant about qualitative science…although I have dabbled in its arts I simply do not know enough about it to contribute something useful. Instead, I’d like to talk about the number 12…
The source of all knowledge, Wikipedia, tells me (and by extension you) a couple of interesting facts about twelve (link): It is the largest number with a single-morpheme name in English, a group of twelve things is called a duodecad, it is apparently often used as a sales unit in trade, it is a superfactorial (eg the product of the first three factorials), it is the atomic number of magnesium, the human body has twelve cranial nerves, Force 12 on the Beaufort wind force scale corresponds to the maximum wind speed of a hurricane, The 12th moon of Jupiter is Lysithea, The Western zodiac has twelve signs, as does the Chinese zodiac, In antiquity there are numerous magical/religious uses of twelves, The biblical Jacob had 12 sons, who were the progenitors of the Twelve Tribes of Israel, while the New Testament describes twelve apostles of Jesus, there are twelve Imams, legitimate successors of the Islamic prophet, Muhammad, the sun god Surya has 12 names, and of course there are 12 days of Christmas! In fact, there are 12 F keys on your keyboard. There are many more things, so have a look if you’re interested (all you need to do now is steer any pub conversation to the number 12 and you’re sorted…you’re welcome!). Anyway, so twelve is an important number it seems.

And indeed, during these university ethics committee meetings when I asked about the sample size of the qualitative studies, often the reply was something along the lines of “A minimum of twelve is needed..”

Yes, I know!! Twelve again!

It struck me as a pretty large coincidence that all these studies just happen to have overall statistics so similar, that the calculations would all result in twelve. Neither do I expect that the sample size of a qualitative study is based on the number of apostles Jesus had, or that the scientists counted the number of F keys on their keyboard in an attempt to get some clues about the sample size for their studies. It was clearly not just a made-up number because that wouldn’t have been the same for everyone. It would have been much more likely to have been ten, or maybe fifteen or twenty. Unfortunately, many of the research studies were represented by students, and as a result the general replies I got to the question “Why?“, were variations to “That’s what my supervisor said“, “That’s what we (unspecified) always do” and “Other papers did this as well“. So that was not very helpful…

I still wanted to know, and I would imagine so would others. Not to mention, if you are writing your next research proposal or you are finishing up your PhD thesis I would say it would be pretty useful to be able to defend the numbers. So I went on a quest (again you are welcome). A bit like the hobbits in Lord of the Rings, but without the ring…and without the hobbits, orcs, Sauron or a vulcano. Pretty much without excitement really, coming to reflect upon this quest.

…it turned out not to be as easy as I had hoped, and with Pubmed not giving me a straightforward answer I had to turn to the blogosphere. Luckily the majority of the blogs and Q&A websites discussed that the minimal number of participants could not be easily set and differed depending on a great many factors. So that was quite re-assuring. Nonetheless, the number 12 kept popping up regularly. Many clicks further I finally managed to track down the paper on which it was all based, and which was the only one quoted (if someone actually quoted something) to defend the 12. The end of my quest was near…
In 2006 a paper was published by Drs Guest, Bunce and Johnson in the journal Field Methods with the hopeful title “How Many Interviews are Enough? : An Experiment with Data Saturation and Variability” (link). And not just that! The following sentence was included in the abstract:

“…they found that saturation occurred within the first twelve interviews…”

I found it! The holy grail of qualitative sample size calculations!


So for the benefit of future applications to medical ethics committees we’ll have a look at what actually happened. The rationale for the study was a great premises; the authors were getting a bit fed up with the use of theoretical saturation to determine sample size in qualitative work without any guidance available on what this may mean numerically. In fact, they reviewed 24 text books and seven databases, but found nothing useful on the matter. Recommendations ranging from six to 36, but without any justification for the numbers. So they decided to use data they had at their disposal and get a useful number. Or in their own words, obtain a “general yardstick“.
The study they used examined perceptions of social desirability bias and accuracy of self-reported behaviour in the context of reproductive health research. In other words, they wanted to investigate the impact in such studies (of, for example, HIV) of participants giving socially desirable answers rather than the truth. I suppose it’s important to know that they used semi-structured, open-ended interviews to see how women talk about sex and their perceptions of self-report accuracy. They needed people in the highest risk group for HIV so decided to recruit sex workers in two large cities.

Oh yes, did I mention the groups of women lived in Nigeria or Ghana……?

For those “in the know” about these things, the interviews were then analyzed using thematic analyses. The authors identified the minimal sample size required as the point where theoretical saturation was reached. That’s a bit academic, but essentially means that when new interviews do not add to the knowledge already gained from the previous interviews; or in other words “…the point in data collection and analysis when new information produces little or no change to the codebook.” To do this, they documented the progression of theme identification (with a theme being kind of like a subject mentioned by participants; and thus this the list of themes – the codebook – changes if new interviewees say something the previous ones didn’t report) after each set of six interviews, for a total of ten rounds – or 60 interviews.

A total of 109 content-driven codes (I suspect this is a theme) were identified and mentioned in at least one interview, and of these, 80 (73%) were identified after the first six interviews. After the next round (12 interviews) 92% of all codes were identified and after round 3 97%. Inclusion of the interviews from the second country (eg Nigeria) did not change much. Additional analyses about changes in codes were analyzed, but I wont go into this here. Essentially conclusions did not change. So what do we know from this study? The main conclusion of the authors is that after 12 (do you see what is happening…there it is!) interviews 92% of the total number of codes for Ghana and 88% of the codes of Ghana and Nigeria were identified, with new themes only occurring infrequently after that. So there you go. It took me a bit of searching but I found the basis for the number 12. And it’s done in quite a neat way (not including the number of F keys on your keyboard at all). If you weren’t aware of this before, you now know why you put the number 12 in your grant application…

…there is a small issue though. Is your study on social desirability bias in reproductive health research by any chance? Oh, is the study in a group of sex workers from, say, the cities of Ibadan or greater Accra? If not, you may have an issue…
…or as pointed out by the authors “it is hard to say how generalizable our findings might be” in this “homogeneous population” with “fairly narrow objectives“. The authors warn against assuming that six or twelve interviews are always enough. I find it quite intriguing that the number 12 is the result of the authors’ choice to do this in batches of six interviews. Had they used batches of five, I strongly suspect the minimal sample size would have been 10.
Much easier to remember that!!
Anyway, unfortunately this implies it isn’t quite as simple as we would have hoped and unfortunately we need to engage our brains when designing a new study. Also, “my supervisor said that would be ok” and “we always do this” really is not good enough (it never was by the way, just to point that out). To finish with a quote from the paper: the authors state they would prefer you not to just copy these numbers for “quick and dirty research”…



  …I myself, I am now looking forward to the next research application with a sample size of 12…