Noise: A flaw in human judgement by Daniel Kahneman, Olivier Sibony, and Cass Sunstein tells a damning story about how much variability there is in the assessments we make as experts in our field. They bring in examples ranging all around professional judgements - medicine, legal cases, laboratory assessments, hiring, forecasting, grading term papers, etc. etc. This noise has serious implications in all these arenas - false positive and false negatives cost time, money and lives. And while people often think that these variations might “balance out” the costs certainly do not.
The authors do an extensive review of the kinds of noise that they found in across the research into this arena. They also highlight that while bias (consistent error in one direction) is also costly, it is not the focus of the book. Something they highlighted very early: reducing error due to noise and reducing error due to bias have the same statistical effect on the overall errors in a system. And just as important - noise is expensive and unfair, just like bias. So what kinds of noise did they see? And can we do anything about it?
The authors identified three types of noise in our judgments. 1) level noise: We have different “severity” measures so that in a given situation different assessors are more or less "strict” in their readings. Anyone who deals with the courts are familiar with “lenient” judges or “harsh” judges. There is little agreement on what 7 out of 10 actually means. 2) pattern noise: Across a set of cases, individuals will differ in their pattern of judgment meaning that they might be more or less forgiving in some scenarios compared to other judges. There is disagreement on the relative ranking of cases, based on the particular judge’s worldview or other factors. This pattern noise is seen as stable over time. 3) occasion noise: One of the strangest types of noise is the variation in judgments of the same cases from time to time. This isn’t because there is additional information, it is entirely based on the occasion, and the examples are both sad and funny. Maybe your favorite team won (lost), so you are more (less) forgiving. Assessments might be different if people are hungry or just starting their day.
Throughout the conversations about the types of noise, the authors brought up a number of internal mental models that reinforce the noise in our judgments. For example, we look for patterns in data and “the story,” and when we find a pattern that satisfies our brains, we stick with that pattern, seeing only elements that confirm our first impression. Even more critical to this is that different people will pick up different elements of the story and see different patterns. Or you have the example of experts in their field. They have developed a skill and reputation for making their assessments. They’ve been rewarded for “being right” and see little reason to assume their assessments are off, so they must be right.
“is it really that bad?” - This question that arises in reading the book. Sure, we all make little mistakes or changes to our assessments. But surely, as experts, surely we are pretty good at what we do! The authors make it pretty clear from all of the research they site and summarize that this is not the case. In some instances, the levels of noise are barely above random chance. And even when there is agreement, the level of correlation is quite low from a statistical perspective. In many cases, the extent of the noise isn’t obvious until something tragic causes it to become a topic. Or if an organization is concerned and does some form of a noise audit - a controlled test of experts across cases in their arena. (more detail in the book)
But even without the audit, why don’t we do anything about it? Isn’t it obvious that there is a problem? It isn’t, and this seems to be a primary reason the authors put this book out into the world. They do provide some suggestions and guidance on how to reduce this kind of error. They even talk about some of the blowback in situations where error reduction has been potentially taken too far. The range of noise mitigate ideas include: Judicious use of algorithms that do a “mechanical” calculation and assessment, based on the data will have no noise - there may be bias based on how the algorithms are designed (or trained in the case of machine learning. Educating people on what the various levels of intensity mean and how these should be used to gain consistency. Understand what the “outside view” or nominal case would be and then use judgment to adjust from there. Raising awareness of the likelihood of occasion noise. Break more complex assessments into independent tasks, in an attempt to limit confirmation bias from coloring ones assessment. Aggregate judgments from multiple, independent experts. The authors talk about this collection of techniques as decision hygiene, reflecting similarly on physical hygiene. These are good practices which help to reduce the likelihood of noise creeping into our judgments.
I even saw some examples in action where well-intentioned interventions might end up causing more noise when they don’t consider the impacts they might have on the people that have to work with them every day. One suggestion for reducing noise is to establish more rules to remove the need for human judgment. These work great until the context for the rule gets lost and either the rule is applied blindly OR people find workarounds because it doesn’t make sense. The other direction here is to set standards - but standards by default encourage human judgment, and the book makes it clear that this will generate more noise. The authors suggest that these issues can be mitigated on both sides by adding clarity and context around rules - maybe even where they stop applying. And for standards, adding examples and clarity there can help make them generate less noise.
This example about rules and standards reflects back across the whole book. While there is noise everywhere in our judgments, it is not a simple snap of the fingers to fix. I really liked that the book brings this into stark relief. And I suspect I am going to start wondering about this more and more in the coming days.