Let's eat some vegetables together, y'all!
Anytime you find yourself saying "X is great today. What caused that?" and your investigation involves gathering memory-based data from people who know X is great today and benefited from X or were involved in making X what it is today, the your investigation can be shaded or spoiled by Halo Effects.
As I was reading the subheading for this MIT Sloan Management Review article, my potential Halo Effect proximity alert was going off: Top Performers Have a Superpower: Happiness. A large-scale study found that well-being predicts outstanding job performance.
The researchers measured job performance by measuring the percentage of their sample that received awards based on their job performance. So far so good: if the standards for giving the awards are somewhat objective and consistent, then we have a somewhat objective and consistent measurement tool for the thing we're trying to measure.
Here's where the researchers had a choice that risked involving the Halo Effect: do they build a sample of just the award-winners and then interview them to search for patterns? Or do they take a different approach?
If they built a sample of just the award-winners and then interviewed them, they would very likely get garbage results. They would be gathering memory-based data from people who won an award and benefited from that award and were involved in winning the award, and so those people's recollection of what they did that might have caused them to win the award will be suspect.
The researchers did not take that approach. They selected a sample and administered a happiness assessment to the entire sample. Then, 5 years later, they looked at which members of that sample had received a job performance award (and what kind of award).
The method they used is not going to be spoiled by the Halo Effect. It might be spoiled by other things, but not by the Halo Effect.
Here's a drawing I made showing what's going on with the Halo Effect:
This shows a simple system where we have defined what we think is a superior/desirable result. Then we have found every example we can find that exhibits superior/desirable results. That's our sample.
Then, we've studied just this sample and found that members of the sample either do A or B, but they never do both. (Like one flip of a coin -- you get heads or tails but not both.) Our measurement finds a lot of A. Twice as much A as B, in fact! That's yooge! Seems like A pretty clearly causes high performance!
Because of how we've sampled and measured things here, we do not notice that in the entire population, the following is happening:
- There are 9 members of the population doing A.
- There are 4 members of the population doing B.
- Across the entire population, there's at most a 4 out of 13 chance that A causes (or is correlated with; we won't get into correlation/causation here) what we have defined as superior/desirable.
- Across the entire population, there's at most a 2 out of 13 chance that B causes what we have defined as superior/desirable.
- Across the entire population, there is a 5 out of 13 chance that A fails to cause what we have defined as superior/desirable.
- Across the entire population, there is a 2 out of 13 chance that B fails to cause what we have defined as superior/desirable.
So why is there twice as much A within our superior/desirable sample as the general population? And why is A more than twice as likely to fail to cause a superior/desirable result?
If we try to understand why, we might do one of two things. One of these things is cheaper and faster than the other:
- The cheaper faster method: ask the members of our sample if they did A or B.
- The more expensive method: ask all members of the entire population if they did A or B.
If A/B are subjective in nature and admitting to doing A is more socially desirable, we have another problem on our hands because it will create cognitive dissonance for people to admit to doing B. (Shorthand: A = socially Awesome behavior, B = socially Bad behavior)
To cook up a whimsical example, let's imagine that A is being really encouraging and supportive of your employees, while B is being very harsh and demanding to your employees. In this simple system, being very encouraging and supportive of your employees is twice as likely to fail to result in superior/desirable results! Yes, those who achieve superior/desirable results are twice as likely to say they were encouraging and supportive of their employees. Nothing to dispute there. But what kind of understanding about the value of being encouraging and supportive of employees would we walk away with if we talked to all 13 members of this system? Wouldn't being encouraging/supportive of employees seem like a more risky way to achieve superior/desirable results? (We'd also wonder what would happen if half of companies tried B rather than 4 out of 13.)
Again, this is a radically simplified system. What if, in reality, some of the members of the system did A or B and also threw in a little C but didn't think C was important enough to talk about when we asked them what they did? Or what if some other factor in the system -- a factor that is invisible to both us the researchers and the members of the system -- rewards those who do C?
Or what if, in reality, all 13 members of the system did both A and B, but when we asked them what they did, most of them had an easier time remembering A (you'll recall that A is socially Awesome behavior) and therefore bucketed themselves in the A bucket instead of the B bucket? Or what if we had asked the question in a way that forced them to choose between A and B when in reality they do both A and B?
The most accurate and honest things our researchers could say after they've done this hypothetical study on this radically simplified system are:
- When we look at the 6 best-performing companies, we find that 4 of them are encouraging and supportive of their employees.
- When we look at all companies, we find that there's another approach that seems more likely to make your company into a top performer: being harsh and demanding of your employees. Only 4 out of 13 companies tried this approach, but 2 of those 4 became top performers, while 5 of the 9 companies that were encouraging and supportive of their employees failed to become top performers.
- We have done our best to be impartial researchers here, but we are also humans who want to live in a world where employees are encouraged and supported. We'd like to see more companies pursue this approach, but we have to admit that our study finds the encouraging/supportive approach to be the more risky approach.
Again, reality is way more complex than this radically simplified system. In reality, as our hypothetical researchers report their findings they would also humbly gesture at the many potential factors their study does not consider and suggest how future research efforts might build upon their study's findings and these other factors.
The study the MIT Sloan article references (https://link.springer.com/article/10.1007/s10902-021-00441-x) seems to avoid these problems:
- It's using a more objective measure of superior/desirable results.
- It's sampling the entire population, not just the members of the group who achieved superior/desirable results.
- It measured the input before the result was known rather than asking those with superior results to use their memory to describe the input they think produced the now-known superior results.
I'm sure it's not a perfect piece of research, but it's a nice way to reflect on how the Halo Effect functions. It's also not a model for the kind of small-scale research that I advocate because it's longer and more expensive than we can usually manage.
- The most-cited Halo Effect study is worth a read: https://deepblue.lib.umich.edu/bitstream/handle/2027.42/92158/TheHaloEffect.pdf
- As is the Wikipedia article: https://en.wikipedia.org/wiki/Halo_effect
- As is this book: https://amazon.com/dp/B000NY128M