Skip to main content

Verified by Psychology Today

Intelligence

How to Achieve Minimum Necessary Rigor

Excessive rigor can impede progress in research in applied settings.

Key points

  • Applied research, such as system evaluations, does not require the same amount of rigor as academic research.
  • Excessive rigor can actually be a drawback, driving up expense and adding delays.
  • Excessive research rigor may discourage sponsors from conducting any evaluation of their systems or programs.
  • Applied researchers can simplify the designs and the statistics they use to be more effective.

This essay is a collaboration with Robert Hoffman and is based on a report by Klein et al. (2021) that was prepared for the Defense Advanced Research and Projects Agency.

This essay is about conducting research in applied settings: e.g., government-sponsored projects to determine if a specific product, such as an Artificial Intelligence system, is cost-effective. This section is not about performing experiments in order to publish the results in journals. The nature of experiments, and the criteria for rigor, are different in these two contexts.

Rigor is critical for getting research published; lack of rigor is often grounds for having a manuscript rejected by the journal editor or the reviewers. The more rigor, the better.

But in applied settings, where the research is judged on the basis of how well it answers the question—typically a question about the value of the treatment or the technology—you want an appropriate degree of rigor. More rigor is not necessarily better. It can even be worse.

More rigor can be worse if it creates additional expense and delayed outcomes. The delayed outcome issue can be seen as the Consumer Reports syndrome: If you can read about it in CR, you can no longer purchase it because that brand or version has been discontinued. The time needed for Consumer Reports to carefully conduct its tests has made the results obsolete.

Excessive rigor can lead to another problem—a tendency to overcontrol the variables and make the tasks more artificial and context-free. Researchers can more easily add rigor to toy problems and laboratory tasks than to realistic types of tasks. The trap here is stripping the context away until you get findings that don’t apply to the sponsor’s needs.

We might refer to these problems as “rigor mortis.”

Rigor mortis problems may actually be discouraging government sponsors from conducting evaluations of new technologies, and that is unfortunate because fielding untested systems can create all kinds of risks.

I was once sitting in a meeting reviewing a new military mission, and someone stated that the program would need to include an evaluation component. Another more senior person responded that the military no longer seemed very enthusiastic about research and experiments anymore. This statement came as a surprise. I asked why and was told it was because of many experiences where the research was too expensive, took too long, and provided answers that were obsolete by the time they arrived.

In short, the rigor mortis problem.

Here are some suggestions to achieve Minimum Necessary Rigor (MNR):

Minimum Necessary Rigor (MNR) means fewer control conditions and more straightforward statistics in order to reduce or eliminate excessive rigor, expense, and time. And while we are cautioning about the problems of excessive rigor, we are aware of the problem of insufficient rigor—guidance that cannot be trusted or has to be so qualified as to be useless.

1. Stop calling the research “experiments.”

This term carries a lot of baggage for increased rigor. Instead, call it “pilot studies.” The research team can claim that experiments can be conducted later on if warranted. But in the meantime, a streamlined project can get underway. Another option is to call these efforts “Discovery/Exploratory Experiments.”

2. Place the emphasis on discoveries and learning.

So, avoid large-scale events which are not conducive to learning. Conduct scale pilot studies rather than large, complex factorial experiments. Pilot studies can be conducted with as few as 10 participants and can be redesigned on the fly as you learn what works and what doesn’t.

3. Use natural tasks.

There is usually pressure to resort to artificial tasks because they are easiest to design and present, but the whole idea of MNR is to use tasks that preserve the context and complexities of the operational environment.

4. Keep the design simple.

For precision, you may want to run a number of groups that each receive some aspect of your independent variable in order to tease out what is really working. Resist this temptation. For a pilot study, it’s best to smoosh all the interventions together and try everything at once. If you get convincing results, you can go back and run more careful investigations later; if you don’t get convincing results, it’s time to re-assess.

5. Keep the control groups down.

You want your control groups to be fair, but don’t overdo it.

6. Try to use actual workers.

Who are you going to run in your pilot study? Please don’t answer “college students” or, even worse, “Mechanical Turkers.” If possible, try to recruit participants who are currently doing the work or highly similar work.

7. Keep the number of measures down.

Resist the temptation to measure everything that moves. Yes, you have that capability, but then you’ll be saddling yourself with the burden of analyzing all those noisy data. A deeper problem is the mindless tactic: “Well, I will collect everything, so I don’t have to give any thought to what I really need.” Instead, try to imagine what measures are going to be sensitive to the effects that you expect.

8. Don’t over-complicate the statistics.

The purpose of statistics is to communicate what actually happened in your pilot study, so try not to use statistics that are too opaque or complicated or difficult to explain to the sponsors.

Good luck with your evaluation projects and pilot studies and with making useful and timely discoveries.

References

Note: This material is approved for public release. Distribution is unlimited. This material is based on research sponsored by the Air Force Research Lab (AFRL) under agreement number FA8650-17-2-7711. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes, notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of AFRL or the U.S. Government.

Klein, G., Jalaeian, M., Hoffman, R.R., Mueller, S.T., Clancey, W.J. (2021). Requirements for the empirical assessment of Human-AI work systems: A contribution to AI measurement science. Technical Report, DARPA Explainable AI Program.

advertisement
More from Gary Klein Ph.D.
More from Psychology Today