The problem with the external validity of impact evaluations has not gone away. Quite the opposite, policymakers in many different sectors have access to an increasing amount of evidence of “what works in a particular context” and they are often left to believe that policy design can be based on evidence of what worked in other contexts. However, we also know that the same policy can yield different results in different contexts.
This idea of simple ‘transportability’ of evidence from one context to another has partially resulted from following the example set by medical sciences (where it doesn’t always work either!). Having said that, the medical sciences aren’t really to blame here. Some of the challenges are created by the fact that researchers behind the evidence generation and policymakers responsible for policy design are often interested in different aspects of external validity. However this isn’t always as explicit as we think it is. The researchers are tasked with addressing the issue of generalisability, which focuses on the extrapolation of study findings beyond the study sample to another population. In other words, researchers attempt to answer the question of whether an impact evaluation’s findings are likely to hold in general in other contexts. Policymakers, however, are tasked with addressing the issue of applicability - whether evaluation results from elsewhere continue to hold in one specific destination context. And it gets even more complicated than that! The applicability question is relevant to both scaling up a policy within the same target population and applying the evidence to a different target population.
How do you predict whether a policy that showed positive effects in one context will continue to do so in a new implementation context?
The questions about scale-up and transportability differ beyond simple semantics. Given the more limited evidence base of what works in the field of innovation, entrepreneurship and growth compared to some other fields, the transportability of a relatively scarce evidence base is particularly relevant. How do you predict whether a policy that showed positive effects in one context will continue to do so in a new implementation context? This was a topic that we explored with colleagues from J-PAL at the IGL2018 conference held in Boston, US, and one which we will certainly continue at this year’s IGL global conference held in Berlin, Germany from 21-23 May.
It’s not a secret that the issue of generalisability has received more attention in academic literature and other discussions than applicability. Nevertheless, we have a long way to go in this sector to reach the level of debates around external validity similar to those in public health. Having said that, nothing (illegal?) stops us – researchers and policymakers, but also those working on the ground – from working collaboratively in finding ways to make policy design and adoption much more responsive to the specifics of destination contexts. This requires a better understanding of the policy’s theory of change (including mechanisms) but also of contextual assumptions and characteristics to predict and spot external validity failures. This can be achieved by further research on how evidence-based policy decisions are made, where people’s beliefs fit in this, what sort of institutional structures facilitate finding the best fit between the existing evidence and the local context. Futhermore, a lot could be gaines from working collaboratively to develop and test empirical tools that help to draw judgements about “will it work here?”