One of the interesting features of randomised experiments is that the researchers or evaluators are usually involved in the process right from the start, when a new programme is first being designed or planned. After all, it’s not possible to randomly allocate participants to a treatment or control group in retrospect. And we find that involving evaluators from the start has some incidental benefits. Implementers have often found that evaluators add a useful perspective to their discussions about the logic of the intervention and the details of delivery. Importantly, evaluators will also be thinking about what data will be needed to measure success.
Despite the increasing use of randomised experiments, it is still common for specialists to be brought in to design an evaluation after a policy or programme has already got underway. Evaluators in these situations may find that there is no baseline data available that can be used to assess progress in key indicators (never mind whether any of that progress can be attributed to the intervention). If so, they may have no choice but to rely on participants’ recall, asking them to provide data on key indicators from some period before the programme began and compare it to their situation now. For example, participants in a business-support programme might be asked to estimate how much their turnover had changed since a notional baseline period – and possibly then asked how much of that change is attributable to the support they received. Nobody would expect retrospective data like this to be completely accurate, but it’s usually assumed to be better than nothing.
Perhaps we shouldn’t be too quick to make that judgement. In a recent paper (open-access version here, brief summary here), Simone Lombardini, Cecilia Poggi and I have found that people struggle to accurately recall even quite basic information about their living standards. We found that when individuals were asked to recall information about their household assets and their sources of income from a baseline period six years previously, their responses were much more closely associated with their current situation at the time of the survey than with what their true situation had been at the baseline. If we had relied on this recalled data to assess the impact of the programme, we would have mistakenly concluded that the programme had little or no impact. Our study is carried out in quite a different context to that where most of IGL’s work is focused, among women in rural communities in Ethiopia. But the findings concur with some other studies, also carried out in developing countries. In the absence of evidence that retrospective baseline data in other contexts is much more reliable, these results should at least make us hesitate before relying on this for conclusions about the impact of a programme.
This serves as another illustration that learning about the effectiveness and impacts of interventions is really difficult, and that sensible-sounding shortcuts might lead to results that are not just less accurate, but potentially highly misleading. At the end of the day, there is no substitute for a well-designed experiment and a robust approach to collecting data.