9  Empirical impact evaluation

Key points

•  Empirical impact evaluation seeks to find out whether a policy caused a particular outcome to occur. It requires both a measure of the outcome and a means of estimating what would have happened without the policy, usually using a comparison group.

•  Empirical impact evaluation is not feasible for every policy, especially if there is no comparison group. It may also be constrained if data are not available, or are too noisy, on the things it is necessary to measure.

•  Impact evaluations cannot be guaranteed to produce the correct answer. There is always some risk of concluding that a programme works when it does not, or that it is ineffective when it has a real impact. To some extent the risks can be mitigated by careful design of the research, and sufficient investment in data collection, but they also depend on, among other factors, the actual size of the impact.

•  The comparison group may have different outcomes from the policy group because of the way it was selected, rather than because of the policy itself, making comparison "unfair". This problem is known as selection bias.

•  Research designs seek to control the composition of the comparison group so that selection bias can either be avoided or taken into account. Using randomness plays a central role here, but this does not always mean a randomised control trial. Sometimes "natural" randomness present in the system being studied can be utilised instead.

•  The analysis of evaluation data requires an "identification strategy" to isolate the policy effect from competing influences. This involves modelling the sources of selection bias either directly (for example, by regression) or indirectly (for example, by estimating their effects with respect to trends over time).

•  Reporting of an evaluation should distinguish between descriptive statistics on the outcomes and true impact evaluation, which takes potential non-policy causes for observed changes into account. The former cannot answer the question of whether the policy caused the observed changes to occur, but the latter can.

More Information