9.1 This chapter focuses on impact evaluations which provide a quantitative measure of the extent to which any observed changes in an outcome of interest were caused by the policy. This kind of evaluation attempts to estimate the counterfactual - that is, what would have happened to the outcome of interest had the policy not taken place - by controlling for other factors which might have caused the observed outcome to occur. The outcomes can be selected to answer a range of questions, from whether the policy achieved its ultimate objectives, to whether other, intermediate outcomes were affected, which might indicate how and why such changes occurred. (The latter questions are also discussed in the context of process evaluation in Chapter 8).
9.2 The scope of this chapter is confined to empirical methods which isolate the effect of the policy from other factors affecting the outcome of interest through the statistical analysis of newly-collected or existing data. It does not, therefore, consider those types of impact evaluation which attribute changes in an outcome to the policy (or aspect of it) through reference to theory or existing evidence (this is discussed in Chapter 6).1
9.3 The formulation and analysis of the research designs used in impact evaluation require a solid grounding in statistics, and often expertise in a range of specialised techniques. The supplementary guidance provides a more detailed explanation and technical treatment. This chapter is therefore more concerned with the concepts, rather than the mechanics, of impact evaluation. To present these concepts it makes reference in places to particular research designs and statistical techniques, and as such is slightly more technical than the rest of the Magenta Book. But this is not a "how-to" guide to those techniques; rather, it seeks to explain carefully the underlying issues that arise in impact evaluation and what the techniques can and cannot do to address them. It should be useful both to analysts seeking to advise their policy colleagues on setting up evaluations, as well as to those responsible for managing externally-commissioned research as critical customers.
9.4 This chapter begins by considering what is required to conduct an impact evaluation, why it is sometimes problematic, and under what circumstances it is feasible. The next section builds on Chapter 3 and looks at the fundamental principles behind designing policies for evaluation, and how they may be applied. The important issue of "noise" is then considered. A section on data analysis follows, built around the notion of an identification strategy. The different ways in which research designs attempt to address selection bias are discussed, and some of the things that can go wrong are considered, along with advice on detecting and correcting for them where possible. Finally, there is a section on "constrained designs", including guidance on reporting results when the evidence falls short of what would be regarded as acceptable for a full impact evaluation.
_________________________________________________________________________
1 The rest of this chapter uses impact evaluation to mean empirical impact as defined in 9.2