10.3 An important task in all evaluations is to bring together the evidence collected from different parts of the evaluation to present a complete account. What are the answers to the original research questions? Do the results support each other, or are there apparent contradictions? In a small-scale evaluation this may be a fairly straightforward task, but in others, with many separate studies (such as process and impact evaluations) carried out over a number of years, it can be substantial. It is important to ensure that sufficient time and resource is allocated for this part of the evaluation.
10.4 When drawing together quantitative and qualitative evaluation evidence it is important to consider whether answers to different questions are consistent. A process evaluation might, for example, find that a policy was only weakly implemented; yet the impact study shows that it still had a significant effect. The different parts of the evaluation will need to be used to examine the original logic model (see Chapter 5 for further detail on the use of logic models).
10.5 Ideally, all the steps in the model are found to work as anticipated: a programme is implemented as intended; participants change their behaviour as predicted; and the desired outcomes are observed. Where this occurs, the overall consistency of the various evaluation findings increases our confidence in them. However, there may be occasions where some steps cannot be fully validated, for example, all the processes are seen to have worked as expected, but there is only weak evidence of overall impact. In such a case, confirming the earlier steps in the logic model will lead to increased confidence that the observed impacts are genuine.
10.6 But in some cases this does not happen, and the logic model breaks down. This can occur at a relatively early stage in the model. For example, suppose that the evaluation of a training programme for unemployed people finds that there is no significant impact, and that a large proportion of participants drop out before completing the training. We can then look for evidence as to why this happened using other evaluation evidence, for example through qualitative studies of participants, exploring why they did or did not complete a course, or through more detailed analysis of quantitative data to identify what factors are statistically associated with completing a training course.
10.7 Sometimes the break in the logic model can be at a later stage: a policy is fully implemented as intended but does not have the desired impact. For example, a programme is designed to help move unemployed people into work by encouraging them to search more actively for jobs, based on previous evidence suggesting that this will result in faster movement into work. The evaluation shows that people participate in the programme, and intensify their job search but that there is no impact on employment. Again, other parts of the evaluation may suggest explanations for this, for example there may be evidence that the current state of the labour market reduces the effectiveness; or that the programme only works for certain subgroups of individuals.
10.8 It is extremely important to note that these conclusions are not robust findings in their own right, but are new hypotheses which will need further testing to verify them. (A good treatment of the iterative process of refining hypotheses in this way is given in Pawson and Tilley's book on realistic evaluation.)1
10.9 It is useful to capture and document these emerging hypotheses as changes to, or refinements of, the original logic model, being careful to distinguish between those parts which are clearly supported by evidence, and those which are for further testing.
10.10 It is highly advisable to set down in advance the intended strategy for reconciling different estimates of impact. For example the Pathways to Work evaluation2 collected data on a cohort of those joining the pilots early in their operation in addition to a cohort that joined after the programme had been operating for six months, by which time it was expected that initial teething troubles would have been addressed. The intention was always explicit to use the results from the later, "preferred", cohort. It is important to set this out early on because otherwise it can be difficult to avoid accusations of choosing evidence to support a prior viewpoint.
10.11 There are no hard and fast rules for this process of drawing data together and many analysts will already have experience of synthesising data. For those wishing to learn more there are textbooks on the topic, for example Cooper and Hedges (1994).3 It is worth noting that there are separate considerations for quantitative and qualitative data.
________________________________________________________________________
1 Realistic Evaluation. Pawson and Tilley.1997 - see Chapter 5 in particular
2 Pathways to Work for new and repeat incapacity benefits claimants: Evaluation synthesis report, Research Report No 525 National Institute of Economic and Social Research on behalf of the Department for Work and Pensions, 2008. http://www.dwp.gov.uk/
3 The Handbook of Research Synthesis, Cooper and Hedges (Eds), 1994, New York: Russell Sage Foundation