What modifications might we make and why?

3.7  Controlling policy allocation - which individuals or areas receive which interventions, and when - can play a key role in successful impact evaluation by affecting whether there is a meaningful comparison group. Public policy interventions tend naturally to be allocated in ways which conflict with good impact evaluation, but there are some minor adjustments which can be made to policy allocation which can dramatically improve the feasibility of obtaining meaningful estimates of impact. A simple explanation of some of these adjustments is provided in Box 3.A.

3.8  At first glance, accommodating evaluation in these ways might appear to require compromising on policy effectiveness. There might be concerns that planning research designs will delay the launch of a policy. Not necessarily targeting those subjects in most "need" is sometimes claimed to be limiting the benefits recipients might gain. Holding back a comparison group of unaffected individuals is similarly sometimes claimed to be limiting the numbers able to benefit. But there are strong counter-arguments against each of these points which should be recognised.

Box 3.A: What policy adjustments can improve evaluation chances? Some examples

Pilots

For interventions that are innovative, experimental or otherwise associated with a high degree of uncertainty, piloting is a recommended and often used way to introduce the policy. (A detailed review of pilots has been published by the Cabinet Office).1 This allows the policy to be tried out and information collected before full-scale resources are committed. In terms of generating a comparison group, piloting works because not every potential subject is exposed to the policy immediately. However, there is still likely to be a temptation on the part of those owning or delivering the pilot to allocate the intervention to those deemed most in need or otherwise deserving of it, leading to the same 'apples and pears' problem as was described in paragraph 3.5. Piloting should therefore be combined with one of the other allocation mechanisms described below.

Randomisation and randomised control trials

How should the policy be allocated to pilot areas, or to individuals or institutions within those areas? The method offering the strongest measure of policy impact is randomisation, often in a form known as a randomised controlled trial (RCT). In an RCT, the allocation of individuals, groups or local areas to receive the intervention is determined by lottery or some other purely random mechanism. Carefully conducted, a RCT provides the clearest evidence of whether an intervention has had an effect. RCTs should therefore be near the top of the list of potential allocation mechanisms, especially for policies that are experimental in nature. However, it is often claimed that RCTs are not appropriate or possible for a variety of operational, underpinning logical or ethical reasons. Indeed, there are a range of factors which can make randomisation difficult to implement. For instance, it is not likely to be suitable for assessing the impact of changes in universal policies. (For example, it would not be feasible to change the law on the legal blood alcohol limit for a random selection of drivers).

Phased introduction and intermittent operation

A variant of randomised allocation is phased introduction, whereby all participants in the pilot receive the intervention, but sequentially over some period of time. The periods of time when some participants have received the intervention and others have not can then serve to generate a comparison group (though you still need to control in some way for other factors ongoing during the time delay). It is still preferable to use randomisation to determine the order in which participants receive the intervention, to avoid a situation where "the most deserving" or "most prepared" receive it first - this might be considered more acceptable within a pilot in which all participants are planned to receive the intervention eventually. Obviously, phased introduction need not be limited to pilots and can also be used for the roll-out of general (e.g. national) policies.

A further variant of the phased introduction approach might be termed intermittent operation, where interventions that are short term in nature are applied in bursts. This approach is only likely to be suitable for particular types of intervention which are appropriately flexible (advertising campaigns might be one example).

Objective allocation rules

Where policies are targeted towards individuals, institutions or areas that have the greatest need (for example, prolific offenders, "failing" schools or deprived neighbourhoods), evaluation can be made much stronger (and the policy more transparent) by employing objective allocation rules (e.g. scoring systems or funding formulae) to determine who receives the policy. These policies can be evaluated effectively if these rules are well documented and applied. One approach is to assign a score to each offender, school, and so on, based on their level of need, so that those above a certain score then receive the policy, and those below do not. Comparison might then be made between subjects who received similar scores but who were just above and just below the threshold, or perhaps comparing those in just in scope of a policy with those just out of scope.2 Waiting lists are an administrative approach to allocation which can combine the features of phased introduction and objective allocations rules (e.g. a scoring system to assess needs and hence treatment priority).

Measures of relative effectiveness

If a policy must be introduced everywhere simultaneously then it will not always be possible to obtain an estimate of the full policy impact. However, some modifications might allow an estimate to be made of the impact on effectiveness of changes in the level or intensity of policy exposure - that is, of one extent of implementation relative to another. In these cases, the level of exposure which a subject receives needs to be decided in a way similar to the approaches discussed here (e.g. randomly, or through a scoring system), to ensure that exposure is not tailored by the policy maker to match needs of the intervention target or participant

3.9  As regards the timing of policy launches, avoiding delays can simply be a question of sound project management - including preparing for the evaluation in parallel with the other activities necessary to set up the policy. Moreover, many of the allocation mechanisms described in Box 3.A could be said to represent rather minor modifications of practice which do not imply significant policy delays. Good impact evaluation can be compatible with quick policy timescales, so long as it is considered early enough in the development process.

3.10  In response to the claim that adjusting implementation will reduce effectiveness or that random allocation of the policy might raise ethical concerns that the policy would not be delivered to those most in need, at least with policies where there is a reasonable degree of uncertainty about outcomes or value for money, one of the principal reasons for undertaking an impact evaluation is to determine whether an intervention is effective or offers value for money at all. In these situations, it does not follow that temporarily restricting implementation or using random allocation will necessarily reduce policy effectiveness. It could just as easily be the case that overall effectiveness might actually increase, by avoiding resources being wasted subsequently on policies which do not work or do not offer good value for money.

3.11  Even when a policy is implemented initially in a restricted way (for instance, in the form of a pilot or phased introduction), it might still be targeted at those subjects deemed most in need, rather than through a less discretionary, more random process. This might be in an attempt to "appease" any persistent concerns about limiting effectiveness. However, if so, it should be recognised that there will be negative consequences for the eventual evaluation. Not only will it be made more difficult to achieve reliable results (for the "apples and pears" reason described in paragraph 3.5), but any results which are obtained will relate to the recipients of the restricted policy only, and will not be readily applicable to those areas or individuals which would come under a more widely rolled-out policy. This will make extrapolation more difficult.

3.12  It is clear that impact evaluation has certain special requirements. Often these can be met by taking some relatively simple steps during policy development. The risks discussed in paragraph 3.8 should be recognised, therefore, but not exaggerated or used as a routine excuse to avoid undertaking robust evaluation. Nevertheless, there might be occasions where there is pressure to implement a policy as quickly as possible, in a quite specific way, with little thought given to the implications for any subsequent evaluation. If this is the case, it is better for decisions to be made only once the implementation options have been identified and their implications for evaluation and evidence considered. In some cases, pressure to implement might simply reflect a lack of recognition of the negative consequences for the evaluation, or the ease with which evaluation needs can be accommodated.




_______________________________________________________________________

1  Trying it out - the role of "pilots" in policy-making, Cabinet Office, 2003

2  For example in the Department for Work and Pension's evaluation of the New Deal for Young People, those included in the policy scope (people aged 18-24) were compared with those out of scope (people aged 25 - 49) using a difference s in differences approach. See Findings from the Macro evaluation of the New Deal for young people, Department for Work and Pensions, 2002 http://www.dwp.gov.uk/