Adjusting for factors affecting allocation

9.44 So, if "adjusting for" some set of factors is appropriate, how in practice is this adjustment performed? Essentially there are two strategies:

controlling for them - the relevant factors are entered as explanatory variables in the regression model. If the policy effect remains significant in this expanded model, it is interpreted as a causal effect of the policy; or

matching - the factors are used in a technique such as propensity score matching (PSM) to select subsets of the treated and untreated individuals that may be regarded as equivalent (in the sense defined above). A simple comparison between the matched groups might then be made, as it would be for an RCT. Box 9.I provides an example of an evaluation using propensity score matching.

9.45 When deciding which strategy to use, the first point to note is that in terms of addressing selection bias, they are equivalent. The choice therefore rests on other features of the data rather than on the assumptions being made about what drives exposure to the policy. A brief description is provided in Box 9.H.

Box 9.H: Control using regression

Control using regression is simple to implement, provides an estimate based on all the data, and allows the effects of relevant factors to be estimated individually. But regression models have to assume that the underlying relationship between variables has a particular shape, or "functional form" (in simple cases, just a straight line).

Departures from these assumptions turn out to be particularly problematic when the same factor strongly affects both exposure and outcome, as unfortunately, tends to be the case with quasi-experimental studies. A further issue is that the regression model will be based in part on individuals whose likelihood of participating is extremely low, and whose outcomes may bear little relationship to those of individuals who do actually participate. Matching designs have the advantage that they do not require any functional form assumption, but have their own difficulties.

For instance, depending on the success of matching they may involve discarding a significant portion of the data - especially if the targeting of the policy is such that the untreated contain few good matches for the treated. Matching can also be more complicated to implement.

The issues are technical and for a more detailed discussion of these points the reader is referred to Bryson et al.17

Box 9.I: An example of an evaluation using propensity score matching

New Deal for Lone Parents Evaluation (Department for Work and Pensions)

New Deal for Lone Parents (NDLP) is targeted at lone parents on Income Support (IS). It tries to place job ready lone parents into paid work and to prepare lone parents not currently in the market for work for entry to the labour market. NDLP was subject to a rigorous evaluation, one component of which was to measure the counterfactual (i.e. the additional benefits of the programme). However, there were a number of challenges in meeting this aim:

a matched area comparison was not possible because the programme was implemented in all areas of the UK;

all members of the target group were invited to join NDLP so there was no opportunity to select a control group from individuals that had not been invited; and

due to the relatively low take-up of NDLP, the maximum possible effect on aggregate numbers on Income Support was small, so that a time series approach to the impact assessment was not feasible.

Propensity Score Matching was chosen because it allowed a comparison sample to be drawn from lone parents who had chosen not to participate in the programme. Participants and the comparison sample were matched on their "propensity score" - the probability of participating conditional on all the factors that affect both participation and outcomes18A key issue in implementing this approach was that it was well-known that motivation of individuals is linked both to participation and outcomes, and that failure to control for this would almost certainly bias the results. This was addressed by explicitly collecting baseline data on motivation/attitudes through a carefully designed survey.

A stratified sample of approximately 70,000 lone parents was selected from Income Support records using data from August and October 2000. The sample was restricted to those who, at the time of selection, had not participated in the programme. Administrative systems were used to identify those who participated and these formed the sample of "participants". The rest of the sample was categorised as non-participants, the sample of participants were matched to a comparison sample of "non-participants", using a combination of administrative and survey data, including that on attitudes.

NDLP appears to have had a large positive impact on entries into work. After six months, 43 per cent of participants had entered full-time or part-time work compared to 19 per cent of matched non-participants. This suggests that 24 per cent of lone parent participants had found work that would not otherwise have done so.

Similar effects were observed when looking at the exit rate from Income Support; NDLP appears to dramatically increase the rate at which lone parents leave benefit.

There is no evidence to suggest that NDLP jobs are not sustainable: on the whole, participants left jobs less quickly than non-participants (12 per cent of participants left work (of 16 hours or more per week) within six months compared with 14 per cent of matched non-participants). For more information the report is available online 19 as is a subsequent more detailed technical assessment of the results.

9.46 It is important to realise that both the matching and controlling approaches depend on the assumption that all sources of selection bias have been captured in the data available to the researcher. If there is "selection on unobservables", and other, unknown, factors affect the probability of treatment
, then regardless of how elaborate the modelling procedure it is simply not possible to tell how much, if any, of the estimated policy effect is real, and how much is due to the unmodelled selection bias. A common example of selection on unobservables is motivation of participants in voluntary schemes, discussed earlier. A second example is personal knowledge of the candidate (for example, by a teacher, social worker, probation officer, etc.) which might affect that professional's decision to put the candidate forward for intervention. Where this is the case, an alternative approach that does not depend on identifying all the individual sources of selection bias may be stronger.



_____________________________________________________________________

17 The use of propensity score matching in the evaluation of active labour market policies, Bryson, Dorsett and Purdon, Department for Work and Pensions Working Paper No. 4 (2002). http://www.dwp.gov.uk/

18 Many studies tend to match on whatever obserable characteristics are available, whether these are the actual factors affecting participation and outcomes or not. In fact, in many situations these factors are either unobserable or simply not known, and hence should be subject to additional hypotheses.

19 Evaluation of the New Deal for Lone Parents: technical report for the quantitative survey; DWP Working Age Report 146, Phillips, Pickering, Lessof, Purdon and Hales. 2003, http://www.dwp.gov.uk/