5c Assess possible outcome evaluation designs - Examples 

Below are a series of examples of how the assessment of possible outcome evaluation designs can be done in evaluation plans:

Example 1. Performance Based Research Fund (PBRF)

Performance Based Research Fund (PBRF) Evaluation Plan (a nation-wide academic research output assessment system). The analysis of outcome evaluation designs is shown in Table 5 on pages 29-33 of the evaluation plan.

Example 2. Community Central 

Community Central Evaluation Plan (Community Central is an internet-based on-line networking platform for the community sector). Analysis set out below:

Experimental design

NOT FEASIBLE. It is not feasible to randomly assign groups of sector users to using or not using Community Central on the full-roll out of the system as one could not stop the control group using electronic networking in their work. However the system will be piloted, but not using an experimental design because of the expense of an experimental design in piloting.

Regression-discontinuity design

NOT FEASIBLE. This design creates an intervention group by selecting those in most need of an intervention, giving them the intervention and comparing their outcomes to those less in need. It is not appropriate or feasible in this case.

Time-series analysis design

NOT FEASIBLE. There is no good regular collections of data on networking amongst the relevant sectors which would provide a sufficiently long and detailed data-series to allow identification of the impact of the introduction of Community Central at a particular point in time on networking within the relevant sectors.

Constructed matched comparison group design

NOT FEASIBLE. There is no groups which is sufficiently similar to those who will be using Community Central which will not be using electronic networking and which could be used as a comparison group. This is because most people in most sectors are increasingly making use of electronic networking. An international comparison with another country would also not be feasible because all similar countries are moving to use electronic networking in relevant sectors. 

Exhaustive causal identification and elimination design

NOT FEASIBLE. This design would rely on a robust measure of increased sector networking and then would try to identify all of the possibilities for why this may have occurred rather than Community Central having caused it. Then the role of these other factors would be systematically examined and, if eliminated, the conclusion would be drawn that Community Central had caused the change. There is no real external measure of networking in the sector apart from the usage results from Community Central from which stakeholders will be able to draw their own conclusions about the level of networking occurring on Community Central.

Expert judgement design

FEASIBLE, HOWEVER NOT AFFORDABLE WTIHIN EVALUATION BUDGET. This design, of asking an expert whether in their judgement there is improved networking in the relevant sectors is unlikely to add much more information over and above the usage measures which will be able to be provided from the system and from which stakeholders can draw their own conclusions about the level of networking occurring on Community Central.

Key informant design

APPROPRIATE, FEASIBLE AND AFFORDABLE. WILL BE DONE. This is the approach which will be used to answer this question. An electronic questionnaire will be circulated to groups of users within Community Central. This will include general users and administrative users who will be in more of a position to comment on the use of the system overall.

Example 3: A national new building regulatory regime

An evaluation plan for a national new building regulatory regime. The new building regulatory regime was introduced as a consequence of the failure (due to leaking) of a number of buildings under the previous national regulatory regime.

Experimental design

NOT FEASIBLE. This design would set up a comparison between a group which receives the intervention and a group (ideally randomly selected from the same pool) which does not. For ethical, political, legal and design compromise reasons it is not possible to implement the interventions in one or more localities while other localities (serving as a control group) do not have the interventions. Apart from anything else, statutory regulation could not be imposed on only part of the country. In addition, there is a major design compromise problem given the practical and political importance of having a high standard of new building work it is likely that compensatory rivalry would reduce any difference outcomes between the intervention and control group.  Compensatory rivalry is where the control locality also implements the interventions which are being evaluated because it also wants to achieve the outcomes which area as important to it as to the locality receiving the intervention.

Regression-discontinuity design

NOT FEASIBLE. This design would graph those localities which could potentially receive the intervention on a measurable continuum (e.g. the quality of buildings in the locality). The intervention would then only be applied to those localities below a certain cut-off level. Any effect should show as an upwards shift of the graph at the cut-off point. In theory it would be possible to rank local authorities in order of the quality of their new building work and if resources for the intervention were limited it would be ethical to only intervene in those with the worst new building work occurring and hence establish a regression discontinuity design. However, the political, legal and design compromise (as in the above experimental design) mean that a regression-discontinuity design is not feasible. 

Time-series design

NOT FEASIBLE. This design tracks a measure of an outcome a large number of times (say 30) and then looks to see if there is a clear change at the point in time when the intervention was introduced. This design would be possible if multiple measures of new building quality were available over a lengthy (say 20 year) time series which could then continue to be tracked over the course of the intervention. However this design has the design compromise problem that there is another major factor - which can be termed the 'crystallization of liability' which is occurring at the same time as the introduction of the new building regulatory regime. The crystallization of liability is a consequence of all the stakeholders now becoming aware of the liability they can be exposed to due to failure of many buildings and the attendant liability claims which have arisen from them. It should be noted that this crystallization, of course, does not mean that any available time series data cannot be used as a way of tracking the not-necessarily attributable indicator of quality of new building work over time. It is just that any such time series analysis would be silent on the question of attribution of change to the new building regulatory regime. 

Constructed matched comparison group design

NOT FEASIBLE. This design would attempt to locate a group which is matched to the intervention group on all important variables apart from not receiving the intervention. This would require the construction (identification) of a comparison group not subject to a change in its regulatory regime, ideally over the same time period as the intervention. Since the new building regulatory regime is a national intervention such a comparison group will not be able to be located within the country in question. It is theoretically possible that one or more comparison groups could be constructed from other countries or regions within other countries. However discussions so far with experts in the area have concluded that it is virtually impossible for a country or region to be identified which could be used in a way that meets the assumptions of this design. These assumptions are: that the initial regulatory regime in the other country was the same; that the conditions new buildings are exposed to in the other country are similar; that the authorities in the other country do not respond to new building quality issues by changing the regulatory regime themselves; and that there are sufficient valid and reliable ways of measuring new building quality in both countries before and after the intervention. It should be noted that while some of these assumptions may be met in regard to some overseas countries, all of them would need to be met for a particular country to provide an appropriate comparison group.

Causal identification and elimination design

LOW FEASIBILITY. This design works through first identifying that there has been a change in observed outcomes and then undertaking a detailed analysis of all of the possible causes of a change in the outcome and elimination of all other causes apart from the intervention. In some cases it is possible to develop a detailed list of possible causes of observed outcomes and then to use a 'forensic' type process (just as a detective does) to identify what is most likely to have created the observed effect. This goes far beyond just accumulating evidence as to why it may be possible to explain the observed outcome by way of the intervention and requires that the alternative explanations be eliminated as having caused the outcome. This may not be possible in this case due to the concurrent crystallization of liability, discussed above, which occurred in the same timeframe as the intervention. It is likely that this cause is significantly intertwined with the intervention in being responsible for any change that occurs in new building practice and that it will be impossible to disaggregate the effect of the intervention from the effect of crystallization of liability. A feasibility study should be undertaken to make sure that this design is not feasible.

Expert judgement design

HIGH FEASIBILITY. This design consists of asking a subject expert(s) to analyze a situation in a way that makes sense to them and to assess whether on balance they accept the hypothesis that the intervention may have caused the outcome. One or more well regarded and appropriate independent expert(s) in building regulation (presumably from overseas in order to ensure independence) could be asked to visit the country and to assess whether they believe that any change in the new building outcomes is a result of the new building regulatory regime. This would be based on their professional judgement and they would take into account what data they believe they require in order to make their judgement. Their report would spell out the basis on which they made their judgement. This approach is highly feasible but provides a significantly lower level of certainty than all of the other outcomes evaluation designs described above. If this design is used then the evaluation question being answered should always be clearly identified as: In the opinion of an independent expert(s) has the new building regulatory regime led to an improvement in building outcomes? There are obvious linkages between this design and the causal identification and elimination design above and the feasibility study for that design should also look in detail at the possibilities for the expert judgement design.  

Key informant judgement design

HIGH FEASIBILITY. This design consists of asking key informants (people who have access by virtue of their position to knowledge about what has occurred regarding the intervention) to analyze a situation in a way that makes sense to them and to assess whether on balance they accept the hypothesis that the intervention may have caused the outcome. A selection of stakeholder key informants (key informants are people who have knowledge of what has occurred in an intervention) could be interviewed in face to face interviews and their opinions regarding what outcomes can be attributed to the new building regime could be summarized and analyzed in order to draw general conclusions about the effect of the intervention. This could be linked in with an expert judgement and a causal elimination design as are described above.

 Creative Commons Copyright Dr Paul Duignan 2007-2010