From Ignorance to Evidence? The Use of Programme Evaluation in 
Conservation: Evidence from a Delphi Survey of Conservation Experts.  
 
Hannah Fay Curzon
1 
and Andreas Kontoleon
2
*  
 
 
Abstract  
 
Persistent gaps in the evidence base regarding the performance of conservation policies has 
put pressure on the conservation policy field to adopt ‘best practice’ programme evaluation 
methods. These are methods that account for the counterfactual and are able to attribute 
causality between a conservation policy and specific observable environmental and social 
impacts. Despite this pressure, use of such methods continues to be rare. This paper uses 
the Delphi technique to provide the first systematic assessment of the reasons behind the 
apparent hesitation of conservation practitioners to adopt rigorous policy impact evaluation 
methods. The Delphi study consisted of two online questionnaires conducted on 
conservation policy experts. The results presented confirm that the use of rigorous impact 
evaluation methods in conservation is still very limited but this, crucially, is not because 
conservationists are ignorant of these methods or their advantages. In fact, considerable 
effort is being made to develop and improve evidence standards but these efforts have 
largely been thwarted by large financial and time related constraints that mean even 
elementary evaluations are hard to achieve. The results from this Delphi study allow us to 
provide more realistic recommendations on how impact evaluation studies can be more 
widely embraced and implemented in conservation practice.  
 
1,2 
Department of Land Economy, University of Cambridge, 19 Silver Street, Cambridge, CB3 
9EP  
*Corresponding Author: Prof Andreas Kontoleon (E-mail: ak219@cam.ac.uk; Tel: +44 1223 
339773) 
 
 
1. Introduction 
 
Conservation practitioners and policy-makers need credible information regarding the 
performance of conservation interventions in order to ensure that scarce funds are not 
wasted on ineffective policies (Sutherland et al., 2004; Stem et al., 2005; Botrill, 
Hockings & Possingham, 2011). There have been numerous calls for the conservation 
policy field to adopt 'best practice' or 'rigorous' programme evaluation methods (e.g. 
Ferraro & Pattanayak, 2006; Ferraro, 2009). These methods focus on the use of 
experimental and quasi-experimental evaluation designs that can be used to credibly 
measure ‘counterfactual’ outcomes. It is argued that establishing this counterfactual is 
critical to being able to unambiguously isolate the impacts of policy interventions so as 
to get an unbiased estimate of a programme’s performance (Berry et al., 2012). 
 
Despite these calls, there are still large gaps in the hard fact evidence base regarding the 
performance of conservation policies. Several reviews have documented the paucity of 
formal evaluations studies on conservation policies using experimental and quasi- 
experimental methods (e.g., Pattanayak, Wunder & Ferraro, 2010; Blackman, 2012; 
Miteva et al., 2012; Adhikari & Agrawal, 2013; Roe, Greig-gran & Mohammed, 2013; 
Zheng et al., 2013; Alcorn, 2014; Cowling, 2014; Samii et al., 2014). This body of work 
has found that though monitoring and evaluation data (which only documents trends 
and changes in variables) is abundant and routinely collected, formal evaluation studies 
(which identify the causal links between a policy and specific conservation outcomes) 
are highly scarce 
 
Although the inherent financial, temporal, logistical, and sometimes ethical, challenges 
of conducting rigorous evaluations have been discussed in the literature, it is still 
conjectured that one of the main reasons for the limited use of policy evaluation methods 
is not through a lack of opportunity and resources but, instead, due to a lack of 
awareness, understanding and appreciation of the need for counterfactual thinking within 
the conservation policy field (e.g., Ferraro 2006, 2009, 2012). Such assertions are, 
however, largely unsupported by any kind of formative assessment of the rationale 
behind conservation evaluation decisions in practice and thus risk being inaccurate and 
out-of-date. Arguably, in order to obtain a more comprehensive understanding of the 
underlying reasons for the documented gaps in the evidence base, it is necessary to draw 
on the knowledge and experience of the actual decision-makers and practitioners 
working in the conservation policy field. The present study aims to fill this research gap 
by being the first to systematically ascertain information from experts working in 
conservation as to their stance with respect to the usefulness, practicality, desirability 
and prospects of using formal policy evaluation methods. For this purpose, our study 
uses the Delphi technique, an iterative survey-based research method, which allows for a 
systematic assessment of the conservation sector’s actual knowledge, appreciation, and 
experience with such methods. As a result, our study will be able to more critically 
evaluate the commonly made assertions found in several past reviews that the 
conservation sector is averse to impact evaluation. Lastly, the study will provide policy 
relevant information on how to more rigorously determine the needs, opportunities and 
barriers to using 'best practice' methods to evaluate the impact of conservation 
interventions. These findings could significantly contribute to improving our 
understanding of the conservation sector's approach to evaluation and how far 
conservation organisations represented in this study are thinking counterfactually, thus 
providing a more accurate and informed assessment of the real reasons for the gaps in 
the evidence base. 
 
This paper proceeds as follows: Section 2 provides some of the common critical 
assertions found in review literature on the paucity of impact evaluation work in the 
conservation field. This is followed by the rationale for this study and the specific 
research questions we address. Section 3 outlines the research methodology as applied 
in the Delphi study. The results of the study are then presented in Section 4 and are 
discussed and summarised in Section 5. The survey instruments that were used appear 
as Supplementary Materials (Appendices SM1-SM4). More details specifically on the 
methods used can also be found in a SM1 (Technical Annex). 
 
 
 
2. Impact evaluation in conservation policy 
 
2.1. The impact evaluation revolution in science 
 
Programme evaluation is fundamentally a process of making inferences about an 
unobserved counterfactual outcome, i.e., what would have happened in the absence of 
the intervention, programme or policy. (Ferraro & Pattanayak, 2006). Without this 
'counterfactual analysis' it is impossible to know how far impacts are the result of the 
intervention and not due to other confounding factors or biases (White, 2006; Khandker, 
Koolwal & Samad, 2010). However, as the counterfactual cannot be observed, the main 
challenge of impact evaluation is to find or construct an appropriate counterfactual in 
the light of the missing data. 
 
Two common approaches to evaluation that have been used in the conservation policy 
field are before-after and with-without comparisons, i.e., comparisons of outcomes before 
and after an intervention and comparisons of outcomes in areas with and without 
exposure to the intervention. As before-after comparisons do not control for other time 
varying factors, and with-without comparisons do not control for selection bias, both 
methods lead to biased estimates of impacts (Khandker, Koolwal & Samad, 2010). 
More rigorous approaches that can be used to solve the problem of selection bias and 
establish a credible counterfactual broadly fall into two categories (Khandker, Koowal 
& Samad, 2010). The first relies on data obtained from randomised controlled 
evaluations or trials (i.e. RCTs) which randomly assign study subjects into treatment 
and control groups. The data is collected before and after the policy leading to the so- 
called Before-After-Control-Impact (or BACI) design which is widely regarded as the 
'gold-standard' in programme evaluation (Frondel & Schmidt, 2005; Duflo, Glennerster 
& Kremer, 2008; Greenstone & Gayer, 2009). By randomly allocating treatment and 
control groups across eligible sample units, units that do not receive the treatment will 
be a valid comparison group for those that did since there should be no systematic 
differences between their characteristics (Rossi & Freeman, 1993). 
 
When randomisation of the treatment is not possible, the second-best option is to rely on 
observational data of two samples of subjects, one that has been exposed to a policy (or 
treatment) and others that have not. Then practitioners use quasi-experimental statistical 
methods (such as propensity score matching and difference-in difference estimation) to 
create comparison groups that are valid under a set of underlying assumptions about the 
nature of potential selection bias in programme targeting and participation (Khandker, 
Koowal & Samad, 2010). While these econometric methods are well-developed and 
firmly grounded in theory and statistics, the identifying assumptions are not always 
directly testable, and the validity of any particular study depends instead on how 
convincing the assumptions appear (Duflo, Glennerster & Kremer, 2008). 
 
The call for the use of formal impact evaluation methods that address the issue of the 
counterfactual is part of a broader movement towards evidence-based policy making 
(Gertler et al., 2011) that was first experienced in medicine in the second half of the 
twentieth century (Pullin et al., 2004). The resulting paradigm shift from ‘experience- 
based’ to 'evidence-based’ practice that emphasized the use of clinical experiments and 
systematic reviews (Pullin & Knight, 2001; Stevens et al., 2001) completely 
revolutionised medical practice. This 'effectiveness revolution' became the archetypal 
method for evaluation and primary research and spread to other social policy fields such 
as public health, education and international development who started to build 
randomised evaluations into their programmes recognising the need for convincing and 
comprehensive evidence that could be used to inform policy making and improve the 
allocation on government resources ((Pullin & Knight, 2004; Pullin et al., 2004; Gertler 
et al., 2011). 
 
2.2. Impact evaluation in conservation policy 
 
In contrast, the field of conservation policy did not experience the same 'effectiveness 
revolution' and even by the beginning of the twenty-first century the evaluation of 
conservation programmes continued to be rare (Kleiman et al., 2000). One of the main 
conclusions stemming from a global review of the evidence base known as the 
'Millennium Ecosystem Assessment,' was that '[f]ew well-designed empirical analyses 
assess even the most common biodiversity conservation measures' (MEA, 2005, p.122). 
Indeed, it was widely acknowledged at the time that conservation was still largely an 
experience-based practice that depended on intuition and anecdote to guide the design 
of conservation investments as opposed to empirical evaluations (Kleiman et al., 2000; 
Pullin & Knight, 2001; Salafsky et al., 2002; Salafsky & Margoluis, 2003; Pullin et al., 
2004; Sutherland et al., 2004). While these studies advocated the need for evidence- 
based conservation, interest in impact evaluation per se did not emerge in the 
conservation policy field until the mid to late 2000s (Frondel & Schmidt, 2005; Ferraro 
& Pattanak, 2006; Ferraro, 2009; Greenstone & Gayer, 2009; Pattanayak, Wunder & 
Ferraro, 2010). As a result, the amount of literature on environmental impact evaluation 
is still limited. 
 
Ferraro and Pattanayak's 2006 paper was one of the first to call for rigorous empirical 
evaluation of conservation polices. The authors argued that while conservation projects 
had increasingly focused on 'monitoring and evaluation' since the 1990s, 'rigorous 
measurement of the counterfactual in the conservation literature was non-existent' 
(Ferraro & Pattanayak, 2006, p.483) which had not only left conservation policy lagging 
behind most other policy fields but had also created a large gap in the evidence base 
regarding the effectiveness of even the most common conservation interventions 
(Ferraro & Pattanayak, 2006). The authors argued that: 
If any progress is to be made in stemming the global decline of biodiversity, the 
field of conservation policy research must adopt state-of-the-art program 
evaluation methods to determine what works and when. 
(Ferraro & Pattanayak, 2006, p.482) 
Particular emphasis was placed on the need for more experimental and quasi- 
experimental evaluations in the conservation sector on the basis that nearly all 
environmental programmes have hidden confounders which means non-rigorous 
evaluation approaches will, in most cases, lead to biased estimates of programme 
effectiveness. While the authors recognised the methodological challenges to using 
these approaches, they argued that there were still 'substantial opportunities [in the 
conservation policy field] to elucidate causal relationships through experimental and 
quasi-experimental designs' (Ferraro, 2009, p.76). In the same year Greenstone and 
Gayer's (2009) paper also stressed the need for policy makers to place greater emphasis 
on credible empirical approaches. Again, randomised evaluations were recognised as the 
ideal way to achieve this but the paper also demonstrated the validity of quasi- 
experiments as an appealing alternative. 
 
Since then, subsequent review papers on the environmental and social effectiveness of 
conservation policies have all reached a similar conclusion, namely that evaluation 
studies that construct credible counterfactuals are scarce and that there is a reluctance 
and hesitancy to undertake such exercises within policy circles (e.g., Blackman & 
Rivera, 2010; Blackman, 2012; Miteva et al., 2012; Roe, Grieg-gran & Mohammed, 
2013; Samii et al., 2014; Fisher et al., 2014; Baylis et al., 2015; McKinnon et al., 2015). 
In addition to highlighting the need for more rigorous evaluations, the review studies 
mentioned above have tried to identify and characterise some of the difficulties and 
potential barriers to implementing experimental and quasi-experimental designs in an 
attempt to provide some explanation for the limited use of these methods. Some of the 
barriers mentioned include missing baselines, long time-lags between intervention and 
impacts, complex spill-over effects, ethical considerations, lack of funding for 
evaluations and lack of time to update evaluation best-practice guidelines. Whilst these 
barriers to formal evaluation are recognised as being particularly pervasive in the 
conservation policy field, the ‘real’ reasons behind the limited amount of credible 
studies is yet to be formally assessed and thus remains contested. For example, Ferraro 
and his colleague’s argument is that the limited use of rigorous evaluation in 
conservation is due to lack of awareness and understanding of state-of-the-art 
programme evaluation methods, and a lack of appreciation for the biases in standard 
evaluation techniques (Ferraro & Pattanayak, 2006; Ferraro, 2009). According to 
Ferraro: 
Environmental scientists and practitioners often assume that evaluation is simply 
an act of taking a careful look at the monitoring data. If the indicator improves, 
a program is deemed to be “working.” If the indicator worsens, one infers the 
program is “failing.” 
(Ferraro, 2009, p.77) 
Yet, such assertions are largely unsubstantiated by any kind of formative assessment of 
the rationale behind conservation evaluation decisions in practice and the apparent 
hesitancy in using formal impact evaluation methods. Understanding the reasons for the 
gaps in the evidence base requires examining the merits, need for, and challenges of 
impact evaluation from the perspective of the policy-makers and the conservation 
practitioners on the ground. Relying only on reviewing existing and accessible impact 
evaluation studies (either published in journals or in grey literature) cannot adequately 
shed light as to how the conservation sector views these methods nor (more 
importantly) why they have not been espoused to the same degree observed in other 
social policy areas (such as development aid and health care). Indeed, more recent 
studies suggest that current understanding of programme evaluation in conservation 
may now be out-of-date, again, supporting the need for a more pragmatic assessment of 
the evaluation process. For instance, a recent study by McKinnon and her colleagues 
(2015) argues are that ‘CNGOs [conservation non-government organisations] are 
increasingly engaged with impact evaluation’ (p. 3) and that ‘investment in producing 
and commissioning impact evaluations among CNGOs is therefore growing…’ (p. 3) 
but that ‘little attention has been given to the organisational arrangements and processes 
by which these evaluations occur’ (p. 2), again supporting the need for further 
exploration in this area. 
 
2.3. Aims and research questions 
 
The aim of this paper is to address this gap in the systematic assessment of the use of 
programme evaluation approaches in conservation by employing an expert-panel based 
research method which allows us to assess past criticisms of the conservation policy 
field. Obtaining a more comprehensive view of the situation necessitates seeking 
perspectives of policy-makers and practitioners working in the conservation sector who 
have first-hand experience and knowledge of the fields' evaluation aims and techniques. 
For this reason, this study will use a panel survey of experts working in the conservation 
policy field to address the study's research questions. Specifically, this study will 
investigate the following research questions: 
 
 
RQ1.  How important are experimental and quasi-experimental evaluation methods? 
RQ2.  What are regarded as 'best practice' evaluation methods? 
RQ3.  To what extent are conservation organisations using experimental and quasi- 
experimental evaluation methods? 
RQ4.  What are the most significant reasons for the limited use of experimental and quasi-
experimental methods? 
RQ5.  What efforts are being made to improve evaluation standards? 
 
Addressing these questions will enable us to provide an ‘insiders’ perspective of how far 
the sample of conservation organisations represented in this study are actually 
embracing state-of-the-art evaluation methods. By drawing on the expert knowledge of 
individuals working on large-scale conservation projects, we expect the insights from 
this study to be germane to other organisations in the conservation sector. 
 
3. Methods 
 
3.1. The Delphi method 
 
We employ the Delphi method which is a survey-based research method that is able to 
ascertain the opinions of a purposively selected panel of experts (Hasson, Keeney & 
McKenna, 2000). Using a series of iterative questionnaires, the Delphi method 
facilitates structured communication between the experts effectively allowing the group, 
as a whole, to deal with a complex problem (Linstone & Turnoff, 1975) and to ultimately 
reach a consensus or convergence in opinion (Angus et al., 2003). The Delphi method has 
been employed in numerous disciplines including planning, social policy, nursing and 
information systems research, but also more recently in conservation and natural 
resource management (e.g., Hess & King, 2002; Oliver, 2002; MacMillan & Marshall, 
2006; Geneletti, 2008; Orsi et al., 2011). 
 
 
3.2. Selection of the expert panel 
 
Unlike traditional surveys, a Delphi survey requires a sample of qualified experts that 
have a deep understanding of the issues. Subsequently, rigorous selection of the panel is 
one of the most critical requirements of any Delphi study (Okoli & Pawlowski, 2004). 
The experts involved in this Delphi study were primarily identified in three ways: 
through involvement in projects run by the Cambridge Conservation Initiative (CCI) (a 
unique collaboration between the University of Cambridge and many of the largest 
biodiversity conservation organisations in the world (including the World Conservation 
Monitoring Centre (WCMC-UNEP) and Fauna and Flora International (FFI); through 
membership of BIOECON (a network of social scientists and policymakers working on 
conservation policy); and finally, by browsing staff profiles on the websites of national 
and international conservation NGOs and government agencies such as the World 
Wildlife Foundation (WWF), The Rainforest Alliance, The Department for 
Environment, Food and Rural Affairs (DEFRA), The Royal Society for the Protection of 
Birds (RSPB) and Natural England to name some examples. As there was a need for all 
of the panellists to speak English, organisations were predominately located/operating 
in the UK, Europe or North America. 
 
These sources initially produced a list of approximately 1,600 potential candidates. To 
be selected for the expert panel participants had to work for a conservation organisation 
as a policy advisor (designing and producing conservation policy interventions) or, have 
experience working as a conservation practitioner (implementing and evaluating 
conservation policy interventions on the ground). Conservation researchers working 
purely in an academic capacity, i.e., not involved with the actual design and 
implementation of large-scale conservation policies, were excluded from the Delphi 
panel selection. 
 
Using web searches to obtain biographical information, approximately 300 individuals 
were identified as meeting the necessary criteria and for which the necessary contact 
information could be sourced. Individuals were then sorted in order of preference based 
on their level of experience and expertise. The most preferred candidates were 
individuals in more senior positions, such as programme managers or officers, and 
individuals working in monitoring and evaluation (M&E) divisions or those known to 
have specialist knowledge in conservation effectiveness or evaluation based on their 
career history. Finally, with an expected 10% response rate we invited the top 200 
experts to participate in the Delphi study in order to obtain a sample of a minimum of 
20 participants which is in accord with Delphi method best practice guidelines (see 
SM1. for details). 
 
3.3. Structure of the Delphi process 
 
The Delphi study took place in June 2014 and consisted of two online questionnaires 
each taking approximately 20 minutes to complete. In our case, an a priori decision was 
made to have two rounds of questions due to the time available and to avoid the risk of 
sample fatigue. The questionnaires were created using the online survey software 
'Qualtrics.' Participants had eight days and nine days (with a week in between) to 
respond to the first and second questionnaire, respectively. To enhance response rates 
multiple follow-up email reminders were sent. 
 
 
To better address the study's research questions, the first round (R1) of questions was 
structured into two parts (see SM3.). Questions in part one of R1 were designed to 
address research questions one and two, i.e., the importance of rigorous evaluation 
methods as well as the panel's perspective on what they considered to be 'best practice' 
in conservation evaluation. Questions in part two of R1 were designed to address 
research questions three, four and five, i.e., how far different evaluation methods are 
actually being used in practice, the panel's perspective on what they considered to be the 
most significant barriers to using rigorous evaluation methods and how far attempts 
were being made to improve evaluation and evidence standards in conservation. The 
first questionnaire was accompanied by a two-page document which introduced 
respondents to the concept of the 'counterfactual' and outlined different approaches to 
evaluating conservation interventions (see SM2.). Particular attention was given to the 
definition of experiments and quasi-experiments in comparison to simple 'before-after' 
or 'with-without' approaches. 
 
Preparation for the second questionnaire (round two or R2) was devised based on the 
responses from R1 and was designed to provide a more detailed judgement on the issues 
therein. Following standard Delphi method best-practice guidelines we largely re- 
iterated the questions asked in R1 but now also included additional options for the 
experts to choose from based on the answers provided in R1 (see R2 survey in SM2.). 
This allows panellists the opportunity to re-evaluate their opinions in light of the new 
options and information available. In accordance with the Delphi methodology, the 
panellists were provided with the results from R1 to aid the re-assessment procedure 
(Angus et al., 2003). See SM1-SM4 for detailed information and justification regarding 
the structure of Delphi survey, the differences between the two rounds of questions and 
copies of the survey instruments used in R1 and R2 of the study. 
 
3.4. Composition of the Delphi panel 
 
Table 1 summarises the composition of the Delphi Panel. A total of 45 experts agreed to 
participate in the study and completed the first round of questions. While the initial 
response rate was relatively modest (24 %) the number of responses achieved was well 
above the minimum target of the 20 participants needed for the study. In contrast, there 
was a much higher response rate to R2 of the study (80 %) as only nine experts dropped 
out of the study leaving 36 participants (Table 1). For both R1 and R2 of the study there 
was a relatively even split between the number of males and females serving on the 
panel (Table 1). On average panellists had worked in conservation for 11 - 20 years, 
indicating a relatively high level of experience. A total of 24 conservation organisations 
were represented by the panel members in R1 decreasing only slightly in number to 21 
in R2 (Table 1). These organisations included a mixture of national and international 
NGOs, UK government agencies, major conservation research institutes and 
international development organisations (Table 1). The majority of the panel were found 
to be trained conservation scientists, environmental economists or experienced 
programme officers. 
 
[INSERT TABLE 1] 
4. Results 
 
4.1. Delphi survey: round one 
 
Table 2 provides the summary statistics of the responses to seven attitudinal questions 
posed to panellists in R1 of the study. There was found to be a consensus (75 % or more 
of the panel agreed) of opinion amongst the panellists for six of these questions which 
were subsequently omitted from R2 of the study. 
 
[INSERT TABLE 2] 
 
Unsurprisingly, nearly all panel members agreed that evaluations (in general) are 
essential to building the evidence base. More interestingly, it was also found that a very 
high percentage of panel members also agreed that the use of experimental and quasi- 
experimental evaluation methods was particularly important. Yet, over three quarters of 
the panel had the sense that when it comes to evaluating the success of its interventions, 
conservation is most likely still lagging behind other policy fields. It is not therefore 
surprising that the vast majority of the panel members agreed that attempts to measure 
the outcomes and impacts of conservation interventions using programme evaluation 
methods were only made occasionally, agreeing also that of these evaluations only 
some, as opposed to most or all, used experimental or quasi-experimental designs. That 
said, the vast majority of the panel also believed that there had been at least a slight 
increase in the use of these methods in conservation over the last few years (Table 2). In 
contrast, there was found to be less agreement amongst the panel as regards to whether 
or not conservation organisations are working hard to improve evaluation standards In 
this case, only 58 % of the panel agreed with this statement. One panel member 
commented that they would have answered the question differently had the questions 
asked them to consider just their own organisation and not conservation organisations 
generally. Subsequently, a re-phrased version of this question was included in R2 of the 
questionnaire for re-assessment by the panel. 
 
In order to assess whether or not there is any consensus amongst the sample of conser- 
vation organisations represented by the panel as regards the best standard of evidence, 
panel members were also asked to choose which evaluation design, from a list of four 
options, came closest to what they considered to be 'best practice' in terms of: (a) desir- 
ability and; (b) feasibility. Overall, there was found to be no consensus amongst the 
panellists who provided a wide range of answers for both questions. 
 
As no consensus was reached these 'best practice' questions were repeated in R2. 
Differently to R1, four additional options were included which were devised based on 
the 19 suggestions made by the panel as these represented totally different approaches 
(see Figure 1). This time panellists had the option to choose one or two answers. The 
questions were also rephrased to emphasise the distinction between what methods were 
most desirable in theory and what methods are actually most commonly undertaken in 
practice within the conservation community. A further modification was that there was 
no distinction made between methods using 'quantitative' and 'qualitative' data as this 
distinction, which was made in R1, appeared to have clouded consensus regarding the 
preferred method. 
  
[INSERT FIGURE 1] 
 
While there was again found to be no overall consensus in R2, panellists appeared to 
react to the information from other experts as there was less disparity between the 
group’s answers suggesting there was some convergence in opinion. For instance, in 
terms of desirability, the three most suggested answers provided by the panel were 
roughly evenly split between an 'experimental design,' (33 %), a 'BACI experimental 
design' (39 %) and a 'BACI-quasi-experimental design,' (39 %) (Figure 1). One panel- 
lists qualified their answer by stating: 
While seeking to codify best practices in conservation impact evaluation is 
important, we need  to  recognise  that  appropriate  evaluation  design  is 
context dependent, shaped by the level of uncertainty involved in the 
intervention, the data available, and the needs of decision-makers. 
Another panellist added: 
 
The problem is not about deciding what the bars are and which bar is necessary 
to evaluate intervention effectiveness. Most of the program leaders get it. The 
challenges sit with the reality of implementing even some of the lower standard 
evaluation designs. In many circumstances they are simply not feasible. 
 
In other words, the three favoured designs were all either experimental or quasi- 
experimental suggesting that these were definitely preferred by the majority of the panel 
compared to the other non-experimental options. The opposite was found in terms of 
what methods were said to be used most commonly in practice. This time the three most 
suggested answers were a 'simple before-after design' (61 %), a 'simple with-without 
design' (42 %), or 'other,' suggested by 25 % of the panel (Figure 1). Many of the 
panellists that selected 'other' specified that simple before-after designs were most used 
in practice but that they most commonly had 'small' samples of treatment groups. 
 
In order to assess whether or not there is any consensus regarding the reasons for the 
limited use of rigorous impact evaluation methods, panel members were asked to select 
what they considered to be the five most significant barriers to implementing these 
methods from a list of 14 options. Panellists also had the option to specify an alternative 
suggestion. The five barriers most suggested by the panel in R1 were 'lack of funding', 
'availability of a baseline', 'time constraint,' 'lack of forward planning' and the 
'availability of suitable control group,' (see Figure 2). This question was repeated in R2 of 
the study to try and build consensus. Building on R1, the panel had a choice of four 
additional options that were added to the list based on the alternative suggestions provided 
by experts in R1. Despite the additional options, the five most suggested barriers 
remained the same (Figure 2). However, this time there was found to be some 
consensus in opinion amongst the panel as 78 % of the experts concurred that 'lack of 
funding' and 'time constraint' were two of the most significant barriers. For instance, one 
panellist stated that: 
Funding is so short-term and funder requirements/interest so inconsistent, that it 
is basically impossible to develop consistent monitoring programs and to 
maintain consistent strategies through time. 
Another panellist commented that: 
 
The available funds do not even permit even the most basic level of 
monitoring. 
 
[INSERT FIGURE 2] 
 
4.2. Delphi survey: round two 
 
R2 of the Delphi study included several new questions. These questions were based on 
the general comments and feedback provided by the experts in R1. They were designed 
to provide a deeper insight into evaluation practices within the sample of conservation 
organisations represented by the expert panel. 
 
In order to better understand reasons for gaps in the evidence base, the panel were 
presented with a series of plausible explanations (devised from comments in R1) and 
then asked to score how far they agreed or disagreed with each explanation using a five- 
point Likert-scale (Table 3). There was found to be a clear consensus in opinion amongst 
the panellists for two of the four explanations with exactly three quarters of the experts 
agreeing that 'gaps in the evidence base have less to do with the nature of the field and 
more to do with a lack of incentives and/or funding/resources,' and that gaps 'can 
mainly be attributed to a lack of funding and/or resources and not because impact 
evaluation is not valued in the conservation policy field'. A high percentage of experts 
also agreed that 'a lack of incentive to disseminate findings' (64 %) and 'a lack of an 
accepted standard' (69 %) were also valid explanations for gaps in the evidence base. 
For example, one panellist stated that: 
In most conservation organisations there is little time or incentive for staff to 
write up findings for journals; our data and analyses usually go no further than 
donor reports and project institutional databases, rather than reaching the 
scientific literature. 
 
[INSERT TABLE 3] 
 
Table 4 summarises the aggregate results from three final attitude questions posed to the 
panel in R2. It was found that there was a strong consensus regarding how important it 
was 'to develop an accepted standard for the design and implementation of conservation 
evaluations,' with 85 % of the expert panel concurring that this development would be 
very important or at least quite important (Table 4). In contrast, there was found to be 
less agreement (only 61%) amongst the panel when asked to what extent they agreed 
that formal impact evaluation methods are unsuitable for evaluating all types of 
conservation policies. 
 
[INSERT TABLE 4] 
 
Finally, panellists were directly asked for their opinion on whether sufficient effort was 
being made in their organisation to improve programme evaluation standards. 
Encouragingly, 50 %, of the panel member answered 'Yes,' (Table 4). 
 
 
5. Discussion and policy recommendations 
 
The first point of inquiry of this study was to more systematically investigate to what 
extent rigorous evaluation methods are being used within conservations organisations. 
In line with existing arguments in the literature, the results from the Delphi study 
confirmed that the majority of evaluations in the sample of conservation organisations 
studied still use a simple before-after or with-without design (Figure 1). However, while 
the results show that the use of more rigorous evaluation methods is still insufficient in 
the conservation policy field, it does appear to be less limited within our sample than 
previously suggested. In fact, the common view held by the panel is that there has been 
an increase in the application of these methods in recent years and several panellists 
were aware of some organisations that are already regularly taking a more rigorous 
approach. For instance, many of the panel drew attention to a significant programme of 
work administered by conservation organisations (such as the RSPB, Natural England, 
CIFOR and WWF) that use BACI experiments and quasi-experimental methods to 
evaluate the environmental and social impacts of conservation interventions. This body of 
work on account of being either unpublished or in grey literature has often not been 
picked up by past critical reviews. Although not all of these studies can strictly be said 
to be perfect examples of impact evaluations, they do demonstrate that these 
organisations are making an attempt to construct reasonably credible counterfactuals. 
That said, given that the same few examples were provided by the panel, it is apparent 
that these organisations are still the exception and not the rule. 
 
While the results of this study corroborate previous claims that the use of experimental 
and quasi-experimental methods is still limited, they do not, however, support assertions 
that this is largely because conservationists are ignorant and unappreciative of rigorous 
methods. In contrast, the results support that the panel were not only highly aware of 
these methods and the need for more credible evidence but also recognised their 
importance when it came to drawing reliable inferences about the causal effects of 
conservation interventions (Table 2). What is more, far from being ignorant of 
experimental and quasi-experimental evaluation methods, the consensus reached 
amongst the panel was that these methods are, at least in theory, considered to be the 
benchmark, or 'best practice' for conservation evaluation (Figure 1). 
 
In line with Roe, Greig-gran & Mohammed’s (2013) arguments, the results reveal that 
there is a considerable gap between what methods and design considerations are 
considered to be 'best practice,' and what methods are actually feasible to implement in 
reality. While there is far from a simple explanation for why this implementation gap 
persists, as discussed above it is clearly not because impact evaluation in not valued in 
the conservation policy field (Table 3). Instead, the results from this study show that 
there are in fact a number of pervasive barriers to implementing experiments and quasi- 
experiments on the ground. In particular, many of the experts felt that the use of 
rigorous evaluation methods would remain limited without more staff and programme 
consistency and substantially higher levels of funding to implement evaluations over 
longer time periods. 
 
The 'crisis management nature' of the conservation policy field (Pullin et al., 2013) was 
also reflected in many of the comments left by the panel who argued that the forward 
planning required for rigorous evaluations is often just too impractical in the face of 
short-term funding and thus short-lived opportunities for action which are urgently 
needed. For the same reasons, our study also showed that there is a lack of incentive for 
conservation organisations to disseminate their findings. 
 
Whilst a lack of data sharing is not a barrier to evaluation per se, the vast majority of the 
panel agreed that it was likely to be one of the reasons for the gaps in the evidence base 
and is therefore an area in need of improvement (Table 3). What is more, the finding 
that much of the data in conservation does not reach the scientific literature supports the 
theory that the reliance on desk reviews of the scientific literature in order to assess 
approaches to evaluation in conservation is unlikely to be a fair or accurate assessment 
of what is happening in reality and, thus, the results may be a gross underestimate of the 
progress in evaluation that is actually taking place. For instance, a review of the 
Rainforest Alliance's unpublished impact studies, which was conducted as an extension 
to this study, found that there was mounting evidence of counterfactual thinking within 
the conservation organisation as many of their more recent evaluations had at least 
attempted to construct reasonably credible counterfactuals using matching  methods (e.g., 
Paschall & Seville, 2012 and Hughell & Newsom, 2013). Unfortunately, this progress 
appears to have been overlooked by some of the critical literature which has largely 
focused on material published in more academic oriented sources or only readily available 
reports. 
 
Further, many experts agreed that experimental and quasi-experimental methods should 
be prescribed in a more targeted manner. These sentiments are synonymous with 
arguments recently made by Pullin and his colleagues (2013), which stressed that while 
evaluation was important, it was also time consuming and costly and therefore needed 
to be justified. Furthermore, Mascia and others (2014), as well as Roe, Greig-gran & 
Mohammed (2013), have suggested experimental and quasi-experimental methods are 
best employed selectively where additional rigour is required to inform major 
programme decisions. 
 
Finally, one of the key contributions of this study relates to the methods used. Whilst 
there has been much discussion on standards of evaluation methods used by the 
conservation sector, the reasons provided for the current trends and attitudes towards 
these methods have largely been based on personal experience and anecdotal evidence. 
In contrast, this study is the first to employ the Delphi technique to provide a more 
systematic assessment of conservation experts’ actual knowledge, appreciation, as well 
as their experience with such methods. Overall, despite its limitations, the Delphi 
method proved to work well. Throughout the study, experts were observed to react to 
information from other experts and a considerable amount of convergence was observed 
to produce a clear consensus on a number of issues. Further the sampling frame adopted 
(including over 1600 conservation experts) as well as the actual sample size of 
respondents (n=45) are considerably above the minimum requirements for achieving 
robust findings (Hasson et al., 2000). Lastly, respondent bias was minimised as experts 
were rigorously selected to ensure that a wide spectrum of organisations and expertise 
were represented (Table 1). That said, it is important not to over generalise the findings 
of this study; the results are only representative of the conservation organisations in the 
sampling frame and cannot be considered to be necessarily representative of the 
conservation sector at large or of a specific geographical context. 
 
The insights gained from of our Delphi study allow us to draw several policy 
recommendations with respect to the use of impact evaluation methods in conservation. 
Firstly, our analysis suggests that the focus of research should move away from 
codifying  best  practice  evaluation  methods  and  instead  focus  on  developing  and 
improving minimum standards. As such, more emphasis should be placed on getting the 
basics of evaluation right. Indeed, there was a strong consensus amongst the panel that it 
was particularly important to first develop an accepted standard for the design and 
implementation of conservation evaluations (Table 3) as the current lack of an accepted 
standard was considered to be another factor contributing to gaps in the evidence base 
(Table 4). The standards would include the requirement that baseline survey data (before 
the intervention) on the environmental and livelihood impacts of conservation policies 
are more routinely collected from both potentially treated and comparable untreated (or 
control) villages. By agreeing to such basic minimum standards of policy evaluation, 
policy organisations position themselves more favourably in order to undertake 
evaluation in the future (when many of the impacts of their policies will be more readily 
observable). 
 
Secondly, it should be acknowledged that not all conservation policies can or should be 
subjected to large-scale rigorous policy impact evaluation. Policy agencies on their own 
will unlikely have the capacity and know-how to independently design and implement 
such studies. Aiming to undertake a plethora or of ill-designed evaluation studies will 
not provide valuable information and will constitute a waste of time and money. Instead 
what is needed is the emergence of a selected critical mass of carefully designed and 
executed evaluation studies (including RCTs and framed field experiments) that will 
produce unbiased estimates of impacts of conservation policies across different 
geographical and institutional contexts. This will enable researchers to undertake meta- 
analyses of this type of unbiased evidence that will produce more generalisable 
findings. 
 
Thirdly, undertaking such detailed and rigorous impact evaluation studies requires 
considerable time commitment (often over several years) as well as funds and resources 
not available to conservation organisations. Hence for this purpose it is imperative that 
NGOs collaborate with academics and gain access to additional resources to complete 
such studies. Research grant agencies should more proactively support and facilitate 
such collaborative research projects. A paradigm example of such a collaboration is that 
between the Royal Society for the Protection of Birds (RSPB) and academics from 
Wageningen and Cambridge universities, whom, between 2010-2015, undertook a series 
of comprehensive and rigorous policy impact evaluation studies (including randomised 
control trials) on the environmental and social impacts of conservation policies that aim 
to preserve the Gola Forest Nature Reserve in  Sierra  Leone  (see  project  link here: 
http://www.conservation.cam.ac.uk/collaboration/framework-assessing-livelihood- 
impacts-forest-conservation-programmes). It is imperative that these types of projects 
are emulated and funded by research grant agencies. 
 
 Fourth, the funders of conservation projects themselves also need to change their 
priorities and adopt a culture in which conservation evaluation is given as much 
importance as conservation action. This change in attitude is essential for providing the 
incentive to conservation practitioners to undertake or, at a minimum, engage with 
impact evaluation studies (Stravinsky et al., 2000; Kapos et al., 2008). For example, 
national and international policy organisations as well as private market stakeholders 
that are involved in the funding of Reducing Emissions from Deforestation and Forest 
Degradation (REDD) projects or Payments for Ecosystem Services (PES) programmes 
should embrace the importance and necessity of impact evaluation by providing 
adequate time and resources to undertake such studies and disseminate the results 
obtained. 
 
6. Conclusion 
 
The aim of this study was to more formally and systematically assess the importance 
and use of rigorous evaluation methods in the conservation policy field by conducting a 
Delphi survey of conservation experts with real experience in the conservation sector. 
Using a Delphi technique proved to be an effective way of synthesising expert 
knowledge to produce a coherent and comprehensive picture of the rationale behind 
conservation evaluation decisions in practice and, thus, has provided important insights 
into each of the study's five research questions. 
 
In general, the results confirm that the use of experimental and quasi-experimental 
evaluation methods in conservation is still very limited but this, crucially, is not because 
conservationists are ignorant of these methods or do not recognise them as being superior 
to non-experimental methods. In fact, considerable effort is being made to develop 
and improve evidence standards but these efforts have largely been thwarted by large 
financial and time related constraints that mean even elementary evaluations are hard to 
achieve. Impact evaluation is clearly not a panacea and will not always be what is 
needed. Certainly, incessant calls for increasingly rigorous evaluations are likely both 
quixotic and unproductive. Instead, this study recommends that there should be less 
focus in the literature on codifying best practice and more focus on finding ways to 
effectively raise minimum standards on a tight budget. Further, state-of-the-art impact 
evaluations should be aimed for a small selected number of case studies. 
 
As the Delphi study has proved to be an effective communication device in this area, a 
way forward from this study could be to widen the scope of this Delphi study to 
incorporate the views of the academic community by adding another panel comprised of 
conservationists working purely in research and academia. This way a discussion 
between the practitioners and researchers could be facilitated in an attempt to identify 
common views, share knowledge and seek ever more efficient means to a common end. 
Whatever the method employed, explaining, and thus addressing, the gaps in the 
evidence base will require academics to put their prejudices aside, open up paths of 
communication with the conservation sector and, crucially, undertake more research that 
draws on the expert knowledge and experience of those on the front line of conservation 
practice. 
 
 Supplementary Materials 
 
Supplementary Materials related to this article are attached as Appendixes SM1 to SM4 
(and will be made available on the JEMA’s website). These include a Technical Annex 
detailing the methods (Appendix SM1), the briefing document sent to the panellists 
(Appendix SM2), and copies of the survey instruments used for round one and round 
two of the survey (Appendix SM3 and Appendix SM4, respectively). The authors are 
solely responsible for the content of these materials. Queries (other than absence of 
material) should be directed to the corresponding author. 
 
Acknowledgements 
 
We would like to extend particular thanks to the 45 panellists who participated in our 
Delphi study for their engagement and expertise, as well as three anonymous reviewers 
for their insightful comments on an earlier draft of this article which greatly improved 
the manuscript. We would also like to thank Dr. Valerie Kapos for testing the survey 
instruments and providing useful feedback. 
References 
 
Adhikari, B., Agrawal, A., 2013. Understanding the Social and Ecological Outcomes 
of PES Projects: A Review and an Analysis. Conservat. Soc. 11, 359-374. 
http://dx.doi.org/10.4103/0972-4923.125748 
 
Alcorn, J.B., 2014. Lessons learned from Community Forestry in Latin America and 
their relevance for REDD+. USAID-supported Forest Carbon, Markets and 
Communities (FCMC) Program, EUA. http://www. fcmcglobal. 
org/documents/CF_Latin_America. pdf. Accessed 31 March 2016. 
 
Angus, A., Hodge, I., McNally, S., Sutton, M., 2003. The setting of standards 
for agricultural nitrogen emissions: a case study of the Delphi technique. J. 
Environ. Manag. 69, 323-337.  
http://dx.doi.org/10.1016/j.jenvman.2003.09.006 
 
Baylis, K., Honey-Rosés, J., Börner, J., Corbera, E., Ezzine-de-Blas, D., Ferraro, P.J., 
Lapeyre, R., Persson, U.M., Pfaff, A., Wunder, S., 2015. Mainstreaming Impact 
Evaluation in Nature Conservation. Conserv. Lett. 9, 58-64. http://dx.doi.org/ 
10.1111/conl.12180 
 
Berry, M., Cashore, B., Clay, J., Fernandez, M., Lebel, L., Lyon, T., Mallet, P., 2012. 
Toward sustainability: The roles and limitations of certification. Resolve, Inc. 
Washington, DC: 
 
Blackman, A., Rivera, J.E., 2010. The evidence base for environmental and 
socioeconomic impacts of ‘sustainable’certification. Resources for the Future, 
Washington, DC. 
https://www.researchgate.net/profile/Jorge_Rivera10/publication/46456069_The_Evid
ence_Base_for_Environmental_and_Socioeconomic_Impacts_of_Sustainable_Certific
ati on/links/0c9605299683547c30000000.pdf Accessed 11 April 2016. 
 
Blackman, A. (2012). Expost evaluation of forest conservation policies using 
remote sensing data: An introduction and practical guide. EfD, Discussion Paper 
Series 12-05. http://www.rff.org/RFF/Documents/EfD-DP-12-05.pdf Accessed 
May 19 2015. 
 
Bottrill, M.C., Hockings, M., Possingham, H.P., 2011. In pursuit of knowledge: 
addressing barriers to effective conservation evaluation. Ecol. Soc. 16, 14 
http://www.ecologyandsociety.org/vol16/iss2/art14/ 
 
Cowling, R.M., 2014. Let's Get Serious About Human Behavior and Conservation. 
Conserv. Lett. 7, 147-148. http://dx.doi.org/10.1111/conl.12106 
 
Duflo, E., Glennerster, R., Kremer, M., 2008. Using randomization in 
development economics research: A toolkit, in: Schultz, T. and Strauss, J.  
(Eds.), Handbook of Development Economics., 4. New York, pp. 3895-3962. 
 Ferraro, P.J., Pattanayak, S.K., 2006. Money for nothing? A call for empirical 
evaluation of biodiversity conservation investments. PLoS Biol. 4, e105. 
http://dx.doi.org/10.1371/journal.pbio.0040105 
 
Ferraro, P.J., 2009. Counterfactual thinking and impact evaluation in 
environmental policy, in: Birnbaum, M. & Mickwitz, P., (Eds), Environmental 
Program and Policy Evaluation: Addressing Methodological Challenges. New 
Directions for Evaluation 122, pp. 75-84. 
 
Fisher, B., Balmford, A., Ferraro, P.J., Glew, L., Mascia, M., Naidoo, R., Ricketts, 
T.H., 2014. Moving Rio Forward and Avoiding 10 More Years with Little Evidence for 
Effective Conservation Policy. Conserv. Biol. 28, 880-882. http://dx.doi.org/ 
10.1111/cobi.12221 
 
Frondel, M., Schmidt, C.M., 2005. Evaluating environmental programs: The 
perspective of modern evaluation research. Ecol. Econ. 55, 515-526. 
http://dx.doi.org/10.1016/j.ecolecon.2004.12.013 
 
Geneletti, D., 2008. Incorporating biodiversity assets in spatial planning: 
Methodological proposal and development of a planning support system. 
Landscape Urban Plan. 84, 252-265. 
http://dx.doi.org/10.1016/j.landurbplan.2007.08.005 
 
Gertler, P.J., Martinez, S., Premand, P., Rawlings, L.B., Vermeersch, C.M., 2011. 
Impact Evaluation in Practice. World Bank Publications. 
 
Greenstone, M and Gayer, T., 2009 Quasi-Experimental and Experimental 
Approaches to Environmental Economics. J. Environ. Econ. Manag. 57, 21-44. 
http://dx.doi.org/10.1016/j.jeem.2008.02.004 
 
Hasson, F., Keeney, S., McKenna, H., 2000. Research guidelines for the Delphi 
survey technique. J. of Adv. Nurs. 32, 1008-1015. http://dx.doi.org/10.1046/j.1365- 
2648.2000.t01-1-01567.x 
 
Hess, G.R., King, T.J., 2002. Planning open spaces for wildlife: I. Selecting 
focal species using a Delphi survey approach. Landscape Urban Plan. 58, 25-
40. http://dx.doi.org/10.1016/S0169-2046(01)00230-4 
 
Hughell, D. & Newsom, D., 2013. Evaluating the results of our work: impacts 
of Rainforest Alliance certification on coffee farms in Colombia. Cenicafe, 
Colombia. http://www.rainforest- 
alliance.org/sites/default/files/publication/pdfcenicafe_singles_0.pdf. Accessed 13 July 
2014. 
 
Kapos, V., Balmford, A., Aveling, R., Bubb, P., Carey, P., Entwistle, A., Hopkins, J., 
Mulliken, T., Safford, R., Stattersfield, A., Walpole, M., Manica, A., 2008. Calibrating 
conservation: new tools for measuring success. Conserv. Lett. 1, 155-164. 
http://dx.doi.org/10.1111/j.1755-263X.2008.00025.x 
 
Khandker, S.R., Koolwal, G.B., Samad, H.A., 2010. Handbook on Impact Evaluation: 
Quantitative Methods and Practices. World Bank Publications. 
 
Kleiman, D.G., Reading, R.P., Miller, B.J., Clark, T.W., Scott, J.M., Robinson, J., 
Wallace, R.L., Cabin, R.J., Felleman, F., 2000. Improving the Evaluation of 
Conservation Programs. Conserv. Biol. 14, 356-365. http://dx.doi.org/ 
10.1046/j.1523-1739.2000.98553.x 
 
Linstone, H.A., Turoff, M., 1975. The Delphi Method: Techniques and Applications. 
Addison-Wesley Reading, MA. 
 
MEA (Millenium Ecosystem Assessment), 2005. Ecosystems and Human Well-being. 
Island Press Washington, DC. 
 
MacMillan, D.C., Marshall, K., 2006. The Delphi process – an expert-based approach 
to ecological modelling in data-poor environments. Animal. Conserv. 9, 11-19. 
http://dx.doi.org/10.1111/j.1469-1795.2005.00001.x 
 
Mascia, M.B., Pailler, S., Thieme, M.L., Rowe, A., Bottrill, M.C., Danielsen, F., 
Geldmann, J., Naidoo, R., Pullin, A.S., Burgess, N.D., 2014. Commonalities and 
complementarities among approaches to conservation monitoring and evaluation. 
Biol. Conserv. 169, 258-267. http://dx.doi.org/10.1016/j.biocon.2013.11.017 
 
McKinnon, M.C., Mascia, M.B., Yang, W., Turner, W.R., Bonham, C., 2015. Impact 
evaluation to communicate and improve conservation non-governmental 
organization performance: the case of Conservation International. Philos. T. Roy. 
Soc. B. 370. http://dx.doi.org/10.1098/rstb.2014.0282 
 
Miteva, D.A., Pattanayak, S.K., Ferraro, P.J., 2012. Evaluation of biodiversity policy 
instruments: what works and what doesn’t? Oxf. Rev. Econ. Pol. 28, 69-92. 
http://dx.doi.org/10.1093/oxrep/grs009 
 
Okoli, C., Pawlowski, S.D., 2004. The Delphi method as a research tool: an example, 
design considerations and applications. Inform. Manag. 42, 15-29. 
http://dx.doi.org/10.1016/j.im.2003.11.002 
 
Oliver, I., 2002. An expert panel-based approach to the assessment of vegetation 
condition within the context of biodiversity conservation: Stage 1: the identification of 
condition indicators. Ecol. Indic. 2, 223-237. http://dx.doi.org/10.1016/S1470- 
160X(02)00025-0 
 
Orsi, F., Geneletti, D., Newton, A.C., 2011. Towards a common set of criteria and 
indicators to identify forest restoration priorities: An expert panel-based approach. 
Ecol. Indic. 11, 337-347. http://dx.doi.org/10.1016/j.ecolind.2010.06.001 
 Paschall, M. & Seville, D. 2012. Certified cocoa: scaling up farmer participation in 
West Africa. New Business Models for Sustainable Trading Relationships: IIED, 
CIAT, Rainforest Alliance, CRS & Sustainable Food Lab. 
http://pubs.iied.org/pdfs/16034IIED.pdf . Accessed 7 July 2014. 
 
Pattanayak, S.K., Wunder, S., Ferraro, P.J., 2010. Show me the money: do payments 
supply environmental services in developing countries? Rev. Environ. Econ. Policy. 4, 
254-274. http://dx.doi.org/10.1093/reep/req006 
 
Pullin, A.S., Knight, T.M., 2001. Effectiveness in conservation practice: pointers from 
medicine and public health. Conserv. Biol.  15, 50-54. 
http://dx.doi.org/10.1111/j.1523- 1739.2001.99499.x 
 
Pullin, A.S., Knight, T.M., Stone, D.A., Charman, K., 2004. Do conservation managers 
use scientific evidence to support their decision-making? Biol. Conserv. 119, 245-
252. http://dx.doi.org/10.1016/j.biocon.2003.11.007 
 
Pullin, A.S., Sutherland, W., Gardner, T., Kapos, V., Fa, J.E., 2013. Conservation 
priorities: identifying need, taking action and evaluating success, in: Macdonald, 
D.W., Willis, K. (Eds), Key Topics in Conservation Biology 2, John Wiley 
Publishing, pp. 3- 22. 
 
Roe, D., Grieg-Gran, M., Mohammed, E.Y., 2013. Assessing the social impacts 
of conservation policies: rigour versus practicality. IIED Briefing Papers. 
International Institute for Environment and Development, London. 
http://pubs.iied.org/pdfs/17172IIED.pdf. Accessed 11 April 2016. 
 
Rossi, P.H., Lipsey, M.W., Freeman, H.E., 2003. Evaluation: A systematic 
approach. Sage publications. 
 
Salafsky, N., Margoluis, R., Redford, K.H., Robinson, J.G., 2002. Improving the 
practice of conservation: a conceptual framework and research agenda for 
conservation science. Conserv. Biol. 16, 1469-1479. 
http://dx.doi.org/10.1046/j.1523- 1739.2002.01232.x 
 
Salafsky, N., Margoluis, R., 2003. What conservation can learn from other fields 
about monitoring and evaluation. BioSci. 53, 120-122. 
http://dx.doi.org/10.1641/0006- 3568(2003)053[0120:wcclfo]2.0.co;2 
 
Samii, C., Lisiecki, M., Kulkarni, P., Paler, L., Chavis, L., 2014. Effects of 
payments for environmental services (PES) on deforestation and poverty in low 
and middle in- come countries: a systematic review. Cambell. Syst. Rev. 11. 
http//dx.doi.org/10.4073/csr.2014.11 
 
Stem, C., Margoluis, R., Salafsky, N., Brown, M., 2005. Monitoring and evaluation 
in conservation: a review of trends and approaches. Conserv. Biol. 19, 295-309. 
http://dx.doi.org/ 10.1111/j.1523-1739.2005.00594.x 
 
Stevens, A., Abrams, K., Brazier, J., Fitzpatrick, R., Lilford, R., 2001. The 
Advanced Handbook of Methods in Evidence Based Healthcare. Sage, London. 
 
Stravinsky, I., 2000. Writing the wrongs: developing a safe-fail culture in 
conservation. Conserv. Biol.  14, 1567-1568. 
 
Sutherland, W.J., Pullin, A.S., Dolman, P.M., Knight, T.M., 2004. The need for 
evidence-based conservation. Trends. Ecol. Evolut. 19, 305-308. 
http://dx.doi.org/10.1016/j.tree.2004.03.018 
 
White, H., 2006. Impact evaluation: the experience of the Independent  Evaluation 
Group of the World Bank. University Library of Munich, Germany. 
https://mpra.ub.uni- muenchen.de/1111/1/MPRA_paper_1111.pdf . Accessed 11 
April 2016. 
 
Zheng, H., Robinson, B.E., Liang, Y.-C., Polasky, S., Ma, D.-C., Wang, F.-C., 
Ruckelshaus, M., Ouyang, Z.-Y., Daily, G.C., 2013. Benefits, costs, and livelihood 
implications of a regional payment for ecosystem service program. Proc. Natl. Acad. 
Sci. USA. 110, 16681-16686. 
  
Tables and Figures 
 
Table 1. Composition of the Delphi Panel in R1 and R2 
 
 R1 R2 
No. Experts Invited 200 45 
No. Responses 45 36 
Response Rate (%) 24
a
 80 
Responses Female (%) 45 47 
Responses Male (%) 
 
No. Organisations Represented 
55 
24
b
 
53 
21
c
 
a
adjusted to account for 5 % of emails bouncing back, i.e.,190 sent and received in total. 
b
including: WWF-US, UNEP-WCMC, WCS, UNESCO, RSPB, FFI, CCI, The Natural Capital Initiative, 
John Muir Trust, Bioversity International, Rainforest Alliance, Endangered Wildlife Trust, Natural 
England, DEFRA, FERA, FAO, OECD, FC, IUCN, IIED, ICL, DICE, Institute for Forest and 
Environmental Policy, Cambridge Forum for Sustainability and the Environment. 
c
less: OECD, John Muir Trust and Bioversity International. 
  
Table 2. Aggregate scores and responses for seven attitude questions posed to the Delphi panel in R1. 
 
Research 
Question 
Being 
Addressed 
Question 
(presented verbatim) 
Median Mean S.D n Percentage of 
Respondents 
RQ1 Efforts to evaluate the effectiveness of conservation interventions are essential 
to building the evidence base of what works and when. 1 = Strongly Agree, 2 = 
Agree, 3 = Neither/Nor, 4 = Disagree, 5 = Strongly Disagree 
1 1.24 0.48 45 98 % Strongly Agree or 
Agree 
RQ1 How important is the use of experimental and quasi-experimental programme 
evaluation methods for drawing reliable inferences about the causal effects of 
conservation interventions? 1 = Very, 2 = Quite, 3 = Neither/Nor, 4 = Not 
Important, 5 = Completely Inappropriate 
2 1.76 0.61 45 95 % rated Very Important 
or Quite Important 
RQ3 When it comes to evaluating the success of its interventions, the field of 
ecosystem and biodiversity conservation lags behind most other policy fields. 1 
= Definitely yes, 2 = Probably yes, 3 = Probably not, 4 = Definitely not 
2 2.04 0.77 45 77 % rated Definitely or 
Probably yes 
RQ3 How often are attempts made to measure the outcomes and impacts of 
conservation interventions using programme evaluation methods? 1 =Always, 2 
= Usually, 3 = Occasionally, 4 = Never 
3 2.82 0.45 44 77 % said Occasionally 
RQ3 On average what proportion of these evaluation studies would you say use 
experimental or quasi-experimental designs? 1 = All, 2 = Most, 3 = Some, 4 = 
None 
3 3.05 0.45 40 80 % said Some 
RQ3 How would you best describe the general trend in the use of experimental and 
quasi-experimental evaluation methods in the conservation policy field over the 
last few years? 1 = Dramatic Increase, 2 = Moderate Increase, 3 = Slight 
Increase, 4 = No Change, 5 = Slight Decline, 6 = Moderate Decline, 7 = 
Dramatic Decline 
3 2.69 0.83 39 77 % said Moderate 
Increase or Slight Increase 
RQ5 In general, conservation organisations are working hard to improve their 
programme evaluation standards in an attempt to strengthen the credibility of 
the evidence base. 1 = Strongly Agree, 2 = Agree, 3 = Neither/Nor, 4 = 
Disagree, 5 = Strongly Disagree 
2 2.40 0.75 45 58 % Strongly Agree or 
Agree 
 
 
 
 
 
 
27 
  
Table 3. Potential explanations for gaps in the evidence base: the results presented here inform RQ4 (from R2 survey) 
 
To what extent do you agree or disagree with each of the following explanations for any gaps in the evidence base regarding the impacts of conservation 
interventions? 
 
1 = Strongly Agree, 2 = Agree, 3 = Neither/Nor, 4 = Disagree, 5 = Strongly Disagree 
Statement (presented verbatim) Median Mean S.D n Percentage of Respondents 
 
Agreeing or Strongly Agreeing 
Gaps in the evidence base have less to do 
with the nature of the field and more to do 
with a lack of incentives and/or 
funding/resources. 
2 2.14 0.83 36 75 % 
Gaps in the evidence base can mainly be 
attributed to a lack of funding and/or 
resources and not because impact evaluation 
is not valued in the conservation policy 
field. 
2 2.17 0.56 36 75 % 
Gaps in the evidence base can partly be 
explained by a lack of incentive to 
disseminate findings: writing-up results for 
journals in not a priority. 
2 2.47 0.94 36 64 % 
Gaps in the evidence base can partly be 
explained by the lack of an accepted 
standard for the design and implementation 
of impact evaluations in the conservation 
  policy field.   
2.5 2.67 1.01 36 69 % 
 
 
 
 
 
 
 
 
 
 
 
 
28 
  
Table 4. Attitudes towards impact evaluation (from R2 survey) 
 
Research 
Questions 
Being 
Addressed 
Question (presented verbatim) Median Mean S.D n Percentage of 
Respondents 
RQ2 Experimental (randomised evaluations) and quasi- 
experimental (statistical matching) methods are not 
suitable for evaluating all conservation interventions and 
should only be used in certain circumstances. 1 = 
Strongly Agree, 2 = Agree, 3 = Neither/Nor, 4 = 
Disagree, 5 = Strongly Disagree 
2 2.47 1.00 36 61 % Strongly Agree or 
RQ4 In your opinion, how important or unimportant is it to 
develop an accepted standard for the design and 
implementation of conservation evaluations? 1 = Very, 2 
= Quite, 3 = Neither/Nor, 4 = Not Important, 5 = 
Completely Inappropriate 
2 2.06 0.79 36 83 % rated Very 
Important or Quite 
Important 
RQ5 With regard to your own organisation, do you think 
sufficient effort is being made to develop or improve 
programme evaluation standards in an attempt to 
strengthen the credibility of the evidence base? 1 = Yes, 
2 = No, 3 = Don't Know 
1.5 1.78 0.87 36 50 % said Yes 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
  
29 
. 
P
er
ce
n
ta
g
e 
o
f 
R
es
p
o
n
d
en
ts
 (
%
) 
100 
 
90 
 
80 
 
70 
 
60 
 
50 
 
40 
 
30 
 
20 
 
10 
 
0 
Theory (n = 36) Practice (n = 36) 
 
Evaluation Design 
 
Experimental design: Evidence from from a large-scale field experiment (i.e., fully 
randomised with-without comparison) 
 
Quasi-experimental design: Evidence from a large sample of statistically matched treatment 
and control groups (i.e., matched with-without comparison) 
 
BACI experimental design: Evidence from a large sample of randomised treatment and 
control groups compared before and after the conservation intervention (i.e., randomised 
with-without comparison with baseline data) 
BACI quasi-experimental design: Evidence from a large sample of statistically matched 
treatment and control groups compared before and after the intervention (i.e., matched with- 
without comparison with baseline data) 
Standard BACI design: Evidence from a large sample of similar, but NOT statistically 
matched treatment and control groups compared before and after the conservation 
intervention 
Simple with-without design: Evidence from a comparison of outcomes for a large sample of 
unmatched treatment and control groups (no baseline data) 
 
Simple before-after design: Evidence from a comparison of outcomes before and after the 
intervention for a large sample of treatment groups (no comparison group) 
 
I reject the idea of 'Best Practice;' there is no ideal way to evaluate the effectiveness of 
conservation interventions (Theory): Other (Practice) 
 
 
Figure 1. Comparison of the evaluation design/s that were chosen by the panel in R2 when asked 
what they considered to be the ideal evaluation design in theory and the most commonly used 
design in practice. Panellists could choose one or two answers from the eight options available for 
each question. The results presented here inform RQ2 and RQ3 
 
30 
P
er
ce
n
ta
g
e 
o
f 
R
es
p
o
n
d
en
ts
 (
%
) 
 
100 
 
90 
 
80 
 
70 
 
60 
 
50 
 
40 
 
30 
 
20 
 
10 
 
0 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Lack of funding Availability of 
baseline data 
 
R1 (n = 45) 
R2 (n = 36) 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Time constraint Lack of forward 
planning 
 
Barriers 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Availability of a 
suitable control 
group 
Figure 2. The five most suggested barriers to implementing experimental and quasi- 
experimental evaluation methods in the conservation policy field as rated by the Panellists in R1 
and then again in R2. The results presented here inform RQ4. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
31