Improving the Professionalism of Evaluation
May 31, 2005
Prepared for the Centre of Excellence in Evaluation
Treasury Board Secretariat
by T. K. Gussman Associates Inc.
Overview and Summary of Findings
This paper considers developments and trends in program evaluation against the context of evolving public service management. Expenditure review, new management frameworks and a need for greater public sector accountability have created challenges for Internal Audit and program evaluation in the federal sector. For program evaluation, that challenge involves improving the image and credibility of the function and, ultimately, ensuring greater use of evaluations by senior management.
For years, many organizations have housed the two functions together, and in more cases than not, the head of the organization has an audit background. Most departments have a joint Audit and Evaluation Committee to review and approve the results of accountability studies. Part of the solution lies in creating an evaluation identity separate from Internal Audit.
A first step would be to separate the administration of the two functions. The second step would be to develop and promulgate appropriate professional standards and training for evaluators to ensure that their work served the purposes for which it was intended.
In parallel with such organizational evolution would be the development of a system to ensure that individuals engaged in the practice of evaluation had sufficient skills and education to carry out the work to the established standards.
Some of the key findings of the study are:
The climate of public sector accountability has been changing and continues to change. The nature of program evaluation is changing as well. The term "evaluation" is often perceived loosely and needs to strengthened or clarified in its application. Studies of the health of the evaluation function carried out the CEE have concluded that insufficient attention has been devoted to cost-effectiveness in a majority of evaluation studies and that greater focus on the value for money aspect of evaluation will be necessary in the new era of results and accountability.
The Internal Audit function is being strengthened and serious consideration is being given to requiring certification for program auditors for employment in federal departments. The changes in the audit area provide both a signal and an incentive for the evaluation community to find a means for strengthening evaluation shops and ensuring professional standards.
As management needs become more complex, critics point to a lack of consistency and methodological rigor within the evaluation function. Much of the criticism comes from within the evaluation community. A lack of standards has led to a situation where "evaluation" is conducted by a variety of individuals with a social science or statistics background. There is a consensus that action is required to create professional competency criteria for evaluators.
Evaluation involves the rigorous application of research methods, statistical methods, analytical techniques and listening skills. Also critical are sound judgment and effective communication skills. Because evaluation is not a profession, individuals who engage in the activity should demonstrate a certain skill set and the ability to identify and apply the appropriate tool(s) to any specific situation.
It is almost impossible to regulate entry to the field. At the same time, it is highly important that there be standards to which "evaluators" can be held. While there is overall agreement on such standards for evaluation, the debate over if and how those standards could be enforced has taken close to two decades -- without resolution.
Licensing and certification of individuals involve arduous processes and appear to raise the spectre of potential legal challenges. "Credentialing" is a looser form of certification and this approach leads into the identification of core competencies.
A reasonable solution appears to come through the growth of university-based programs offering a certificate in evaluation. The path of least resistance in finding a solution would be to arrange for partnership agreements between the federal government and such institutions. Identified universities in all regions of the country could be accredited to train entry-level evaluators and to test their students to ensure that they demonstrate the key competencies for an evaluator. An initial step would be to reconcile the TBS profiles with those recommended in the recent academic evaluation literature. There would be no direct enforcement on the part of the federal government, but the use of such programs could be encouraged through the terms and conditions of contracting (such as through mandatory requirements in a request for proposals).
The accreditation of post-secondary institutions to train students to ensure that they meet the core competency test would support the effort to separate evaluation units in the federal government from their co-mingling with audit units. With separately defined criteria and standards for evaluators and auditors there would be less reason to continue such administrative arrangements. Separate management can be justified by the presence of professional standards and training needs and how these relate to accountability. This would help to enforce the evaluation identity and justifying the function as an important and powerful management tool.
Public sector program evaluation is in a state of flux. As the function evolves, it does so against the backdrop of recent studies and reviews suggesting a need for raising the profile of evaluation in the federal sector. Studies of the process by the Office of the Auditor General noted in 1993 and 1996 that the quality of federal departmental evaluation studies must improve. This was confirmed recently in a 2004 study of the Quality of Evaluations conducted by Treasury Board's Centre of Excellence in Evaluation (CEE). As well, CEE consultations indicate frustration among many evaluation groups, who see themselves as under-funded or under-utilized.
Two issues underlie this concern: the future of the evaluation function in the federal context and the credibility of the function itself. Both issues can be tied into the perceived degree of "professionalism" for program evaluation. This touches upon evaluators, evaluation units and evaluation managers.
Possible responses to these issues can be found in ongoing efforts to modernize the evaluation function and, at the same time, to make it more professional. The evaluation community is paying close attention to changes taking place within the Internal Audit community, where greater attention has been focused upon Internal Audit as a management accountability tool in light of recent and ongoing public inquiries. As a partial response, the Treasury Board Secretariat recently renewed the role of Office of the Comptroller General. One of its key priorities is to strengthen the Internal Audit function in the federal government. Activities being considered to accomplish this goal include revising the Internal Audit Policy and practices; clearer definition of roles and responsibilities for senior managers and Comptrollers; standardizing Internal Audit methodologies and tools; providing guidance on best practices and training; and possibly requiring certification standards for some internal departmental auditors. Similar considerations are also of interest to the evaluation community.
In response to all of the above developments, the Senior Advisory Committee established a sub-committee to look at the role and positioning of evaluation in the federal government. This report on improving the professionalism of evaluation deals with one of three themes developed by the Senior Advisory Committee.
Objective of this study
This study investigates optional approaches to increase the professionalism of evaluation in the federal government. It is anticipated that increased professionalism, however achieved, would lead to the increased utilization of evaluation by program managers and deputy heads. The areas investigated include the quality of evaluations, lessons learned in strengthening Internal Audit practices, the utilization of evaluation groups and alternative approaches to ensuring basic credentials or levels of competency for professional evaluators and other groups. The objective of this research has not been to develop certification standards, statements of values and ethics or training plans.
As the study evolved, the evidence pointed strongly to increasing the focus of the study on what the minimum acceptable set of standards for an "evaluator" ought to be, how best to create assurance that evaluators met those standards of practice, who ought to deliver such training and the nature of continuous learning to ensure that standards were maintained throughout public sector evaluation units and in the outside community.
Approach to the research
An extensive body of literature, both academic and public sector, was reviewed for the project. Using themes from the literature, discussions took place with federal evaluation managers and other government officials in Ottawa, selected academics in Canada and the US and practitioners active in the Canadian Evaluation Society.
The purpose of these discussions was to explore what the respondents believed were the 'professional qualities' of evaluation, as well as the concepts of accreditation, certification, credentialing and core competencies to determine the model that best matches the anticipated needs of the Canadian public service with the product/services that the evaluation community is prepared to provide.
In short, the information sought was for the purpose of equilibrating future supply and demand in the market for professional evaluation services.
1. The Current State of Evaluation
Over the past few years, a number of studies have been undertaken by the CEE to assess the state of program evaluation in the federal sector. Particular emphasis was placed on the quality of evaluations and the extent to which they have been utilized, the human and financial resources involved and the stated or perceived needs of the evaluation community See footnote 1 .
CEE's February 2005 report on the health of the evaluation function in the Government of Canada notes that total expenditure in the federal sector was $54.8 million, which averaged only 0.16% of departmental expenditures. Of particular concern was the fact that only 11 out of 47 small agencies had evaluation budgets totalling $1 million, and that 60% of that related to just two small agencies.
The study found that evaluation quality, while improving, was still not as high as could be expected. Quality was measured on several criteria such as methodological rigour, the use of executive summaries and appropriate attention to issues of cost-effectiveness. Even with improvement noted since the previous review, CEE found almost one-quarter of the evaluation reports examined to be inadequate.
Perhaps of greater concern, the study found that only 80 percent of evaluation units can meet their departmental demands for evaluation and only 53 percent are able to produce or assist in developing Results-based Management and Accountability Frameworks (RMAFs). Moreover there is a reported shortfall both in funding available for evaluation purposes and the personnel required to manage and carry out evaluations.
In response to these concerns and broader Government comptrollership and accountability initiatives, the CEE reorganized and implemented several measures to strengthen the evaluation function and process. In addition to supporting the sharing of effective practices among departments, the CEE now reviews all evaluations submitted to the Treasury Board in support of funding requests and requires that all evaluation plans be reported to Parliament through departmental Reports on Plans and Priorities. As well, completed evaluations must now be reported in Departmental Performance Reports. These changes are designed to contribute to greater transparency of results on program effectiveness in the federal sector.
2. The utilization of evaluation groups
According to recent research, evaluation groups are not involved to the extent one would expect for this function. For example, the CEE found that only 28% of heads of evaluation units sign off on program-led reviews, many of which do not follow formal evaluation protocols or standards. In fact, only 25% of evaluation heads sign off on program-led evaluations and a smaller number of evaluation units are involved in advising on program design or developing performance measurement frameworks for departmental programs See footnote 2 . The CEE study expresses surprise that "…only 10% of evaluation units are involved in implementing and monitoring departmental Management Accountability Frameworks (MAFs)."
Of more concern, the same study found that evaluation work led by program areas often falls outside the purview of departmental evaluation committees. The evaluation committees of departments contacted reviewed only 15% of evaluation reports produced by programs and only 20% of the management action plans stemming from program-led evaluations. Only 15% were involved in monitoring the follow-up to such action plans.
While the above results are based on a survey of 40 federal departments and large agencies, the findings are sufficiently strong to indicate that something more is needed to ensure that evaluations are utilized by senior management and throughout program areas. The CEE has embarked on several initiatives to promote and strengthen evaluation units, including its recently updated Learning Strategy.
3. Maintaining Standards
The February 2005 report of the Auditor General of Canada notes the importance of maintaining evaluation standards in the work carried out by Canada's foundations:
The foundations set their own terms of reference for evaluations required by funding agreements. In our view, consistent application of evaluation standards is needed to assess whether foundations have met the major government objectives set for them. Departments follow the standards set out in the Treasury Board's evaluation policy. Comparable standards could be used by foundations. This is a new element in our accountability framework for foundations (Appendix C).
The Auditor General's report notes that some foundations have asked officials in sponsoring departments to comment on evaluation documents such as draft terms of reference and recommends that in new or amended funding agreements, sponsoring departments should seek to ensure that evaluations commissioned by foundations meet recognized evaluation standards. The Government concurred that the use of recognized evaluations standards was very important, but clarified that there was no suggestion that such standards are not being followed because the Auditor General did not examine the evaluations or related documents commissioned by foundations.
The standards for policy evaluation in Canada are well documented See footnote 3 . The first set of standards for program evaluation was published in 1984, followed by standards for personnel and then student evaluation. In 1994 the program evaluation standards were revised and they are now under review again. The Canadian standards have been used in other jurisdictions. It is of interest to note that the African Evaluation Association has recently adapted its own standards using the Canadian model, and that the Canadian Evaluation Society has assisted in those efforts.
Recent efforts to strengthen the role of the CEE in reviewing plans and results as well as overall monitoring and guidance will contribute to the maintenance and enforcement of those standards. The gap lies in the evaluation community itself, where there is growing evidence of the need for ensuring adherence to minimum professional standards among evaluation practitioners.
4. Lessons Learned from Other Areas
Lessons Learned in Strengthening Internal Audit Practices
Parallel to the CEE, the Treasury Board Secretariat has a Centre for Excellence in Internal Audit (CEIA), which offers both products and services to federal departments and agencies to support Internal Audit activity. Through its website, the Centre shares tools, best practices in Internal Audit and risk-based planning. In many ways the activities and resources of the CEIA are similar to those of the CEE. Internal Audit engagements differ from program evaluations, both in intent and approach. At the broadest level, both functions relate to accountability. One fundamental difference lies in the absence of any international body or standards governing program evaluation. On the other hand, the Internal Audit function in the Government of Canada relies on standards approved internationally by the Institute of Internal Auditors (IIA), which is a worldwide professional organization for Internal Auditing. The IIA provides systematic, disciplined guidance for the conduct of audit engagements, supporting an approach to evaluate and improve the effectiveness of risk management, control, and governance processes See footnote 4 .
The IIA Board of Directors approved in June 1999 a set of professional standards and a Code of Ethics. These are incorporated by reference into the federal Policy on Internal Audit. The IIA standards are to be used by auditors carrying out assignments in the federal government so long as those standards do not conflict with the federal Internal Audit policy or any other TBS policies or guidelines. The absence of internationally recognized standards and a governing body for evaluation marks a fundamental difference between the evaluation and audit functions. Having these as a reference point enhances the credibility of Internal Audit.
That comes in part through the commonality of approach among audits and the ability of that latter community to agree on standardized methodologies for audit engagements. In many senses, evaluation is less amenable to standardization, due to the many variants possible for both purpose and conduct.
The Treasury Board "Audit Guide" illustrates various areas of practice and policy where Internal Audit and evaluation have similarities, at least in terms of the governance structure. Departmental committees are required for both functions, although many departments have tended to combine their oversight at the senior management level into "Audit and Evaluation" review committees. For the most part, this has been done for administrative simplicity. A priori, there are no compelling factors, other than the common basis of accountability, to house the two together.
One area where the approaches diverge, and where evaluation may be able to build upon lessons learned from the Internal Audit experience, is that of quality assurance and continuous improvement See footnote 5 .
The federal audit function utilizes five IIA quality assurance standards. IIA Standard 1300 (Quality Assurance and Improvement Program) specifies that the chief audit executive develop and maintain a quality assurance and improvement program that covers all aspects of the Internal Audit activity and continuously monitors its effectiveness. All aspects of the program should be designed to help the Internal Auditing activity add value and improve the organization's operations and to provide assurance that the Internal Audit activity is in conformity with the Standards and the Code of Ethics. Outside reliance on international standards, much of this approach is consistent with the current mandate of the CEE regarding program evaluation.
IIA Standards 1310, 1311 and 1312 relate to a process for monitoring the quality program, including ongoing performance reviews and self-assessment. Standard 1311 covers internal assessments and Standard 1312 covers external assessments.
IIA Standard 2340 deals with engagement supervision such that proper supervision is provided to ensure that objectives are achieved, quality is assured, and that staff is developed.
The guide notes that the first and most important level of quality assurance is the due professional care exercised by the Internal Auditor and the supervisory review conducted of the Internal Auditor's work throughout all phases of the engagement by more senior members of the Internal Audit group. The supervisory review must encompass the planning, conduct, and reporting phases. The second level of quality assurance performed by many audit shops is an independent internal review to assess the quality and adequacy of the work performed, in accordance with TBS, IIA and department or agency policies and standards. This will involve a thorough examination of working papers and the audit report by a professional who did not conduct the audit. A third level of quality assurance involves a formal comprehensive review of effectiveness and compliance with relevant standards every five years. This usually involves liaison with the IIA but may be carried out internally with external validation.
Another development within the Internal Audit community bears examination. The IIA has developed a specialty certification for public sector auditors, known as the Certified Government Auditing Professional (CGAP). This is a specialty IIA certification designed for and by public-sector Internal Auditing practitioners. The exam tests a candidate's knowledge of the unique features of public sector Internal Auditing, including fund accounting, grants, legislative oversight, and confidentiality rights and other areas. The broad scope emphasizes the Internal Auditor's role in strengthening accountability to the public and improving government services See footnote 6 .
In addition the IIA offers a Certified Internal Auditor (CIA) designation. This is currently the only globally accepted certification for Internal Auditors and remains the standard by which
individuals demonstrate their competency and professionalism in the Internal Auditing field.
The Office of the Comptroller General (OCG) is in the process of requiring such designations for the Senior Head of Audit (and Evaluation) in federal department within two years of appointment.
The following information comes from the CGAP Exam Topic Outline,
Domain III - Government Auditing Skills and Techniques (20-25%)
- Management Concepts and Techniques (A)
- Performance Measurement (P)
- Program Evaluation (A)
- Quantitative Methods (e.g., statistical methods and analytical review) (P)
- Qualitative Methods (e.g., questionnaires, interviews, and flow charts) (P)
- Methods for the Identification and Investigation of Integrity Violations (P)
- Research/Data Collection Techniques (P)
- Analytical Skills (P)
(P) Indicates that candidates must demonstrate proficiency (thorough understanding; ability to apply concepts)
(A) Indicates that candidates must exhibit awareness (knowledge of terminology and fundamentals)
Elements B, C, D, E, G and H have a direct bearing on program evaluation. One option would be to encourage the evaluation community to acquire such a designation, but the requirement for audit experience and the focus on auditing practice would suggest that this is not highly viable.
The clear message to the evaluation community from this is that attention must be paid to such changes on the audit side and that the time has come to find a means to ensure that evaluators somehow meet minimum professional standards. Some observers fear that inaction could lead to the eventual imposition of certified auditors in supervisory roles over evaluation studies. A major challenge on the evaluation side relates to the absence of internationally recognized criteria or a governing body to support enforcement.
Evaluation has often been perceived as the "sister of audit", particularly in the regions. Far too often, evaluators have arrived in a regional office to commence an investigation and been greeted with "so you are the auditors." The recent changes within the Internal Audit community signal both a challenge and an opportunity for the evaluation community to seek an effective and enforceable way to ensure similar standards, quality assurance measures and professional qualifications within evaluation units. This will help to establish an identity for federal program evaluators and support improvements in the overall quality of federal evaluation work.
Lessons Learned from Other Jurisdictions
Reports presented at international conferences on the subject of improving public sector accountability contain themes that in some areas reinforce what has been done in Canada and in other cases may offer options to be explored. For the most part, Canada appears to have a policy evaluation system no less sophisticated than those of other major nations. In particular, there are striking similarities between the Australian and Canadian efforts to enhance accountability of public sector programs through strengthening the evaluation function.
The Auditor General of Australia presented a paper on results-based management advances in that country, and reported a growing emphasis on evaluation to strengthen accountability. The Australian approach parallels recent moves to strengthen the evaluation function in Canada such as in requirements for three-year plans, a mandatory inclusion of evaluation arrangements in policy submissions to Cabinet and the publication of completed evaluation reports.
Australia's federal budget reform initiatives of the 1980s and 1990s shaped the evolution of an integrated framework of accrual budgeting, accounting and reporting under an "outcomes and outputs" framework. This system was fully implemented for the 1999-2000 budget and has been in place since then. Performance measures were developed for the outcomes and outputs framework, with reporting under four Key Result Areas to the Australian National Audit Office. This bears some resemblance to recent changes within the Canadian federal system, whereby the Management Accountability Framework requires departments to measure and improve upon organizational performance in ten key areas See footnote 7 . As well, the increased focus on budget management has some parallels to Canada's Expenditure Review Committee, created in 2003 to keep a close watch on all federal spending, with a view to ensure accountability and that tax revenues are spent for the purpose of achieving results.
The Government of the United Kingdom adopted policies in 1999 and 2000 to commit to evidence-based policy making. This requires that the best available evidence from statistics, research, evaluations and systematic consultation be utilized in policy development. Along with the UK strategy for public spending and taxation, it is a major driver for high quality policy evaluation. The UK has shown leadership through the use of advanced methodological approaches to policy evaluation. Beyond the traditional tools employed in conducting impact evaluations to assess outcomes (what has been labelled "summative evaluation" in Canada) research designs now include: randomized control trials; regression discontinuity designs; single group pre-and-post test designs; interrupted time series designs; and regulatory impact assessments. Of these approaches, only the latter design has been commonly employed in the Canadian context. While none of these experimental designs is new, they may all merit consideration in the context of evaluating public policy in Canada See footnote 8 .
The UK approach to implementation evaluation (until recently labelled "formative" in Canada) has relied on qualitative evaluation methods to assess the most effective means for implementing policy. These methods employ in-depth interviews and case studies – tools that are in common use in the Canadian context. In fact, the UK Cabinet Office has published a framework for using qualitative research and evaluation as a quality assurance measure.
Resource allocation in the UK is now linked to performance. "Service delivery units" are set performance targets that are evaluated regularly using the methods noted above. Programs that meet or exceed their performance targets are rewarded with financial resources in future budgets. UK officials have pointed out the need to avoid target setting as an end it itself. As well, it has been noted that this approach fails to identify unanticipated outcomes and can lead to goal or policy "displacement." Setting targets is seen as useful, but this should be done with a conscious appreciation for the needs, values and demands of the users of public services. This echoes some of the philosophy underlying Results for Canadians.
France has also undergone reform in the evaluation of public policy. In response to criticism of the inter-ministerial system of evaluation (known to have long time frames, arbitrary selection of subjects and poor results utilization) the National Council of Evaluation, created in 1998, was charged to work jointly with the French Planning Office in evaluating public policies. The Council proposes an annual evaluation plan to the Prime Minister and provides advice on methodology for evaluations undertaken by the State or local authorities and their public institutions. As in other jurisdictions, evaluation in France has taken on a stronger position with respect to budgetary and strategic planning processes. As reported by French officials, evaluative capacity is being diffused, indicating an increased interest in the activity from many sectors.
In the US, growing interest in performance measurement, spurred by the Government Performance and Results Act of 1983, led to the development of the Program Assessment Rating Tool (PART), a systematic method to assess performance across the US federal government. The 1983 legislation established requirements for federal agencies to undertake strategic planning with goals and performance measures. That alone turned out to be less than what decision-makers needed to prioritize federal spending or recommend management and legislative reforms. The PART relies on a variety of sources to document whether programs have demonstrated appropriate results. One of those inputs is evaluation. The PART assessments are carried out under the direction of officials from the Office of Management and Budget (OMB), which is found within the Executive Office of the President.
The PART has four major sections to address 25-30 questions about the program under review. These sections are: (I) Program Purpose and Design; (II) Strategic Planning; (III) Program Management; and (IV) Program Results/Accountability. Seven variants of this instrument have been developed to cover the following federal program areas: Direct Federal; Competitive Grant; Block Formula Grant; Regulatory Based; Capital Assets and Services Acquisition; Credit; and Research and Development.
Applying this tool to about 400 programs, representing roughly 40% of the 2004 and 2005 budgets, 40% of the programs were found to be effective or moderately effective. Another 40% were determined not to have demonstrated results. Another 25% were rated adequate or ineffective (note that these add to more than 100% due to rounding). The concept of "program" could mean an entire department's program or hundreds of separate line items in the Budget, and that this is not consistent across agencies.
Although the OMB has the "final pen" on performance reviews, there is heavy reliance on evaluations and reviews carried out by federal departments. In fact, one question in the PART asks if independent evaluations have been carried out. Officials responsible for the PART state that the challenge remains to develop accessible and workable performance measures. There are some similarities between the philosophical underpinnings of the PART and the UK links between program performance and future budgets. Given the state of maturity in each system, it appears that the UK is further along the spectrum while the US approach continues to be refined.
These lessons from other jurisdictions demonstrate that evaluation continues to play a key role in measuring results and that credible evaluations can play a prominent role in supporting budget decisions.
5. Towards a More Professional Evaluation Function in Canada
Three observations can be made after reviewing the status of evaluation in Canada along with recent experiences in Internal Audit and in other nations. First, there are many tools available to the evaluator and the challenge is to have the knowledge and training to be able to identify when different tools are required and how to apply them. Second, evaluation per se in Canada is in need of a distinct identity to ensure its survival. This has implications for administrative arrangements within government departments and agencies and for the 'marketing' of evaluation services to program clients. Third, our system requires a means of ensuring that practitioners of evaluation possess the skills and knowledge necessary to carry out credible and useful evaluations to support decision-making and budget allocation processes.
5.1 Improving the rigour of the function: the choice of evaluation instruments
Lessons learned from other jurisdiction point to the growing use of more sophisticated and technically complex instruments. The example of the PART in the US illustrates how evaluation results feed into the performance measurement continuum. The UK reports the application of methodological rigour in conducting evaluations.
Problems with Current Evaluation Tools
Overall, it is important to apply known evaluation approaches in a more rigorous fashion. It is widely agreed that numerous evaluation designs and related methods already exist See footnote 9.
However, the current set of evaluation standards within the government of Canada asserts one-size-fits all. Evaluations generally reflect:
- Lack of use of quantitative methods (see ERIC)
- Over-reliance on interviews with stakeholders, lack of comparison groups or baseline data (see quality review)
- Do not address the tough questions, few recommendations on expenditure management (see ERIC, quality review)
- Program managers prefer formative evaluations as they focus on improvements to programs and do not address substantive questions such as "what difference did this program make?") These evaluations do not enable a fundamental discussion on the rationale of the program.
- Formative and summative evaluations are often confused. Formative evaluations are submitted to Senior Executives and TBS for decision-making when the real need is for a summative evaluation. There is a need to better distinguish the difference between the two and when they ought to be used.
- There is a role for evaluations that examine program implementation. These should be used judiciously, however, because the real need is for summative evaluations. Currently, the expectation put forth in some RMAFs that the program will have both a formative and summative evaluation places too much of a workload on the program, its stakeholders and departmental evaluation functions.
"One-size fits all" means that evaluations are often too rigorous for the specific needs. This involves two aspects. First, TBS applies the same standards and criteria to all evaluations. As a
result, evaluation units often conduct reviews in the absence of standards or protocols. Second, Evaluation Units cannot keep up with the demand for evaluations, which, given the required rigour, are
resource intensive. There is a need for more targeted tools with associated standards so that evaluation managers can match tools to need. This will also increase the quality of evaluations by
establishing a set of standards. Moreover, it should better reflect the resources available. The table on the following page presents some options for consideration.
Evaluators need to make use of a better range of evaluation designs and methods:
Basic and Traditional Techniques
|Sampling of program managers and program beneficiaries|
|Interviews and file/document reviews|
|Case studies of individual experiences|
|Comparative cost analyses|
Evaluation Design/Advanced Techniques See footnote 10
|Randomized Controlled Trials|
|Direct Controlled Trials|
|Non-Experimental Direct Analysis|
|Non-Experimental Indirect Analysis|
|Regulatory Impact Evaluations|
|Proposed Example of Suite of Evaluation Tools|
|Results-Capacity Check (RCC)||Implementation Evaluation||Impact Evaluations||Policy Evaluations||Value-For-Money Assessments||Management Reviews|
|Focus||State of performance information collection (relevancy, validity, reliability, & systems) and use for decision-making and accountability.||Specific aspect of program implementation(eg: management framework, governance, efficiency, etc..)||Evaluate broad range of program implementation issues including: governance, management, performance measurement systems, etc… alternative delivery mechanisms)||Identifies what difference the program made, relevancy, effectiveness, value-for-money, impacts, alternatives||Higher level in the PAA, looks at how suite of programs are working together, redundancies. Future oriented in terms of lessons learned, future policy directions.||Maturing model for assessing the cost-effectiveness of a program links costs with administration, direct outputs, direct impacts (direct and indirect costing models)||Strategic, management issues|
|Timeframe||2 weeks||3-4 months||4-6 months||8-12 months||6-8 months||2 weeks per year over a 3 year period||3-4 months|
|Independence||Evaluation Unit||Evaluation Unit or program management||Evaluation Unit||Evaluation Unit||Evaluation Unit||Evaluation Unit or program management|
|Evaluation Design and Methods and Approaches||Template, interviews, data review||Interviews, document reviews, literature review||Full range of evaluation design and methods as appropriate||Full range of evaluation design and methods as appropriate||Secondary analysis augmented with some primary analysis, statistical analysis, societal measures, etc.||Template, some interviews, costing analysis, document review and literature review||Primary research, with some secondary analysis|
|Issues||To be developed|
|Standards and Protocals||To be developed|
It is important at all times to recognize the distinction of public policy evaluation from that of education and health applications. Some of these tools/experimental designs have been applied widely in these latter fields but have more limitations when applied in public policy applications. There is general agreement that the instrument of choice will depend on what the objectives are. In the federal evaluation context, unless the focus is on G&C, most evaluation tends to deal with management practices, making it more difficult for regulatory programs to identify particular clients let alone determine the impact. Such cases limit the experimental design options.
One of the pitfalls identified in CEE studies and corroborated by the various experts consulted throughout this research was the perceived over-reliance on the opinions of respondents. Some evaluators have tended to base their own judgments on the opinions of a large group of experts interviewed during an evaluation study. Evaluation managers underscore the importance of the evaluator interpreting, and not merely summarizing, information and opinions encountered in a study. A number of federal evaluation units have borrowed approaches from the Internal Audit community in order to validate evaluation findings.
At the same time, evaluators must be cautious in their choice of complex methodologies in situations where these are not appropriate. There are situations where evaluations will benefit from the application of more rigorous methods, but the challenge in designing a good evaluation is to select the approaches that will yield the most useful information for management. Just as technology for its own sake does not necessarily solve a problem, so must the professional evaluator understand the choices available and be able to select the methods of investigation best suited to the data and context of the program/issue being evaluated.
- Introduce a Suite of Evaluation tools
- Expand use of current methods used
- Develop higher level of training (Masters level) for evaluation methods
- Need to consider the development of protocols for each type of method to ensure issues such as neutrality/independence, objectivity, quality assurance are considered/improved.
5.2 Enhancing the Evaluation Identity
Towards an Identity: Integrating evaluation into the organizational culture
To ensure that evaluation is understood and sought by senior management, evaluators and evaluation units need an identity – something that distinguishes them from other analytical areas and something that makes them unique in the organizational fabric.
One identifying factor emerges from the response to a traditional criticism. Various observers have pointed to over-reliance among evaluators on interview findings to form their conclusions. Interviews are one key line of evidence that lead the evaluator to form perceptions. But it is critical to dig behind those perceptions to find the reality. The evaluator's recommendations, based on critical thinking, can address how to close the gap between perceptions and reality. This critical thinking is a distinguishing factor from Internal Audit and other forms of inquiry. It is one of the aspects that make evaluation different. In that sense, it becomes part of the identity of program evaluators.
In recent years, the focus of public sector accountability has centred on the achievements of results, particularly with transfer payments programs. Program officials often contact evaluation heads when they want to learn about Program Activity Architecture, "results" and "indicators." By helping them understand these concepts, the evaluation head begins to build a relationship with them. The whole time, the object is to get the program people into a new mindset about accountability and transparency and understanding their vulnerability – that they must be able to respond to different kinds of questions at different levels of the organization. They must be shown how evaluation can help them understand what they are doing and help them to adjust.
The evaluation function still has to prove its value added to senior management. There is a general appreciation of the function but not necessarily an understanding of what it can do. Many departments continue to rely on an integrated Audit and Evaluation Plan. If the products and value-added of both audit and evaluation can be clarified, heads of evaluation will be better positioned to demonstrate how evaluation can be applied. The more practical we make evaluation, the more it is to be desired as a tool by program managers. This will help to make evaluation part of the corporate mentality.
Program managers often seek an evaluation to use as evidence to convince their superiors that they have done things right. This is why the credibility of evaluators is so critical. They are sent in to talk with managers who may be two levels above them in the organization, so they have to bring their intellectual skill sets to the table. By demonstrating quality in the work, the image of evaluation is enhanced. At all levels of the organization, evaluators must be able to convince officials that there are thematic links from the evaluation work to the broader policy picture. This underscores the importance of training and development for evaluators.
What is the value added from Evaluation?
The Treasury Board Secretariat (2004) elaborates four contributions that can be made by a strong evaluation function in the Canadian context:
- Important government management practices such as, stewardship, the Management Accountability Framework (MAF) and, more recently, the ongoing expenditure review exercise require a rigorous
- Evaluation results provide the fact-based evidence that can support Executive Committees and central agencies in priority setting and resource reallocation.
- Used strategically, sound evaluations can offer the insight and detailed information required to support improved management practice and achieve results; and
- There is no other discipline in government that provides, in one place, an analytical and advisory capacity on both results and accountability See footnote 11 .
Evaluation is distinct from Internal Audit through its focus on tracking actual performance to support objective assessments of results achieved. Internal Audit, on the other hand, supports decision making by providing assurance on an agency's risk management strategy and management control framework. Both functions serve a purpose in monitoring and improving management and service delivery. Both are important along the accountability continuum.
Where Internal Audit provides assurance that management controls are in place, evaluation can be more closely aligned to support budget processes as illustrated through recent developments in the UK, Australia and US.
In that context, evaluation in the Canadian government is a valuable tool in the expenditure review process instituted in December 2003. The Expenditure Review Committee (ERC) oversees all federal spending. The ERC mandate is to ensure that government spending remains under control, is accountable, is closely aligned with the priorities of Canadians, and that every tax dollar is invested with care to achieve results for Canadians. The ERC is examining about $150 billion in federal spending using seven criteria:
- Is the public interest being served?
- Should there be a role for Government in the program area?
- Is it appropriate for the program to remain in the federal sphere?
- Could part or all of the activities be transferred to the private or voluntary sectors?
- Is there Value-for-Money to Canadian taxpayers?
- Can the delivery be made more efficient?
- Are these programs or activities affordable and what could be eliminated?
What can evaluation contribute to expenditure review?
Value for Money Assessment is the main approach of the expenditure review, but the focus has not been on an absolute assessment of whether or not a program is delivering value for money. The key question has centred on how any particular program delivers value relative to other programs. If all programs could be assigned rankings, they could be assessed relative to the pot of money available. In expenditure review, economics is not the only consideration, so value for money cannot be the only criterion. Political and social policy may still dictate that money should be spent for a particular purpose, even if a value for money assessment were negative. In expenditure review, value for money does not simply follow an accounting approach.
In that sense, evaluation can serve a useful purpose. In the view of senior officials, Expenditure Review must answer four key questions:
- How is this spending linked to the Government's priorities?
- Is there a way to cut the cost of delivery?
- What is the program's overhead cost?
- Are any horizontal links possible?
Evaluation can help to provide a judgment on these four questions for a wide range of programs. This would go well beyond current departmental evaluation plans but cover a narrower range of inquiry. If the evidence suggests a need for further study, ERC officials would support more extensive reviews. But, in many cases, the short list of questions may be sufficient to indicate where programs are performing well or poorly without a protracted investigation, and enable decisions about continuing or winding down spending in certain areas. This does not suggest that major evaluations will be eliminated, but it may focus such efforts on a smaller number of programs where such a level of detail is deemed to be appropriate.
In the world of expenditure review, evaluators would have to be experts in policy and understand the program being studied. This is not dissimilar from current requirements for policy evaluation.
5.3 Options for Improving the Credibility of Evaluation Practitioners
Evaluators working in the public service most often have a degree at the Masters level, but a small minority have gone through formal training in evaluation methods per se during their university studies. In the US, the bulk of academic training designed for evaluation has been in the areas of industrial psychology and education. It is likely that the majority of practitioners in Canada have been trained on the job, both in government and in the consulting community. This may be sufficient to ensure that evaluators bring appropriate skills to their assignments, but it does not allow any standardization for the purpose of assessing the level of their abilities. Most often, evaluators are identified on the basis of their experience.
The debate about standards and essential competencies has gone on for years without resolution. The CES commissioned a study about four years ago to identify the core knowledge components an evaluator should be expected to possess. The concept behind this was that only individuals who met certain standards should be allowed to head an evaluation study. The team could comprise any other professionals with different research skills, but the team leader would be recognized as an "evaluation expert." This professional status would ensure to potential clients that the individual leading the evaluation would be accountable to certain standards.
The Treasury Board Secretariat has published standards for the conduct of evaluations in Canada. These are in the areas of evaluation planning and issues; competency; objectivity and integrity; consultation and advice; measurement and analysis; and reporting. Detailed guidance is provided for each area See footnote 12 . Similarly, the AEA revised and approved (in July 2004) its Guiding Principles for Evaluators. Broadly, these are in the areas of systematic inquiry; competence; integrity/honesty; respect for people; and responsibilities for general and public welfare See footnote 13 . What emergences from these overarching sets of standards and principles is that evaluators must possess certain technical skills, be versed in research methods, be good listeners and communicators, display honesty and integrity personally and with respect to the evaluation process and ensure that their work balances the needs of their clients with the public interest.
This suggests that "evaluation" is not strictly a profession. It involves professional skills and ethics, but it requires something beyond. Evaluators must possess a certain mindset and be able to draw on different skills and techniques, as they deem appropriate to any specific assignment. Evaluation, in the public policy context, goes beyond research and analysis – it involves a blend of research, interpretation and critical thinking.
Alternative approaches to ensuring levels of professional competence
Although professional standards for evaluators and evaluation are generally agreed upon in the government, academic and consulting communities, unanswered questions remain in the areas of testing and enforcement of those standards. Various options have been the subject of debate for more than a decade, and still there is no overall agreement. The principal options are licensing, accreditation, certification and credentialing.
Full Accreditation and Licensing
These options are similar and have the least appeal. Such an approach is the most onerous, both in terms of the requirements to become licensed or accredited and from the perspective of establishing and administering a system. To issue licences would require national acceptance of standards, standardized testing and a licensing authority to test evaluators, collect fees and issue licenses. Given the diverse nature of evaluation and evaluators, such an option would appear to involve administrative complexities and a need for ongoing negotiations with provincial governments and professional associations. Moreover, such a system of licensure opens the door to legal challenges. It would take only one court challenge from an individual denied official status to derail the process. Given that the goal is to create greater certainty about the professionalism of evaluation work, pursuit of such regimes would appear to create potential pitfalls and costs that far outweigh the expected benefits.
Perspectives on Certification
The Government of Canada has invested heavily in certifying professional groups. In 1998 the effort began with two communities, procurement and materiel management. The driver in both cases was a lack of confidence and trust. More recently, the Office of the Comptroller General is in the process of developing a certification program for Internal Auditors. In this case, the impetus relates more to a desire to establish minimum professional standards.
The Professional Development and Certification Program
The Professional Development and Certification Program emerged as a key human resource renewal initiative in support of Modern Comptrollership, Human Resources Modernization and the new Policy for Continuous Learning in the Public Service. It also supports the TBS management agenda and commitments outlined in Results for Canadians: A Management Framework for the Government of Canada, Human Resource Modernization and the Management Accountability Framework.
The Program has two components, (1) Professional Development and (2) Certification, and is designed to provide employees in the Procurement, Materiel Management and Real Property Community with the learning tools to help acquire the skills, knowledge and expertise required to meet evolving and complex business needs, government priorities and management initiatives. It is expected to enhance the professionalism and value-added contribution of this community in the delivery of programs and services to Canadians and in the organizations in which they are employed See footnote 14 .
In designing the program, officials were clear that, if there is a professional development element, not everybody would need to be certified. The potential population is 6,000 including real property people, with a large range of responsibilities and degree of application in the field. The designers looked for what was common in the process, and developed a competency profile in cooperation with the community. The elements of this approach included: linking the competency profile directly into the business process to get some commonalities; a web-based tool to enable self-assessment (to help people take stock of where they are and to work in cooperation with their managers to develop a learning plan See footnote 15 .
After identifying high priority topics, a curriculum was developed with three new modules: Regulatory Regime and Policy (how government works and where they fit); Business Processes (what does it mean to be in these fields? - a specific 3-day course for each area); Life-cycle Asset Management (how this fits into government processes - being better at risk management, contingency planning...). The competency profile was turned into a standard with the help of the Canadian General Standards Board and targeted for publication by March 31, 2005.
The next step is to develop the requirement for a certification and assessment tool. This is being done with the Psychological Centre for three levels. Exams are being designed to test knowledge and a certification body is needed to review records of achievement re experience, check references and administer the exams.
The expectation is that the training process for procurement will increase the average educational level of people in the field. Not everyone will have to be certified, but only critical positions. There is a need to establish a review body in addition to the certification body and TBS is pushing for the public service management school as its preferred option. There will also be a need for a dispute resolution process.
There are several outstanding questions such as who will maintain the database and the length of time a certification will remain valid. As well, there are questions about who would have the authority to revoke certification and how to deal with grievances or legal challenges.
What does this imply for the evaluation community? The process has taken six years already. If such an avenue is pursued for evaluation, it may be possible to do it in less time because there are higher levels of education in the evaluation community and less variance among the skill sets. At the same time, evaluation assignments can also vary widely. There is far more standardization in life-cycle asset management.
The process for certifying Internal Auditors is expected to be less complex. That community already has standards and institutions. Evaluation, even with standards, has more grey areas.
Other Approaches to Certification
Membership in Associations: Self-Assignment
The Canadian Association of Management Consultants (CAMC) offers a Certified Management Consultant (CMC) designation, but this is not required for membership in the association. Currently, the CAMC certification process is under review. About 2,500 people (78% of the total membership) carry this designation, which is recognized in over 40 countries.
Canada has a national certification board and provincial institutes that are registered legal entities. Candidates receive their designation from a provincial board. No province mandates the requirement for designation to carry on the practice of management consulting. Since 1998 the national organization has been setting the standards with input from the provincial institutes.
The CMC designation is divided into three sections: external components (equivalencies in finance, human resources, information technology, marketing, operations and strategic planning. Most entrants to the program have an MBA. This factor alone aligns the CMC concept more closely with the evaluation situation than the government procurement case. The second section (the "association component") involves two courses administered with a comprehensive exam (8 hour sit-down case study). The third part depends on experience, requiring at least three years of 1,200 hours annual effort in the business of management consulting (including business development).
As well, there is a mentoring requirement. Candidates must be sponsored by at least two people who carry the designation.
Some CAMC members declare "program evaluation" as an area of practice, but the association does nothing to regulate it. Nor is there any interest in certifying evaluators. Testing is limited to overall skills to consult professionally, but not in subject areas or applications. Policing can be achieved through client feedback. It would be unusual for someone to go through all this rigour to be certified and then go out and sell services where they are not experienced. When such incidents occur (albeit rarely) provincial disciplinary committees deal with them.
The CAMC has a Code of Conduct that appears on the association's website. Members must certify each year that they are adhering to the Code.
Credentialing and Standardised Competencies
A certification process can verify the technical skills but does not necessarily prove that a candidate has the intellectual skills. Many critics of certification suggest that the focus ought to be instead on good training. The Essential Skills Series from CES and new programs from CEE may offer some potential in that direction. There is consensus that "old time" evaluators have not kept in touch with the evolution of the evaluation function.
In any case, the various experiences confirm the general conclusion from the extensive academic literature available in this debate that less complex solutions may better serve the interests of the evaluation community. If there is one area of agreement among the officials consulted, it is that we can define core skills and competencies for evaluators. The literature and debate in this area centre on what core competencies ought to underlie the basic credentials an evaluator ought to bring to the job.
The issue of credentials or standardized competencies has been under debate for a long time. This study examined the history of the debate and sought the opinions of major proponents of the alternatives. Following a study of the literature in this area, interviews took place with academics and practitioners in both Canada and the US to obtain their most recent perspectives on the options.
Underlying the discussion is one fundamental concept. Evaluators, unlike Internal Auditors, come from diverse backgrounds, generally falling within the social sciences and education. Not all of their educational programs teach the principles of program evaluation per se, and many of these individuals learn evaluation "on the job."
The US General Accounting Office developed a "credentialing" training program that was extensive but has since been abandoned due the high cost of maintaining the program See footnote 16 . Many experts in the US favour credentialing for public sector evaluators. In Canada, we have recognized guiding principles and standards for evaluation from the Treasury Board and the CES. The key elements of these are generally consistent with those published in the US.
As discussed above, the underlying question in determining the extent and depth of credentials centres on the question of what makes a person an "evaluator." It is not methodology courses or sampling courses per se. Evaluation covers a range of skills including an understanding of theories, models, history, needs assessments, cost/benefit analysis, using evaluation plans and more. Most training programs touch upon some of these but not necessarily all.
What are the core competencies of an evaluator?
There is general agreement within government and the academic/practitioner communities about the core competencies of an "evaluator" but the issue has long been left at the 'agreement' level due to the lack of an enforcement mechanism. As noted above, one underlying reason is the absence of an international governing body or standards. In recent years, US researchers have integrated the essential competencies for evaluators by doing a "crosswalk" of the essential competencies identified in the US and Canada See footnote 17 . Stevahn, King and Ghere provide a rationale for having evaluator competencies and develop a taxonomy of essential competencies. To the extent possible, the authors have attempted to relate the competencies to various activities that evaluators carry out to achieve standards that will constitute sound evaluations. In that sense, they have described the competencies in behavioural language. This new research builds on earlier work by the same authors in 2001, at which time they had compared their essential evaluator competencies with standards and principles endorsed by the major North American evaluation associations and determined a need to make those competencies more comprehensive.
The authors concede that their proposed taxonomy needs to be validated through widespread endorsement by professionals in the field.
Each of the six proposed areas contains a compilation of specific skills. They are rank-ordered from the most specific to the more general and each has a list of specific competencies. A full matrix of the competencies is reproduced in Annex 2. In summary, the six categories are:
- Professional practise - knowing the standards, being involved in practice, ethics, honest evaluation conduct and reporting; respecting clients, respondents, other stakeholders
- Systematic inquiry – relates more to technical skills
- Situational analysis – understanding the political environment, etc.
- Management skills - the nuts and bolts of managing evaluation projects
- Reflective practice - being able to step back and understanding needs for growth; engaging in professional development
- Interpersonal skills
The Treasury Board Competency Profile for Public Service Evaluation Professionals (http://www.tbs-sct.gc.ca/cee/stud_etud/capa-pote05-eng.asp) is equally comprehensive and, in a sense, goes further in attempting to specify the competencies for junior, intermediate and senior level evaluators.
It is generally agreed that a means of ensuring basic competence would serve to maintain a level of credibility for evaluation groups within the public service, particularly as Internal Audit groups strengthen their identity and credentials. There is a pervading sense among a wide majority of academics (at least via the literature) that licensing or certifying individuals as "evaluators" may create at least as many, or more, problems than it solves. There appears to be much greater support among officials surveyed for certifying institutions that would be delegated the responsibility to determine who had the appropriate competencies or credentials.
University programs can play an important role in ensuring that social science graduates demonstrate at least the core competencies determined for an entry level position in the public service. As well, the same programs can be made available part-time to individuals from the public sector or consulting community to ensure that their skill sets match with the grid for the work they are required to undertake. In a sense, the academic sector has been pre-emptive in designing certificate programs in evaluation. Although the US academic community has retained its focus on psycho-educational applications, the nature of basic evaluation training is of relevance in other public policy settings.
Most American universities offering Ph.D. and Masters degrees in evaluation do so in conjunction with another discipline (e.g., applied psychology). Some of these institutions also offer professional development workshop series or summer institutes in evaluation, usually with a certificate awarded upon completion. A typical full-time program aims to develop applied researchers, some of whom will end up as academics, consultants or government employees. Universities such as Claremont Graduate University in California have educated a large number of students in evaluation through such programs.
The part-time certificate programs are designed to help "accidental evaluators" who find themselves in jobs that require knowledge and skills in the theory and practice of evaluation. Such individuals, without prior experience in evaluation, find themselves in a job where evaluation skills are required. The vast majority of these candidates usually have a graduate degree prior to enrolling. Some of these programs are delivered through distance learning, and students may be invited to attend in person for summer events.
Among US universities that offer training programs in evaluation, there is support for the formation of a voluntary body that would accredit programs. Individuals who successfully completed these programs could be "credentialed" - but all of this is going to be on a voluntary basis. This can provide quality assurance to government clients, without the need for a government-level examination. Ultimately, the market will screen out people who do not meet the standards.
Proponents of a voluntary system see two stages of development. Initially, anyone who wants to be accredited to offer programs could come to the voluntary certifying body. In the longer term, the concept of a national board (such as teachers in the US) could develop, but this could be at least 15-20 years in the future.
Merits of a voluntary accreditation body
A number of part-time programs have emerged in the Canadian academic sector in the past few years and these certification programs are growing in number.
Most recently, the University of Ottawa is planning to offer a joint certificate program in Education and Social Sciences commencing in the fall of 2005. The two-year part-time curriculum will cover methods/practice; theory and interpretation issues; practicum; elective courses; and a synthesis paper of publishable quality.
The day is not far off when there will be a sufficient number of such programs to warrant the formation of a "certification network" in Canada. Participating schools could form a voluntary body along with the federal government and the CES acting as advisors. Agreement could be sought to ensure that the programs trained and tested candidates to ensure that graduates met the competency profiles endorsed by the advisory organizations.
Such a network of academic institutions holds great promise. They have been proactive in designing and offering evaluation courses and the timing is good to begin discussions on delegating the responsibility to test for competencies at the university level. Most likely, agreements would be required to have a joint CEE-CES committee, in an advisory capacity, "approve" the tests that these university programs would administer to attest that individuals met the required competencies.
Many legal and administrative problems can be removed from Government if agreement can be reached to designate academic institutions to certify that individuals met acceptable standards and displayed the necessary levels of competence as specified for the level of their job. A the same time, the transition to such a regime would create a gap in that the bulk of current practitioners have not received such formal training. In a voluntary system, and without an enforcement body, consideration would be required for people already working in the field.
Some current evaluation professionals would want to attend part-time courses, either through the CEE or CES or in one of the emerging university programs. Given that such a system would likely be voluntary, consideration should be given to "grandparenting" current evaluators who applied for status. If the CES were to charge an annual fee (perhaps $200-300) above its typical membership fees, a fund could be built up to subsidize training for interested individuals wishing to enhance their evaluation skills in particular areas. The future of any such system will depend on the strength of the underlying training network and the promotion of continuous learning.
There will be adjustment costs and growing pains, because some people will be certified as having appropriate competencies when they may not merit such a status. Even a full certification system would have some flaws and likely misclassify people. But after a period of transition, perhaps eight to ten years, many of the older evaluators will have retired. The next generation will have much more standardized training and will have benefited from the emergence of programs that train and test the key skills and competencies. The critical factor in following such a path is the avoidance of the legal pitfalls and administrative costs associated with a full certification or accreditation system. These problems will not emerge in a system that attests to competence and skills.
After the transition period, more stringent requirements could be imposed for the next generation. But, as always, the system ought to be voluntary. As noted several times, the market will serve as the most efficient filter.
There is support among the federal evaluation community that the "competencies" route is the most appealing and that basic screening and testing should be left to institutions. This may be a boost to recruitment efforts in the public service. At the same time, the knowledge that a candidate displays the core competencies may not translate at first into job performance. In order to determine how a junior evaluator performs on the job and how that individual thinks in a problem-solving situation, it would be useful to pair recruits with experienced evaluators to support the learning process. This approach is now followed at Health Canada and other departments and yields two-way benefits. New graduates come to the job with modern quantitative skills and often bring new energy and that perspective to the work. This can have benefits for the more experienced evaluators.
Summary of Options for Improving the Professionalism of Evaluators
Licensing authority tests candidates to national standards and issues license giving the holder a legal right to perform certain duties
Requires national standards; negotations with provinces; setting up licensing agency; collection of fees; renewals; redress mechanism
Likely implementation range 8-10 years
Adherence to stringent standards would ensure evaluators had desired skills
Granting of a written or printed statement testifying to an individual's qualifications to perform certain duties
Would require standard written and oral tests, specifying educational requirementsn and agreement on a certification body
Likely implementation range 5-7 years
Letter or certificate given to an individual and stating that the person has the right to exercise a certain position or authority
TBS and CES could agree on the minimum standards of experience/training and the society could issue letters to assert that various individuals have the necessary credentials
Likely implementation range 3-5 years
Agreement on common skill sets that demonstrate an individual's competence to carry out evaluation work.
These areas reflect the knowledge, skills, experience and attitudes that enable a researcher to function as an effective evaluator
TBS, CEE and the AES have published competency profiles. Recent academic research has reconciled these and could support short term action.
Likely implementation range 18 months-2 years
|Accreditation of Schools||
A recognized professional group (or institution) evaluates an instructional program against agreed standards. That program then certifies that its graduate meet the standards.
There is an emerging opportunity to delegate responsibility to universities who are now training students in basic evaluation skills.Evaluator certificate courses are growing in number and it would be most efficient to enter agreements with universities in all regions to train candidates to the standardized core competencies (at a junior level).It would also be possible to have the academic institutions offer part-time training at advanced levels of evaluation
Likely implementation range 2-3 years
(same cons that apply to competencies in row above)
Towards a Learning Culture
Regardless of the path chosen to create standards for the practice of evaluation, an important element to support ongoing success will be the availability and assurance of continuous learning. TBS has taken on a prominent role in promoting the use of evaluation as a tool. There can be significant benefits from a focus on the coordination of training programs and promoting a continuous learning culture.
The Treasury Board Secretariat has documented the public service commitment to encourage and support employee efforts to improve and enhance their professional qualifications and accreditation through formal education, subject to their organizational mission and operational requirements.
Commitment 6 of the Treasury Board Policy for Continuous Learning is that "employees should be supported in their efforts to enhance their academic or professional qualifications or credentials. Encouraging employees to develop and enhance their professional qualifications and abilities, or pursue further accreditation in their field, will require a practical approach. It may involve partnering with recognized universities or colleges in specialized areas of study, or an expansion of the types of accreditation that are recognized as professional qualifications. However this commitment is approached, it must be supportive of training, development and learning in individual areas of specialization, while also respecting the requirements of the organization in fulfilling its mission."
The CEE has demonstrated its commitment to continuous learning in evaluation through its Learning Strategy and ongoing cooperative efforts with the Canadian Evaluation Society's National Capital Chapter (CES-NCC), one of 12 regional non-for profit professional chapters of the Canadian Evaluation Society See footnote 18 . The CES Essential Skills Series is linked closely with the CEE Development Program for Evaluators. As well, the CEE takes an active role in all CES conferences and workshops.
The CEE has not benefited from the full support of federal departments in the development and delivery of its Learning Strategy See footnote 19 . That strategy, revised in February 2005, has been developed over several iterations to strengthen the evaluation function throughout the Government of Canada by adhering to the following principles:
- Focus on critical or core capacity requirements;
- Respect the diversity within the evaluation community;
- Build on experience and apply lessons learned;
- Recognize different learning needs and preferences; and
- Apply a long-term sustainable approach.
The Strategy serves practitioners, managers and users of program evaluations. More consultation may be needed to ensure that Heads of Evaluation, who often send staff away on formal training
programs, make a commitment to support the training made available by the CEE in cooperation with its delivery partners See footnote 20
More consultation with heads of Evaluation Unites may be required to engage them fully in shaping the curriculum in core knowledge areas and specialized skills.
Some Advanced Evaluation Techniques
Randomized Controlled Trials (or RCT) are studies that measure an intervention's effect by randomly assigning, for example, individuals (or other units, such as schools or police precincts) into an intervention group, that receives the intervention, and into a control group that does not receive the intervention. At some point following the intervention, measurements are taken to establish the difference between the two groups. Because the control group simulates what would have happened if there were no intervention, the difference in outcomes between the groups is said to demonstrate the "outcome" or impact one would expect for the intervention more generally. The US has applied the RCT approach in areas including education, health, nutrition, and anti-poverty programs. In once case, the approach has been used with taxpayers to evaluate the effectiveness of various tax compliance strategies. One key benefit of this approach is the possibility of utilizing data already being collected for other purposes.
There are many programs for which it would not be possible to conduct an RCT. To carry out an RCT, there must be a possibility of selecting randomized intervention and control groups—those who will receive a program intervention and those who will not (or will receive a different intervention). For practical, legal, and ethical reasons, this may not always be possible in a public policy setting. Obviously, where the program provides a public good like national security, clean air or other basic benefits, no control group is possible.
Direct Controlled Trials are studies where various factors that might influence test results are directly controllable to such a degree that potentially undesirable or external influences are eliminated as significant uncertainties in the outcome of the trial. Such trials are most often possible in technology or engineering programs. The OMB cites the example of a newly developed weapon with a test plan that measures the performance of the new weapon under a hostile or adverse environment that simulates a battlefield situation. The performance of the weapon will be measured and analyzed using appropriate statistical and other analytic tools, and the results of that analysis will be compared to the pre-existing but demanding test performance thresholds. In such a case, this evaluation can provide the full measure of rigour needed for evaluation of the development program and for use in acquisition decisions.
Quasi-Experimental evaluations, like randomized controlled trials, assess the differences that result from a federal activity and the result that would have occurred without the intervention. For example, for a welfare program, the comparison may be between an intervention group that receives the benefits of a program and a comparison group that does not. However, the control activity (comparison group) is not randomly assigned. Instead, it is formed based on the judgment of the evaluator as to how to minimize any differences between the two groups, or it may be a pre-existing group. Quasi-experimental evaluations often are called "comparison group studies." Under certain circumstances, well-matched comparison group studies can approach the rigor of randomized controlled trials and should be considered if random assignment is not feasible or appropriate. However, use of comparison group studies does increase the risk of misleading results because of the difficulty in eliminating bias in the selection of the control group. Awareness of this risk is crucial to the design of such evaluations.
Non-Experimental Direct Analysis involves evaluations that examine only the intervention subject (e.g., group)—the subject (group) receiving the program intervention (e.g., for groups, the intervention may be benefits); there is no comparison subject (group). A common example of this type of evaluation, the "pre-post study," examines only an intervention group (no separate comparison group is selected), with outcomes compared both before and after program benefits are received. "Longitudinal studies," which also examine changes over time and relate those changes back to the original condition of the intervention group, are another example. This group of non-experimental tools and methods includes some of the more traditional approaches like correlation analyses, surveys, questionnaires, participant observation studies, implementation studies, peer reviews, and case studies.
The concern with this class of evaluations is that they often lack rigour and may lead to false conclusions if used to measure program effectiveness. Therefore, their use may be most effective in situations where the object is to examine how or why a program is effective, or to provide useful feedback for program management.
Non-Experimental Indirect Analysis is often useful in cases where research results are so preliminary in the near-term or so predominantly long-term in nature that a review by a panel of independent experts may be the most appropriate form of assessment. The use of such surrogate analysis must be justified for a specific program based on the lack of viable alternative evaluations that would provide for more meaningful conclusions. The OMB notes that, in some cases, such a review may be the best type of assessment available.
Other advanced techniques include Matched Comparison Designs and Interrupted Time Series Designs. These are discussed in more detail in P.T. Davies, "Policy Evaluation in the United Kingdom," 2004.
|1.1||Applies professional evaluation standards|
|1.2||Acts ethically and strives for integrity and honesty in conducting evaluations|
|1.3||Conveys personal evaluation approaches and skills to potential clients|
|1.4||Respects clients, respondents, program participants, and other stakeholders|
|1.5||Considers the general and public welfare in evaluation practice|
|1.6||Contributes to the knowledge base of evaluation|
|2.1||Understands the knowledge base of evaluation (terms, concepts, theories, assumptions)|
|2.2||Knowledgeable about quantitative methods|
|2.3||Knowledgeable about qualitative methods|
|2.4||Knowledgeable about mixed methods|
|2.5||Conducts literature reviews|
|2.6||Specifies program theory|
|2.7||Frames evaluation questions|
|2.8||Develops evaluation designs|
|2.9||Identifies data sources|
|2.11||Assesses validity of data|
|2.12||Assesses reliability of data|
|2.17||Provides rationales for decisions throughout the evaluation|
|2.18||Reports evaluation procedures and results|
|2.19||Notes strengths and limitations of the evaluation|
|3.1||Describes the program|
|3.2||Determines program evaluability|
|3.3||Identifies the interests of relevant stakeholders|
|3.4||Serves the information needs of intended users|
|3.6||Examines the organizational context of the evaluation|
|3.7||Analyzes the political considerations relevant to the evaluation|
|3.8||Attends to issues of evaluation use|
|3.9||Attends to issues of organizational change|
|3.10||Respects the uniqueness of the evaluation site and client|
|3.11||Remains open to input from others|
|3.12||Modifies the study as needed|
|4.1||Responds to requests for proposals|
|4.2||Negotiates with clients before the evaluation begins|
|4.3||Writes formal agreements|
|4.4||Communicates with clients throughout the evaluation process|
|4.5||Budgets an evaluation|
|4.6||Justifies cost given information needs|
|4.7||Identifies needed resources for evaluation, such as information, expertise, personnel, instruments|
|4.8||Uses appropriate technology|
|4.9||Supervises others involved in conducting the evaluation|
|4.10||Trains others involved in conducting the evaluation|
|4.11||Conducts the evaluation in a nondisruptive manner|
|4.12||Presents work in a timely manner|
|5.1||Aware of self as an evaluator (knowledge, skills, dispositions)|
|5.2||Reflects on personal evaluation practice (competencies and areas for growth)|
|5.3||Pursues professional development in evaluation|
|5.4||Pursues professional development in relevant content areas|
|5.5||Builds professional relationships to enhance evaluation practice|
|6.1||Uses written communication skills|
|6.2||Uses verbal/listening communication skills|
|6.3||Uses negotiation skills|
|6.4||Uses conflict resolution skills|
|6.5||Facilitates constructive interpersonal interaction (teamwork, group facilitation, processing)|
|6.6||Demonstrates cross-cultural competence|
- American Evaluation Association, Guiding Principles for Evaluators, American Journal of Evaluation 26 (1), March 2005, pp. 5-7.
- Altschuld, J. W., Developing an evaluation program: Challenges in the teaching of Evaluation. Evaluation and Program Planning 18, pp. 259-265, 1995.
- Altschuld, James W., The Case for a Voluntary System for Credentialing Evaluators, American Journal of Evaluation, 20(3), 1999, pp. 507-517.
- Altschuld, J. W., The certification of evaluators: Highlights from a report submitted to the Board of Directors of the American Evaluation Association. American Journal of Evaluation, 20, 481-493, 1999.
- Barrett, Pat (Auditor General of Australia). "Results Based Management and Performance Reporting – An Australian Perspective," Address to the UN Results Based Management Seminar, Geneva, October, 2004.
- Canadian Evaluation Society, Essential skills series. http://www.evaluationcanada.ca, created in 1999.
- Centre of Excellence for Evaluation, Treasury Board of Canada Secretariat, Achieving Excellence in Evaluation- A LEARNING STRATEGY, revised February 2005 RDIMS 265116.
- Centre of Excellence for Evaluation, Treasury Board of Canada Secretariat, "Capturing Evaluation Findings: Evaluation Review Information Component (ERIC)". October 2004.
- Centre of Excellence for Evaluation, Treasury Board of Canada Secretariat, "CEE Monitoring Strategy, October 27, 2004".
- Centre of Excellence for Evaluation, Treasury Board of Canada Secretariat, "Centre for Excellence for Evaluation: 2003-04 to 2004-05." September 2004, RDIMS #247843.
- Centre of Excellence for Evaluation, Treasury Board of Canada Secretariat, Community Development Strategy, March 13, 2002.
- Centre of Excellence for Evaluation, Treasury Board of Canada Secretariat, "Evaluation Function in the Government of Canada," Draft, July 6, 2004.
- Centre of Excellence for Evaluation, Treasury Board of Canada Secretariat, "Interim Evaluation of the Treasury Board Evaluation Policy." January 2003.
- Centre of Excellence for Evaluation, Treasury Board of Canada Secretariat, "Preparing and Using Results-Based Management and Accountability Frameworks." January 2005.
- Centre of Excellence for Evaluation, Treasury Board of Canada Secretariat, "Report on Consultations," by Peter Hadwen Consulting INC. March 2004.
- Centre of Excellence for Evaluation, Treasury Board of Canada Secretariat, "Report on Effective Evaluation Practices," 2004.
- Centre of Excellence for Evaluation, Treasury Board of Canada Secretariat, "Review of the Quality of Evaluation across Departments and Agencies." Final Report, October 2004
- Centre of Excellence for Evaluation, Treasury Board of Canada Secretariat, "Synopsis of the Demographic Profiles of the Internal Audit and Evaluation Communities in the Canadian Federal Public Service."
- Centre of Excellence for Evaluation, Treasury Board Secretariat, "The Health of the Evaluation Function in the Government of Canada" (draft). Report for Fiscal Year 2004-05. February 2005.
Davies, Philip Thomas, "Policy Evaluation in the United Kingdom" (Prime Minister's Strategy Unit, Cabinet Office), 2004.
- Ghere, G., J. A. King, L. Stevahn, and J. Minnema, "Linking Effective Professional Development and Program Evaluator Competencies, manuscript submitted for publication in the American Journal of Evaluation, November 1994."
- Hunt, Terry Dale. "Policy Evaluation System for the Government of Canada" (Centre of Excellence for Evaluation, Treasury Board of Canada Secretariat, 2004.
- Jones, Steven C. and Blaine R. Worthen, Altschuld, James W., AEA Members' Opinions Concerning Evaluator Certification, American Journal of Evaluation, 20(3), 1999, pp. 495-506.
- King, J. A., Stevahn, L., Ghere, G., Minnema, J. (2001). Toward a Taxonomy of Essential Evaluator Competencies. American Journal of Evaluation, 22(2) 2001, pp. 229-247.
- Kingsbury, Nancy and Terry E. Hedrick. Evaluator Training in a Government Setting, in The Preparation of Professional Evaluators: Issues Perspectives and Programs, American Evaluation Association, New Directions for Program Evaluation, Number 62, Summer 1994, pp. 61-70.
- Le Bouler, Stephane, "Survey of Evaluation in France" (Commissart General du Plan), year?
- Long, Bud and Natalie Kishchuk. Professional Certification –A report to the National Council of the Canadian Evaluation Society on the experience of other organizations, October 1997.
- Love, Arnold, "Should Evaluators Be Certified?" in The Preparation of Professional Evaluators: Issues Perspectives and Programs, American Evaluation Association, New Directions for Program Evaluation, Number 62, Summer 1994, pp. 29-40.
- Lyon, Randolph Matthew, "The U.S. Program Assessment Rating Tool (PART)," Office of Management and Budget, United States Government, (year)?
- Office of Management and Budget, United States Government, Rating the Performance of Federal Programs, The Budget For Fiscal Year 2004, pp. 47-53.
- Office of Management and Budget, United States Government, Program Assessment Rating Tool (PART), http://www.whitehouse.gov/omb/part/
- Office of Management and Budget, United States Government, Examples of Performance Measures, http://www.whitehouse.gov/omb/part/performance_measure_examples.html
- Office of Management and Budget, United States Government, "What Constitutes Strong Evidence of a Program's Effectiveness?" http://www.whitehouse.gov/omb/part/2004_program_eval.pdf (PDF version 212 kb)
- Patton, Michael Quinn. "The Evaluator's Responsibility for Utilization," in Evaluation Practice 9 (2), 1988, pp. 5-24.
- Public Works and Government Services Canada, Government-Wide Review of Procurement, Concepts for Discussion, http://.pwgsc.gc.ca/prtf/text/concept_doc-e.html
- Smith, M. F., Should AEA Begin a Process for Restricting Membership in the Profession of Evaluation? American Journal of Evaluation 20(3), pp. 521-531.
- Stevahn, Laurie, Jean A. King, Gail Ghere and Jane Minnema. Establishing Essential Competencies for Program Evaluators. Manuscript accepted for publication in the American Journal of Evaluation, March 2005.
- Treasury Board of Canada Secretariat, Evaluation Policy, April 1, 2001.
- Treasury Board of Canada Secretariat, A Context for Understanding, Interpreting, and Using the Competency Profile for the Federal Public Service Evaluation Community, http://www.tbs-sct.gc.ca/cee/stud_etud/context-eng.asp
- Treasury Board of Canada Secretariat, A Guide to Planning, Conducting, and Reporting on Internal Auditing Assurance Engagements in the Federal Government of Canada (April, 2004), section 7 "Quality
Assurance and Continuous Improvement, pp. 36-38."
Treasury Board of Canada Secretariat, Building Community Capacity – Competency Profile for Federal Public Service Evaluation Professionals, Building Community Capacity - Competency Profile for Federal Public Service Evaluation Professionals : http://www.tbs-sct.gc.ca/cee/stud-etud/capa-pote05-eng.asp
- Treasury Board of Canada Secretariat, Results Reporting Capacity Check, ARCHIVED - Results Reporting Capacity Check : http://www.tbs-sct.gc.ca/cee/pubs/rrcc-dcrr-eng.asp
- Weiss, Carol H. "Evaluation for Decisions: Is Anybody There? Does Anybody Care?" in Evaluation Practice 9 (1), 1988, pp.5-19.
- Weiss, Carol H. "If Program Decisions Hinged Only on Information: A Response to Patton," in Evaluation Practice 9 (3), 1988, pp.15-28.
- Worthen, Blaine R., Critical Challenges Confronting Certification of Evaluators, American Journal of Evaluation 20 (3), pp. 533-555.
- Worthen, Blaine R. Is Evaluation a Mature Profession That Warrants the Preparation of Evaluation Professionals? In The Preparation of Professional Evaluators: Issues Perspectives and Programs, American Evaluation Association, New Directions for Program Evaluation, Number 62, Summer 1994, pp. 3-15.
- Zorzi, Rochelle, Martha McGuire and Burt Perrin. Evaluation Benefits, Outputs and Knowledge Elements. Canadian Evaluation Society Project in Support of Advocacy and Professional Development, October 2002.
Return to footnote reference 1 Two of the more recent studies are "Review of the Quality of Evaluation across Departments and Agencies" Final Report, October 2004 and "The Health of the Evaluation Function in the Government of Canada" (draft). Report for Fiscal Year 2004-05, February 2005.
Return to footnote reference 2 CEE, The Health of the Evaluation Function… Op. Cit., p. 7.
Return to footnote reference 3 See Policy Evaluation System for the Government of Canada, T. D. Hunt, Treasury Board Secretariat, 2004.
Return to footnote reference 4 A Guide to Planning, Conducting, and Reporting on Internal Auditing Assurance Engagements in the Federal Government of Canada – April 2004, on the TBS website
Return to footnote reference 5 Ibid., pp. 36-38.
Return to footnote reference 7 Under the MAF, departments must examine, improve and report on the following areas: Governance and Strategic Direction; Public Service Values; Policy and Programs; People; Citizen-focused Service; Risk Management; Stewardship; Accountability; Results and Performance; and Learning, Innovation and Change Management (from Hunt, 2004).
Return to footnote reference 8The details of these approaches can be found in "Policy Evaluation in the United Kingdom" by Philip Thomas Davies (manuscript) 2004.
Return to footnote reference 9 One forum for keeping abreast of new (and old) instruments and how they are being applied is an Internet discussion group hosted by the American Evaluation Association (AEA). This forum, known as Eval Talk, can be joined via the AEA website via the link Archives of EVALTALK@LISTSERV.UA.EDU American Evaluation Association Discussion List : http://bama.ua.edu/archives/evaltalk.html
Return to footnote reference 10 Annex 1 summarizes a discussion from the OMB website regarding some of the advanced techniques and their applicability.
Return to footnote reference 11 Hunt, Terry Dale. "Policy Evaluation System for the Government of Canada" (Centre of Excellence for Evaluation, Treasury Board of Canada Secretariat, 2004, p. 7.
Return to footnote reference 12 Evaluation Function in the Government of Canada, July 2004, Appendix 2. 5. Evaluation Standards and Criteria: http://www.tbs-sct.gc.ca/cee/pubs/func-fonc-eng.asp#s5
Return to footnote reference 13 American Journal of Evaluation, Vol. 26 No. 1, March 2005, pp. 5-7.
Return to footnote reference 14 Full details on the program can be found at Procurement, Materiel Management and Real Property Communities Management Office: http://www.tbs-sct.gc.ca/pd-pp/
Return to footnote reference 15 Courses are available through PWGSC, the public service management school, Materiel Management Institute, Real Property Institute, and other departments who might offer certain training.
Return to footnote reference 16 The GAO experience on the preparation of professional evaluators is documented in Nancy Kingsbury, "Evaluator Training in a Government Setting" New Directions for Program Evaluation, No. 62, 1994.
Return to footnote reference 17 Stevahn, Laurie, Jean A. King, Gail Ghere and Jane Minnema. Establishing Essential Competencies for Program Evaluators. Manuscript accepted for publication in the American Journal of Evaluation, March 2005. This research integrates the key elements from the CES and AEA competencies. Their work makes reference to the TBS Evaluation Policy.
Return to footnote reference 18 CEE March 2005 Community Newsletter Number 10.
Return to footnote reference 19 Centre of Excellence for Evaluation, Treasury Board of Canada Secretariat, Achieving Excellence in Evaluation- A LEARNING STRATEGY, revised February 2005 RDIMS 265116.
Return to footnote reference 20 Through Memoranda of Understanding, the following organizations help to deliver the various programs: The Canadian School of Public Service, Statistics Canada and the Canadian Evaluation Society (as well as the National Capital Region chapter).
- Date modified: