We are currently moving our web services and information to Canada.ca.

The Treasury Board of Canada Secretariat website will remain available until this move is complete.

Program Evaluation Methods


Archived information

Archived information is provided for reference, research or recordkeeping purposes. It is not subject à to the Government of Canada Web Standards and has not been altered or updated since it was archived. Please contact us to request a format other than those available.

Appendix 1 - SURVEY RESEARCH

Section 4.5 of this publication discussed the use of surveys as a data collection method in evaluation, and gave references for further information and more detail. Indeed, the design of surveys should typically involve people with expertise in the field. Because surveys are so frequently used in evaluation, this appendix is included to give a more detailed overview of the major factors to consider in designing a survey. This appendix is not, however, a substitute for consultation with experts in the field.

Three basic elements are involved in survey research: designing the sampling, selecting the survey method and developing the measuring instrument. Each element will be briefly discussed below and the major problem areas discussed.

1.1 Sampling

When it is not possible or efficient to survey an entire population concerned with a program, a sampling procedure must be used. The scope and the nature of the sampling procedure should be geared to three specific requirements:

The need for the findings to be generalized to the appropriately defined population

Whenever conclusions are made about a whole population based on a sample survey, the evaluator must be sure that findings from the survey can be generalized to the population of interest. If such a need exists, a probability sample (as opposed to a non-probability sample) is usually required. Evaluators must be very alert to the possibility of statistical biases. A statistical bias usually occurs when a non-probability sample is treated as a probability sample and inappropriate inferences are drawn from it. Statistical bias is often the result of an inappropriate or careless use of probability sampling procedures.

The need for minimum precision requirements

The precision and the confidence level required in the survey must be stated. Statistical theory can provide estimates of sampling error for various sample sizes-that is, the precision of the estimates. The sample size should therefore be a function of the required level of precision. Evaluators should be more concerned with precision than with sample size alone. It is worth noting at this stage that there are different sample size formulas for different sampling procedures and different types of measurements (estimates), including the magnitude of a characteristic of the population and the proportion of the population in some category. It is not uncommon to find that one has used the wrong formula to compute the minimum sample size required.

The need to keep sampling cost within budget constraints

Certain sampling procedures, such as stratified sampling and replicate design, have been developed to reduce both the sample size and the cost of actually performing measurements. Sophistication in sampling can be cost effective.

Once these three requirements are specified, the sampling process can be established. This involves six steps.

  • (i) Define the population. This definition must be detailed specifically, and often includes time, location and socio-economic characteristics. For example, the population might be all females, 18 years and over, living in Ontario, who participated in the program during the period November 15-30, 1982, and who are currently employed.
  • (ii) Specify the sampling frame. A sampling frame is a list of the elements of the population (such as names in a telephone book, an electoral list or a list of recipients on file). If a sampling frame does not exist, it may have to be created (partially or wholly) through a sampling strategy.
  • (iii) Specify the sampling unit. This is the unit for sampling, and might be the geographic area, a city block, a household or a firm).
  • (iv) Specify the sampling method. This is the method by which the sampling units are to be selected and might be systematic or stratified sampling, for example.
  • (v) Determine the sample size. Decide how many sampling units and what percentage of the population are to be sampled.
  • (vi) Select the sample.

Non-sampling errors may occur at each stage of this process. For example, the population defined may not match the target population, or a sampling frame may not correspond exactly to the population. When these problems occur, resulting measurements or inferences can be biased and, hence, misleading. For example, suppose that a survey of fund recipients was part of the evaluation of an industrial assistance program. Suppose that the sampling frame of companies included only those receiving more than a certain amount of money. Clearly, any generalization of the results to the population of all recipients of funds would not be valid if based on a sample chosen from this frame.

Non-sampling errors may also occur during virtually all of the survey activities. Respondents may interpret survey questions differently, mistakes may be made in processing results, or there may be errors in the frame. Non-sampling errors can occur in both sample surveys and censuses, whereas sampling errors can occur only in sample surveys.

1.2 Survey Methods

Typically, the data collection technique characterizes the survey. The choice of the collection technique is extremely important for any survey that depends on individual responses. The three basic procedures are discussed below.

Telephone Interviewing

To sample, the interviewer starts with a sampling frame containing phone numbers, chooses a unit from this frame, and conducts an interview over the telephone, either with a specific person at the number or with anyone at that number. A second technique is called random digit dialling, where, as the name suggests, the interviewer dials a number, according to some probability-based dialling system, not knowing whether there definitely is a live connection at that number or not, or whether it is a business, hospital or household. In practice, list sampling and the random digit dialling techniques are used together. For example, it is common practice to use random digit dialling to produce an initial list of random numbers. Using a random mechanism, numbers are then taken from this list to produce a final set for the sample.

Personal Interviewing

There are three basic approaches to collecting data through interviewing. All three should be considered in personal interviewing. While all three are possible in telephone interviewing, it is extremely rare that either one of the first two is optimal approach. Each technique includes different types of preparation, conceptualization and instrumentation. Each technique has its advantages and disadvantages. The three alternatives are as follows:

  • the informal conversational interview;
  • the general interview guide interview; and
  • the standardized format interview.

Informal conversational interview

This technique relies entirely on spontaneous questions arising from the natural flow of a conversation, often as part of an ongoing observation of the activities of the program. During this kind of interview, the people being talked to may not even realize that they are being interviewed. The strength of this technique is that it allows the evaluator to respond to individual and situational differences. Questions can be personalized to establish in-depth, non-threatening communication with the individual interviewees. It is particularly useful when the evaluator is able to explore the program over a long period of time, so that later interviews build on information obtained in earlier interviews.

The weakness of the informal conversation is that it requires a great deal of time to collect systematic information, because it may take several conversations before a uniform set of questions has been covered. This interview is also more open to interview effects and biases, since it depends to a large extent on the skills of the individual interviewers.

Interview guide

An interview guide is a list of issues or questions to be raised during the interview. It is prepared to ensure the same basic material is covered in all interviews. The guide provides topics or subject areas within which the interviewer is free to probe to obtain more complete information about the particular subject. In other words, it is a framework within which the interviewer develops questions, sequences those questions and makes decisions about which information to pursue in greater depth.

The strength of the interview guide is that it ensures the interviewer uses limited time to the best advantage. It helps make interviewing more systematic and comprehensive by directing the issues to be discussed in the interview. It is especially useful in group interviews, where a guide keeps the discussion focused, but allows individual perspectives to be identified.

There are several potential deficiencies to the technique. Using the interview guide, the interviewer may still inadvertently omit important topics. Interviewer flexibility in sequencing and wording questions can greatly reduce the comparability of the responses. The process may also appear more threatening to the interviewee, whose perception of an interviewer also affects the validity and reliability of what is recorded.

Standardized format interview

When it is desirable to obtain strictly comparable information from each interviewee, a standardized format may be used, in which each person is asked essentially the same questions. Before the interviews begin, open-ended and closed-ended interview questions are written out exactly as they are to be asked. Any clarifications or elaborations are written into the interview itself, as are any possible probing questions.

The standardized interview minimizes interviewer bias by having the interviewer ask the same questions of each respondent. The interview is systematic, and needs minimal interviewer judgement. This technique also makes data analysis easier, because it is possible to organize questions and answers that are similar. Another benefit is that decision makers can review the exact instrument before the interviews take place. Also, the interviewer is highly focused, which usually reduces the duration of the interview.

The weakness of this technique is that it does not allow the interviewer to pursue issues that may only emerge in the course of the interview, even though an open-ended questionnaire reduces this problem somewhat. A standardized interview restricts the extent to which individual differences and circumstances can be taken into account.

Combinations

In evaluation studies, a combination of the interview guide and standardized techniques is often found to be the best approach. Thus, in most cases, a number of questions will be worded in a predetermined fashion, but the interviewer is given flexibility in probing and gauging when it is appropriate to explore subjects in greater depth. A standardized interview format is often used in the initial parts of each interview, with the interviewer being freer to pursue other general subjects of interest for the remainder of the interview.

Mail-out Survey

The third basic survey method is a survey mailed to the respondent, who is expected to complete and return it. To keep response rates high and analysis meaningful, most mail-out surveys consist primarily of closed-ended questions. The advantage of mail-out questionnaires is that they are a cheap method of obtaining broad coverage. The advantage of quantitative closed-ended questions is that data analysis is relatively simple. Responses can be directly compared and easily aggregated. The disadvantage is that respondents must fit their experience and views into predetermined categories. This can distort what respondents really mean by limiting their choices. To partially overcome these difficulties, open-ended questions are often added to mail-out surveys. This allows participants to clarify and amplify their responses.

One of the major difficulties with mail-out surveys is non-response. Non-response is also a problem with personal and telephone surveys, but it is much more problematic with mail-out surveys. Non-response can be caused by many factors, including unavailability of respondents or refusal to participate in the survey. Three strategies are often used to increase the response rate:

  • telephone prompting;
  • interviews with non-respondents; and
  • mail follow-ups.

In the first case, non-respondents are eventually telephoned and urged to complete the questionnaire.

The second strategy involves taking a sample of non-respondents, and completing the survey with them through a telephone or personal interview. Weighting the results from these interviews, so that they represent the non-respondent population as a whole, and then combining the results with the respondent population allows for unbiased generalizations to the overall population. For this technique to be valid, the non-respondents must be sampled scientifically.

The third case, the use of follow-up mailed questionnaires, is similar to the use of telephone calls, although usually less effective. After a certain period of time, questionnaires are again mailed to non-respondents with a request for completion.

Obviously, time and money constraints may not allow a further increase in the response rate. One must, therefore, account for the non-response as part of the process of drawing conclusions about the population surveyed from information collected about the sample.

Non-response causes an estimation bias because those who return the survey may differ in attitude or interest from those who do not. Non-response bias can be dealt with using several methods, such as the sub-sampling of non-respondents described above.

Survey of Objects (An Inventory)

The above survey methods apply to surveying people. As well as surveying individuals, one might wish to survey other entities, such as buildings, houses and articles. The same sampling principles used for individuals hold for other entities. The most important component of a survey is a trained surveyor. It is up to the surveyor to ensure that appropriate measurements are taken, recorded and reported without error. There is as much, if not more, chance of measurement bias in surveys of other entities as there is for interviewer bias in interview surveys.

As an example of such a survey, suppose that an industrial assistance program encourages companies to build energy-saving factory equipment. A study could be conducted to survey, scientifically, a sample of such equipment, measuring energy savings. It is clearly imperative to have well-trained surveyors, equipped to carry out the required measurements accurately.

1.3 Measurement Instruments

Data collection usually involves some kind of measurement. The quality of an evaluation ultimately rests on the quality of its measures. Adequate attention should be devoted to developing measurement instruments that will yield valid and reliable data. (For a perceptive treatment of questionnaire design see Bradburn, et al., 1979.) In survey research, the measuring instrument is a questionnaire, and questionnaire construction is an imperfect art. It has been estimated that the common range of potential error created by ambiguous questions may be 20 or 30 percentage points, and it can be much higher. A guide entitled Basic Questionnaire Design is available from Statistics Canada.

The process of designing a questionnaire consists of five steps:

Define the concepts that need measurement

Surprisingly, the most difficult task in questionnaire development is to specify exactly what information is to be collected. Identifying the relevant information usually requires the following:

  • a review of similar studies and possibly some exploratory research;
  • a clear understanding of which evaluation issues are to be addressed in the survey;
  • an understanding of the concepts being measured and of how this can best be done;
  • a statement of the hypothesis to be tested;
  • an understanding of how the answers will furnish evidence about the evaluation issues addressed; and
  • an appreciation of the level of validity and reliability needed to produce credible evidence.

Before moving to the next step, one must translate the evaluation research objectives into information requirements that a survey can capture.

Format the questions (or items to be measured) and specify the scales

Questions can be formatted in different ways (open-response vs. closed-response; single choice vs. multiple choice, and the like). The scaling format (assigning numbers to the possible answers) is also important because of its effect on the validity of the measurements.

Wording of the questions

This is essentially a communication task; one should phrase questions that are free from ambiguity and bias, and which take into account the backgrounds of the respondents. In many program areas, pre-tested questions or measurements exist that the evaluator might find useful. For example, the University of Michigan Survey Research Center has described various measurements of social psychological attitudes and assessed the strengths and weaknesses of each (Robinson and Shaver, 1973).

Decide the order of the questions and the layout of the questionnaire

Design a sequence that builds up interest while avoiding order bias, such as when the sequence of questions seems to lead inevitably to a predetermined conclusion.

Pre-test the questionnaire

A pre-test will detect ambiguous questions, poor wording and omissions. It should be done on a small sample of the population of interest (see Smith, 1975).

1.4 Estimating Survey Costs

To estimate costs, sub-divide the survey into several self-contained components. Then look at the cost of carrying out that component in house or of contracting it out. The cost per completed interview should be based on the costs of survey design, data collection, data editing, coding, transposition of raw data to machine-readable forms, tabulation, or data analysis.

Contracted-out surveys can be purchased either from the Special Survey Groups at Statistics Canada or from a commercial survey firm. Statistics Canada publishes a directory of survey research organizations and their specialized skills.

1.5 Strengths and Weaknesses

The discussion below focuses on the use in evaluations of the three approaches for surveying individuals. For a discussion of strengths and weaknesses of the various statistical aspects of surveying, see Smith, 1975, Chapter 8 and Galtung, 1967.

Personal Interviewing

Face-to-face interviewing arouses initial interest and increases the rate of participation. It enables the evaluator to ask complex questions that may require explanation or visual and mechanical aids. The method allows the interviewer to clarify answers. It is usually preferred when a large amount of in-depth information is needed from respondents. Also, it is highly flexible, since irrelevant questions can be skipped and other questions added. Interviewers can observe respondent characteristics and record them. Personal interviewing can be used when no sampling frame or lists of respondents can be established. On the other hand, personal interviews are time consuming, difficult to administer and control, and quite costly. They also lend themselves to interviewer bias and chatty bias, as when certain individuals who tend to be more outspoken and their views stand out.

Telephone Interviewing

Telephone interviewing is a fast, economical and easy technique to administer and control, if it is conducted from a central location. The results of the interview can be directly input into a computer if the telephone operator has a direct link to a computer terminal, making the method very efficient.

Telephone interviewing is a particularly effective method for gaining access to hard-to-reach people, such as busy executives. On the limitation side, it makes it difficult to conduct long interviews, to ask complex questions, or to use visual or mechanical aids. Because some people have unlisted phone numbers, or no phone at all, the method may create sampling bias. Non-response bias could be a problem; the respondent can hang up the phone at any moment if he or she chooses. Also, chatty bias can be a problem with telephone interviewing.

Mail-out Surveys

While the main advantage of mail surveys is low cost, the main disadvantage is the large number of variables that cannot be controlled because there is no interviewer, such as the identity of the respondent, whom the respondent consults for help in answering questions, speed of response, the order in which questions are answered, or the respondent's understanding of the questions. However, for many types of questions, there is consistent evidence that mail surveys yield more accurate results than other survey methods. Mail surveys can provide breadth of coverage, and individuals are often more open in writing than they would be verbally. Unfortunately, if the boon of mail survey is cost, the bane is non-response and the bias this may create. As well, mail surveys are time consuming (time for postage, handling and responding) and preclude interviewer probing and clarification.

Summary

As we have seen, there are pros and cons to each survey method. The following factors should be used to evaluate each method:

  • accuracy (absence of bias);
  • amount of data that can be collected;
  • flexibility (meaning the potential for using a variety of questioning techniques);
  • sample bias (meaning the ability to draw a representative sample);
  • non-response bias (meaning that reluctant respondents could be systematically different from those who do answer);
  • cost per completed interview;
  • speed of response; and
  • operational feasibility (meaning the ability to meet various operational constraints, such as cost and staffing).

Surveys of objects involve objective information that is usually more valid and credible than the opinions and perceptions of individuals. However, these too are subject to a wide range of errors, including sampling (Was an appropriate sample of objects taken?) and measurement error (Is the measuring instrument accurate and is the evaluator measuring it appropriately?).

Finally, the best designed survey may still produce useless data if implemented improperly. Interviewers must be properly trained. It is essential to set aside resources and time to train those who do interviewing and coding. The reliability and the validity of the results will be increased by minimizing the inconsistency among interviewers' (and coders') understanding of the questionnaire, their skills and their instructions.

Appendix 2 - GLOSSARY OF TERMS

Accuracy: The difference between a sample estimate and the results that can be obtained from a census. For unbiased estimates, precision and accuracy are synonymous.

Attribution: The estimation of the extent to which any results observed are caused by a program, meaning that the program has produced incremental effects.

Breadth: Breadth refers to the scope of the measurement's coverage.

Case study: A data collection method that involves in-depth studies of specific cases or projects within a program. The method itself is made up of one or more data collection methods (such as interviews and file review).

Causal inference: The logical process used to draw conclusions from evidence concerning what has been produced or "caused" by a program. To say that a program produced or caused a certain result means that, if the program had not been there (or if it had been there in a different form or degree), then the observed result (or level of result) would not have occurred.

Chatty bias: The bias that occurs when certain individuals are more outspoken than others and their views stand out.

Comparison group: A group not exposed to a program or treatment. Also referred to as a control group.

Comprehensiveness: Full breadth and depth of coverage on the evaluation issues of interest.

Conclusion validity: The ability to generalize the conclusions about an existing program to other places, times or situations. Both internal and external validity issues must be addressed if such conclusions are to be reached.

Confidence level: A statement that the true value of a parameter for a population lies within a specified range of values with a certain level of probability.

Control group: In quasi-experimental designs, a group of subjects that receives all influences except the program in exactly the same fashion as the treatment group (the latter called, in some circumstances, the experimental or program group). Also referred to as a non-program group.

Cost-benefit analysis: An analysis that combines the benefits of a program with the costs of the program. The benefits and costs are transformed into monetary terms.

Cost-effectiveness analysis: An analysis that combines program costs and effects (impacts). However, the impacts do not have to be transformed into monetary benefits or costs.

Cross-sectional data: Data collected at the same time from various entities.

Data collection method: The way facts about a program and its outcomes are amassed. Data collection methods often used in program evaluations include literature search, file review, natural observations, surveys, expert opinion and case studies.

Depth: Depth refers to a measurement's degree of accuracy and detail.

Descriptive statistical analysis: Numbers and tabulations used to summarize and present quantitative information concisely.

Diffusion or imitation of treatment: Respondents in one group get the effect intended for the treatment (program) group. This is a threat to internal validity.

Direct analytic methods: Methods used to process data to provide evidence on the direct impacts or outcomes of a program.

Evaluation design: The logical model or conceptual framework used to arrive at conclusions about outcomes.

Evaluation strategy: The method used to gather evidence about one or more outcomes of a program. An evaluation strategy is made up of an evaluation design, a data collection method and an analysis technique.

Ex ante cost-benefit or cost-effectiveness analysis: A cost-benefit or cost-effectiveness analysis that does not estimate the actual benefits and costs of a program but that uses hypothesized before-the-fact costs and benefits. This type of analysis is used for planning purposes rather than for evaluation.

Ex post cost-benefit or cost-effectiveness analysis: A cost-benefit or cost-effectiveness analysis that takes place after a program has been in operation for some time and that is used to assess actual costs and actual benefits.

Experimental (or randomized) designs: Designs that try to ensure the initial equivalence of one or more control groups to a treatment group, by administratively creating the groups through random assignment, thereby ensuring their mathematical equivalence. Examples of experimental or randomized designs are randomized block designs, Latin square designs, fractional designs and the Solomon four-group.

Expert opinion: A data collection method that involves using the perceptions and knowledge of experts in functional areas as indicators of program outcome.

External validity: The ability to generalize conclusions about a program to future or different conditions. Threats to external validity include selection and program interaction; setting and program interaction; and history and program interaction.

File review: A data collection method involving a review of program files. There are usually two types of program files: general program files and files on individual projects, clients or participants.

History: Events outside the program that affect the responses of those involved in the program.

History and program interaction: The conditions under which the program took place are not representative of future conditions. This is a threat to external validity.

Ideal evaluation design: The conceptual comparison of two or more situations that are identical except that in one case the program is operational. Only one group (the treatment group) receives the program; the other groups (the control groups) are subject to all pertinent influences except for the operation of the program, in exactly the same fashion as the treatment group. Outcomes are measured in exactly the same way for both groups and any differences can be attributed to the program.

Implicit design: A design with no formal control group and where measurement is made after exposure to the program.

Inferential statistical analysis: Statistical analysis using models to confirm relationships among variables of interest or to generalize findings to an overall population.

Informal conversational interview: An interviewing technique that relies on the natural flow of a conversation to generate spontaneous questions, often as part of an ongoing observation of the activities of a program.

Input-output model: An economic model that can be used to analyze mutual interdependencies between different parts of an economy. The model is a systematic construct outlining the flow of goods and services among producing and consuming sections of an economy.

Instrumentation: The effect of changing measuring instruments from one measurement to another, as when different interviewers are used. This is a threat to internal validity.

Interaction effect: The joint net effect of two (or more) variables affecting the outcome of a quasi-experiment.

Internal validity: The ability to assert that a program has caused measured results (to a certain degree), in the face of plausible potential alternative explanations. The most common threats to internal validity are history, maturation, mortality, selection bias, regression artefacts, diffusion, and imitation of treatment and testing.

Interview guide: A list of issues or questions to be raised in the course of an interview.

Interviewer bias: The influence of the interviewer on the interviewee. This may result from several factors, including the physical and psychological characteristics of the interviewer, which may affect the interviewees and cause differential responses among them.

List sampling: Usually in reference to telephone interviewing, a technique used to select a sample. The interviewer starts with a sampling frame containing telephone numbers, selects a unit from the frame and conducts an interview over the telephone either with a specific person at the number or with anyone at the number.

Literature search: A data collection method that involves an examination of research reports, published papers and books.

Longitudinal data: Data collected over a period of time, sometimes involving a stream of data for particular persons or entities over time.

Macro-economic model: A model of the interactions between the goods, labour and assets markets of an economy. The model is concerned with the level of outputs and prices based on the interactions between aggregate demand and supply.

Main effects: The separate independent effects of each experimental variable.

Matching: Dividing the population into "blocks" in terms of one or more variable (other than the program) that are expected to have an influence on the impact of the program.

Maturation: Changes in the outcomes that are a consequence of time rather than of the program, such as participant aging. This is a threat to internal validity.

Measuring devices or instruments: Devices that are used to collect data (such as questionnaires, interview guidelines and observation record forms).

Measurement validity: A measurement is valid to the extent that it represents what it is intended and presumed to represent. Valid measures have no systematic bias.

Micro-economic model: A model of the economic behaviour of individual buyers and sellers in a specific market and set of circumstances.

Monetary policy: Government action that influences the money supply and interest rates. May also take the form of a program.

Mortality: Treatment (or control) group participants dropping out of the program. It can undermine the comparability of the treatment and control groups and is a threat to internal validity.

Multiple lines of evidence: The use of several independent evaluation strategies to address the same evaluation issue, relying on different data sources, on different analytical methods, or on both.

Natural observation: A data collection method that involves on-site visits to locations where a program is operating. It directly assesses the setting of a program, its activities and individuals who participate in the activities.

Non-probability sampling: When the units of a sample are chosen so that each unit in the population does not have a calculable non-zero probability of being selected in the sample.

Non-response: A situation in which information from sampling units is unavailable.

Non-response bias: Potential skewing because of non-response. The answers from sampling units that do produce information may differ on items of interest from the answers from the sampling units that do not reply.

Non-sampling error: The errors, other than those attributable to sampling, that arise during the course of almost all survey activities (even a complete census), such as respondents' different interpretation of questions, mistakes in processing results or errors in the sampling frame.

Objective data: Observations that do not involve personal feelings and are based on observable facts. Objective data can be quantitatively or qualitatively measured.

Objectivity: Evidence and conclusions that can be verified by someone other than the original authors.

Order bias: A skewing of results caused by the order in which questions are placed in a survey.

Outcome effectiveness issues: A class of evaluation issues concerned with the achievement of a program's objectives and the other impacts and effects of the program, intended or unintended.

Plausible hypotheses: Likely alternative explanations or ways of accounting for program results, meaning those involving influences other than the program.

Population: The set of units to which the results of a survey apply.

Primary data: Data collected by an evaluation team specifically for the evaluation study.

Probability sampling: The selection of units from a population based on the principle of randomization. Every unit of the population has a calculable (non-zero) probability of being selected.

Qualitative data: Observations that are categorical rather than numerical, and often involve attitudes, perceptions and intentions.

Quantitative data: Observations that are numerical.

Quasi-experimental design: Study structures that use comparison groups to draw causal inferences but do not use randomization to create the treatment and control groups. The treatment group is usually given. The control group is selected to match the treatment group as closely as possible so that inferences on the incremental impacts of the program can be made.

Random digit dialling: In telephone interviewing, a technique used to select a sample. The interviewer dials a number, according to some probability-based dialling system, not knowing whether it is a valid operating number or whether it is a business, hospital or household that is being called.

Randomization: Use of a probability scheme for choosing a sample. This can be done using random number tables, computers, dice, cards and so forth.

Regression artefacts: Pseudo-changes in program results occurring when persons or treatment units have been selected for the program on the basis of their extreme scores. Regression artefacts are a threat to internal validity.

Reliability: The extent to which a measurement, when repeatedly applied to a given situation, consistently produces the same results if the situation does not change between the applications. Reliability can refer to the stability of the measurement over time or to the consistency of the measurement from place to place.

Replicate sampling: A probability sampling technique that involves the selection of a number of independent samples from a population rather than one single sample. Each of the smaller samples are termed replicates and are independently selected on the basis of the same sample design.

Sample size: The number of units to be sampled.

Sample size formula: An equation that varies with the type of estimate to be made, the desired precision of the sample and the sampling method, and which is used to determine the required minimum sample size.

Sampling error: The error attributed to sampling and measuring a portion of the population rather than carrying out a census under the same general conditions.

Sampling frame: A list of the elements of a survey population.

Sampling method: The method by which the sampling units are selected (such as systematic or stratified sampling).

Sampling unit: The unit used for sampling. The population should be divisible into a finite number of distinct, non-overlapping units, so that each member of the population belongs to only one sampling unit.

Secondary data: Data collected and recorded by another (usually earlier) person or organization, usually for different purposes than the current evaluation.

Selection and program interaction: The uncharacteristic responsiveness of program participants because they are aware of being in the program or being part of a survey. This interaction is a threat to internal and external validity.

Selection bias: When the treatment and control groups involved in the program are initially statistically unequal in terms of one or more of the factors of interest. This a threat to internal validity.

Setting and program interaction: When the setting of the experimental or pilot project is not typical of the setting envisioned for the full-scale program. This interaction is a threat to external validity.

Standard deviation: The standard deviation of a set of numerical measurements (on an "interval scale"). It indicates how closely individual measurements cluster around the mean.

Standardized format interview: An interviewing technique that uses open-ended and closed-ended interview questions written out before the interview in exactly the way they are asked later.

Statistical analysis: The manipulation of numerical or categorical data to predict phenomena, to draw conclusions about relationships among variables or to generalize results.

Statistical model: A model that is normally based on previous research and permits transformation of a specific impact measure into another specific impact measure, one specific impact measure into a range of other impact measures, or a range of impact measures into a range of other impact measures.

Statistically significant effects: Effects that are observed and are unlikely to result solely from chance variation. These can be assessed through the use of statistical tests.

Stratified sampling: A probability sampling technique that divides a population into relatively homogeneous layers called strata, and selects appropriate samples independently in each of those layers.

Subjective data: Observations that involve personal feelings, attitudes and perceptions. Subjective data can be quantitatively or qualitatively measured.

Surveys: A data collection method that involves a planned effort to collect needed data from a sample (or a complete census) of the relevant population. The relevant population consists of people or entities affected by the program (or of similar people or entities).

Testing bias: Changes observed in a quasi-experiment that may be the result of excessive familiarity with the measuring instrument. This is a potential threat to internal validity.

Treatment group: In research design, the group of subjects that receives the program. Also referred to as the experimental or program group.

Appendix 3 - BIBLIOGRAPHY

Abt, C.G., ed. The Evaluation of Social Programs. Thousand Oaks: Sage Publications, 1976.

Alberta, Treasury Department. Measuring Performance: A Reference Guide. Edmonton: September 1996.

Alkin, M.C. A Guide for Evaluation Decision Makers. Thousand Oaks: Sage Publications, 1986.

Angelsen, Arild and Ussif Rashid Sumaila. Hard Methods for Soft Policies: Environmental and Social Cost-benefit Analysis. Bergen, Norway: Michelsen Institute, 1995.

Australia, Department of Finance. Handbook of Cost-benefit Analysis. Canberra: 1991.

Babbie, E.R. Survey Research Methods. Belmont: Wadsworth, 1973.

Baird, B.F. Managerial Decisions under Uncertainty. New York: Wiley Interscience, 1989.

Behn, R.D. and J.W. Vaupel. Quick Analysis for Busy Division Makers. New York: Basic Books, 1982.

Belli, P. Guide to Economic Appraisal of Development Projects. Washington, D.C.: World Bank, 1996.

Bentkover, J.D., V.T. Covdlo and J. Mumpower. Benefits Assessment: The State of the Art. Dordrecht, Holland: D. Reidel Publishing Co., 1986.

Berk, Richard A. and Peter H. Rossi. Thinking About Program Evaluation. Thousand Oaks: Sage Publications, 1990.

Bickman L., ed. Using Program Theory in Program Evaluation. V. 33 of New Directions in Program Evaluation. San Francisco: Jossey-Bass, 1987.

Blalock, H.M., Jr. Measurement in the Social Sciences: Theories and Strategies. Chicago: Aldine, 1974.

Blalock, H.M., Jr., ed. Causal Models in the Social Sciences. Chicago: Aldine, 1971.

Boberg, Alice L. and Sheryl A. Morris-Khoo. "The Delphi Method: A Review of Methodology and an Application in the Evaluation of a Higher Education Program," Canadian Journal of Program Evaluation. V. 7, N. 1, April-May 1992, pp. 27-40.

Boruch, R.F. "Conducting Social Experiments," Evaluation Practice in Review. V. 34 of New Directions for Program Evaluation. San Francisco: Jossey-Bass, 1987, pp. 45-66.

Boruch, R.F., et al. Reanalyzing Program Evaluations - Policies and Practices for Secondary Analysis for Social and Education Programs. San Francisco: Jossey-Bass, 1981.

Boruch, R.F. "On Common Contentions About Randomized Field Experiments." In Gene V. Glass, ed. Evaluation Studies Review Annual. Thousand Oaks: Sage Publications, 1976.

Bradburn, N.M. and S. Sudman. Improving Interview Methods and Questionnaire Design. San Francisco: Jossey-Bass, 1979.

Braverman, Mark T. and Jana Kay Slater. Advances in Survey Research. V.V. 70 of New Directions for Program Evaluation. San Francisco: Jossey-Bass, 1996.

Buffa, E.S. and J.S. Dyer. Management Science Operations Research: Model Formulation and Solution Methods. New York: John Wiley and Sons, 1977.

Cabatoff, Kenneth A. "Getting On and Off the Policy Agenda: A Dualistic Theory of Program Evaluation Utilization," Canadian Journal of Program Evaluation. V. 11, N. 2, Autumn 1996, pp. 35-60.

Campbell, D. "Considering the Case Against Experimental Evaluations of Social Innovations," Administrative Science Quarterly. V. 15, N. 1, 1970, pp. 111-122.

Campbell, D.T. "Degrees of Freedom and the Case Study," Comparative Political Studies. V. 8, 1975, 178-193.

Campbell, D.T. and J.C. Stanley. Experimental and Quasi-experimental Designs for Research. Chicago: Rand-McNally, 1963.

Canadian Evaluation Society, Standards Development Committee. "Standards for Program Evaluation in Canada: A Discussion Paper," Canadian Journal of Program Evaluation. V. 7, N. 1, April-May 1992, pp. 157-170.

Caron, Daniel J. "Knowledge Required to Perform the Duties of an Evaluator," Canadian Journal of Program Evaluation. V. 8, N. 1, April-May 1993, pp. 59-78.

Casley, D.J. and K. Kumar. The Collection, Analysis and Use of Monitoring and Evaluation Data. Washington, D.C.: World Bank, 1989.

Chatterjee, S. and B. Price, Regression Analysis by Example, 2nd edition. New York: John Wiley and Sons, 1995.

Chelimsky, Eleanor, ed. Program Evaluation: Patterns and Directions. Washington: American Society for Public Administration, 1985.

Chelimsky, Eleanor and William R. Shadish, eds. Evaluation for the 21st Century: A Handbook. Thousand Oaks: Sage Publications, 1997.

Chen H.T. and P.H. Rossi. "Evaluating with Sense: The Theory-driven Approach," Evaluation Review. V. 7, 1983, pp. 283-302.

Chen, Huey-Tsyh. Theory-driven Evaluations. Thousand Oaks: Sage Publications, 1990.

Chenery, H. and P. Clark. Inter-industry Economics. New York: John Wiley and Sons, 1959.

Ciarlo, J., ed. Utilizing Evaluation. Thousand Oaks: Sage Publications, 1984.

Clemen, R.T. Making Hard Decisions. Duxbury Press, 1991, section s 1-3.

Cook T.D. and D.T. Campbell.. Quasi-experimentation: Designs and Analysis Issues for Field Settings. Chicago: Rand-McNally, 1979.

Cook, T.D. and C.S. Reichardt, eds. Qualitative and Quantitative Methods in Evaluation Research. Thousand Oaks: Sage Publications, 1979.

Cordray D.S., "Quasi-Experimental Analysis: A Mixture of Methods and Judgement." In Trochim, W.M.K., ed. Advances in Quasi-experimental Design and Analysis. V. 31 of New Directions for Program Evaluation. San Francisco: Jossey-Bass, 1986, pp. 9-27.

Datta L. and R. Perloff. Improving Evaluations. Thousand Oaks: Sage Publications, 1979, Section II.

Delbecq, A.L., et al, Group Techniques in Program Planning: A Guide to the Nominal Group and Delphi Processes. Glenview: Scott, Foresman, 1975.

Dexter, L.A. Elite and Specialized Interviewing. Evanston, IL: Northwestern University Press, 1970.

Duncan B.D. Introduction to Structural Equation Models. New York: Academic Press, 1975.

Eaton, Frank. "Measuring Program Effects in the Presence of Selection Bias: The Evolution of Practice," Canadian Journal of Program Evaluation. V. 9, N. 2, October-November 1994, pp. 57-70.

Favaro, Paul, Marie Billinger. "A Comprehensive Evaluation Model for Organizational Development," Canadian Journal of Program Evaluation. V. 8, N. 2, October-November 1993, pp. 45-60.

Fienberg, S. The Analysis of Cross-classified Categorical Data, 2nd edition. Cambridge, MA: MIT, 1980.

Fitzgibbon, C.T. and L.L. Morris. Evaluator's Kit, 2nd edition. Thousand Oaks: Sage Publications, 1988.

Fowler, Floyd J. Improving Survey Questions: Design and Evaluation. Thousand Oaks: Sage Publications, 1995.

Fox, J. Linear Statistical Models and Related Methods, with Applications to Social Research. New York: Wiley, 1984.

Gauthier, B., ed. Recherche Sociale: de la Problématique à la Collecte des Données. Montreal: Les Presses de l'Université du Québec, 1984.

Gliksman, Louis, et al. "Responders vs. Non-responders to a Mail Survey: Are They Different?" Canadian Journal of Program Evaluation. V. 7, N. 2, October-November 1992, pp. 131-138.

Globerson, Aryé, et al. You Can't Manage What You Don't Measure: Control and Evaluation in Organizations. Brookfield: Gower Publications, 1991.

Goldberger A.S. and D.D. Duncan. Structural Equation Models in the Social Sciences. New York: Seminar Press, 1973.

Goldman, Francis and Edith Brashares. "Performance and Accountability: Budget Reform in New Zealand," Public Budgeting and Finance. V. 11, N. 4, Winter 1991, pp. 75-85.

Goode, W.J. and Paul K. Hutt. Methods in Social Research. New York: McGraw-Hill, 1952, Chapter 9.

Gordon, R.A. Economic Instability and Growth: The American Record. Harper & Row, 1974.

Guba, E.G. "Naturalistic Evaluation." in Cordray, D.S., et al., eds. Evaluation Practice in Review. V.V. 34 of New Directors for Program Evaluation. San Francisco: Jossey-Bass, 1987.

Guba, E.G. and Y.S. Lincoln. Effective Evaluation: Improving the Usefulness of Evaluation Results through Responsive and Naturalistic Approaches. San Francisco: Jossey-Bass, 1981.

Hanley, J.A.. "Appropriate Uses of Multivariate Analysis," Annual Review of Public Health. Palo Alto: Annual Reviews Inc., 1983, pp. 155-180.

Hanushek, E.A. and J.E. Jackson. Statistical Methods for Social Scientists. New York: Academic Press, 1977.

Harberger, A.C. Project Evaluation: Collected Papers. Chicago: Markham Publishing Co., 1973.

Heilbroner, R.L. and Thurow, L.C. Economics Explained. Toronto: Simon and Schuster Inc., 1987.

Heise D.R.. Causal Analysis. New York: Wiley, 1975.

Henderson, J. and R. Quandt. Micro-economic Theory. New York: McGraw-Hill, 1961.

Hoaglin, D.C., et al. Data for Decisions. Cambridge, MA.: Abt Books, 1982.

Hudson, Joe, et al., eds. Action-oriented Evaluation in Organizations: Canadian Practices. Toronto: Wall and Emerson, 1992.

Huff, D. How to Lie with Statistics. Penguin, 1973.

Jolliffe, R.F. Common Sense Statistics for Economists and Others. Routledge and Kegan Paul, 1974.

Jorjani, Hamid. "The Holistic Perspective in the Evaluation of Public Programs: A Conceptual Framework," Canadian Journal of Program Evaluation. V. 9, N. 2, October-November 1994, pp. 71-92.

Katz, W.A. Introduction to Reference Work: Reference Services and Reference Processes, Volume II. New York: McGraw-Hill, 1982, Chapter 4.

Kenny, D.A. Correlation and Causality. Toronto: John Wiley and Sons, 1979.

Kerlinger, F.N. Behavioural Research: A Conceptual Approach. New York: Holt, Rinehart and Winston, 1979.

Kidder, L.H. and M. Fine. "Qualitative and Quantitative Methods: When Stories Converge." In Multiple Methods in Program Evaluation. V. 35 of New Directions in Program Evaluation. San Francisco: Jossey-Bass, 1987.

Kish, L. Survey Sampling. New York: Wiley, 1965.

Krause, Daniel Robert. Effective Program Evaluation: An Introduction. Chicago: Nelson-Hall, 1996.

Krueger, R.A. Focus Groups: A Practical Guide for Applied Research. Thousand Oaks: Sage Publications, 1988.

Leeuw, Frans L. "Performance Auditing and Policy Evaluation: Discussing Similarities and Dissimilarities," Canadian Journal of Program Evaluation. V. 7, N. 1, April-May 1992, pp. 53-68.

Leontief, W. Input-output Economics. New York: Oxford University Press, 1966.

Levine, M. "Investigative Reporting as a Research Method: An Analysis of Bernstein and Woodward's All The President's Men," American Psychologist. V. 35, 1980, pp. 626-638.

Love, Arnold J. Evaluation Methods Sourcebook II. Ottawa: Canadian Evaluation Society, 1995.

Mark, M.M. "Validity Typologies and the Logic and Practice of Quasi-experimentation." In Trochim, W.M.K., ed. Advances in Quasi-experimental Design and Analysis,. V. 31 of New Directions for Program Evaluation. San Francisco: Jossey-Bass, 1986, pp. 47-66.

Martin, Lawrence L. and Peter M. Kettner. Measuring the Performance of Human Service Programs. Thousand Oaks: Sage Publications, 1996.

Martin, Michael O. and V.S. Mullis, eds. Quality Assurance in Data Collection. Chestnut Hill: Center for the Study of Testing, Evaluation, and Educational Policy, Boston College, 1996.

Maxwell, Joseph A. Qualitative Research Design: An Interactive Approach. Thousand Oaks: Sage Publications, 1996.

Mayne, John and Eduardo Zapico-Goñi. Monitoring Performance in the Public Sector: Future Directions From International Experience. New Brunswick, NJ: Transaction Publishers, 1996.

Mayne, John, et al., eds. Advancing Public Policy Evaluation: Learning from International Experiences. Amsterdam: North-Holland, 1992.

Mayne, John and R.S. Mayne, "Will Program Evaluation be Used in Formulating Policy?" In Atkinson, M. and Chandler, M., eds. The Politics of Canadian Public Policy. Toronto: University of Toronto Press, 1983.

Mayne, John. "In Defence of Program Evaluation," The Canadian Journal of Program Evaluation. V. 1, N. 2, 1986, pp. 97-102.

McClintock, C.C., et al. "Applying the Logic of Sample Surveys to Qualitative Case Studies: The Case Cluster Method." In Van Maanen, J., ed. Qualitative Methodology. Thousand Oaks: Sage Publications, 1979.

Mercer, Shawna L. and Vivek Goel. "Program Evaluation in the Absence of Goals: A Comprehensive Approach to the Evaluation of a Population-Based Breast Cancer Screening Program," Canadian Journal of Program Evaluation. V. 9, N. 1, April-May 1994, pp. 97-112.

Miles, M.B. and A.M. Huberman. Qualitative Data Analysis: A Sourcebook and New Methods. Thousand Oaks: Sage Publications, 1984.

Miller, J.C. III and B. Yandle. Benefit-cost Analyses of Social Regulation. Washington: American Enterprise Institute, 1979.

Moore, M.H. Creating Public Value: Strategic Management in Government. Boston: Harvard University Press, 1995.

Morris, C.N. and J.E. Rolph. Introduction to Data Analysis and Statistical Inference. Englewood Cliffs, NJ: Prentice Hall, 1981.

Mueller, J.H. Statistical Reasoning in Sociology. Boston: Houghton Mifflin, 1977.

Nachmias, C. and D. Nachmias. Research Methods in the Social Sciences. New York: St. Martin's Press, 1981, Chapter 7.

Nelson, R., P. Merton and E. Kalachek. Technology, Economic Growth and Public Policy. Washington, D.C.: Brookings Institute, 1967.

Nutt, P.C. and R.W. Backoff. Strategic Management of Public and Third Sector Organizations. San Francisco: Jossey-Bass, 1992.

O'Brecht, Michael. "Stakeholder Pressures and Organizational Structure," Canadian Journal of Program Evaluation. V. 7, N. 2, October-November 1992, pp. 139-147.

Office of the Auditor General of Canada. Bulletin 84-7, Photographs and Other Visual Aids.

Office of the Auditor General of Canada. "Choosing and Applying the Right Evidence-gathering Techniques in Value-for-money Audits," Benefit-cost Analysis. Ottawa: 1994, Appendix 5.Okun, A. The Political Economy of Prosperity. Norton, 1970.

Paquet, Gilles and Robert Shepherd. The Program Review Process: A Deconstruction. Ottawa: Faculty of Administration, University of Ottawa, 1996.

Patton, M.Q. Qualitative Evaluation Methods. Thousand Oaks: Sage Publications, 1980.

Patton, M.Q. Creative Evaluation, 2nd edition. Thousand Oaks: Sage Publications, 1986.

Patton, M.Q. Practical Evaluation. Thousand Oaks: Sage Publications, 1982.

Patton, M.Q. Utilization-focused Evaluation, 2nd edition. Thousand Oaks: Sage Publications, 1986.

Pearsol, J.A., ed. "Justifying Conclusions in Naturalistic Evaluations," Evaluation and Program Planning. V. 10, N. 4, 1987, pp. 307-358.

Perret, Bernard. "Le contexte français de l'évaluation: Approche comparative," Canadian Journal of Program Evaluation. V. 9, N. 2, October-November 1994, pp. 93-114.

Peters, Guy B. and Donald J. Savoie, Canadian Centre for Management Development. Governance in a Changing Environment. Montreal and Kingston: McGill-Queen's University Press, 1993.

Polkinghorn, R.S., Micro-theory and Economic Choices. Richard Irwin Inc., 1979.

Posavac, Emil J. and Raymond G. Carey. Program Evaluation: Methods and Case Studies, 5th edition. Upper Saddle River, NJ.: Prentice Hall, 1997.

Pressman, J.L. and A. Wildavsky. Implementation. Los Angeles: UCLA Press, 1973.

Ragsdale, C.T. Spreadsheet Modelling and Decision Analysis. Cambridge, MA: Course Technology Inc., 1995.

Reavy, Pat, et al. "Evaluation as Management Support: The Role of the Evaluator," Canadian Journal of Program Evaluation. V. 8, N. 2, October-November 1993, pp. 95-104.

Rindskopf D. "New Developments in Selection Modeling for Quasi-Experimentation." In Trochim, W.M.K., ed. Advances in Quasi-experimental Design and Analysis. V. 31 of New Directions for Program Evaluation. San Francisco: Jossey-Bass, 1986, pp. 79-89.

Rist, Ray C., ed. Program Evaluation and the Management of the Government. New Brunswick, NJ: Transaction Publishers, 1990.

Robinson, J.P. and P.R. Shaver. Measurement of Social Psychological Attitudes. Ann Arbor: Survey Research Center, University of Michigan, 1973.

Rossi, P.H. and H.E. Freeman. Evaluation: A Systematic Approach, 2nd edition. Thousand Oaks: Sage Publications, 1989.

Rossi, P.H., J.D. Wright and A.B. Anderson, eds. Handbook of Survey Research. Orlando: Academic Press, 1985.

Rush, Brian and Alan Ogborne. "Program Logic Models: Expanding their Role and Structure for Program Planning and Evaluation," Canadian Journal of Program Evaluation. V. 6, N. 2, October-November 1991, pp. 95-106.

Rutman, L. and John Mayne. "Institutionalization of Program Evaluation in Canada: The Federal Level." In Patton, M.Q., ed. Culture and Evaluation. V. 25 of New Directions in Program Evaluation. San Francisco: Jossey-Bass, 1985.

Ryan, Allan G. and Caroline Krentz. "All Pulling Together: Working Toward a Successful Evaluation," Canadian Journal of Program Evaluation. V. 9, N. 2, October-November 1994, pp. 131-150.

Ryan, Brenda and Elizabeth Townsend. "Criteria Mapping," Canadian Journal of Program Evaluation, V. 4, N. 2, October-November 1989, pp. 47-58.

Samuelson, P. Foundations of Economic Analysis. Cambridge, MA: Harvard University Press, 1947.

Sang, H.K. Project Evaluation. New York: Wilson Press, 1988.

Sassone, P.G. and W.A. Schaffer. Cost-benefit Analysis: A Handbook. New York: Academic Press, 1978.

Schick, Allen. The Spirit of Reform: Managing the New Zealand State. Report commissioned by the New Zealand Treasury and the State Services Commission, 1996.

Schmid A.A. Benefit-cost Analysis: A Political Economy Approach. Boulder: Westview Press, 1989.

Seidle, Leslie. Rethinking the Delivery of Public Services to Citizens. Montreal: The Institute for Research on Public Policy (IRPP), 1995.

Self, P. Econocrats and the Policy Process: The Politics and Philosophy of Cost-benefit Analysis. London: Macmillan, 1975.

Shadish, William R, et al. Foundations of Program Evaluation: Theories of Practice. Thousand Oaks: Sage Publications, 1991.

Shea, Michael P. and John H. Lewko. "Use of a Stakeholder Advisory Group to Facilitate the Utilization of Evaluation Results," Canadian Journal of Program Evaluation. V.10, N. 1, April-May 1995, pp. 159-162.

Shea, Michael P. and Shelagh M.J. Towson. "Extent of Evaluation Activity and Evaluation Utilization of CES Members," Canadian Journal of Program Evaluation. V. 8, N. 1, April-May 1993, pp. 79-88.

Silk, L. The Economists. New York: Avon Books, 1976.

Simon H. "Causation." In D.L. Sill, ed. International Encyclopedia of the Social Sciences, V. 2. New York: Macmillan, 1968, pp. 350-355.

Skaburskis, Andrejs and Fredrick C. Collignon. "Cost-effectiveness Analysis of Vocational Rehabilitation Services," Canadian Journal of Program Evaluation. V. 6, N. 2, October-November 1991, pp. 1-24.

Skelton, Ian. "Sensitivity Analysis in Multi-criteria Decision Aids: A Demonstration of Child Care Need Assessment," Canadian Journal of Program Evaluation. V. 8, N. 1, April-May 1993, pp. 103-116.

Sprent, P. Statistics in Action. Penguin, 1977.

Statistics Canada. A Compendium of Methods for Error Evaluation in Consensus and Surveys. Ottawa: 1978, Catalogue 13.564E.

Statistics Canada. Quality Guidelines, 2nd edition. Ottawa: 1987.

Statistics Canada. The Input-output Structures of the Canadian Economy 1961-81. Ottawa: April 1989, Catalogue 15-201E.

Stolzenberg J.R.M. and K.C. Land. "Causal Modeling and Survey Research." In Rossi, P.H.,et al., eds. TITLE MISSING. Orlando: Academic Press, 1983, pp. 613-675.

Stouthamer-Loeber, Magda, and Welmoet Bok van Kammen. Data Collection and Management: A Practical Guide. Thousand Oaks: Sage Publications, 1995.

Suchman, E.A. Evaluative Research: Principles and Practice in Public Service and Social Action Programs. New York: Russell Sage, 1967.

Sugden, R. and A. Williams. The Principles of Practical Cost-benefit Analysis. Oxford: Oxford University Press, 1978.

Tellier, Luc-Normand. Méthodes d'évaluation des projets publics. Sainte-Foy: Presses de l'Université du Québec, 1994, 1995.

Thomas, Paul G. The Politics and Management of Performance Measurement and Service Standards. Winnipeg: St. John's College, University of Manitoba, 1996.

Thompson, M. Benefit-cost Analysis for Program Evaluation. Thousand Oaks: Sage Publications, 1980.

Thurston, W.E. "Decision-making Theory and the Evaluator," Canadian Journal of Program Evaluation. V. 5, N. 2, October-November 1990, pp. 29-46.

Treasury Board of Canada, Secretariat. Benefit-cost Analysis Guide. Ottawa: 1997.

Treasury Board of Canada, Secretariat. Federal Program Evaluation: A Compendium of Evaluation Utilization. Ottawa: 1991.

Treasury Board of Canada, Secretariat. Getting Government Right: Improving Results Measurement and Accountability - Annual Report to Parliament by the President of the Treasury Board. Ottawa: October 1996.

Treasury Board of Canada, Secretariat. A Guide to Quality Management. Ottawa: October 1992.

Treasury Board of Canada, Secretariat. Guides to Quality Services: Quality Services - An Overview. Ottawa: October 1995; Guide I - Client Consultation. Ottawa: October 1995; Guide II - Measuring Client Satisfaction. Ottawa: October 1995; Guide III - Working with Unions. Ottawa: October 1995; Guide IV - A Supportive Learning Environment. Ottawa: October 1995; Guide V - Recognition. Ottawa: October 1995; Guide VI - Employee Surveys. Ottawa: October 1995; Guide VII - Service Standards. Ottawa: October 1995; Guide VIII - Benchmarking and Best Practices. Ottawa: October 1995; Guide IX -Communications. Ottawa: October 1995; Guide X - Benchmarking and Best Practices. Ottawa: March 1996; Guide XI - Effective Complaint Management. Ottawa: June 1996; Guide XII - Who is the Client? - A Discussion. Ottawa: July 1996; Guide XIII - Manager's Guide for Implementing. Ottawa: September 1996.

Treasury Board of Canada, Secretariat. Into the 90s: Government Program Evaluation Perspectives. Ottawa: 1991.

Treasury Board of Canada, Secretariat. Measuring Client Satisfaction: Developing and Implementing Good Client Satisfaction Measurement and Monitoring Practices. Ottawa: October 1991.

Treasury Board of Canada, Secretariat. Quality and Affordable Services for Canadians: Establishing Service Standards in the Federal Government - An Overview. Ottawa: December 1994.

Treasury Board of Canada, Secretariat. "Review, Internal Audit and Evaluation," Treasury Board Manual. Ottawa: 1994.

Treasury Board of Canada, Secretariat. Service Standards: A Guide to the Initiative. Ottawa: February 1995.

Treasury Board of Canada, Secretariat. Strengthening Government Review - Annual Report to Parliament by the President of the Treasury Board. Ottawa: October 1995.

Treasury Board of Canada, Secretariat. Working Standards for the Evaluation of Programs in Federal Departments and Agencies. Ottawa: July 1989.

Trochim W.M.K., ed. Advances in Quasi-Experimental Design and Analysis. V. 31 of New Directions in Program Evaluation. San Francisco: Jossey-Bass, 1986.

Uhl, Norman and Carolyn Wentzel. "Evaluating a Three-day Exercise to Obtain Convergence of Opinion," Canadian Journal of Program Evaluation. V.10, N. 1, April-May 1995, pp. 151-158.

Van Pelt, M. and R. Timmer. Cost-benefit Analysis for Non-Economists. Netherlands Economic Institute, 1992.

Van Maasen, J., ed. Qualitative Methodology. Thousand Oaks: Sage Publications, 1983.

Warwick, D.P. and C.A. Lininger. The Survey Sample: Theory and Practice. New York: McGraw-Hill, 1975.

Watson, D.S. Price Theory in Action. Boston: Houghton Mifflin, 1970.

Watson, Kenneth. "Selecting and Ranking Issues in Program Evaluations and Value-for-money Audits," Canadian Journal of Program Evaluation. V. 5, N. 2, October-November 1990, pp. 15-28.

Watson, Kenneth. "The Social Discount Rate," Canadian Journal of Program Evaluation. V. 7, N. 1, April-May 1992, pp. 99-118.

Webb, E.J., et al. Nonreactive Measures in the Social Sciences, 2nd edition. Boston: Houghton Mifflin, 1981.

Weisberg, Herbert F., Jon A. Krosmick and Bruce D. Bowen, eds. An Introduction to Survey Research, Polling, and Data Analysis. Thousand Oaks: Sage Publications, 1996.

Weisler, Carl E. U.S. General Accounting Office. Review Topics in Evaluation: What Do You Mean by Secondary Analysis?

Williams, D.D., ed. Naturalistic Evaluation. V. 30 of New Directions in Program Evaluation. San Francisco: Jossey-Bass, 1986.

World Bank, Economic Development Institute. The Economics of Project Analysis: A Practitioner's Guide. Washington, D.C.: 1991.

Wye, Christopher G. and Richard C. Sonnichsen, eds. Evaluation in the Federal Government: Changes, Trends and Opportunities. San Francisco: Jossey-Bass, 1992.

Yates, Brian T. Analyzing Costs, Procedures, Processes, and Outcomes in Human Services. Thousand Oaks: Sage Publications, 1996.

Yin, R. The Case Study as a Rigorous Research Method. Thousand Oaks: Sage Publications, 1986.

Zanakis, S.H., et al. "A Review of Program Evaluation and Fund Allocation Methods within the Service and Government," Socio-economic Planning Sciences. V. 29, N. 1, March 1995, pp. 59-79.

Zúñiga, Ricardo. L'évaluation dans l'action : choix de buts et choix de procédures. Montreal: Librairie de l'Université de Montréal, 1992.

Appendix 4 - ADDITIONAL REFERENCES

Administrative Science Quarterly

American Sociological Review

The Canadian Journal of Program Evaluation, official journal of the Canadian Evaluation Society

Canadian Public Administration

Canadian Public Policy

Evaluation and Program Planning

Evaluation Practice, formerly Evaluation Quarterly

Evaluation Review

Human Organization

International Review of Administrative Sciences

Journal of the American Statistical Association

Journal of Policy Analysis and Management,

Management Science

New Directions for Program Evaluation, official journal of the American Evaluation Association

Optimum

Policy Sciences

Psychological Bulletin

Public Administration

Public Administration Review

The Public Interest

Public Policy

Survey Methodology Journal

As well, additional evaluation-related journals exist for specific program sectors, such as health services, education, social services and criminal justice.



Date modified: