Types of Data and Indicators

Program-based versus Population-based Measures

Program-based data consist of information available from program sources (e.g., administrative records, client records, service statistics) or information that can be obtained from on-site collection (e.g., observation, client-provider interaction, client exit interviews, mystery client surveys, surveys with participants before and after a program – particularly a behavior change program), although routine health information systems are the primary source of program-based data. Although some program-based data correspond to a limited network of clinics providing a specialized service, “program-based” also can refer to programs that are national in scope.

Program-based information is very important for understanding the performance of programs and the type of output they achieve (e.g., number of visits per month to a clinic, number of tetanus shots administered to pregnant women). However, program-based data do not reflect the extent of coverage of these programs (unless one estimates a denominator for the catchment area that converts these program statistics into an estimation of a rate). Moreover, data from program participants are potentially biased (do not necessarily reflect the situation of the general population), because of selectivity; that is, persons who opt to participate in programs are often different from the population at large. NGOs tend to evaluate using program data alone, because they do not aspire to national coverage of the population at-large even in the defined area in which they work.

In contrast, governmental programs designed to have national coverage are evaluated in terms of their effect on the general public. The term “population-based” can refer to a smaller geographic region (e.g., the catchment area for a demonstration project, such as a district), provided the data are drawn from a representative sample of the population. The primary sources of population-based data for reproductive health activities are the Demographic Health Surveys (DHS) and the Reproductive Health Surveys (RHS). In addition, data are available in selected countries from national-level surveys funded by the host government or by other international agencies (e.g., the India Fertility Survey). The DHS and RHS surveys are particularly useful for measuring demand factors. The DHS service provision assessment (SPA) modules measure factors in the supply environment, but are not as widely applied as are the household surveys.

Input, Process, Output, and Outcome

Prior to 2000, input, process, and output measures were commonly classified as program-based, in contrast to outcome which was classified as population-based. This approach is useful in evaluating national programs, such as a national family planning (FP) program. However, it is somewhat less useful (especially the term “outcome”) for the evaluation of specific functional areas (such as training, logistics, and behavior change communication). For example, one objective of training programs is usually enhanced quality of care in the service delivery environment. Although the collective efforts of training will contribute to outcomes at the national-level (e.g., increased contraceptive prevalence, increased number of women delivering with a skilled attendant), the most direct and measurable effect of training is improved service quality. In this sense, the desired outcome for a series of training events is quality of care in a specific network of facilities. These results are not population- based, yet they represent the appropriate endpoint in measuring and evaluating training programs.

Inputs are human and financial resources, physical facilities, equipment, and operational policies that enable program activities to be implemented.

Process refers to the multiple activities carried out to achieve the objectives of the program and include both what is done and how well it is done.

Although a high level of input is generally reflected in satisfactory program implementation, it is theoretically possible to have a high level of input but a poorly delivered program (for example, if a high-level administrator opposed to FP were successful in blocking service delivery in facilities under his/her control). Conversely, there are countless real-life examples around the world where program staff, with highly inadequate resources, strive, nonetheless, to do the best work they can under the circumstances.

Output refers to the results of these efforts at the program level. Although program managers at the field level are interested in national trends that show the fruits of their efforts (e.g., contraceptive prevalence for FP, prevalence of breastfeeding for breastfeeding promotion), they tend to limit the evaluation of their own activities to program-based measures, especially measures of output. Two types of output are service output (that measures the adequacy of the service delivery system) and service utilization (that measures the extent to which clients use the services).

Outcome generally refers to results of programs measurable at the population level. The evaluation of outcome measures the effect that the program has had on the general population in the given catchment area (such as all women of reproductive age in a given country). It is important to distinguish between two kinds of outcome: intermediate and long-term. Intermediate outcomes tend to refer to specific behaviors or practices on the part of the intended audience — such as contraceptive use, breastfeeding, condom use, consumption of micronutrient supplements — that will affect the desired long-term outcome (of reducing mortality, morbidity, or fertility). The long-term outcome refers to the anticipated results of the program (a change in morbidity, mortality, or fertility). However, the long-term outcome is almost always subject to the influence of non- program factors, including socio-economic conditions and the status of women in a given country.

Whereas evaluators often report the findings from a program evaluation for the population as a whole, when possible, they should disaggregate the results by regions or subgroups, particularly by gender/sex, age, and some measure of socio-economic status. For example, a program that achieves results by providing services to privileged urban residents may achieve less than an equivalent program that reaches the urban or rural poor. Similarly, where data are available and the sample is sufficiently large, evaluators may wish to disaggregate data by ethnic, geographic, and other relevant sociodemographic factors. The reduction of inequalities among subgroups (e.g., more equitable use by women and men; more equitable use by unmarried youth and married women) and regioins may be a programmatic objective above and beyond the improvement of the average among the general population. Evaluating this type of objective requires disaggregation and measurement by subgroups.

Reproductive health interventions generally address deeply entrenched practices that will only change over an extended period of time. The unprecedented drop in fertility rates worldwide in the late 20th century occurred with record speed, yet it took 20-30 years to accomplish. From a program perspective, it is often impractical to report annually on long-term outcome indicators, even if those outcomes “really matter” (e.g., changes in mortality and fertility). Instead, program evaluation tends to focus on intermediate outcomes (also called effects) that are more directly linked to program effort and expected to change in a shorter period of time.

Quantitative versus Qualitative Indicators

Whereas quantitative research has dominated the field of health and social science research in the past, qualitative research gained wide acceptance during the 1990s, to the point that the latter has become an integral part of program evaluation, especially in relation to process evaluation (e.g., to measure client satisfaction or participant reactions to the program). Focus groups, in-depth interviews, observation, and interviews with key informants constitute the most commonly used qualitative methodologies. Although some researchers have attempted to quantify the results of qualitative techniques, the most appropriate and useful analyses capture the main ideas of respondents through narrative text rather than through percentages and other statistics.

Qualitative methods in program evaluation complement quantitative techniques and quantifiable indicators and are particularly useful in four areas:

Conducting needs assessments or formative research (to learn more about the local situation before designing the program);
Understanding the local terminology for a given subject (prior to finalizing quantitative data collection instruments);
Evaluating process (documenting the dynamics of how a program works, as well as its strengths and weaknesses); and
Developing a clearer understanding of the results obtained from a quantitative instrument (e.g., the attitudes, beliefs, and values that underlie a given finding). By contrast, quantification is essential for measuring results and impact.