Description of the LSMS Household Surveys

The following is an excerpt from:
Grosh, M. & Glewwe, P. A Guide to Living Standards Surveys and Their Data Sets. LSMS Working Paper #120, The World Bank, 1995 (updated on March 1, 1996).

(To download the complete document, see instructions.)


What is an LSMS Survey?

Two characteristics distinguish LSMS surveys: (i) multi-topic questionnaires designed to study multiple aspects of household welfare and behavior and (ii) extensive quality control features. In the past, there was a time when a survey was considered to be an LSMS survey because it was carried out with technical assistance from staff in the relevant division at the World Bank, but, as the number of surveys grows and as the role of the division changes, this is no longer a reliable indicator of how well a survey fits the LSMS prototype. This document looks at each characteristic in turn, and examines variations in the design and content of existing LSMS-type surveys.

Multi-Topic Questionnaires
The main objective of LSMS surveys is to collect household data that can be used to assess household welfare, to understand household behavior, and to evaluate the effect of various government policies on the living conditions of the population. Accordingly, LSMS surveys collect data on many dimensions of household well-being, including consumption, income, savings, employment, health, education, fertility, nutrition, housing and migration (see Table 1).

Three different kinds of questionnaires are normally used: the household questionnaire, which collects detailed information on the household members; the community characteristics questionnaire, in which key community leaders and groups are asked about community infrastructure; and the price questionnaire, in which market vendors are asked about prices. A fourth type of questionnaire, school or health facility questionnaires, is sometimes used as well.

Household Questionnaire. Because welfare is measured by consumption in most LSMS research on poverty, measurement of consumption is strongly emphasized in the questionnaires. There are detailed questions on cash expenditures, on the value of food items grown at home or received as gifts and on the ownership of housing and durable goods (for example, cars, televisions, bicycles and sewing machines) to make it possible to assign them a use rental value.

A wide range of income information is also collected. For individuals in formal sector jobs, most surveys contain detailed questions about wages, bonuses and various forms of in-kind compensation. Information is usually sought on secondary as well as principal jobs. At the household level, lengthy agriculture and small enterprise modules are designed to yield estimates of net household income from these activities. Other sources of miscellaneous income, such as the receipt of private transfers (for example, child support or remittances from abroad), public transfers (in cash or in kind), lottery winnings and interest income, are recorded as well.

Collecting data on a variety of household characteristics (including those on health, education, fertility and migration) from the same households makes it possible to analyze the important relationships among different aspects that make up the quality of life, such as the impact of parents' education on child nutrition or the effect of health status on employment. The sectoral modules collect such information. However, they are shorter, and the amount of detail provided on any one topic is smaller, compared to a single-topic survey.

Community Questionnaires. In order to limit the length of the household questionnaire, information on local conditions that are common to all households in the area is gathered in the community questionnaire. These questionnaires are typically used only in rural areas, where local communities are easier to define than in urban areas. The information covered by the questionnaire usually includes the location and quality of nearby health facilities and schools, the condition of local infrastructure such as roads, the sources of fuel and water, the availability of electricity, means of communication and agricultural conditions and practices.

Price Questionnaires. In countries where prices vary considerably among regions, it is important to gather information on the prices that households are faced with in practice. Thus, in most LSMS surveys, questionnaires have been developed to compile information on the prices of commonly purchased goods.

Special Facility Questionnaires. Sometimes very detailed information on schools or health clinics is desired. When this is the case, special facility questionnaires may be developed to supplement or replace those sections of the community questionnaire.

Extensive Quality Control Procedures
In order to minimize errors and delays in data processing, LSMS surveys are implemented using procedures that resolve most inconsistencies in the data before they reach the central statistical office. Here, we will highlight those elements that are distinctive in LSMS surveys, as opposed to those that LSMS surveys share with other good household surveys.

Questionnaire Format. Several features of the questionnaire help to minimize interviewer error. For example, the questionnaire makes extensive use of screening questions so that the skip pattern is automatic, requiring virtually no decisionmaking by the interviewer. All of the questions are written out exactly as they are to be asked. Moreover, suggested questions for further probing are printed on the questionnaire for consumption items, crops and durable goods. Together, these features reduce the conceptual skills required of the interviewers and the potential for variation among them, and also save time as the interviewer does not have to pause to consider how to phrase each question or how to follow the skip pattern.

Other features eliminate a number of steps (and thus the opportunity for error and delay) in the processing of data. Almost all potential responses to each question are marked on the questionnaire with a numbered code, and the interviewer writes only the response code on the questionnaire. Further, the household questionnaire is designed so that the data can be entered into the computer straight from the completed questionnaire, thus eliminating the additional step of transcribing codes onto data entry sheets.

An important element in the design of the LSMS questionnaire is that changes can be made to the questionnaire quickly and easily, either in response to the field test or over the years as policy needs change. The questionnaires are produced on microcomputers using special formatting packages. This also simplifies translations as the verbal parts can be overwritten in the local language, leaving the skip codes, the response codes and the general format intact.

Organization of Fieldwork. Fieldwork and data entry are highly decentralized in full-fledged LSMS surveys. The core work is performed by a team consisting of a supervisor, two interviewers, an anthropometrist, a data entry operator and a driver. The team is based in a regional office equipped with a personal computer for data entry. The data entry operator works only at the field office, while the other members of the team travel between the field sites and the office. Teams are supervised and supported by a national survey directorate, consisting of the survey director and assistants responsible for field operations and data management.

As the surveys are carried out in two rounds (which allows for a two-week waiting period between completing the first and second halves of the household questionnaire), it is possible to check the data from the first round for consistency before the second round of questioning is administered in the field. Thus, any inconsistencies detected from the first round of interviews can then be cleared up directly with respondents during the second round of the interviews.

The standard fieldwork plan is as follows:

  1. During round one, which takes a week in each village, two interviewers each administer the household questionnaire to eight households, while the supervisor administers the community and price questionnaires.
  2. The supervisor personally observes and evaluates one interview per interviewer during this week, discussing improvements with the interviewer and recording the results on a form to be sent to the national office.
  3. Following round one in the field, the half-completed questionnaires are taken to the field office, where the data are recorded on computer diskettes by the data entry operator. This takes about one week.
  4. The data entry program prints out the data recorded for each household, highlighting any errors or inconsistencies. The supervisor then reviews, circling on the original questionnaires the questions that must be repeated by interviewers during the second round.
  5. During round two of the interview, the team returns to the field to complete the second half of the questionnaire and to correct errors found in round one.
  6. This is followed by field supervision, data entry and quality control of data entry, as in round one. Errors detected after round two are corrected only if they are data entry errors. Thus, supervision in the field is especially critical during the second round, when there is no subsequent opportunity to capture missing information or to correct field errors.
  7. In the final step, the diskettes of data are sent from the field office to the national office to be reviewed by the data management specialist and consolidated with data from the other field teams. Throughout this cycle, staff from the national office make unannounced visits to field offices and to interviewing sites to observe the efforts of the team members and evaluate their performance.

Sample Size. The number of field teams is kept small so that it is feasible to supervise them closely. LSMS surveys tend to use small samples, often in the order of 1,600 to 3,200 households and rarely more than 5,000 households. Although larger samples would have smaller sampling error, it was judged by survey designers that non-sampling errors would increase more than concomitantly. Having a small number of teams also helps to keep the cost of supplying them with vehicles and computers within bounds.

Data Management. The LSMS surveys use personal computers in the field, where all the stages of data collection, data entry and editing are carried out. This dramatically reduces the length of time between when the fieldwork ends and when the data become available for analysis. It also improves the quality of the data. The data entry programs that have been used for LSMS surveys have each been custom designed. This was a major innovation at the time of the first survey in 1985. The use of commercially-available packages for this purpose has now become widespread, though the thoroughness of the checks in the fully-fledged LSMS surveys is probably well above average even today.

As the data are keyed in, they are first submitted to a set of range checks. Numeric variables are constrained to lie between minimum and maximum values, qualitative variables can only have defined valid codes and chronological variables are supposed to contain valid dates. When all of the data from a single questionnaire have been recorded, consistency checks are run on data from different parts of the questionnaire. When values are out of the allowed range or are inconsistent with other variable values, the computer gives audible and visual signals to the operator. A printout is then made of all the data for each household in a format similar to that of the questionnaire. Missing data and errors in the skip pattern appear circled in black, and a list of specific inconsistencies between different sections of the household questionnaire is produced.

If the error is due to a typographical mistake, the data entry operator corrects it immediately. If the questionable value is on the original questionnaire, it is referred back to the supervisor and interviewer. A additional set of consistency checks were developed for the anthropometric module, which automatically compares survey data on individuals' age, height and weight with standard reference tables from the World Health Organization. The data entry program then produces a list of those individuals with seemingly erroneous measurements so that they can be remeasured during the second round.

Resulting Data Quality. When all of these procedures are scrupulously followed, data quality can be very high, as shown by evidence on some dimensions of data quality for the Côte d'Ivoire and Peru surveys. These data sets were subjected to data entry checks and corrections in the field as explained above, but were not subjected to any further "cleaning" in the central office. Missing data in both surveys are extremely rare. Among the 55,843 persons in the three surveys, for example, only 46 persons have a missing age. There are also very few missing modules. At least one height and weight measurement was obtained for almost 90 percent of all household members.

Turnaround Times. The LSMS is noted for the short turn around time between the end of data collection and the availability of data for analysis. Theoretically, this is a matter of only a week or two, and in several countries basic abstracts have been completed within two to six months of the end of field work. This speed has contributed markedly to the relevance of the data to policymaking. The quick turn around between the completion of field work and the availability of data for analysis is largely due to the pre-coding in the questionnaire, the extensive quality control during the field work, and the decentralized, concurrent data entry.

The length of time between the decision to carry out a survey and the time when data are available, however, is almost always much longer. Depending on the starting point of the country's survey infrastructure (notably the adequacy of the sample frame, availability of equipment and general adequacy of management), six to eighteen months of preparation may be required before the survey is fielded. When the full LSMS field procedures are used, data collection itself takes place over a full year (though a preliminary analysis is sometimes done with the first six months of data). Thus from the first idea to the full abstract of results can take two to three years.

Revised 09/05/97