The LSMS team provides the following
selected programs as a guide to use of LSMS data. This text and the program files are all
contained in the zip file deaton.zip. The book
is available for purchase from World
Bank Publications. In order to understand and use the selected programs here, and have
access to the full set of programs, you will need a copy of the book.
Deaton.zip contains the STATA code used to produce the results in "The Analysis of Household Surveys: A Microeconometric Approach to Development Policy", by Angus Deaton. This book, published for the World Bank by The Johns Hopkins University Press and scheduled for release in 1997, is about the analysis of household survey data from developing countries and about how such data can be used to cast light on a range of policy issues. Data from several different developing countries are used for illustration and one of the aims of the book is to bring together the relevant statistical and economic methods that are useful for building the bridge between data and policy.
These programs are not intended to be self-contained; they cannot be run as they stand without modification and, at the very least, the addition of the user's own dataset. The idea is not to supply a software package that will replicate the analyses in the text, but to provide the code to serve as a template for the user's own analysis. Code has not been provided for straightforward cases such as tabulations or standard regressions. An attempt has been made to add enough comments to the code to make it broadly comprehensible and to aid those who wish to translate it into languages other than STATA; in most cases, the translation should not be difficult. Apart from some later addition of comments, the code given here is the code that was actually used to produce the results in the text. A good deal of care has been taken to check the code, but no guarantee is given that it is error free, let alone that it will automatically work on different data sets and different machines.
[NOTE: File names for the accompanying Stata programs are given in brackets below.]
There are three examples. The first is for the saving, expenditure, and income tabulation for Thailand in Table 1.3, the second for the alternative estimates of standard errors for mean PCE in Pakistan in Table 1.4, and the third for the bootstrap estimates of standard errors, also for Pakistan, discussed on pages 57-58.
The program, which was used to calculate Table 1.3, takes data on income, expenditure, and weights from a file thaicy.dta, creates a measure of saving, sorts by income and by expenditure, and creates deciles for each, taking into account the weights. Income, expenditure, and saving are then summarized by decile.
The program takes data from a file consdata.dta, which contains information on total expenditure, household size, and survey design variables, that were taken from the expenditure files of the PIHS. It then computes weighted and unweighted means, together with various correct and incorrect variances and standard errors. This program should not be used in preference to the complex survey design procedures that became available in Version 5.0 of STATA.
The program uses the same data as in Example 1.2 to contruct bootstrap samples with the same stratified and clustered structure as the original. In each bootstrap replication, weighted and unweighted means and medians are calculated and saved in a dataset. The standard deviations over the bootstrap replications are used to indicate the sampling variability of the original estimates.
There are two examples. The first concerns the heteroskedastic censored regression model and the second, the decomposition of data from a time series of cross-sections into age, cohort, and year effects.
The heteroskedastic censored regression model is discussed on pp. 85-90. I give the code for the Monte Carlo experiment comparing Powell's censored LAD estimator with OLS and MLE Tobit, the results of which are discussed on p. 90.
The second example presents the code for the decomposition of earnings in Taiwan (China) into age, cohort, and year effects; the methods are discussed on pp. 116-27, and the results are shown in Figure 2.5.

Figure 2.5
The examples provide code for the summary measures of living standards, inequality, and poverty in Côte d'Ivoire and South Africa shown in Tables 3.1-3.4 and Figures 3.5 and 3.6. Figures 3.7-3.12 were calculated using the functions provided by STATA and do not require special coding. The GAUSS code for Figures 3.13 and 3.14 is given as Example 3.7 I have not provided the code for the Thai rice calculations; Figures 3.15 and 3.16 are calculated in the same way as Figures 3.13 and 3.14. The nonparametric regressions shown in Figures 3.17 and 3.18 are kernel regressions, a method that has been superceded by the locally-weighted regressions shown in Figure 3.20 and the code for which is given in Example 3.8.

Figure 3.5

Figure 3.6

Figure 3.13

Figure 3.14

Figure 3.20
The code uses data on household expenditures, prices, and weights to calculate means for households and individuals, as well as their standard errors taking into account the survey design.
The code calculates each of the inequality measures in Table 3.2, with and without weights.
The code calculates the four poverty measures, together with their bootstrapped standard errors.
The code is similar to that for Côte d'Ivoire but is adapted to the South African data.
The code shows how to calculate the cumulative fractions of population and of PCE and uses the results to draw the standard Lorenz curves.
This example adapts the code for the standard Lorenz curves so as to plot the distance between the Lorenz curve and the 45-degree line.
The code for the contours and netmaps is given in GAUSS because these facilities do not exist in STATA. I have also calculated the kernel estimates in GAUSS, though it would also have been possible to go to this stage in STATA.
Figure 3.20 shows plots of expected actual and potential pension receipts against the logarithm of household per capita income. These are calculated using locally-weighted regressions.
The programs below were used to generate Figures 4.4 and 4.5 in the text. The code for the contour maps in Figures 4.2 and 4.3 is a straightforward application of Example 3.7 above. The results in Section 4.2 are obtained from straightforward regression analysis, and the code is not given here.

Figure 4.4

Figure 4.5
There are three sections of code which, for logical transparency, are presented in the reverse order in which they should be run. The first section calculates the estimated regression function and its derivatives and then picks up the bootstrap replications from the second and third sections of the code to construct Figures 4.4 and 4.5.
I provide the code for calculating the system of demand equations, including the own and cross-price elasticities, for completing the system, and for calculating the symmetry-constrained estimates. The code here is for the Maharashtran case; except for minor details-the number and names of the goods, and the definition of the other variables-the Pakistani code is the same. There are four separate programs: the first, allindia.do, is for estimating the demand system. Appended to it is a program mkmats.do, that calculates the commutation and selection matrices required for the symmetry-constrained estimates, as well as procedures for making the "vec" of a matrix, and for reversing the operation. The code bootall.do bootstraps the procedure in order to obtain measures of sampling variability. Finally, the program policy.do calculates the efficiency and equity components of the cost-benefit ratios for price reform and was used to give the results in Tables 5.12 and 5.13.
None of the calculations in Chapter 6 require new or non-standard coding. The decomposition of the logarithms of income, the logarithm of consumption, and the saving ratio into age and cohort effects in Figures 6.5 and 6.6, as well as the age effects in inequality in Figure 6.11, were calculated in the same way as in Example 2.2 above.

Figure 6.5

Figure 6.6