Format of LSMS Data

LSMS data are distributed in ASCII, SAS portable, SPSS and Stata formats. Not all data sets are available in all formats.  These formats were chosen to ensure accessibility to much of the social science research community. The data are distributed in ASCII format since most all softwares will read ASCII data. Typically the ASCII data files will contain the variable names in the first line, will be delimited by a space or a comma, and have a non-fixed layout (this means that the variables are not located by a column position). The only disadvantage of the ASCII files is that the data are only described by an eight character variable name--they do not come with longer variable labels.

The SAS portable files (click here for an explanation of the SAS transport facility) have been written with the Xcopy procedure and are meant to be usable on any platform (e.g. IBM mainframe, pc, unix) supported by SAS. (See the SAS home page for more information.) Once imported to the desired SAS format, the data files are ready for use and typically include variable labels (descriptions of the variable with up to forty characters). These labels can be quite helpful to the analyst who is somewhat unfamiliar with LSMS data.

Selected country data sets are available in SPSS.  The files typically include variable labels and once decompressed are immediately ready for use.  The files have been saved in SPSS for Windows format (see the SPSS web site for more information).

The Stata files also typically include variable labels (up to thirty characters), and once decompressed are immediately ready for use.  The version of Stata that is used depends on when the data were collected.  Early files were written to version 2.1. Currently, files are being written to version 7.  The earlier versions are sometimes not compatible with current versions of Stata.  If there are difficulties in reading the files (after they have been decompressed fully, see below), it may be necessary to convert the files using a conversion software.  (See the Stata home page for more information.)

Instructions for decompressing the files:

Most of the data sets have been compressed TWO TIMES.  Individual files have been compressed individually, and those individual files are compressed together into one large file.

To access the data you will need to decompress the data in two stages. The first stage is to decompress the one large data file called either ****DAT.ZIP, ****SSP.ZIP, ****SAV.ZIP, or ****DTA.ZIP, where the **** will signify the country/year and DAT signifies ASCII, SSP signifies SAS portable, SAV signifies SPSS, and DTA signifies Stata. For example, the file containing all of the Tanzania HRDS data in Stata format is called TANZDTA.ZIP. Decompressing this file will not require much additional disk space, as the result is simply a set of individually compressed files.

The second step in accessing the data is to decompress the individual compressed data files. The file extensions for the individually compressed files will be either zss, zav, zdt, or zda depending on whether the data are in SAS portable, SPSS, STATA, or ASCII format respectively. 

(The purpose of compressing the files twice is to reduce the potential for disk space problems. If the files were only compressed once into the one large file, decompressing this would often require up to 50 MB of free disk space.)

It is possible to use any decompression software to decompress the files.  Simply tell the software the extension of the file that you wish to decompress.

More Information on the SAS portable files:

For users of the SAS portable files, below are examples of the CMS and SAS code used to export (create the portable files) and import (create SAS mainframe data files from portable files). The examples are only intended to let users know how the export files were created. (The IMPORT section is an example of how the export files are imported using SAS for the CMS operating system.) You will need to determine how to import the portable files for the operating system and version of SAS you use.

EXPORT:
CMS FILEDEF sasout DISK filename SSP A (RECFM FB LRECL 80 BLKSIZE 8000 );
PROC XCOPY IN=SASDATA OUT=sasout EXPORT;
SELECT filename; run;


IMPORT:
CMS FILEDEF sasin DISK filename ssp A (RECFM FB LRECL 80 BLKSIZE 8000 );
PROC XCOPY IN=sasin OUT=SASDATA IMPORT; run;

Revised 08/15/06