|
[Program Overview] [Sample Results] [Download POVCAL]
POVCAL assists with routine poverty assessment work by using sound and accurate methods for calculating poverty and inequality measures. It requires a basic PC and any of the various types of grouped income distribution data typically available.
POVCAL is designed to be an easy to use and reliable tool for routine poverty assessment work. It uses sound and accurate methods for calculating poverty and inequality measures with only a basic PC and any of the various types of grouped distributional data typically available, often in published form.
If one has access to the relevant household level ("unit record") data, then there are accurate computational methods for estimating poverty and inequality measures directly from that data, using standard econometric/statistical packages. But one rarely has access to the data in this form. Or one might not have easy access to the sort of computing power (typically a mainframe) that is needed to process unit record data.
Distributional data are more typically available in grouped form, such as income shares of deciles of household ranked by per capita income. There are some subtle difficulties in using such data which do not arise with unit records. There are many ways to estimate poverty and inequality measures from such data. The commonly used interpolation methods can be quite unreliable. The approach we have adopted here uses parametric specifications of the underlying Lorenz curve, from which all desired measures can then be calculated. This has a number of advantages. The method is efficient, reliable, and accurate (at least for the particular specifications that we use in this program). It also facilitates certain simulations which can be of analytical interest, such as estimating how the poverty measures will respond to distributionally neutral growth (interpretable as an increase in the mean of the distribution, holding the Lorenz curve constant).
To implement this approach in a fully self-contained way for use on any PC, POVCAL has been programmed largely from first principles. The source code is in Microsoft FORTRAN 5.0 and was written by Shaohua Chen. POVCAL is designed to be a practical tool for Bank staff; however, it is not a commercial program with a flash user interface.
The program will run on any IBM compatible PC, including the most basic model. The size of the data set it can handle will depend on the memory of your PC, but 640K should be ample for almost all applications. Its speed will depend entirely on the speed of your computer.
You will need to have your grouped distributional data set up in the way we describe in detail below (though the program allows this to take many forms, as discussed below), and you will need to know the poverty line. The program will estimate the Lorenz curve, Gini index, headcount index of poverty, poverty gap index, Foster-Greer-Thorbecke index, and the elasticities of these poverty measures with respect to the mean of the distribution, and the Gini index. It does all this for two alternative specifications of the Lorenz curve - the General Quadratic (Villasenor and Arnold) and the Beta model (Kakwani). (We have found these to be better than the many alternatives in the literature). It performs various checks on the results, and it tells you which specification is better for your data. The program is also set up to allow detailed poverty profiles to be readily constructed; you simply re-run the program for each sub- group in the poverty profile. (All the poverty measures are additively decomposable; the aggregate poverty measure is simply the population share weighted sum of the sub-group poverty measures as calculated by POVCAL.) It also allows assessments to be easily made of the sensitivity of the results to measurement assumptions, such as the choice of the poverty line.
It is probably convenient to first set-up a sub-directory (POVCAL, say), and load all the files on the disk supplied into that sub- directory (with "COPY A:*.* C:" if the disk in the A drive). Make sure that POVCAL.EXE and your input data file are in the same sub- directory. POVCAL can be run only from this sub-directory.
Your data file is assumed to comprise "records" and "sub-groups". The number of records is simply the number of class-intervals or fractiles in your data. For example, if your data are in the form of a table of decile income shares, then you will have 10 records. The number of sub-groups is the number of ways the underlying population has been divided up in presenting the distributions. For example, if it is divided up as "urban" and "rural", then you will have two sub-groups. National data comprise one sub- group.
Your data file must be set-up in tabular form where each row corresponds to a record, and the columns correspond to the variables for the sub-groups; enter all variables for the first sub-group, followed by those for the second etc. For each sub- group, the columns must be in the same order as the variables specified in the relevant option for your type of data, as explained below.
Distributional data can come in many forms. The program allows eight possibilities, which should accommodate all of the data found in practice. The program asks you to select one option. Each is defined by two or (sometimes) three variables. The options are described in the following Table.
Type 1: p=cumulative proportion of population (ranked by the poverty indicator, which we will call "income"), L=cumulative proportion of income held by that proportion of the population.
Type 2: q=proportion of population (as in p, but not cumulative), r=proportion of income (as in L, but not cumulative).
Type 3: p (as in 1), r (as in 2).
Type 4: q (as in 2), L (as in 1).
Type 5: f(x)=percentage of the population in a given class interval of incomes, X=the mean income of that class interval.
Type 6: upper bound of a class interval, f(x) (as in 5), X (as in 5).
Type 7: upper bound of a class interval, p (as in 5), X (as in 5).
Type 8: upper bound of a class interval, f(x) (as in 5).
NOTE: The program allows your data to either be expressed as a percentage or as a proportion.
Data types 6, 7 and 8 include information on the upper bounds of each class interval. In this case, you will need to set an upper bound for the highest (richest) class interval, though the choice is arbitrary, and does not affect any of the calculations.
Data type 8 is potentially troublesome, though (thankfully) it does not appear to be common. This type does not include the mean of each class interval. There is no option but to make assumptions about where the mean lies within each class interval. Common practice is to use the mid-points. In our experience (by using data sets for which we do know the mean, but pretending that we do not, and trying alternative assumptions), this is generally fine for all but the lowest and highest class intervals. You will probably get better results with the following rule of thumb which we have built into the program: i) The mean of the lowest (poorest) class interval is assumed to be 80% of the upper bound of that class interval. ii) The mean of the highest class interval is set at 30% above the lower bound of that class interval. iii) For all other class intervals, the mean is set at the midpoint. (The program re-writes your data file with these estimated means, so in subsequent runs you can treat it as a type 6 data set.) Needless-to-say there will be some loss of accuracy when using data sets for which you do know the means of each class interval. This rule-of-thumb still gave quite accurate results for the poverty measures in our experiments. If you want to try alternative assumptions, then use option 6, adjusting the (re- written) data file with your estimates of the means.
Also use type 8 when (although you do not know the means for the class intervals) you do have an estimate of the population mean. The program will prompt for that information, and choose a mean for the richest class interval consistent with your estimate of the overall mean. (Otherwise, it will assign means by the above rule of thumb.) Of course, this is also an arbitrary choice, and we would still recommend you test the sensitivity of your results to these assumptions.
Type POVCAL and press ENTER.
The program will ask you for the following information:
1. The name of your ASCII data file. 2. The number of sub-groups. 3. The number of records. 4. The type of data you have (8 options). 5. The DOS name of your desired output file.
The program will then estimate the General Quadratic (GQ) Lorenz curve and give you a statistical summary of the results. After that it will ask you for the mean (if different from that estimated from your data; you may, for example, want to test for sensitivity to measurement error in the mean, assuming that the Lorenz curve is unaffected), and the poverty line, which must lie in the stipulated interval to give valid estimates.
The program will give you the Gini index and poverty measures for this Lorenz curve. It will also give you the elasticities of the three poverty measures with respect to the mean and the Gini index. (The latter calculation assumes that the Lorenz curve shifts equi-proportionately up or down at all points.)
Next, all of the above will be repeated for the Beta Lorenz curve.
Then, the program will give you an assessment of which Lorenz curve (and corresponding estimates) you should prefer for your data.
Finally, the program will graph the fitted Lorenz curves, and their first and second derivatives.
Sometimes the choice is obvious, but other times some judgement is needed. The program "builds in" what we consider to be sound criteria for making such judgement.
The program will first check whether your fitted models satisfy the theoretical conditions for valid Lorenz curves. Four conditions should hold: i) it's upper bound should be one, ii) it's lower bound should be zero, iii) it should be strictly increasing throughout, and iv) it's first derivative should be strictly increasing throughout (convex from below). Some of these conditions hold automatically for one or both specifications, while others have to be tested for your data. The program does this and reports the result.
You can check all these conditions yourself from the graphs for each Lorenz curve, and its first and second derivatives, both of which should be positive (above the bold middle line) throughout. (It does not matter if the second and third graphs look strange - jagged - as long as they are above the middle horizontal line.) This allows you to see precisely where any violations of the conditions for a valid Lorenz curve are happening, and to assess the extent of the problem.
Reliable estimates of the headcount index of poverty are often possible as long as the Lorenz curve has positive first and second derivatives in a neighborhood of the headcount index. So if neither Lorenz curve is globally valid all is not lost, though one should be wary of the estimated Gini index and the other poverty measures.
Similarly, the first condition above - that the upper bound of the Lorenz curve is exactly one - is not essential for even quite accurate estimates of the poverty measures.
Serious violations of the theoretical conditions are rare in our experience, and when they do happen it is typically because the primary data have been grouped badly. When only minor infringements occur, the estimates may still be quite good. You have to make this judgement yourself, aided by the above comments and the graphs given by POVCAL.
The program also assesses the goodness-of-fit of the Lorenz curves. This can be done in two ways: i) by comparing the sum of squared errors over the whole Lorenz curve, ii) by comparing the sum of squared errors over the part of your Lorenz curve up to the headcount index of poverty. Results are given for both, but the second comparison is more appropriate for poverty measurement, so that is the basis on which the program decides which Lorenz curve fits the data better.
Let us know if you have any problems, or any suggestions for improving the program -- schen@worldbank.org
The disk also includes a trial data set, in the file INDIA.DAT. This is the same data for rural India used in the example given in Gaurav Datt's paper (referenced below; see his Table 1). As you will see, it has 1 sub-group, 13 records, and it is a type 5 data set, entered as percentages. The poverty line is Rs 89. By comparing the set-up of INDIA.DAT with columns 2 and 3 in Table 1 of Datt's paper, then running the program with this data set and comparing the results to those reported in Table 4 of Datt's paper (for the General Quadratic Lorenz curve) you should get a good idea of what is going on. Your results should be the same as those in Datt's paper, allowing a little for rounding errors. To double check, we also include a file INDIA.OUT which is the output you should get using INDIA.DAT as the input. Looking through this file in advance will also show you what POVCAL does. POVCAL also produces graphs (below) which are only sent to the screen; you will need special software to print them.
|
|
|
GENERAL QUADRATIC LORENZ |
FIRST DERIVATIVE OF EACH LORENZ CURVE |
SECOND DERIVATIVE OF EACH LORENZ CURVE |
Graphic Results from POVCAL India Example |
||
If you use program "POVCALLC.EXE", you will also get a output data file "LC.DAT". This data file gives you one hundred points on both estimated Lorenz curves, you should choose one with the smaller SSE. "LC.DAT" includes two columns, the first column gives the cumulative percentage points of population; the second one gives the cumulative shares of income.
[Program Overview] [Sample Results] [Download POVCAL]