Cross Validated Asked by Warwick Masson on December 21, 2021
I need to compute the Gini coefficient on some population data arranged in income brackets: for example
$0->$1000 : 10000 people
$1000->$10000: 50000 people
My problem is that the last bracket is unbounded i.e it’s in the form:
$1000000 < : 500 people
Is there any way to calculate the Gini coefficient given this data?
A simple approach
The Pareto distribution may be used as a decent first-order approximation for the distribution of the upper tails of income distributions. Henson (1967) provides one such application (among other things) in the estimation of the mean of the truncated upper tail:
$$overline{X} = X - left(frac{V}{V-1}right)$$
Where:
$overline{X}$ is the mean of the truncated upper tail.
$X$ is the lower-limit of the truncated (open) income interval.
$V = frac{c-d}{b-a}$
$a$ is the natural log of the lower limit of the preceding (closed) income interval
$b=ln (X)$
$c$ is the natural log of the sum of the frequencies of both the top (open) interval, and the (closed) interval below it.
$d$ is the log of the frequency of the top (open) interval.
A little more sophistication
If you happen to have educational attainment data (specifically eighth grade or less, some high school, high school graduate, some college, college graduate, and post graduate), Angle (2004) developed an estimator for truncated right tails of income distributions (specifically in the context of the US Census' Current Population Survey) which is tailor-made for your purposes. As Angle writes:
The Salamander was devised as a solution to the problem of inferring the difference between the Gini concentration ratios of nonmetropolitan (nonmetro) annual wage and salary incomes and metro ones.
The Salamander estimator of the mean income in the top (open) income interval is:
$$overline{X}_{t} = sum_{i=1}^{I}w_{it}M_{it}left[frac{(1- omega_{i})}{left(1 - frac{4}{3}omega_{i}right)}right]$$
Where:
$X$ is the lower-limit of the truncated (open) income interval.
$omega_{i}$ is a parameter of Angle's (2002) inequality process (the theoretical loss from a loser won by a winner in a competitive economic encounter) linked specifically to educational attainment—Angle's 2004 paper provides estimates for each of these in a table on page 10.
$w_{it}$ is the proportion of the population at level $i$ of educational attainment at time $t$.
$M_{it} approx frac{3alpha_{i} - 1}{3lambda_{it}}$ is the median of a $Gamma$ PDF.
$alpha_{it} approx frac{1-omega_{i}}{omega_{i}}$ is the shape parameter of the income distribution for educational attainment $i$ at time $t$.
$lambda_{it} approx frac{(1-omega_{i})sum_{i=1}^{I}frac{w_{it}}{omega_{i}}}{overline{x}_{t}}$ is the scale parameter of the income distribution for educational attainment $i$ at time $t$. Unfortunately:
$lambda_{it}$ cannot be estimated unless an estimate of $overline{x}_{t}$ is available.
However, Angle (2004) is optimistic that $overline{x}_{t}$ can be estimated, but goes into more detail than I wish to transcribe here.
References
Angle, J. (2004). The Salamander: A Model of the Right Tail of the Wage Distribution Truncated by Topcoding [Working Paper]. Economic Research Service.
Angle, J., & Tolbert, C. M. (2004). Topcodes and the Great U-Turn in Nonmetro/Metro Wage and Salary Inequality (ERS Staff Paper No. 9904; pp. 1–28). Economic Research Service, Food and Rural Economics Division, USDA.
Henson, M. F. (1967). Trends in Income of Families and Persons in the United State: 1947 to 1960 (Technical Paper No. 17). U.S. Bureau of Census.
Answered by Alexis on December 21, 2021
You can put a lower bound on the Gini coefficient by assuming all 500 of the highest earners earned $1000000. The upper bound is 1. Any attempt to try to narrow it down is probably impossible without further information. How would you know that a billionaire didn't move into the area? Placing a distribution would be very dangerous - fitted distributions are rather speculative in their upper tails. Maybe if you have defensible absolute upper bound, you could get an upper and lower Gini coefficient bound.
Answered by AlaskaRon on December 21, 2021
The book by Handcock & Morris: "Relative Distribution Methods in the Social Sciences" (Springer), solve that problem the following way, quoting:
" We have imputed values for these topcoded earnings in each year (about 0.5% of the cases) using a Pareto distribution. The mean of these imputed values is about 1.45 times the topcode; the value traditionally assigned to topcoded earnings. "
(They give no reference for that "traditional" use).
A relevant paper you could have a look at: http://www.ce.utexas.edu/prof/bhat/ABSTRACTS/Imputing_a_continuous_income.pdf
Answered by kjetil b halvorsen on December 21, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP