When calculating the Gini coefficient for the US, how should the portion of the population which has not filed a return be incorporated?

Question

The Gini coefficient $G$ is a commonly used measure of income distribution inequality, taking values from 0 (meaning every individual in the population has an identical income) to 1 (meaning a single individual in the population earns the entirety of the population's income... and violent revolution is likely imminent ;). $G$ is the difference between the 'line of income equality' where income is distributed uniformly in the populace, and the Lorenz curve, which describes cumulative income (or wealth, other social resources, etc.) as a function of cumulative portion of the population.
In the US, and in US states, $G$ is calculated using income tax return data provided by the IRS. For example, calculated from the IRS' Tax Year 2017: Historic Table 2 (SOI Bulletin). However, data based on US tax returns makes clear that the number of returns filed is (much) less than the population of the US.
This is to be expected, I suppose: we generally do not expect 2 year olds to be earning income, or filing taxes on it, for example. On the other hand, those not filing returns are probably a heterogeneous group, and likely include: the jobless (i.e. employable, but not working, not actively searching for employment, and not earning a taxable income… many full-time college students, for example), possibly some wealthy folks with income sources that are entirely (or nearly entirely) not taxable (e.g., they own a lot of treasury bills, or Muni bonds from their state of residence, etc.), and unemployable (e.g., the aforementioned toddler people persistently lacking language, etc.), possibly others.
How are individuals not filing returns typically accounted for in calculating $boldsymbol{G}$ as a measure of income distribution inequality? (Assuming that we have reliable estimates of the population size?)
Are they:

Ignored? (I.e. is $G$ typically estimated as a measure of inequality among those filing taxes)?

Incorporated into the calculation of $G$ with zero assumed income?

Incorporated into the calculation of $G$ with some estimated mean outcome below the level required for filing taxes?

Incorporated into the calculation of $G$ with some other kind of estimated mean outcome?

Something else?

Bonus question: If there is some means of incorporating the whole population into the estimate of $G$, is this for all ages, or only for some range, such as 18–62 years?
PS One of the places where $G$ breaks down as a measure is when some people actually have negative incomes: certainly possible in the US today. It is probably Ok to leave this nuance out of the answer to this question… unless it isn't. :)

When calculating the Gini coefficient for the US, how should the portion of the population which has not filed a return be incorporated?

Add your own answers!

Ask a Question