Are the differences between sampling clusters and sampling strata, conceptual, methodological, neither or both?

Question

I am fuzzy on the distinctions between sampling strata and sampling clusters. Both seem to aim at designs aiming at creating useful estimates of between/within group (strata, cluster) variation, and in particular, seem to be driven by homogeneity due to some shared group definition.
What are the methodological distinctions?
I would find answers to this part of my question most worthwhile if they explicitly address both (i) what stratified sampling and cluster sampling are intended to accomplish, and (ii) their similarities and distinctions.
What are the conceptual distinctions?
As I am an epidemiologist, I would find answers to this part of my question most worthwhile if couched in substantive theories of the concept of a population as a group of individuals sharing multiple overlapping contexts, with overlapping histories of those contexts. For example, with respect to both cluster sampling, and stratification imply for

Representation in the variables categories? (I.e. valid and reliable estimates.)
Characterization of inequities between variable categories.
Are the variable categories the targets of inference?
Questions of heterogeneity or homogeneity aside, would would preclude a categorical variable from being used?
What circumstances would lead a study designer to say "You know what? We need an additional variable to cluster sample/stratify on.

EDIT 7-20-2020: I feel all four answers to date address methodological concerns, and only one addresses the conceptual concerns (and that did so by saying they do not enter the distinctions). I will find answers addressing both the methodological and conceptual portions of my question most satisfying.

Nuclear03020704 · Answer

Other answers have been giving good and clear examples. I'd like to try a different wording for this.

Consider you are going to sample a city's population to know its average income.
Some of things that will "stratify" your population:

Income level (high, medium, low)
Type of job (skilled labor, unskilled labor, etc.)
Education level (none, highschool, bachelor, master, autodidact, skill from experience, etc.)

Those things will "stratify" your population because you know that you'll find people with different income level or type of job or education level will have different amount of income; while people within the same income level or type of job or education level will more-or-less have the same.
In contrast, some of things that will not "stratify" your population but rather a "cluster":

Neighborhood or city block

If you can assume that any neighborhood in the city are not really different from one another, you can consider neighborhood as a "cluster" rather than a "strata", since you don't believe different neighborhoods will have really different income.

In sampling methodology, strata are designed to make sure you include all different parts of population in your sample, i.e. you have all strata represented. In contrast, clusters are designed so that rather than picking samples from the ENTIRE population at random (which in real-life situations is expensive and more difficult), you can just pick a cluster at random and say "this cluster represents the population at a smaller scale".
To demonstrate why cluster sampling is easier and cheaper than sampling entirely at random, consider you're sampling a city population.
Sampling directly from the city residents list will result in you having to deal with some of sampled people that are really far away. This will make the sampling harder and more expensive.
If you do a cluster sampling, that is you randomly choose neighborhoods/blocks, THEN sample from the residents list of these neighborhoods, the resulting people sampled will be more easier to access because they're closer together. If all the neighborhoods of the city is not that different one another, you can safely say that the cluster you chose will still represent the entire city.

StasK · Answer

Most U.S. health surveys (NHIS and its kiddo MEPS, NHANES, NSDUH) are stratified cluster surveys. The common representation of the public use data sets is a two-stage design with ~50 strata at the first stage of sampling (at which clusters are sampled), usually with two clusters per stratum, and people sampled at the second stage within clusters. This is kind of sixth grade reading level explanation of science, if you like.
Why, and how, are these surveys stratified? Well, the health professionals know that people in different settings have different health care needs and health care outcomes. Urban is different from suburban different from rural, so the level of urbanization / population density is a stratifying variable for these.
Why, and how, are these surveys clustered? Well, cluster samples are either a measure of desperation (there is no way to reach the population in other ways), or simply a way to save on costs (in face-to-face surveys, you rather want to pay interviewers to talk with people, rather that to sit in the car / on the train / walk from one interview to next... so the interviewers should have 5-10-15 minute travel than 2 hour travel between appointments). In large scale U.S. health surveys, you have bits of both: there is no central listing of all people in the country (although one can lay their hands on the list of all addresses, sort of). In international surveys like Demographic and Health Surveys , there may not be enough government data to set up data collection like it is done in the U.S.; the best you may have to deal with is administrative division into provinces, districts, and cities/towns/villages within the latter, with at best rough estimates of population sizes. So you end up sampling those districts, and those settlements within districts, and then send enumerators to count dwellings and then sample from the lists thus created.
There are of course other situations where cluster samples make perfect sense -- namely when the populations are absolutely naturally organized in hierarchical way, like school districts / schools / classes-teachers / students. Clusters are defined by the social processes, not by the statistician's pen. In many of these hierarchical population surveys, there is also interest in data at each level of hierarchy, and in multilevel modeling of mediation of student-level variable effects by the teacher or principal-level variables.
Out of the questions posed by the OP, I can only answer this (others are qualitative research questions, not quantitative research ones):

What circumstances would lead a study designer to say "You know what? We need an additional variable to cluster sample/stratify on."

You can only stratify on a variable that is available on the sampling frame (sampling frame = list of entities that you take a sample from; this would be a list of districts in the example of the DHS surveys, or the list of all 80,000 Census tracts in the case of the United States for the large scale health surveys; this could also be an implicit list like the way to generate random phone numbers in random digit dialing, which is what is being done for BRFSS).
As far as to which variable is to cluster on, it is either the natural hierarchy, or a cost-precision tradeoff: if your interviewers have smaller area to cover, the population is likely to be somewhat more homogeneous, so you don't learn as much from the same number of observations.
P.S. The distinction between clusters and strata is something a lot of people struggle with. You are not alone.
P.P.S. Contrary to what you may have heard, including some of the posted answers, in the U.S., you cannot stratify by person's race/ethnicity, sex/gender, or age, not in the general population surveys, at least. If you have a list of hospital patients with these fields, then of course you can. But there is no general sampling frame (short of maybe the Census Bureau Master Address File) that would list person's name, person's address, and these demographic characteristics. The Nordic countries, however, have population registers where this information can be found; the conversations between Swedes and Americans at professional conferences sometimes go in parallel universes with little traction.) What does happens is that when you stratify by geography, and minorities are heavily segregated, you can select areas that are 90%+ Black/African American or 80%+ Hispanic, and that way you have a good way to predict how many people in those groups your sample will have in the end of the day.

astel · Answer

Stratified sampling is most efficient (in terms of variance of the estimate) when you have homogeneity WITHIN strata and heterogeneity BETWEEN strata. Think US states if your variable of interest were some social issue. Texans are very similar to each other but wildly different from New Yorkers (who are again similar to each other). If this is the case then stratified sampling can be more efficient than simple random sampling since you require less samples to achieve a fully represented sample of your population.
If, in the case of a rare population (i.e. sexual minorities), if that population acts homogeneously with respect to the variable of interest and heterogeneously from members that do not belong to that rare population, then this can cause a large variance in your estimate dependent on whether or not members of this group are in your sample or not. Stratifying on this group ensures that members of this group are in the sample thus achieving less sampling variance for the same sample size.
Consider the case of estimating business revenue in a town with many small businesses and one Wal-Mart. Whether Wal-Mart is included in your sample will cause huge variations in your estimate. Stratifying based on something such as number of employees and perhaps including Wal-Mart in its own strata where the sampling percentage is 100% (this is a take all strata) will decrease the variance in your estimate.
Conceptually, stratified sampling is all about decreasing the variance of your estimate. It allows either the same variance as SRS with fewer samples or less variance for the same amount of samples. What would preclude a variable from being used to stratify? If it had no effect on the variance of your estimate. That is, if it did not further increase the homogeneity within strata. For example, stratifying on eye colour if your variable of interest was student performance. It may not hurt your strata but it will increase the complexity of your survey design needlessly.
Cluster sampling is most efficient (again, efficiency in terms of variance) when you have heterogeneity WITHIN strata and homogeneity BETWEEN strata. Think schools in a particular state and the variable of interest is student height. Cluster sampling intends to design each cluster to essentially be a mini version of your population. The main benefits of this are practical in consideration.
For example, you don't require a complete frame, i.e. if you want to sample students but don't have the students contact information, you can sample the schools instead and have them give the survey to all of the students. It also saves on cost of actually administering the survey. If your survey must be completed in person then it can be expensive to drive around and survey persons chosen randomly using SRS. If you sample clusters that are chosen with geographic proximity in mind this becomes less expensive and can actually lead to you being able to survey more people (which can lead to less variance than SRS).
Clusters are less chosen for their ability to reduce the variance of your estimate and more for their ability to aid in survey administration and reducing costs, however that being said, beyond just practical reasons, it is possible that cluster sampling will have less variance than SRS with the same sample size if there is an intra-class correlation that is negative.

Graham Wright · Answer

Here's how the terms are usually used in survey research.
Stratified sampling is when you take the entire sample frame and preemptively divide it into a number of "buckets" based on some criteria you already know. So if you are sampling people in the US and you already know their race you might divide the sample into white, black, Hispanic and other. These buckets are the "strata." Then instead of taking one big random sample from the entire population you take a random sample from each bucket. There are various benefits of doing this but the biggest is that, if you want, you can take a BIGGER % random sample from smaller buckets to ensure you have enough respondents from that group in your final sample. So if I drew a sample of 500 from each bucket I'm going to have way more Blacks, Hispanics and "others" in my sample than I would if I just drew a random sample from the whole population, which might be important if I want to make sure I have enough N for those subgroups. Of course I'll then need to calculate design weights to adjust for the bias I've intentionally introduced in my sample. But this is easy since I know exactly what sort of bias I've introduced.
Clusters, by contrast, are part of a "two stage" sampling design, where first you draw a random sample of clusters, and then you draw a random sample of observations within the sampled cluster. So if I wanted to study hospital patients I might start by first making a sample frame of all hospitals in the US. Then I would draw a random sample of hospitals. Then, within the hospitals I've sampled I draw a random sample of patients to study.
From a statistical perspective the key difference is that in stratified sampling you just draw ONE random sample, and everyone in the frame has a non-zero probability of selection. Of course people in some strata might have a higher probability of selection than others, but that's where the design weights come in.
In cluster sampling, you draw two random samples – one sample  of clusters and another sample of people (in the sampled clusters). And in that second stage of sampling lots of people (those who are in non-sampled clusters) have a zero % chance of selection. This is when you might want to consider HLM/multilevel modeling to account for the fact that observations are nested within clusters that are themselves just a sample of the total population.
Addition: One conceptual motivation for cluster sampling is that it's often the only feasible way to get the sample you want. There is no one "list" of all hospital patients (or elementary school students) in a country that you can use to draw a random sample of. But there is a list of hospitals (or schools) you can use as a sample frame, and for each hospital chosen there is a list of patients within that hospital. So often it's the only feasible way of proceeding.

Huy Pham · Answer

As I understand it, Cluster sampling is best when the population is homogeneous, and the differences between the means of the clusters is small, and the variance within a cluster is large. The aim is to use the cluster as a proxy for the population as a whole. The benefit is practical. For example, it is easier to pick and one or two schools and sample the students from that school, rather than sample one or two students from many many schools. So you might select a small number of schools through simple random sampling and then go to those schools and use simple random sampling to select students from them. This of course requires that the schools be basically the same as each other, and each school to have a wide selection of students to be representative of the whole population.
On the other hand, Stratified sampling is best when the population is heterogeneous, and there are large differences between the means of the strata, and the variance within a stratum is small. The aim is to make sure you do not miss out on the differences within your population. Leave it to random chance and simple Random Sampling and you might not sample small but important groups—for example rural schools might be underrepresented. So you make sure that that strata is represented in the sample by creating a scheme that captures the stratification of the population. For example, you know your final sample will have to be 95% urban schools and 5% rural schools. Then simple random sample within those strata until you have the desired portions to make up your final sample. If there is indeed wide variation within a population, stratified sample should lead to more precise estimates compared to simple random sampling.

Are the differences between sampling clusters and sampling strata, conceptual, methodological, neither or both?

5 Answers

Add your own answers!

Ask a Question