# Probability sampling

Choosing the right sample for statistically-significant results

How do you conduct an accurate national survey when there are 330 million people living in the U.S.?

It would be impossible to send a survey to every single person, but you can use probability sampling to get data that’s just as good even if it comes from a much smaller population.

Probability sampling uses statistical theory to randomly select a small group (a sample) from a larger population, and then predicts the likelihood that all their responses put together will match those of the overall population.

Probability sampling has two equally important requirements:

• Everyone in your population must have a non-zero chance of being selected (i.e. a chance you’ll send them a survey).
• You must know, specifically, what that chance of being selected is for each person.

Following these two rules will help you choose an appropriate sample that represents the overall population. With the right sample, your results will be equally as valid as if they had come from a survey of the whole population.

## Give everyone a chance at being selected

You never want to knowingly exclude someone in your population from being selected into your sample. Watch out for times when particular groups might be unintentionally prevented from participating.

For example, let’s say you want to understand public opinion on an expansive new immigration law. Will you offer a Spanish language version of your survey? You should. If you don’t, you’ll likely miss a lot of first-generation Hispanic immigrants who aren’t comfortable answering questions in English but who have clear and consistent views on immigration. Your survey results won’t match up with true public opinion.

If you can’t give everyone in your population a chance at completing your survey, your sample with be non-representative and, therefore, biased.

## Common probability sampling strategies

### Simple random sampling

In simple random sampling, all members of the population have an equal chance of being selected, and the selection is done randomly. As the name indicates, this is the simplest sampling strategy, but it is also the most prone to bias. The smaller your sample size is compared to your overall population, the less likely you are to draw a reliable sample totally at random. Try using our sample size calculator to get improved results.

### Stratified sampling

Many populations can be divided into smaller groups that don’t overlap but represent the entire population when put together. When sampling, we can take these groups (or strata) and draw a sample from each separately. It’s common to stratify by sex, age, or ethnicity, assigning different selection probabilities to different strata. As long as all sample members are included in one stratum and all the strata are sampled, the probability design still holds.

### Cluster sampling

Cluster sampling is most often used to save costs when surveying populations that are very spread out geographically. Instead of selecting people at random, different geographic areas (or clusters) are selected at random, and then some or all of the members of the selected clusters are surveyed.

## Steps in probability sampling

### 1. Determine your population of interest.

Think through all the people that you’re interested in hearing from, but also be aware of anyone who should be deliberately excluded.

### 2. Find an appropriate sample frame.

Ideally, your frame should include all members of your population of interest (and no one who is not in your population of interest).

### 3. Determine your sampling strategy.

Do you want clusters and strata? Do you want all sample members to have equal probabilities of selection?

### 4. Select your sample and start surveying!

Depending on the population you’re trying to survey, you might have a hard time finding an appropriate sample frame. Even if you have a good frame, deciding on the best selection strategy will force you to make trade-offs between cost, representation, quality, and timeliness.

Getting people to respond to a true probability survey is difficult, because they are unlikely to be interested in the survey topic or to be compensated for the time and effort it takes to complete the survey.

Many of these problems can be solved with non-probability sampling, which (despite its name) still draws from probability and sampling theory to select an appropriate survey sample.

If you have unlimited resources or a small population of interest, probability sampling may not be necessary. But, in most cases, drawing a probability sample will save you time, money, and a lot of frustration. You usually can’t survey everyone, but you can always give everyone the chance to be surveyed; this is what probability sampling accomplishes.