The power of large samples in quantitative research
When conducting research through sample surveys and studies, one of the most important factors affecting the quality of our results is the size of the sample we collect. A large sample size provides substantial advantages over small samples when it comes to representing the population we’re studying.
Statistics relies on samples to make inferences about wider populations. But samples are prone to random error due to chance variations in who or what gets selected. With a small sample, these random fluctuations can skew results considerably from the reality of the full population.
However, as sample sizes increase, random error decreases in influence. The law of large numbers tells us that as more observations are added, the sample averages will converge closely to the true population values.
Essentially, the “signal” of the underlying population characteristics emerges more clearly from the “noise” of sampling variation.
Some key benefits of large samples include:
- Estimates like means, proportions and regression coefficients from large samples are more accurate proxies for the actual population parameters. With smaller samples, the estimate may miss the mark by a wider margin.
Suppose a real estate agency wanted to estimate the average home value in a city. With a small sample of just 5 houses, the average price found could be way off from the true citywide average simply due to bad luck in the selection of atypical homes. One of those 5 houses happens to be a $1 million mansion? Now the sample average is skewed high.
But take a sample of 500 houses. The chances of a few outliers dragging the average significantly from the population mean are much smaller. With 500 data points, the average home value found would provide an extremely close proxy for the true average price across all residences in the city.
- Statistical tests of theories and hypotheses are more powerful and reliable with large datasets, minimizing the chances of both false positives and false negatives.
Imagine a pharmaceutical company testing a new drug to see if it is effective for lowering cholesterol. They give half their test subjects the drug, and half a placebo. After 6 months, they compare the average cholesterol levels between the two groups.
With a small sample of only 10 people in each group, there is a high chance that random variation could skew the results:
A false positive — By chance, the placebo group could have 2 people who happened to lower their cholesterol a lot for unrelated reasons. Now it looks like the placebo worked as well as the drug, even though really there was no effect.
A false negative — Conversely, the drug group could be unlucky and 2 people’s cholesterol rose for random reasons. Now it seems the drug didn’t work, when really it did.
But with a large sample of 500 people in each group, the chances of a few random outliers swinging the results decrease tremendously. The study now has the statistical power to reliably detect even small real effects of the drug, if they exist. And it minimizes the risk of concluding there is no effect when really there is one.
- Margins of error around estimates like confidence intervals are significantly narrower with large n, giving more precise characterizations of population attributes.
Suppose a polling company wants to predict the outcome of the upcoming presidential election. They poll 500 random voters about who they intend to vote for.
Based on the sample results, they estimate that 45% of all voters will vote for Candidate A. But there is still some uncertainty — their statisticians calculate that with a 95% confidence level, the true population percentage for Candidate A lies between 42–48%.
Now compare that to if they had polled 5000 voters instead of just 500. With the much larger sample, their confidence interval around the 45% estimate would shrink considerably — perhaps being something like 43.8–46.2%.
They are now much more certain about how close their sample estimate of 45% is to the real percentage that will be observed across all voters on Election Day. The larger sample allows them to characterize that population attribute (voting intention) in a more precise way.
In summary, sacrificing a bit more effort to obtain large, robust samples pays off with conclusions that can be trusted to closely mirror reality. The representativeness and validity of research findings scale enormously with sample size. So for studies aiming to shed light on populations, bigger is better.