A while
back, I and a colleague submitted a manuscript for publication on a
peer-reviewed journal. Our manuscript was based on a weighted analysis of data
to attenuate bias due to unequal probability of selection of sub-populations[1]. To be more transparent, we reported both the
weighted and unweigthed frequency distribution of the population by important
variables.
After the
manuscript underwent a thorough anonymous peer-review, we received a
peer-review report which states that because the weighted and unweighted
frequencies did have substantial difference, the analysis weight we used was
wrong. Does obtaining different weighted and unweighted frequencies render the
assigned analysis weight invalid?
As pointed
out above, if weighting serves to attenuate bias due to disproportionate
sampling of sub-populations, then weighted and unweighted frequencies (and
other estimates such as means and regression coefficients) are expected to differ. The magnitude of the
difference depends on the extent of the disproportion in the sampling.
To clear the
above issue, let’s see an example. Let’s say we have a population of size 100
comprised of 25 males and 75 females. That is, males comprise 25% of the population
and females comprise 75%. If we take a
sample of size 30 using simple random sampling and if this sample is comprised
of 15 males and 15 females, then based on our sample, males comprise 50% of the
population and females comprise the remaining 50%. However, the probability of
selection for males was 60% (15/25*100) whereas that for females was only 20%
(15/75*100). Clearly, the sampling is disproportionate. Hence, the analysis
should take account of this disproportion in the sampling. If not,
results will be biased.
In this
simplest scenario, we need to calculate an analysis weight as the inverse of
the selection probabilities of males and females. Accordingly, the analysis
weight (non-normalized) for males will be 1.67 and for females 5. If we declare
a complex sample survey design (using svyset
in Stata) and re-do the analysis (complex sample analysis), then the frequency
of males will be 25% and that of females will be 75%.
Hence, it is
not abnormal if the weighted and unweighted frequencies differ. That is rather
what must be expected given the weight variable has been properly computed.
Reference
- Heeringa SG, West BT, Berglund PA. Applied Survey Data Analysis. Boca Raton, FL: Chapman & Hall/CRC; 2010.