In public health research, multivariable models
are often used to control confounders while investigating the effect of an
exposure on an outcome. To use a given model, certain assumptions shall be
fulfilled. A model cannot be valid if its assumptions are fatally violated. Some
assumptions are checked before using the model. For example, one wouldn’t
choose a linear regression model for a nominal dependent variable because the
assumption of “a continuous dependent variable” is violated. Other assumptions
are checked post-hoc (for example, normality of residuals in linear regression
models). Assumptions are model-specific.
There is also a prerequisite that must be checked
for all models but that is not often taught to students and not clearly
addressed in books. A variable being controlled in a multivariable model should
be one that has been measured on all study units and hence for which all study
units have some meaningful value. A control variable should not be one that is
measured only on a sub-sample. For example, in a study among mothers, if there
is a variable about outcome of previous pregnancy, it applies only to mothers
who had at least one pregnancy in the past. It is not uncommon to find
researchers who enter such variables into multivariable models and end up in
models “behaving wildly”. This is because all study units for which the
variable doesn’t apply will be excluded from the analysis. The analysis will
then be limited to a sub-sample. In effect, estimates will biased, precision
will be lost as manifested by too wide confidence intervals (sometimes, the
lower and upper bounds of confidence intervals could not be estimated), and
model goodness-of-fit statistics cannot be determined.
A variable belonging to a sub-sample can be
controlled only if one is purposely doing the analysis on a sub-sample for a
sub-population inference. In that case, it should be well planned from the
outset including ensuring the adequacy of the sample size for sub-population
inference.
In conclusion, while
working with multivariable models, it s necessary to make sure that all control
variables (also exposure variables, for that matter) apply for all study units
(unless one is doing an analysis of a sub-sample for sub-population inference).
Otherwise, results could lack both validity and precision.
No comments:
Post a Comment