When Data is Scarce: Navigating Multivariable Logistic Regression with Limited Events

When Data is Scarce: Navigating Multivariable Logistic Regression with Limited Events

As a medical doctor with a Master’s in Biostatistics, I’ve encountered a common problem in statistical analysis: dealing with limited events in multivariable logistic regression. In my current project, I’m working with a dataset of 1000 cases, but only 23 events – a relatively rare occurrence.

This raises questions about the reliability of my model and the predictors I’ve identified. Should I drop some predictors to avoid overfitting? How can I ensure the accuracy of my results?

The Problem with Limited Events

When the number of events is low, the model becomes fragile and prone to overfitting. This is especially true when using stepwise selection, which can lead to biased estimates. The conventional rule of thumb is to have at least 10 events per variable, but what if that’s not possible?

Strategies for Dealing with Limited Events

One approach is to use Firth’s penalized logistic regression, which can help reduce overfitting. I tried this in my sensitivity analysis, and while it didn’t drastically change the results, it did provide a more robust estimate.

Another strategy is bootstrapping, but this can be problematic with limited events. In my case, bootstrapping gave me nonsensical estimates, likely due to the small number of events, especially for factor A, which is a known strong predictor.

Addressing the Conundrum

So, what can you do when faced with limited events in multivariable logistic regression? Here are a few suggestions:

  • Use alternative models: Consider using alternative models, such as Bayesian logistic regression or generalized additive models, which can handle limited events more effectively.
  • Increase the sample size: If possible, try to increase the sample size to reduce the likelihood of overfitting.
  • Use data augmentation: Data augmentation techniques, such as synthetic data generation, can help increase the sample size and improve model robustness.
  • Consult with experts: Don’t be afraid to consult with statistical experts or collaborate with colleagues to get additional insights and perspectives.

Final Thoughts

Dealing with limited events in multivariable logistic regression requires careful consideration and strategic planning. By using alternative models, increasing the sample size, and employing data augmentation techniques, you can improve the robustness of your results and increase confidence in your predictions.

*Further reading: Multivariable Logistic Regression: A Review*

Leave a Comment

Your email address will not be published. Required fields are marked *