Have you ever come across a research paper that left you wondering how the authors handled collinearity issues in their model? I recently stumbled upon a paper that caught my attention, and I’d love to dive into the details.
The paper mentioned that the crowding factors they included in their models had a modest effect on waiting room time and boarding time after controlling for time of day and day of week. But here’s the thing: wouldn’t accounting for a confounder like temporal variables introduce multicollinearity into the model?
In general, when we control for the effects of other variables, we’re trying to isolate the relationship between our variables of interest. However, if those control variables are highly correlated with each other, it can lead to multicollinearity. This can make it difficult to interpret the results of our model.
So, how do we handle this issue? One approach is to use dimensionality reduction techniques, such as principal component analysis (PCA), to reduce the number of predictor variables. Another approach is to use regularization techniques, such as L1 or L2 regularization, to reduce the magnitude of the coefficients.
In the case of the paper I mentioned, they used quantile regression, which is a type of regression that’s more robust to outliers and non-normality. However, it’s still important to check for multicollinearity and take steps to address it.
What are your thoughts on handling collinearity issues in research papers? Do you have any favorite techniques or approaches that you’d like to share?