Navigating the Challenges of Analyzing Aggregated Data Across Years | Ranjan Kumar

Have you ever struggled with analyzing aggregated data across multiple years? I know I have. Recently, I came across a Reddit post that sparked my interest in this very topic. The author of the post was trying to find the best solution to a simple research problem. They had data across 15 years, with counts of admitted patients with certain symptoms for each year. The counts ranged from around 40 to around 100, and the plot showed a slight u-shaped relation between years and counts.

The author fitted a negative binomial model to model the count data, accounting for overdispersion. They also included a quadratic term for the year, which improved the model fit. The quadratic term was statistically significant and positive, while the linear term was not, although it was close. To account for autocorrelation, they tried using glmmTMB, but the models were virtually the same.

The question was, can you trust the results from a negative binomial regression with only 15 observations and small degrees of freedom? Is it worth modeling, or is it better to just show the plot? Are there other models that would be better suited for this scenario?

As I delved deeper into the problem, I realized that it’s essential to consider the limitations of our data and the models we use. With only 15 observations, it’s crucial to be cautious when interpreting the results. Perhaps it’s better to explore other models, such as generalized linear mixed models, that can account for the autocorrelation and overdispersion.

What do you think? Have you faced similar challenges in your research? How do you handle aggregated data across multiple years? Share your experiences and insights in the comments below.

Leave a Comment Cancel Reply