Hey there! I’m sure many of you have gone through the same struggle as I did when trying to connect the dots between statistical concepts and their real-world applications in machine learning. I recently delved deep into the world of statistics, covering topics like confidence intervals, uniform and normal distributions, and hypothesis testing. While these concepts are fascinating, I couldn’t help but wonder: what’s the actual practical use of these concepts in machine learning?
For instance, let’s say I have a numeric feature of prices that doesn’t follow a normal distribution and is skewed. I can apply the central limit theorem (CLT) to convert the data into a normal distribution. But what’s the point of doing so? By applying CLT, I’d be changing the actual values in the dataset, which would alter the input feature. So, what’s the real-world application of normal distribution in machine learning?
The same question applies to confidence intervals. How do we practically use these concepts in machine learning? Are they just theoretical tools or can they be used to drive meaningful insights and better decision-making?
In this post, I’ll explore the practical use cases of distribution analysis in machine learning, and how we can move beyond the theoretical aspects of statistics to unlock real value from our data.