I recently completed my first data analyst project, and I’m excited to share my experience. As a beginner, I was thrilled to work on a project that involved creating a dashboard from scratch. The data I used was from Kaggle, and I was relieved to find that it was already sorted, with no null, blank, or duplicate values.
But, I have to admit, I didn’t do any data cleaning in Microsoft SQL Server. I know, I know, it’s a crucial step in any data analysis project. However, I justified it to myself by thinking that the data was already clean, so why bother, right?
Looking back, I realize that was a rookie mistake. Even if the data appears clean, it’s essential to perform some level of data cleaning and validation to ensure accuracy and reliability.
In this post, I’ll share what I learned from my experience and why data cleaning is essential, even when working with seemingly clean data.