Unsupervised Machine Learning for Data Cleaning: A Promising Approach?

Unsupervised Machine Learning for Data Cleaning: A Promising Approach?

Hey there, fellow data enthusiasts! I’m currently working on a large dataset that’s a mix of labeled and unlabeled data. The goal is to distinguish between two distinct groups, but the dataset contains a lot of noise – some relevant information and some not. To tackle this, I’ve been thinking of applying unsupervised machine learning techniques to clean my data.

My idea is to use K-means clustering with k = 2 to separate the data into two main clusters. By doing so, I hope to roughly filter out redundant or irrelevant information and retain only the group I’m interested in. This approach seems to make sense, but I’d love to hear your thoughts on whether it’s effective.

Have you ever used unsupervised ML for data cleaning? What were your results? Any tips or advice would be greatly appreciated!

Leave a Comment

Your email address will not be published. Required fields are marked *