Don't Forget the Basics: Working with Small Data in Python

Don’t Forget the Basics: Working with Small Data in Python

I recently had a humbling experience during a live coding assessment. I completely bombed it because I had forgotten how to work with small datasets using pure Python code. I’m used to working with large datasets and relying on libraries like Polars to do the heavy lifting. But in this assessment, only the default Python packages were available, and I was caught off guard.

I struggled to remember how to do transformations using dictionaries, try-excepts, and for loops. It was a painful reminder that sometimes, we forget the basics in our pursuit of more advanced tools and techniques.

However, I did learn something valuable from this experience. When working with small datasets, using default Python packages can be incredibly performant. In fact, I found that using defaultdict was 100 times faster than using Polars for a small dataset. This makes sense, but it’s easy to forget how efficient the default packages can be when we’re used to working with big data.

So, don’t forget to practice working with small data using pure Python code. It may not be glamorous, but it’s an essential skill to have in your toolkit.

Leave a Comment

Your email address will not be published. Required fields are marked *