Storing Data in S3 with DuckDB: A Good Alternative to Postgres?

Storing Data in S3 with DuckDB: A Good Alternative to Postgres?

Hey, fellow data enthusiasts! I came across an interesting question on Reddit that got me thinking about data storage and querying. The original poster was wondering if using S3 with DuckDB would be a bad idea instead of relying on Postgres. I’ll dive into the details and share my thoughts.

The scenario involves a web app where users upload data, and the app generates a summary table with 100k rows and 20 columns. The app displays 10 rows at a time. Initially, the plan was to store the table in Postgres, but then the idea struck to store the parquet file in S3 and use DuckDB to access the required subsets. This approach seems more intuitive, especially when dealing with a lightweight database.

So, is this a reasonable approach, or are we missing something obvious? Let’s break it down.

The context is important here: the table values change based on user input, with usually whole column replacements. There are 15 fixed columns, and the remaining 5 columns vary in number. Plus, this is an MVP with low traffic.

Using S3 with DuckDB can be a great alternative to Postgres in this scenario. By storing the parquet file in S3, you can reduce the load on your database and make it more lightweight. DuckDB can then be used to query the required subsets of data, which is perfect for an MVP with low traffic.

One potential benefit of this approach is that you can scale your storage and querying separately. With S3, you can store large amounts of data without worrying about database storage limits. DuckDB can handle the querying, and you can scale it as needed.

However, it’s essential to consider the trade-offs. You’ll need to ensure that your data is properly partitioned and optimized for querying with DuckDB. Additionally, you might need to implement additional logic to handle data consistency and integrity.

In conclusion, using S3 with DuckDB can be a reasonable approach, especially for an MVP with low traffic. It’s essential to weigh the pros and cons and consider your specific use case before making a decision.

Leave a Comment

Your email address will not be published. Required fields are marked *