Hey, have you ever wondered what’s the best way to store data in Snowflake? I recently came across a data pipeline for multiple projects, mainly dealing with financial data, and I’m curious about the benefits of using Iceberg tables over traditional Snowflake tables.
The current setup involves two types of data ingestion: real-time data ingestion through Kafka events and batch data ingestion through files in S3. In both scenarios, data gets stored in Snowflake traditional tables before being consumed by the end-user or customer. The transformation happens within Snowflake, either on the trusted schema or on top of raw schema tables.
Some architects are suggesting switching to Iceberg tables, an open table format. But I’m not entirely sure where Iceberg tables fit in this picture. What are the benefits of using Iceberg tables over traditional Snowflake tables? Are there any downsides to consider, especially when it comes to performance or data transformation?
One advantage of traditional Snowflake tables is that they offer highly compressed and cheaper storage. So, what additional benefits would we get by storing data in Iceberg tables? I’d love to hear your thoughts and suggestions on this.
Let’s break down the use cases and suitability of each option, including their pros and cons. Should we stick with traditional Snowflake tables or make the switch to Iceberg tables?