Have you ever wanted to spin up your own open data lakehouse locally using open-source tools? I recently put together a hands-on walkthrough to show you how to do just that using Presto and Iceberg.
## The Goal: Keep it Simple and Reproducible
My goal was to keep things simple, reproducible, and easy to test. I wanted to create a guide that would allow you to easily follow along and set up your own open data lakehouse.
## The Tech Stack
The tech stack used in this guide includes Presto, Iceberg, MinIO, and OLake. These open-source tools allow you to create a flexible and scalable data lakehouse.
## The Process
The guide takes you through a step-by-step process of setting up your own open data lakehouse. From running containers to configuring the environment and querying Iceberg tables with Presto, I’ve got you covered.
## What I Learned
One thing that stood out during the setup was how fast and cheap it was. I used a small dataset for the demo, but you can easily push the limits and create your own benchmarks to test how the system performs under real conditions.
## Flexibility is Key
The guide uses MySQL as the starting point, but you can easily plug in Postgres or other sources. This flexibility is what makes open-source tools so powerful.
## Take the Next Step
If you’ve been trying to build a lakehouse stack yourself, this guide can give you a good start. Check out the blog and let me know if you’d like me to dive deeper into this by testing out different query engines in a detailed series, or if I should share my benchmarks in a later thread.
If you have any benchmarks to share with Presto/Iceberg, do share them as well.
**Read the full guide and watch the video walkthrough here:** [link](https://olake.io/blog/building-open-data-lakehouse-with-olake-presto)