Building a Database from an API with No Order Tracking Status: A Real-Time Conundrum

Building a Database from an API with No Order Tracking Status: A Real-Time Conundrum

As a data engineer, I’ve faced a familiar problem: building a database from an API that lacks order tracking status. This missing piece can make it challenging to create a reliable and up-to-date database, especially when real-time data is required. So, how do you overcome this hurdle?

Let’s consider a scenario where you need to generate reports on the 1st day of the month, using data from the previous month and year. The API provides data like item names, revenue, quantity, and transaction IDs, but no order status tracking. How do you build a database that caters to both historical and current data?

## The Challenge
The main issue is that the API doesn’t provide real-time data, making it difficult to ensure accuracy and consistency in your reports. You need a system that can handle the lack of order tracking status while still providing reliable data.

## Two Possible Approaches
There are two possible ways to tackle this problem:

### Approach A: ETL/ELT with Duplicate Checking
This approach involves using an ETL/ELT process with a date argument set to the current date. You would then need to implement a separate logic to identify and remove duplicates on a daily basis. While this method can work, it may lead to data inconsistencies and require additional resources to manage.

### Approach B: Delayed ETL/ELT Orchestration
The alternative approach is to introduce a delay in the ETL/ELT orchestration process. This means making API calls with a 2-3 day delay as arguments before passing the data to the database. This approach seems more reliable, as it allows you to retrieve the previous month’s data via API call and the previous year’s data from your existing database.

## Industry Standard?
While there isn’t a one-size-fits-all solution, Approach B appears to be a safer and more reliable method. By introducing a delay, you can ensure that your database receives accurate and consistent data. This approach also enables you to leverage your existing database for historical data, reducing the risk of data inconsistencies.

## Conclusion
Building a database from an API with no order tracking status requires careful consideration of the challenges and limitations. By choosing the right approach, you can create a reliable and efficient database that meets your reporting needs. Whether you opt for Approach A or B, it’s essential to weigh the pros and cons and tailor your solution to your specific use case.

*Further reading: Data Engineering Best Practices*

Leave a Comment

Your email address will not be published. Required fields are marked *