When it comes to building efficient pipelines in Azure Data Factory, scalability is key. I recently came across a Reddit post from someone who was looking for advice on how to configure a pipeline to copy data from one database to another. They had a working pipeline with a single copy data function, but wanted to add two more tables without sacrificing scalability.
The original poster was considering creating separate pipelines for each table or adding multiple copy data blocks to the existing pipeline. However, they wanted to know if there was a better way to approach this problem.
One potential solution is to use a table to store the copy data inputs and then loop over them in a single pipeline. This approach would allow the pipeline to scale more easily as new tables are added.
But where should this source table be stored? Should it be a global parameter in Azure Data Factory with an array of objects, or an SQL table? This is where things get interesting.
Using a global parameter in Azure Data Factory would be a convenient option, but it may not be the most scalable solution. On the other hand, using an SQL table would provide more flexibility and scalability, but it would require more setup and maintenance.
Ultimately, the best approach will depend on the specific requirements of your project. If you’re looking for a scalable solution that can be easily implemented in future projects, using an SQL table to store the copy data inputs may be the way to go.
Do you have any experience with copy data parameterization in Azure Data Factory? How do you approach scalability in your pipelines?