Home / News / What to consider with the Azure Data Factory
20 April 2021
By: Dennis Barhorst

What to consider with the Azure Data Factory

Featured image for a blog about Azure Data Factory

The Azure data factory

The Azure Data Factory is a useful resource in the Azure landscape: it is cheap, flexible and fast. During my work as a BI Consultant I often use this source as a basis for data integrations. But what should you consider when using this? In this blog I will tell you what I encountered while using the Azure Data Factory.

 

SSIS (SQL Server Integration Services)

Let me start with the SSIS integration. This sounds like a very nice feature to have, because you can reuse the old on-prem SSIS integrations in the Cloud. The enablement of SSIS Pipelines has a direct cost of +/- € 360 Euro per month, at the cheapest, regardless of whether you use it or not. So quite expensive. While the Azure Data Factory itself is actually a follow-up version of SSIS. Most SSIS pipelines are easy to build and thus a lot cheaper.

 

Copy data actions

A second important fact to know is in the frequently performed Copy data actions. By this I mean Data which changes every 5 minutes for a large number of tables. During the development of an Integration solution between D365 Business Central and Azure SQL using the Data Factory, we encountered the following:

The requirements were “Update data every 5 minutes for 60 Tables”. As a team, we seemed to solve this perfectly with the Data Factory. The final cost to run this was ultimately € 8,644 per month. These high costs were caused by one activity, namely “Copy Data”. The “Copy data” is the most expensive activity in the Data Factory at EUR 0.25 per hour, but still not astronomically expensive. However, when we do the math, it already looked like this for our requirements:

60 * 0,0667 * €0,25 = €1,0005 per run
1,005 * 12 = €12,006 per hour
12,006 * 24 = €288,144 per day
288,144 * 30 = €8.644,32 per month

Total number of runs per month: 51.840

 

In the calculation, 0.0667 is the run time in hours and 0.25 is the cost per hour. This runtime is the calculated and rounded runtime charged by Microsoft and rounded to minutes. The 0.0667 equates to a 4-minute duration, but the duration of the pipeline itself was less than a minute in length per table. This is because there were several Copy data actions in one pipeline. For example, for a total runtime of 23 seconds, 240 seconds were charged. We could not use this to get to our solution.

 

Our solution

With the work around we implemented, the pipeline now calls a function app instead of “Copy data” activities. This function app copies and transforms the data to the correct format and writes it to the Azure SQL. This change has completely changed the costs from € 8,644 to € 32 per month for the Data Factory. The costs of the function app are only € 0.11 per month.

 

A great tool

So there are several things that you should definitely pay attention to, but the data Factory is a very nice tool that we use a lot. If you have to refresh little or infrequent data, you can use the “Copy data” activity. As soon as you often want to refresh many entities via the Data Factory, it is a good idea to check whether you can have the actual movement of the data carried out by means of a function app.

 

Would you like to know more about our Azure services? Check out our Azure page: