Azure Data Factory - Integration Runtime
Azure Data Factory - Integration Runtime
Azure integration runtime types:
1. Azure Integration Runtime
Usage: Typically used for moving data between cloud services.
Details: The Azure Integration Runtime is a fully managed, serverless compute in Azure Data Factory. It allows you to easily create, schedule, and manage data-driven workflows in the cloud. When your data does not need to be moved across a private network and it's located in public network scenarios, the Azure Integration Runtime is an ideal choice.
Suitability for the Scenario: Not typically used for moving data from on-premises servers to the cloud due to data security and potential network latency issues.
2. Self-hosted Integration Runtime
Usage: Ideal for private network scenarios, including transferring data from on-premises to the cloud securely.
Details: The Self-hosted Integration Runtime provides a secure, private, and high-performance data transfer method for moving data over a private network, often from on-premises servers to cloud storage or other cloud-based data services.
Suitability for the Scenario: High. Self-hosted Integration Runtime is often used to transfer data from on-premises SQL servers to cloud storage like Azure Data Lake Storage, making it a good fit for the provided scenario.
3. Azure-SSIS Integration Runtime
Usage: Specifically used for SQL Server Integration Services (SSIS) packages execution in Azure.
Details: The Azure-SSIS Integration Runtime allows you to natively run SQL Server Integration Services packages in Azure. It is designed for complex ETL (Extract, Transform, Load) processes that are constructed within the SSIS environment and need to be executed in Azure.
Suitability for the Scenario: May be suitable if ETL processes (SSIS packages) are involved in the data transfer, but not necessarily the best choice for simple data transfer.
Runtime configuration overview:
Integration runtime status:
Status: Running
Type: Self-Hosted
Version: 4.5.5422.1
Running / Registered Node(s): 1/1
High Availability Enabled: False
Linked Count: 0
Queue Length: 0
Average Queue Duration. 0.00s
Integration runtime node details:
Name: Node1
Status: Running
Version: 4.4.7292.1
Available Memory: 7697MB
CPU Utilization: 5%
Network (In/Out): 1.11KBps/0.72KBps
Concurrent Jobs (Running/Limit): 2/12
Role: Dispatcher/Worker
Credential Status: In Sync
Important adjustable configuration parameters:
High Availability:
Changing: You might enable High Availability.
Effect: Enabling High Availability can provide resilience against node failures by deploying multiple nodes. However, it might increase costs and complexity due to the need to manage additional resources.
Running / Registered Node(s):
Changing: You could increase the number of nodes.
Effect: Having more nodes can enhance the throughput and availability of the Integration Runtime. But, managing multiple nodes may necessitate additional administrative overhead and possibly increase costs.
Available Memory:
Changing: Adjusting the memory size allocated or moving to a machine with larger memory.
Effect: Increasing memory might enhance performance for memory-intensive operations. Conversely, decreasing it or utilizing it for other applications could impact performance.
CPU Utilization:
Changing: Deploying the Integration Runtime on a machine with a different CPU configuration.
Effect: Using a machine with a stronger CPU could improve performance, while downgrading might reduce costs but also reduce performance.
Network (In/Out):
Changing: Could be altered by changing network hardware or configuration.
Effect: Improvements could enhance data transfer rates, while downgrades or sharing bandwidth with other applications could impact it.
Concurrent Jobs (Running/Limit):
Changing: Adjusting the maximum allowed concurrent jobs.
Effect: Increasing might expedite data movement if resources allow. Decreasing could limit data movement but might be useful for resource conservation or limiting data transfer to avoid network congestion.
Role: Dispatcher/Worker:
Changing: Adjusting the roles of nodes if using multiple nodes.
Effect: Could balance load and manage distribution more effectively
Dispatcher:
Role: The dispatcher primarily serves to manage and delegate tasks.
Functionality: It receives tasks from the Azure Data Factory and assigns them to available worker nodes for execution.
Consideration: It doesn’t execute the tasks but is pivotal for managing and distributing workload among workers efficiently.
Worker:
Role: The worker is essentially the execution agent.
Functionality: It executes the tasks assigned by the dispatcher. This involves actual data movement or transformation activities.
Consideration: Workers are fundamental for scale and parallel execution in a distributed integration runtime environment.
Dispatcher/Worker: If it has both roles, it means it delegates tasks to other nodes and also executes tasks. It manages workload distribution among available nodes, including itself.
If high availability and workload management are considerations:
Multiple nodes could be set up, where some act purely as dispatchers (managing and delegating tasks) and others as workers (executing tasks).