Scheduling and distributing the first processor - HxGN EAM - 12.1.0.1 - Reference - Hexagon

HxGN EAM Databridge Pro Technical Reference

Language
English
Product
HxGN EAM
Search by Category
Reference
HxGN EAM Version
12.1.0.1

Dataflow Studio is deployed on three nodes for reliability and scalability but, it is up to the flow designer or user creating the flow to ensure that scalability and optimization is realized when a flow executes.

The scheduling execution of the first processor in a process flow depends on how data is accessed.

Data Access Scenario

Execution Setting

Distribution Setting

Processor retrieves the same data each time it executes

Primary Node

Set connections to "Round robin" load balance strategy

Processor receives distinct records each time it executes

All Nodes

In the case where a processor retrieves the same data each time it executes, the scheduling execution should be on the primary node only. If allowed to run on multiple nodes simultaneously, duplicate records could be retrieved and processed. Examples of this would be running a query on a database or retrieving a set of records from a filesystem like an S3 bucket.

If the processor is receiving distinct records, the scheduling execution can be executed on all nodes.

To adjust the Execution schedule on a processor, open the Configure Processor dialog. On the Scheduling tab, set the Execution value accordingly and apply.

Processors running on the primary node should also be set to distribute FlowFiles to all nodes in subsequent connections to allow the load to be spread out.

To adjust the distribution, open the Configure Connection dialog. Select the Settings tab and set the Load Balance Strategy to "Round robin." This should be done for all connections following the processor set to primary node.