Partek Flow Documentation

Page tree
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

 

Note: This tool requires the Flow rest API license feature

Obtaining the Allocator

$ wget customer.partek.com/flow_worker_allocator

$ chmod 755 flow_worker_allocator

Configuring the Allocator

The configuration file flow.worker.allocator.json must exist in the home directory. If this file is not present, an example configuration file will be written by running the allocator like so:

$ ./flow_worker_allocator

The allocator will exit immediately so the example configuration file can be edited. This configuration file is documented in the Configuration File Format section below.

Note: The system’s temp folder (e.g. /tmp) must be writable and mounted with execute permissions in order for the allocator to run.

Authenticating the allocator to Flow

After flow.worker.allocator.json has been configured, run the allocator:

$ ./flow_worker_allocator

You will be asked for your Flow administrator username and password. After successful authentication, a token is stored in ~/flow.auth.json which is used for all following authentication attempts. If this token is removed, the allocator will again prompt for a Flow username and password. This token only allows access to the Flow rest API and if compromised can not be used to recover an account password.



Starting the allocator as a background process


nohup ./flow_worker_allocator 2>/dev/null &


The allocator takes no arguments. All configuration is stored in ~/flow.worker.allocator.json.



Stopping the allocator


killall flow_worker_allocator


If the allocator was run as a foreground process, CONTROL-C or SIGTERM will cause the process to exit.



Logging


The allocator writes daily rotated log files to ~/flow.worker.allocator.log. For verbose output, set DebugMode to 1 in the configuration file.



Allocation criteria and strategies


Once configured, the allocator will poll the Flow server every CheckIntervalSec seconds to ask if there is any pending work that would be able to start immediately if another worker was added. If this is true, WorkerCounterCmd is used to query the job scheduler to see how many Flow workers have been allocated. If this is below the WorkerResourceLimit : MaxWorkers limit, one worker is allocated using WorkerAllocatorCmd.


It is recommended that WorkerResourceLimit : IdleShutdownMin be relatively short so that allocations are elastic: large workloads are able to allocate all resources available to them and quickly return those resources to the job scheduler when Flow workers are idle.



Configuration file format


FlowAPITimeoutSec : integer

The length of time in seconds the allocator will wait for a response from the Flow server.


CheckIntervalSec : integer

The length of time in seconds the allocator will ask Flow if a worker is needed.


InfrastructureWaitTillUpdateTimeSec : integer

The allocator will communicate with your resource allocation infrastructure to see how many workers are running. In most cases, this infrastructure is a job scheduler (e.g. torque, lsf, sge) where there is a delay between the request of resources and the acknowledgement that the request has been made. This parameter tells the allocator to wait for the job scheduler to update before making any further allocation decisions. Note: InfrastructureWaitTillUpdateTimeSec should be less than CheckIntervalSec.


DebugMode : integer

If set to 1, the allocator will use verbose logging. This includes reporting on all allocation decisions.


FlowExternalServerURL : string

This is the URL used to log into Flow, ex: http://flow.server.url:8080 

This must be network accessible from the server running the allocator. If the allocator is running on the same server as the Flow server, then this URL is likely to be http://localhost:8080 


FlowServerWorkerConnectionHost : string

The DNS name or IP of the Flow server from the worker node’s perspective. In most cases, workers are launched by a job scheduler are on a private network. This means the name of the Flow server that the worker needs to connect to may be different than the one listed under FlowExternalServerURL.


FlowDaemonUser : string

The Linux user ID under which job allocation requests are made. This is used when communicating with a job scheduler in order to query for the number of running or pending queued jobs.


WorkerResourceLimit : JSON data

This defines resource limits for every allocated worker. These values are used by RunWorkerCMD and WorkerAllocatorCmd to inform the worker and job scheduler about resource requirements. The following are the resource limit types:


MaxWorkers : string

The maximum number of workers that will be allocated regardless of Flow resource demands. This should not be more than the licensed number of workers or more than the number of jobs the queue will accept for FlowDaemonUser.


MaxCores : string

This defines the maximum number of CPU cores that a worker will use. This should be consistent with you job queue limits.

 

MaxMemoryMB : string

Same as above, but for the total amount of memory consumed by a worker.


RuntimeMin : string

Maximum lifetime of a worker. This should be less than or equal to the runtime limits imposed by your job queue.


IdleShutdownMin : string

Flow workers doing no work for this number of minutes will auto-shutdown and release their resources back to the job scheduler.


RunWorkerCMD : JSON data

This is used to build the shell command used to start a Flow worker and is passed to your job scheduler. The parameters are as follows:


Type : string

The only supported method at this time is SHELL.


Binary : string

The full path to partekFlowRemoteWorker.sh


Options : JSON data

These options must be consistent with those defined in WorkerResourceLimit and FlowServerWorkerConnectionHost. Each option is appended to the command string in the same order it is defined here. The keys 1 … n are merely placeholders as is arg1. Keys labeled as @self refer to fields in this json configuration file where their value (encoded as a simple array) denote the JSON key hierarchy from where to lookup. In most cases, changes are not necessary here unless a new type of worker limit is being added or removed.


WorkerAllocatorCmd : JSON data

This is used to build the shell command that allocates a worker using your and requires modification based on your job scheduler, queue limits, and submission options.


Type : string

Defines the type of job scheduler. This is just a label and has no functional impact. Examples include SGE, TORQUE, LSF, SWARMKIT


Binary : string

The executable used to submit jobs. This must be in your path. Examples: bsub, qsub


Options : JSON data

Keys define the command line options (-x, -q, -M). The values can be strings, null, or @self to read configuration options from the JSON configuration file. The @self can contain the key “append” in order to append static strings to values.


WorkerCounterCmd : JSON data

This is used to build the command that asks your scheduler how many workers have been queued or are running. The output from this command is parsed according to the OutputParser definition.


Type : string

Defines the type of job scheduler. This is just a label and has no functional impact. Examples include SGE, TORQUE, LSF, SWARMKIT


Binary : string

The executable used to query submitted job information. This must be in your path. Examples: qstat, jobstat


Options : JSON data

Keys define the command line options (-x, -q, -M). The values can be strings, null, or @self to read configuration options from the JSON configuration file. The @self can contain the key “append” in order to append static strings to values.


OutputParser: JSON data

Currently the only type is LineGrepCount which will return the number of lines output from WorkerCounterCmd that contain the strings defined by LineGrepCount.

 

 

Additional Assistance

If you need additional assistance, please visit our support page to submit a help ticket or find phone numbers for regional support.

Your Rating: Results: 1 Star2 Star3 Star4 Star5 Star 1 rates

  • No labels