"AWSBatch" (Batch Computation Provider)

Details

AWS Batch is a container-based batch computation service that schedules jobs across managed or unmanaged pools of Amazon EC2 compute instances.
To configure the "AWSBatch" batch computation provider for use in the Wolfram Language, follow the instructions in the Set Up the AWS Batch Computation Provider workflow.
AWS Batch packs jobs into instances based on each job's vCPU count requirement. Multiple jobs can execute concurrently on a single EC2 instance, with each job running in a dedicated Docker container.
The "AWSBatch" batch computation provider stores input and output data for jobs in a user-specified Amazon S3 bucket.
The "AWSBatch" batch computation provider supports Linux-based jobs only.

Environment Properties

The following properties are supported in a RemoteBatchSubmissionEnvironment object for the "AWSBatch" provider:
  • "IOBucket"(required)name of the Amazon S3 bucket in which to store job input and output data
    "JobDefinition"(required)ARN for the AWS Batch job definition to use for submitted jobs
    "JobQueue"(required)ARN for the AWS Batch job queue to which jobs are submitted
    "ServiceObject"AutomaticServiceObject for the "AWS" service
  • The "IOBucket", "JobDefinition" and "JobQueue" properties are required to construct a valid RemoteBatchSubmissionEnvironment object.
    If the "ServiceObject" property is omitted (or is set to Automatic), a service object will be automatically constructed and connected with ServiceConnect["AWS"].
    If the "ServiceObject" property is set to "New", a new service connection will be created with ServiceConnect["AWS","New"], disregarding any saved connections or credentials.
    The Set Up the AWS Batch Computation Provider workflow provides instructions for creating all of the required environment resources in your AWS account using an automated AWS CloudFormation template.

    Job Settings

    The following settings are supported by the RemoteProviderSettings option when using the "AWSBatch" provider:
  • "GPUCount"Inheritedinteger number of GPUs »
    "Memory"Automatic"InformationUnit" quantity or integer number of mebibytes »
    "VCPUCount"Inheritedinteger number of vCPUs »
  • The value Inherited for the "GPUCount", "Memory" and "VCPUCount" settings corresponds to the value in the AWS Batch job definition specified by the supplied RemoteBatchSubmissionEnvironment object.
    The "GPUCount" setting instructs AWS Batch to schedule the job to an instance that has, at minimum, the specified number of GPUs available and to make that number of GPUs available within the job container.
    The "Memory" setting instructs AWS Batch to reserve the specified amount of memory for the job. If the job exceeds this limit, it will be terminated. »
    With the default setting "Memory"Automatic, the minimum memory limit is set to the larger of the value in the job definition and the expression Max[vCPUCount*,GPUCount*,]. »
    The value of the "VCPUCount" setting is used as a weight for the Linux kernel scheduler when allocating CPU time on the host instance to processes within the job container. It does not necessarily result in any fixed number of vCPUs being dedicated to the job container.
    Within a "Single"-type job submitted with RemoteBatchSubmit, functions such as ParallelEvaluate and ParallelMap will automatically launch a number of subkernels equivalent to the value of the "VCPUCount" setting. This behavior can be overridden by calling LaunchKernels with an explicit number of kernels.
    When using the "AWSBatch" provider, the value of the TimeConstraint option of RemoteBatchSubmit and RemoteBatchMapSubmit must be at least 60 seconds. The TimeConstraint option defaults to the "Execution timeout" value in the job definition being used.

    Job Statuses

    The following are possible values of the "JobStatus" job property when using the "AWSBatch" provider, listed in the order through which a typical job will pass:
  • "Submitted"the submitted job has not yet been evaluated by the AWS Batch scheduler
    "Pending"the job is waiting for dependencies to be satisfied
    "Runnable"the job is waiting for compute resources to be available
    "Starting"the job has been scheduled to an instance and its container image is being downloaded
    "Running"the job's container has started
    "Succeeded"the job's execution has succeeded and its output has been uploaded
    "Failed"the job's execution has failed
  • A job in the "Running" state may be in the process of downloading input files, evaluating job code or uploading output data.
    An array job will remain in the "Pending" state until either all of its constituent child jobs have succeeded (at which point it will transition to the "Succeeded" state) or at least one child job has failed (at which point it will transition to the "Failed" state).
    More information about job states is available in the AWS Batch documentation.

    Job Properties

    When using the "AWSBatch" provider, the following properties are available from "Single"-type job objects, in addition to the standard properties supported by RemoteBatchJobObject:
  • "JobExitCode"exit code returned by the kernel within the job container
    "JobLog"console logs from the job container
    "JobStatusReason"string describing the reason for which the job is in its current state
    "ProviderJobID"AWS-provided unique identifier for the job
  • When using the "AWSBatch" provider, the following properties are available from "Array"-type job objects, in addition to the standard properties supported by RemoteBatchJobObject:
  • "ChildJobExitCodes""JobExitCode" property of each array child job
    "ChildJobStatusReasons""JobStatusReason" property of each array child job
    "JobStatusReason"string describing the reason for which the array job is in its current state
    "ProviderJobID"AWS-provided unique identifier for the array job
  • When using the "AWSBatch" provider, the following properties are available from "ArrayChild"-type job objects in addition to the standard properties supported by RemoteBatchJobObject:
  • "JobExitCode"exit code returned by the kernel within the job container
    "JobLog"console logs from the child job container
    "JobStatusReason"string describing the reason for which the child job is in its current state
    "ProviderJobID"AWS-provided unique identifier for the child job
  • The meanings of some possible values of the "JobExitCode" property are listed on the reference page for Exit.
    The value of the "JobStatusReason" property is supplied by AWS Batch and may be absent from a given job, depending on its current state.
    The value of the "JobLog" property is retrieved from Amazon CloudWatch Logs and will be absent until the job has reached the "Running" state.
    The value of the "ProviderJobID" property identifies a job to AWS Batch, in contrast to the "JobUUID" property, which identifies a job within the Wolfram System.
    AWS Batch expires and deletes a job's metadata 24 hours after the job reaches a completed state ("Succeeded" or "Failed"). »
    RemoteBatchJobObject expressions representing jobs that have expired cannot be queried for status and will not be listed by RemoteBatchJobs. Output data from expired jobs will remain in Amazon S3 and accessible from the RemoteBatchJobObject until manually deleted or expired by the bucket's lifecycle policy.

    Examples

    open allclose all

    Basic Examples  (2)

    Create an "AWSBatch" RemoteBatchSubmissionEnvironment object, after configuring the "AWSBatch" batch computation provider as described in the Set Up the AWS Batch Computation Provider workflow:

    Submit a job using the created environment:

    Query the jobs status:

    Query the jobs status again, after it has completed:

    Download the jobs output:

    Create an environment object using an explicitly specified "AWS" service object:

    Job Settings  (4)

    "GPUCount"  (1)

    Submit an array job to AWS Batch that uses GPU computation to perform inference with a pretrained neural net:

    Download the array job output:

    "Memory"  (2)

    The memory limit for a job can be adjusted with the "Memory" job setting:

    If not otherwise specified, the default memory limit is based on the configured vCPU and GPU counts:

    "VCPUCount"  (1)

    Instruct AWS Batch to allocate four vCPUs to a submitted job:

    Job Properties  (2)

    "Single" Jobs  (1)

    Submit a batch job to AWS Batch using RemoteBatchSubmit:

    Query the jobs status along with the reason why the job transitioned to that state:

    Obtain the console output from the job container:

    "Array" and "ArrayChild" Jobs  (1)

    Submit an array job to AWS Batch using RemoteBatchMapSubmit:

    Query the status of each child job along with the reason why each transitioned to its current state:

    Obtain a RemoteBatchJobObject expression representing the first child job:

    Query the child jobs status along with the reason why the child job transitioned to that state:

    Obtain the console output from the child job container:

    Properties & Relations  (1)

    If a job was terminated with RemoteBatchJobAbort, the value of the "JobStatusReason" property will indicate that the termination request originated from the Wolfram Language:

    Possible Issues  (2)

    AWS Batch terminates jobs that exceed their memory limit:

    The memory limit for a job can be adjusted with the "Memory" job setting:

    AWS Batch deletes a jobs metadata 24 hours after the job reaches a completed state ("Succeeded" or "Failed"). This will cause metadata queries to return a missing value:

    The jobs output data will still be accessible until it is either manually deleted or automatically expired by the lifecycle policy on the submission environment's I/O bucket, if such a policy is configured: