Batch Processing Example

This example demonstrates deploying batch processing workloads including scheduled jobs and continuous workers.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                     Batch Processing                             │
│                                                                 │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐         │
│  │  Scheduled │    │   Queue     │    │   Periodic │         │
│  │    Jobs    │    │  Workers    │    │    Jobs    │         │
│  └─────────────┘    └─────────────┘    └─────────────┘         │
│                                                                 │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │                    Message Queue                         │   │
│  │                   (RabbitMQ)                             │   │
│  └─────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────┘

Application Definition

apiVersion: core.oam.dev/v1alpha2
kind: Application
metadata:
  name: batch-processing
  labels:
    environment: production
spec:
  components:
    # Continuous Queue Worker
    - name: queue-worker
      type: batchworkload.nomad.oam.dev
      properties:
        driver: docker
        image: worker/queue-processor:latest
        args:
          - --queue
          - default
          - --workers
          - "4"
        resources:
          cpu: 2000m
          memory: 2048Mi
        restartPolicy:
          attempts: 3
          interval: 5m
          delay: 30s
          mode: fail
      traits:
        - type: scaler
          properties:
            replicas: 5
            min: 2
            max: 10
        - type: migration
          properties:
            maxParallel: 1
            healthCheck: task_states
        - type: servicediscovery.nomad.oam.dev
          properties:
            serviceName: queue-worker
            tags:
              - worker
              - queue
            port: metrics

    # Image Processing Worker
    - name: image-processor
      type: batchworkload.nomad.oam.dev
      properties:
        driver: docker
        image: worker/image-processor:v2.0
        args:
          - process
          - --input
          - s3://images/input
          - --output
          - s3://images/output
          - --resize
        resources:
          cpu: 4000m
          memory: 4096Mi
        restartPolicy:
          attempts: 2
          interval: 10m
          delay: 1m
          mode: fail
      traits:
        - type: scaler
          properties:
            replicas: 2
            min: 1
            max: 4

    # Daily Report Job (Cron)
    - name: daily-report
      type: cron-task
      properties:
        image: jobs/daily-report:v1.5
        schedule: "0 2 * * *"  # 2 AM daily
        args:
          - generate
          - --date
          - yesterday
          - --format
          - pdf
        resources:
          cpu: 1000m
          memory: 1024Mi

    # Hourly Analytics Job (Cron)
    - name: hourly-analytics
      type: cron-task
      properties:
        image: jobs/analytics:v2.0
        schedule: "0 * * * *"  # Every hour
        args:
          - analyze
          - --window
          - 1h
          - --output
          - s3://analytics/hourly
        resources:
          cpu: 2000m
          memory: 2048Mi

    # Weekly Backup Job (Cron)
    - name: weekly-backup
      type: cron-task
      properties:
        image: jobs/backup:v1.0
        schedule: "0 3 * * 0"  # Sunday 3 AM
        args:
          - backup
          - --all-databases
          - --compress
        resources:
          cpu: 2000m
          memory: 4096Mi
        volumes:
          - source: backup-config
            destination: /config
            readOnly: true
      traits:
        - type: volume
          properties:
            name: backup-config
            type: csi
            source: backup-config
            mountPath: /config

    # Data Import Job (One-shot)
    - name: data-import
      type: task
      properties:
        driver: docker
        image: jobs/data-import:v3.0
        args:
          - import
          - --source
          - s3://data/import
          - --target
          - postgres://prod-db:5432/analytics
          - --batch-size
          - "1000"
        resources:
          cpu: 4000m
          memory: 8192Mi
        restartPolicy:
          attempts: 1
          mode: fail

    # ML Training Job (Long-running batch)
    - name: ml-training
      type: batchworkload.nomad.oam.dev
      properties:
        driver: docker
        image: ml/trainer:v1.0
        args:
          - train
          - --model
          - recommendation
          - --epochs
          - "100"
          - --data
          - s3://ml-data/training
        resources:
          cpu: 8000m
          memory: 16384Mi
        restartPolicy:
          attempts: 0

  scopes:
    - scopeRef:
        kind: networkscope.nomad.oam.dev
        name: batch-network
      properties:
        networkMode: bridge

    - scopeRef:
        kind: nodepool.nomad.oam.dev
        name: batch-pool
      properties:
        poolName: batch-pool
        datacenter:
          - dc1
          - dc2
        nodeClass: batch

    - scopeRef:
        kind: namespace.nomad.oam.dev
        name: batch-ns
      properties:
        namespace: batch-processing
        quota: batch-quota

Job Types

Continuous Workers

Run continuously, processing items from a queue:

type: batchworkload.nomad.oam.dev
properties:
  replicas: 5  # Multiple workers

Cron Jobs

Run on a schedule:

type: cron-task
properties:
  schedule: "0 2 * * *"  # Cron expression

One-shot Jobs

Run once and exit:

type: task
properties:
  restartPolicy:
    attempts: 1

Scaling

Workers can scale based on queue depth:

traits:
  - type: scaler
    properties:
      replicas: 5
      min: 2
      max: 10
      scaleMetric: custom  # Custom metric from queue
      scaleTarget: 100    # Process when queue > 100

Resource Planning

Job Type	CPU	Memory	Use Case
Queue Worker	2000m	2Gi	Continuous
Image Processor	4000m	4Gi	Burst
ML Training	8000m	16Gi	GPU workloads
Data Import	4000m	8Gi	Large data