Configuration Reference

Comprehensive reference for all Aether configuration options.

For an introduction, see Configuration Guide.

Complete Configuration Schema

yaml

# Service endpoints
services:
  dimp:
    url: string                 # DIMP pseudonymization service URL
    bundle_split_threshold_mb: integer # Bundle size threshold (1-100, default: 10)
  csv_conversion:
    url: string                 # CSV conversion service URL (future)
  parquet_conversion:
    url: string                 # Parquet conversion service URL (future)
  torch:
    base_url: string            # TORCH FHIR server URL
    username: string            # TORCH username
    password: string            # TORCH password
    extraction_timeout_minutes: integer # Timeout for extractions (default: 30)
    polling_interval_seconds: integer # Initial poll interval (default: 5)
    max_polling_interval_seconds: integer # Max poll interval (default: 30)

# Pipeline configuration
pipeline:
  enabled_steps:
    - string                    # List of steps: torch, import, dimp, validation, csv_conversion, parquet_conversion

# Retry strategy
retry:
  max_attempts: integer         # Max retry attempts (1-10, default: 5)
  initial_backoff_ms: integer   # Initial backoff in milliseconds (default: 1000)
  max_backoff_ms: integer       # Maximum backoff in milliseconds (default: 30000)

# Job configuration
jobs_dir: string                # Directory for job state and data (default: ./jobs)

Service Options

DIMP Configuration

Key: services.dimpType: Object Required: Yes (if DIMP step enabled) Default: None

Configuration for DIMP de-identification service.

Nested Options:

url (String): DIMP service endpoint
bundle_split_threshold_mb (Integer): Auto-split large bundles (1-100 MB, default: 10)

yaml

services:
  dimp:
    url: "http://localhost:32861/fhir"
    bundle_split_threshold_mb: 10

For production:

yaml

services:
  dimp:
    url: "https://dimp.prod.healthcare.org/api/fhir"
    bundle_split_threshold_mb: 50

CSV Conversion URL

Key: services.csv_conversion_urlType: String (URL) Required: No Default: None Status: Placeholder for future feature

Endpoint for CSV conversion service.

yaml

services:
  csv_conversion_url: "http://localhost:9000/convert/csv"

Parquet Conversion URL

Key: services.parquet_conversion_urlType: String (URL) Required: No Default: None Status: Placeholder for future feature

Endpoint for Parquet conversion service.

yaml

services:
  parquet_conversion_url: "http://localhost:9000/convert/parquet"

TORCH Configuration

Key: services.torchType: Object Required: Yes (if TORCH step enabled) Default: None

TORCH FHIR server connection details.

Nested Options:

base_url (String): TORCH server URL
username (String): TORCH username
password (String): TORCH password

yaml

services:
  torch:
    base_url: "https://torch.hospital.org"
    username: "researcher-name"
    password: "secure-password"

Security: Use environment variables for sensitive credentials:

bash

export TORCH_USERNAME="researcher"
export TORCH_PASSWORD="secret"

Then in config:

yaml

services:
  torch:
    base_url: "https://torch.hospital.org"
    username: "${TORCH_USERNAME}"
    password: "${TORCH_PASSWORD}"

Pipeline Options

Enabled Steps

Key: pipeline.enabled_stepsType: Array of strings Required: Yes Default: None

List of pipeline steps to execute in order.

yaml

pipeline:
  enabled_steps:
    - torch      # Optional: Extract from TORCH
    - import     # Required: Import FHIR data
    - dimp       # Optional: Pseudonymization

Available Steps (must be in order):

torch - Extract from TORCH server
import - Parse and validate FHIR data
dimp - Pseudonymization via DIMP
validation - Data quality validation (placeholder)
csv_conversion - Convert to CSV (placeholder)
parquet_conversion - Convert to Parquet (placeholder)

Valid Sequences:

yaml

# Option A: Local files + DIMP
- import
- dimp

# Option B: TORCH + DIMP
- torch
- import
- dimp

# Option C: Full pipeline
- torch
- import
- dimp
- validation
- csv_conversion

Retry Options

Max Attempts

Key: retry.max_attemptsType: Integer Range: 1-10 Default: 5

Maximum number of automatic retry attempts for transient errors.

yaml

retry:
  max_attempts: 3  # Fewer retries for fast-fail

Higher values = more resilience but longer wait times.

Initial Backoff

Key: retry.initial_backoff_msType: Integer (milliseconds) Range: 100-5000 Default: 1000 (1 second)

Initial wait time before first retry.

yaml

retry:
  initial_backoff_ms: 500  # Start with 500ms

Max Backoff

Key: retry.max_backoff_msType: Integer (milliseconds) Range: 1000-60000 Default: 30000 (30 seconds)

Maximum wait time between retries.

yaml

retry:
  max_backoff_ms: 10000  # Cap at 10 seconds

Exponential Backoff Formula:

wait_time = min(initial * (2 ^ attempt), max_backoff)

Example with defaults:

Attempt 1: 1s
Attempt 2: 2s
Attempt 3: 4s
Attempt 4: 8s
Attempt 5: 16s
Attempt 6+: 30s (capped)

Job Options

Jobs Directory

Key: jobs.jobs_dirType: String (directory path) Default: ./jobs

Directory for storing job state and data.

yaml

jobs:
  jobs_dir: "./jobs"

For network storage:

yaml

jobs:
  jobs_dir: "/mnt/shared/aether/jobs"

Requirements:

Must be writable by Aether process
Sufficient disk space for processed data
Should be backed up regularly

Complete Example Configurations

Development Setup

yaml

# aether.yaml - local development
services:
  dimp_url: "http://localhost:8083/fhir"

pipeline:
  enabled_steps:
    - import
    - dimp

retry:
  max_attempts: 3
  initial_backoff_ms: 500
  max_backoff_ms: 5000

jobs:
  jobs_dir: "./jobs"

Production TORCH + DIMP

yaml

# aether.yaml - production
services:
  torch:
    base_url: "https://torch.prod.healthcare.org"
    username: "${TORCH_USERNAME}"
    password: "${TORCH_PASSWORD}"
  dimp_url: "https://dimp.prod.healthcare.org/api/fhir"

pipeline:
  enabled_steps:
    - torch
    - import
    - dimp

retry:
  max_attempts: 5
  initial_backoff_ms: 1000
  max_backoff_ms: 30000

jobs:
  jobs_dir: "/data/aether/jobs"

Local Files Only

yaml

# aether.yaml - local processing
pipeline:
  enabled_steps:
    - import

jobs:
  jobs_dir: "./output"

High-Volume Processing

yaml

# aether.yaml - optimized for large datasets
services:
  dimp_url: "http://dimp-cluster:8083/fhir"

pipeline:
  enabled_steps:
    - import
    - dimp

retry:
  max_attempts: 3
  initial_backoff_ms: 2000
  max_backoff_ms: 10000

jobs:
  jobs_dir: "/mnt/fast-storage/jobs"

Environment Variable References

Configuration supports environment variable substitution:

yaml

services:
  torch:
    base_url: "${TORCH_BASE_URL}"
    username: "${TORCH_USERNAME}"
    password: "${TORCH_PASSWORD}"
  dimp_url: "${DIMP_URL}"

jobs:
  jobs_dir: "${AETHER_DATA_DIR}/jobs"

Set environment variables:

bash

export TORCH_BASE_URL="https://torch.hospital.org"
export TORCH_USERNAME="researcher"
export TORCH_PASSWORD="secret"
export DIMP_URL="http://localhost:8083/fhir"
export AETHER_DATA_DIR="/data/aether"

Configuration Validation

Aether validates configuration on startup:

bash

# Validate configuration without running
aether validate-config aether.yaml

Common Validation Errors:

Missing required services for enabled steps
Invalid directory paths
Invalid retry values
Malformed YAML

Troubleshooting

Config file not found

Error: configuration file not found: aether.yaml

Solution: Ensure aether.yaml exists in the working directory or specify path:

bash

aether pipeline start --config /etc/aether/config.yaml query.crtdl

Service unreachable

Error: DIMP service unreachable: connection refused

Solution: Verify service URL and connectivity:

bash

curl http://localhost:8083/fhir/

Validation failed

Error: configuration validation failed: DIMP URL required for enabled step 'dimp'

Solution: Add required service configuration:

yaml

services:
  dimp_url: "http://localhost:8083/fhir"

Next Steps

Configuration Guide - Configuration introduction
CLI Commands - Command reference
Getting Started - Quick start guide

Configuration Reference ​

Complete Configuration Schema ​

Service Options ​

DIMP Configuration ​

CSV Conversion URL ​

Parquet Conversion URL ​

TORCH Configuration ​

Pipeline Options ​

Enabled Steps ​

Retry Options ​

Max Attempts ​

Initial Backoff ​

Max Backoff ​

Job Options ​

Jobs Directory ​

Complete Example Configurations ​

Development Setup ​

Production TORCH + DIMP ​

Local Files Only ​

High-Volume Processing ​

Environment Variable References ​

Configuration Validation ​

Troubleshooting ​

Config file not found ​

Service unreachable ​

Validation failed ​

Next Steps ​

Configuration Reference

Complete Configuration Schema

Service Options

DIMP Configuration

CSV Conversion URL

Parquet Conversion URL

TORCH Configuration

Pipeline Options

Enabled Steps

Retry Options

Max Attempts

Initial Backoff

Max Backoff

Job Options

Jobs Directory

Complete Example Configurations

Development Setup

Production TORCH + DIMP

Local Files Only

High-Volume Processing

Environment Variable References

Configuration Validation

Troubleshooting

Config file not found

Service unreachable

Validation failed

Next Steps