Configuration Reference
Comprehensive reference for all Aether configuration options.
For an introduction, see Configuration Guide.
Complete Configuration Schema
# Service endpoints
services:
dimp:
url: string # DIMP pseudonymization service URL
bundle_split_threshold_mb: integer # Bundle size threshold (1-100, default: 10)
csv_conversion:
url: string # CSV conversion service URL (future)
parquet_conversion:
url: string # Parquet conversion service URL (future)
torch:
base_url: string # TORCH FHIR server URL
username: string # TORCH username
password: string # TORCH password
extraction_timeout_minutes: integer # Timeout for extractions (default: 30)
polling_interval_seconds: integer # Initial poll interval (default: 5)
max_polling_interval_seconds: integer # Max poll interval (default: 30)
# Pipeline configuration
pipeline:
enabled_steps:
- string # List of steps: torch, import, dimp, validation, csv_conversion, parquet_conversion
# Retry strategy
retry:
max_attempts: integer # Max retry attempts (1-10, default: 5)
initial_backoff_ms: integer # Initial backoff in milliseconds (default: 1000)
max_backoff_ms: integer # Maximum backoff in milliseconds (default: 30000)
# Job configuration
jobs_dir: string # Directory for job state and data (default: ./jobs)Service Options
DIMP Configuration
Key: services.dimpType: Object Required: Yes (if DIMP step enabled) Default: None
Configuration for DIMP de-identification service.
Nested Options:
url(String): DIMP service endpointbundle_split_threshold_mb(Integer): Auto-split large bundles (1-100 MB, default: 10)
services:
dimp:
url: "http://localhost:32861/fhir"
bundle_split_threshold_mb: 10For production:
services:
dimp:
url: "https://dimp.prod.healthcare.org/api/fhir"
bundle_split_threshold_mb: 50CSV Conversion URL
Key: services.csv_conversion_urlType: String (URL) Required: No Default: None Status: Placeholder for future feature
Endpoint for CSV conversion service.
services:
csv_conversion_url: "http://localhost:9000/convert/csv"Parquet Conversion URL
Key: services.parquet_conversion_urlType: String (URL) Required: No Default: None Status: Placeholder for future feature
Endpoint for Parquet conversion service.
services:
parquet_conversion_url: "http://localhost:9000/convert/parquet"TORCH Configuration
Key: services.torchType: Object Required: Yes (if TORCH step enabled) Default: None
TORCH FHIR server connection details.
Nested Options:
base_url(String): TORCH server URLusername(String): TORCH usernamepassword(String): TORCH password
services:
torch:
base_url: "https://torch.hospital.org"
username: "researcher-name"
password: "secure-password"Security: Use environment variables for sensitive credentials:
export TORCH_USERNAME="researcher"
export TORCH_PASSWORD="secret"Then in config:
services:
torch:
base_url: "https://torch.hospital.org"
username: "${TORCH_USERNAME}"
password: "${TORCH_PASSWORD}"Pipeline Options
Enabled Steps
Key: pipeline.enabled_stepsType: Array of strings Required: Yes Default: None
List of pipeline steps to execute in order.
pipeline:
enabled_steps:
- torch # Optional: Extract from TORCH
- import # Required: Import FHIR data
- dimp # Optional: PseudonymizationAvailable Steps (must be in order):
torch- Extract from TORCH serverimport- Parse and validate FHIR datadimp- Pseudonymization via DIMPvalidation- Data quality validation (placeholder)csv_conversion- Convert to CSV (placeholder)parquet_conversion- Convert to Parquet (placeholder)
Valid Sequences:
# Option A: Local files + DIMP
- import
- dimp
# Option B: TORCH + DIMP
- torch
- import
- dimp
# Option C: Full pipeline
- torch
- import
- dimp
- validation
- csv_conversionRetry Options
Max Attempts
Key: retry.max_attemptsType: Integer Range: 1-10 Default: 5
Maximum number of automatic retry attempts for transient errors.
retry:
max_attempts: 3 # Fewer retries for fast-failHigher values = more resilience but longer wait times.
Initial Backoff
Key: retry.initial_backoff_msType: Integer (milliseconds) Range: 100-5000 Default: 1000 (1 second)
Initial wait time before first retry.
retry:
initial_backoff_ms: 500 # Start with 500msMax Backoff
Key: retry.max_backoff_msType: Integer (milliseconds) Range: 1000-60000 Default: 30000 (30 seconds)
Maximum wait time between retries.
retry:
max_backoff_ms: 10000 # Cap at 10 secondsExponential Backoff Formula:
wait_time = min(initial * (2 ^ attempt), max_backoff)Example with defaults:
- Attempt 1: 1s
- Attempt 2: 2s
- Attempt 3: 4s
- Attempt 4: 8s
- Attempt 5: 16s
- Attempt 6+: 30s (capped)
Job Options
Jobs Directory
Key: jobs.jobs_dirType: String (directory path) Default: ./jobs
Directory for storing job state and data.
jobs:
jobs_dir: "./jobs"For network storage:
jobs:
jobs_dir: "/mnt/shared/aether/jobs"Requirements:
- Must be writable by Aether process
- Sufficient disk space for processed data
- Should be backed up regularly
Complete Example Configurations
Development Setup
# aether.yaml - local development
services:
dimp_url: "http://localhost:8083/fhir"
pipeline:
enabled_steps:
- import
- dimp
retry:
max_attempts: 3
initial_backoff_ms: 500
max_backoff_ms: 5000
jobs:
jobs_dir: "./jobs"Production TORCH + DIMP
# aether.yaml - production
services:
torch:
base_url: "https://torch.prod.healthcare.org"
username: "${TORCH_USERNAME}"
password: "${TORCH_PASSWORD}"
dimp_url: "https://dimp.prod.healthcare.org/api/fhir"
pipeline:
enabled_steps:
- torch
- import
- dimp
retry:
max_attempts: 5
initial_backoff_ms: 1000
max_backoff_ms: 30000
jobs:
jobs_dir: "/data/aether/jobs"Local Files Only
# aether.yaml - local processing
pipeline:
enabled_steps:
- import
jobs:
jobs_dir: "./output"High-Volume Processing
# aether.yaml - optimized for large datasets
services:
dimp_url: "http://dimp-cluster:8083/fhir"
pipeline:
enabled_steps:
- import
- dimp
retry:
max_attempts: 3
initial_backoff_ms: 2000
max_backoff_ms: 10000
jobs:
jobs_dir: "/mnt/fast-storage/jobs"Environment Variable References
Configuration supports environment variable substitution:
services:
torch:
base_url: "${TORCH_BASE_URL}"
username: "${TORCH_USERNAME}"
password: "${TORCH_PASSWORD}"
dimp_url: "${DIMP_URL}"
jobs:
jobs_dir: "${AETHER_DATA_DIR}/jobs"Set environment variables:
export TORCH_BASE_URL="https://torch.hospital.org"
export TORCH_USERNAME="researcher"
export TORCH_PASSWORD="secret"
export DIMP_URL="http://localhost:8083/fhir"
export AETHER_DATA_DIR="/data/aether"Configuration Validation
Aether validates configuration on startup:
# Validate configuration without running
aether validate-config aether.yamlCommon Validation Errors:
- Missing required services for enabled steps
- Invalid directory paths
- Invalid retry values
- Malformed YAML
Troubleshooting
Config file not found
Error: configuration file not found: aether.yamlSolution: Ensure aether.yaml exists in the working directory or specify path:
aether pipeline start --config /etc/aether/config.yaml query.crtdlService unreachable
Error: DIMP service unreachable: connection refusedSolution: Verify service URL and connectivity:
curl http://localhost:8083/fhir/Validation failed
Error: configuration validation failed: DIMP URL required for enabled step 'dimp'Solution: Add required service configuration:
services:
dimp_url: "http://localhost:8083/fhir"Next Steps
- Configuration Guide - Configuration introduction
- CLI Commands - Command reference
- Getting Started - Quick start guide