Optimizing Horizon Worker Limits to Prevent Out-of-Memory Errors
Introduction
The Reimpact/platform project experienced production crashes due to out-of-memory (OOM) errors. These errors stemmed from the Horizon queue worker consuming excessive memory, particularly on smaller instance sizes.
The Challenge
The default configuration for Supervisor-2, responsible for managing Horizon workers, was allocating substantial memory per worker process. Specifically, it was configured to launch 10 processes, each potentially consuming up to 512MB of RAM. On a server with limited resources, like a 4GB instance, this led to a potential 5GB memory footprint, exceeding the available resources and triggering OOM kills.
The Solution
To mitigate these issues, the worker limits were reduced to prevent excessive memory consumption. Furthermore, environment variables were introduced to allow for dynamic tuning of these limits without requiring a full redeployment. This provides flexibility in adapting the worker configuration to different server sizes and workload demands.
// Example of how to set worker limits via environment variables
'supervisor-2' => [
'maxProcesses' => env('HORIZON_SUPERVISOR_MAX_PROCESSES', 5),
'memoryLimit' => env('HORIZON_SUPERVISOR_MEMORY_LIMIT', 256),
],
Key Decisions
- Reduced Default Worker Limits: The number of worker processes and their memory limits were lowered to prevent OOM errors on smaller instances.
- Environment Variable Overrides: Introduced environment variables to allow for dynamic adjustments to worker limits without redeploying the application.
Results
- Eliminated production crashes caused by OOM errors related to Horizon workers.
- Improved system stability and reliability, particularly on smaller instance sizes.
- Enabled easier tuning of worker configurations based on server resources and workload demands.
Lessons Learned
When configuring background workers, it's crucial to consider the resource constraints of the underlying infrastructure. Hardcoding resource limits can lead to instability when deployed to environments with varying resources. Employing environment variables for configuration allows for greater flexibility and adaptability.