Improving Job Reliability with Time-Based Retries in Landing
In the landing project, we've been focusing on improving the reliability of background jobs, especially when dealing with rate limiting and throttling exceptions. Previously, we used a fixed number of retry attempts, but this approach proved to be problematic.
The Problem
When jobs are released back to the queue due to rate limiting or throttling middleware, each release counted towards the total number of allowed attempts. In scenarios where a job was frequently rate-limited, it could exhaust all its attempts before even having a chance to complete successfully. For instance, a job configured with $tries = 15 could fail prematurely due to repeated throttle releases.
The Solution
To address this, we switched from a fixed number of attempts to a time-based retry mechanism using retryUntil().
Instead of specifying a fixed number of retries, we now define a time window during which the job can be retried. This ensures that rate-limited releases don't lead to premature job failure. Specifically, we've set a 2-hour time window.
Here's an example of how you might implement this using PHP:
use Carbon\Carbon;
class ExampleJob implements ShouldQueue
{
public function retryUntil(): Carbon
{
return now()->addHours(2);
}
public function handle()
{
// Job logic here
}
}
In this example, the retryUntil() method returns a Carbon instance representing the time until which the job should be retried. This provides a more robust and flexible approach to handling transient errors and rate limits.
Key Takeaway
When dealing with jobs that might be subject to rate limiting or throttling, consider using a time-based retry mechanism instead of a fixed number of attempts. This approach can significantly improve the reliability and success rate of your background jobs.