CronJob Scheduler
Distributed task scheduling with Temporal.io workflow orchestration
Background
Distributed job scheduler orchestrating recurring batch operations including VIP recalculation, report generation, cache warming, and data cleanup. Must guarantee exactly-once execution across multiple pods and support timezone-aware scheduling.
Architecture
Configurable cron definitions → Temporal.io workflow engine → activity workers (go-zero) → distributed lock (Redis) → batch operations → MySQL/Redis/Kafka side effects. Health monitoring watches workflow execution and alerts on failure.
Key Implementations
Temporal.io Workflow Orchestration
Each batch job is modeled as a Temporal workflow with retry policies, timeouts, and visibility into execution history.
Why: Temporal provides durable execution guarantees, automatic retries, and a built-in UI for monitoring long-running batch operations.
Distributed Lock-Protected Execution
Critical batch jobs acquire a Redis distributed lock before execution to prevent concurrent runs across multiple scheduler pods.
Why: Exactly-once execution is critical for financial operations like VIP recalculation and settlement to prevent duplicate processing.
Health Monitoring and Alerting
Monitors workflow execution states and triggers alerts when jobs fail, timeout, or skip their scheduled window.
Why: Silent batch job failures can compound into data inconsistencies; proactive alerting ensures issues are caught before they cascade.
Technical Decisions
| Technical Decisions | Chosen | Alternative | Reason |
|---|---|---|---|
| Job orchestration engine | Temporal.io | Custom cron with goroutines | Temporal provides durable execution, automatic retries, and workflow visibility out of the box, eliminating the need to build these guarantees manually. |
| Concurrency control | Redis distributed lock + Temporal single-worker queue | Database-level locking | Layering Redis locks with Temporal's task queue semantics provides defense-in-depth against duplicate execution across pods. |