Skip to content

Zart is in active development — breaking API changes may occur despite our best efforts to keep contracts stable.

Features

Zart provides everything you need to build reliable, long-running workflows in Rust. Here is a detailed look at each feature.


Every step’s result is serialized and written to the database before the workflow advances. On any restart — planned or unplanned — completed steps are replayed from storage.

How it works:

  1. Worker calls ctx.execute_step(MyStep { ... }).
  2. Zart checks zart_steps for a row with (execution_id, step_name).
  3. Hit: deserializes and returns the stored value. Step logic is never called.
  4. Miss: runs step.run(), serializes the result, inserts the row, returns the value.

What this means for you:

  • A payment is never charged twice, even if your process crashes after the charge but before the receipt.
  • Database inserts, API calls, and emails become safe to “retry” — the retry only runs incomplete work.
  • No idempotency logic needed inside the step logic for most cases.
// This can crash anywhere — Zart recovers from the last completed step
ctx.execute_step(StepA).await?;
ctx.execute_step(StepB).await?; // <-- crash here
ctx.execute_step(StepC).await?; // resumes here

Configure retry behaviour at both the task level and the individual step level.

Task-level retries via fn max_retries():

fn max_retries(&self) -> usize { 3 }

Step-level retries via ZartStep::retry_config():

struct ChargeStep { card: String, amount: i64 }
#[async_trait]
impl ZartStep for ChargeStep {
type Output = ChargeResult;
fn step_name(&self) -> Cow<'static, str> { Cow::Borrowed("charge-card") }
fn retry_config(&self) -> Option<RetryConfig> {
Some(RetryConfig::exponential(3, Duration::from_secs(2)))
}
async fn run(&self, _ctx: StepContext) -> Result<Self::Output, StepError> {
stripe.charge(&self.card, self.amount).await
}
}
// Usage:
let result = ctx.execute_step(ChargeStep { card, amount }).await?;

RetryConfig variants:

ConfigBehaviour
RetryConfig::none()No retries — fail immediately
RetryConfig::fixed(n, delay)n retries, constant delay between each
RetryConfig::exponential(n, base)n retries, delay doubles each attempt

Jitter is applied automatically to exponential backoff to prevent thundering herd.


Fan out work to multiple concurrent sub-tasks using ctx.schedule_step() and ctx.wait_all().

Each scheduled step becomes an independent task in the scheduler — it can run on a different worker node. wait_all durably suspends the parent until all children complete.

let h1 = ctx.schedule_step(EmailStep);
let h2 = ctx.schedule_step(BillingStep);
let h3 = ctx.schedule_step(ProvisionStep);
// Parent suspends here — no thread is blocked
ctx.wait_all(vec![h1, h2, h3]).await?;

If the parent process restarts while waiting, the children continue running independently and the parent resumes when they’re done.

See Parallel Steps for full documentation.


Pause a workflow without occupying a thread or blocking a worker.

The execution is suspended and re-queued after the duration expires. Zero threads blocked. Each sleep requires a unique, stable name for replay.

ctx.sleep("approval-wait", Duration::from_secs(3600)).await?; // wait 1 hour

Sleep until an absolute point in time.

let next_monday = compute_next_monday();
ctx.sleep_until("wait-for-monday", next_monday).await?;

Suspend until an external signal arrives.

let payload: Approval = ctx.wait_for_event(
"hr-approval",
Some(Duration::from_secs(5 * 86400)),
).await?;

See Wait for Event for full documentation.


Zart prevents duplicate executions at the scheduling layer.

Execution IDs: every execution has a unique ID. Scheduling with an explicit ID that already exists is a no-op — the existing execution continues.

// Safe to call multiple times — only one execution is created
scheduler.schedule_with_id(
"checkout",
&format!("checkout-order-{order_id}"),
CheckoutData { order_id },
).await?;

Step-level idempotency: within an execution, step names act as idempotency keys. A step that has already completed will never re-run its closure.

External call idempotency: use ctx.execution_id() to derive stable keys for external APIs:

let idempotency_key = format!("stripe-charge-{}", ctx.execution_id());
stripe.charge_idempotently(&card, amount, &idempotency_key).await?;

Multiple worker instances can run against the same database with zero configuration. Zart uses SELECT … FOR UPDATE SKIP LOCKED to claim tasks — each execution is processed by exactly one worker at a time.

Worker A ─── polls ──→ claims exec-001, exec-003
Worker B ─── polls ──→ claims exec-002, exec-004
Worker C ─── polls ──→ claims exec-005 (skips locked rows)

Workers can be added or removed at any time. There is no leader election, no ZooKeeper, no coordination service.

let worker = Worker::new(scheduler, registry, WorkerConfig {
poll_interval: Duration::from_secs(5),
max_tasks_per_poll: 10,
max_concurrent_tasks: 16, // tokio tasks per worker
shutdown_timeout: Duration::from_secs(30),
});

All workflow state is persisted to PostgreSQL via the zart-postgres crate. No extra infrastructure — just a database you already have.

[dependencies]
zart-postgres = "0.1"

External systems can deliver signals to waiting workflows via offer_event.

Rust:

scheduler.offer_event(execution_id, "payment-received", payload).await?;

HTTP:

POST /api/v1/events/{execution_id}/{event_name}

Events can carry arbitrary JSON payloads. The waiting wait_for_event call deserializes the payload into the expected type automatically.

Events time out gracefully — if no event arrives within the specified window, wait_for_event returns Err(TaskError::EventTimeout), allowing the workflow to take a fallback path.


Every execution and every step attempt is tracked in the database.

Execution status:

  • pending — scheduled, not yet started
  • running — actively being processed
  • waiting — durably paused (sleep or event wait)
  • completed — finished successfully
  • failed — exhausted retries

Step records (zart_steps table): stores the step name, attempt count, result JSON, and per-attempt error messages for debugging.

Query execution status via the Rust API:

let exec = scheduler.get_execution("exec-abc-123").await?;
println!("Status: {:?}", exec.status);
println!("Steps: {}", exec.completed_steps);

Long-running steps — external API calls, data pipelines, video processing — can take minutes or even hours. Without heartbeats, Zart’s orphan recovery would detect these tasks as “stuck” and reset them back to scheduled, causing duplicate execution.

How it works:

When a worker picks up a task, it locks the row with locked_at. A background heartbeat task runs in parallel with the handler, periodically updating locked_at to NOW(). When the handler returns (success or failure), the heartbeat is cancelled.

Worker picks up task → locked_at = T
Heartbeat loop: every 100s → UPDATE locked_at = NOW()
↓ (handler runs for 20 minutes — external API call, data pipeline, etc.)
Heartbeat continues renewing...
Handler returns → heartbeat cancelled → result persisted

Configuration:

let worker = Worker::new(scheduler, registry, WorkerConfig {
orphan_timeout: Duration::from_secs(300), // 5 minutes
heartbeat_interval: None, // auto: orphan_timeout / 3
// ...
});
SettingDefaultEffect
heartbeat_interval: Noneorphan_timeout / 3Auto-computed — 2 retries before orphan reclaim
heartbeat_interval: Some(60s)Custom 60-second interval
heartbeat_interval: Some(0)Heartbeat disabled entirely

Failure modes:

ScenarioBehavior
Single heartbeat DB errorLogged, retried on next interval
Persistent DB failure (3+ misses)Orphan recovery reclaims task; handler result is discarded
Handler panicsCancellation token fires, heartbeat stops cleanly
Worker graceful shutdownCancellation cascades to heartbeat

Observability:

Heartbeat activity is tracked via Prometheus metrics:

  • zart_task_heartbeat_renewals_total — counter with success / failed / not_found status labels
  • zart_heartbeat_active — gauge showing currently running heartbeat loops

No schema changes required — the heartbeat only updates the existing locked_at column.