Handler execution model & blocking contract¶
This documents how SIPhon runs Python script handlers, what may block, and the
elasticity / backpressure / liveness guarantees. It is the user-facing companion
to the source doc-comments in
src/script/py_executor.rs and
src/script/async_pool.rs.
The two handler pools¶
Every inbound SIP message that reaches a script handler runs on one of two pools of OS threads, each with a persistently-attached free-threaded Python interpreter (the persistent attach avoids per-handler mimalloc heap churn / a heap leak):
| Pool | Runs | Size config | Default |
|---|---|---|---|
Sync executor (PyExecutor) |
sync @proxy.on_request, @proxy.on_reply, @registrar.on_change, @rtpengine.on_dtmf, timers, … |
script.sync_pool_size / script.sync_pool_max |
core max(8, 2×CPUs), max max(32, 4×core) |
Async driver pool (AsyncPool) |
async def handlers + their asyncio.create_task work |
script.async_pool_size |
CPUs |
The sync pool is elastic¶
The sync pool starts at sync_pool_size (the core, always-on workers) and a
background grower adds workers on demand — up to sync_pool_max — whenever the
job queue has more work than the idle workers can take. It never shrinks:
workers are never reaped, which is exactly what keeps the persistent
free-threaded-CPython attach from leaking (reaping a persistently-attached
thread orphans ~2 MB of heap). Growth-on-demand restores the burst headroom that
blocking handlers need; never-reaping keeps the no-leak property.
Why elastic: an earlier change moved inbound dispatch off tokio's elastic
spawn_blockingpool (which grew threads on demand) onto a fixed pool to stop the heap leak — but that removed the burst valve. A blocking handler pins a worker for the whole call, so on a small box a couple of concurrent blocking REGISTERs exhausted the fixed pool and wedged the engine with no recovery. The elastic pool is the proper fix: it grows like the oldspawn_blockingpool but never reaps, so it neither wedges nor leaks. The regression is locked down bypool_grows_under_blocking_loadinpy_executor.rs.
The queue feeding the pool is bounded (script.executor_queue_capacity,
default 1024): once the pool is at its thread cap and the queue is full, new
jobs are shed.
The blocking contract — what script authors must know¶
A handler may call Rust APIs that block the worker thread on I/O:
auth.require_digest with the HTTP/Diameter backend,
proxy.send_request(wait_for_response=True), cache.fetch, diameter.*,
RTPEngine control, DNS/TLS connect during relay(), etc. While a handler
blocks, it occupies one pool worker.
The pool grows to absorb concurrent blocking handlers up to sync_pool_max, so
short blocking bursts are fine. But sustained blocking beyond the cap still
queues, and the maximum sustainable rate of a blocking handler is roughly:
Design accordingly:
- Cache hot lookups. For HTTP digest auth, set
auth.http.cache_ttl_secsso a registration storm for the same subscribers reuses a cached HA1 instead of making a blocking fetch per REGISTER — the pool then rarely needs to grow. - Fire-and-forget slow side-effects. Do contact-change notifications, CDR
posts, webhooks, etc. with
asyncio.create_task(...)from anasynchandler — don't block the SIP path on them. Ahttpx.Clientis not safe to share across threads. - Size for your backends and your memory. Raise
sync_pool_maxfor many slow blocking backends; lower it on memory-constrained NFs (peak memory ≈sync_pool_max × ~2 MB).
Blocking calls must release the interpreter (free-threaded GC safety)¶
On free-threaded CPython (3.14t) the cyclic GC performs a stop-the-world:
it pauses every thread that is attached to the interpreter at a safe point. A
thread that performs a blocking call (an HTTP/Diameter auth fetch, a DNS lookup,
…) while still attached can never reach that safe point, so for the duration
of that block every other handler that allocates cyclic garbage — which Python
does constantly — stalls behind the GC. This is verified: a thread blocking
while attached hangs a concurrent gc.collect() for the whole block; detached,
it returns immediately. Under blocking-heavy load (auth/Diameter storms) the
result is periodic engine-wide latency spikes, each lasting as long as the
blocking call — and it bites even at low concurrency (one blocked-while-attached
handler plus one GC trigger).
The worst case has been seen on cpus ≈ 1 nodes (a single-worker runtime,
where available_parallelism() reports 1), where the engine wedged outright
rather than just stalling. The exact escalation from a transient stall to a
permanent wedge on such nodes is not reproduced here, so treat it as an observed
correlation; the deadlock-aware watchdog (below) is the recovery backstop for
it. Either way the fix is the same.
The fix, and the rule for any blocking Rust-side work, is to release the
interpreter for the blocking window — in pyo3 terms,
Python::attach(|py| py.detach(|| block_in_place(block_on(future)))). siphon's
built-in blocking APIs (e.g. the HTTP digest-auth backend) do this internally;
the deadlock-aware watchdog above is the backstop for any path that doesn't.
This GC hazard is specific to Rust-side blocking calls, which hold the attach
for their whole duration. A blocking call made from Python in a handler — a
synchronous httpx/urllib/requests request in @registrar.on_change, say —
does not stall the GC: CPython releases the interpreter around the blocking
socket syscall, so the handler is detached for the wait. It still pins a pool
worker for the duration, though, so prefer asyncio.create_task(...) for slow
side-effects (throughput, not safety). Both paths are exercised by
run-tests.sh --http-auth: one scenario drives a REGISTER storm through the
blocking HA1-fetch (Rust) path, another through a blocking on_change notify
(Python) path, each on a single-worker (cpus: 0.5) siphon; the engine must
complete the registrations.
Backpressure & liveness guarantees¶
Beyond elasticity, the pool is defended on two more fronts so a misbehaving handler degrades gracefully instead of taking the node down silently:
- Bounded queue + load-shed. When the pool is at its cap and the queue is
full, new jobs are dropped (the SIP client retransmits) rather than growing
memory without bound. Counted by
siphon_pyexec_jobs_shed_total. - Liveness watchdog / fail-fast. A dedicated thread (immune to any lock a
wedged handler holds) aborts the process when there is work pending (a
handler in-flight or jobs queued) yet zero completions for
script.handler_stall_abort_secs(default 30 s;0disables). The trigger is independent of pool fill, so it catches a low-concurrency deadlock (a handful of handlers stuck on a lock/await while the pool sits far below its cap) as well as full saturation — an earlier "at the thread cap + fully busy" condition could never see the former, since the pool never grew to the cap. A healthy pool advances completions every tick; a genuinely idle pool has no pending work — neither trips. Aborting is deliberate: a hung-but-alive SIP engine never recovers on its own, so arestart: always/ systemd policy never fires — the abort turns an indefinite outage into a seconds-long restart and leaves a core for post-mortem.
Metrics (/metrics)¶
| Metric | Meaning |
|---|---|
siphon_pyexec_pool_size |
live worker threads (grows core→max under load) |
siphon_pyexec_pool_max |
configured thread ceiling |
siphon_pyexec_inflight |
handlers currently executing |
siphon_pyexec_queue_depth |
handler jobs waiting in the queue |
siphon_pyexec_jobs_completed_total |
completed handler jobs |
siphon_pyexec_jobs_shed_total |
jobs dropped because the queue was full |
siphon_auth_ha1_cache_hits_total |
HTTP-auth lookups served from cache |
Alert on: a sustained rate(siphon_pyexec_jobs_shed_total) > 0, or
siphon_pyexec_pool_size == siphon_pyexec_pool_max with
siphon_pyexec_inflight == siphon_pyexec_pool_size held for minutes — both mean
the pool is fully grown and saturated, approaching the watchdog's abort condition.