Monitoring & observability¶

You can see what SIPhon is doing four ways: Prometheus metrics (built-in + your own), the admin API, Call Detail Records, and full SIP tracing to Homer. None of them block the call path.

Prometheus metrics¶

Enable the endpoint:

metrics:
  prometheus:
    listen: "0.0.0.0:9090"
    path: "/metrics"

SIPhon exports built-in gauges/counters; the ones worth alerting on:

Signal	Alert when	Why
`siphon_memory_allocated_bytes`	`rate(...[30m]) > 0` at flat call rate	A real memory leak
`siphon_pyexec_jobs_shed_total`	sustained `rate() > 0`	Handler pool saturated → SIP retransmits
`siphon_pyexec_pool_size` vs `_pool_max`	pinned equal + all busy for minutes	Pool fully grown and saturated
`siphon_proxy_dialog_sessions`	grows under flat completed-call load	Dialog state not draining
`siphon_rtpengine_instances_up`	drops below your engine count	An RTPEngine is unhealthy

See Handler execution model for the pool internals.

Your own metrics¶

The metrics namespace adds counters, gauges, and histograms that appear on the same /metrics endpoint:

from siphon import metrics

calls = metrics.counter("calls_total", "Calls processed", labels=["direction", "result"])
active = metrics.gauge("calls_active", "Active calls", labels=["direction"])
setup  = metrics.histogram("call_setup_seconds", "INVITE→200 latency",
                           buckets=[0.1, 0.25, 0.5, 1, 2.5, 5])

calls.labels(direction="outbound", result="ok").inc()
active.labels(direction="outbound").inc()      # ... .dec() when it ends
setup.observe(0.342)

Admin API — health, readiness, registrations¶

A separate HTTP port for probes and runtime inspection:

admin:
  listen: "0.0.0.0:9091"

Endpoint	Use
`GET /admin/health`	liveness — `200` while the process is alive (survives drain)
`GET /admin/ready`	readiness — `200`, or `503` while draining (SIGTERM)
`GET /admin/stats`	uptime + active registration count
`GET /admin/registrations[/{aor}]`	inspect bindings
`DELETE /admin/registrations/{aor}`	force-unregister

Point Kubernetes liveness at /admin/health and readiness at /admin/ready so a draining pod leaves rotation cleanly — see Deployment & operations.

Call Detail Records¶

cdr:
  enabled: true
  backend: http              # file | http | syslog
  http:
    url: "https://collector.example.com/v1/cdr"
    auth_header: "Bearer tok123"

CDRs are written asynchronously (a bounded channel, never blocks a call) with the call's timing, parties, transport, disconnect initiator, and response code. Add your own fields from a script:

from siphon import cdr
cdr.write(request, extra={"billing_id": "B-12345", "account": "ACC-789"})

Full SIP tracing → Homer¶

Stream every SIP message to a Homer / heplify-server collector over HEP — invaluable for debugging call flows:

tracing:
  hep:
    endpoint: "127.0.0.1:9060"
    version: 3
    transport: udp           # udp | tcp | tls
    agent_id: "siphon-sbc"   # per-role name so nodes appear separately in Homer

Putting it together¶

A solid baseline: scrape /metrics with Prometheus + alert on the table above; probe /admin/health + /admin/ready from your orchestrator; ship CDRs to your billing collector; and point HEP at Homer for call-flow forensics. For the production alert set and capacity guidance, see Deployment & operations.