Loading…
WeTest.Athens 2026 has ended
Tuesday May 12, 2026 11:30 - 12:15 EEST
Modern software systems rely on large graphs of interdependent microservices maintained by multiple engineering teams. Nightly integration tests frequently expose failures in these systems, yet determining which team should investigate first remains a largely manual and error-prone process. Misrouted incidents lead to cross-team ping-pong, increased MTTR, operational friction, and wasted engineering time.


This paper presents MANTHOS (Microservice Alert Navigation & Triage Handoff Optmization Service), a probabilistic, cost-aware incident routing engine designed to automatically assign integration test failures to the most appropriate team. MANTHOS combines (i) a service dependency graph, (ii) diffusion-based prior estimation for likely upstream faults, and (iii) multiple operational evidence signals—including stack traces, recent code changes, service flakiness, and optional deployment or performance anomalies. It aggregates service-level probabilities into team-level responsibility and selects a recommended assignee using an expected time-to-recovery (TTR) optimization model that incorporates team responsiveness, current load, ownership alignment, and a synergy matrix capturing historical cross-team coordination patterns.


We evaluate MANTHOS on historical integration failures and demonstrate significant reductions in misassignment rates, unnecessary handoffs, and median TTR. The system integrates easily into CI pipelines, Slack/Jira workflows, and SRE tooling, providing transparent, explainable recommendations. Our results show that probabilistic inference combined with lightweight operational signals can materially improve dependability practices in complex service organizations. MANTHOS offers a structured, data-driven alternative to manual triage and serves as a practical foundation for next-generation AIOps tooling.
Speakers
avatar for Thanos Tsiamis

Thanos Tsiamis

Software Engineer in Test, Agile Actors
Thanos Tsiamis is a software engineer with a strong interest in scalable systems, developer tooling, and problem-solving through clean design. He enjoys turning challenging requirements into elegant, reliable solutions.
Tuesday May 12, 2026 11:30 - 12:15 EEST
MC3.2 Megaro Mousikis, Athens, Greece
  Talk
  • global Y

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link