AW Dev Rethought

🕵️ Debugging is like being the detective in a crime movie where you are also the murderer - Filipe Fortes

AI in Production: AI Systems Need SLAs Too — But Different Ones


Introduction:

Service Level Agreements (SLAs) have long been used to define reliability expectations for software systems. Organisations commonly measure uptime, latency, availability, and error rates to ensure systems meet operational requirements.

However, AI systems introduce a new challenge. A model can have excellent uptime, fast response times, and healthy infrastructure metrics while still producing poor results for users.

This means traditional SLAs alone are no longer sufficient. AI systems require a different way of thinking about reliability, performance, and operational success.


Traditional SLAs Focus on System Availability:

Most software SLAs are designed around infrastructure and application behaviour. Teams monitor metrics such as uptime percentages, API response times, throughput, and failure rates.

These metrics work well because traditional software is generally deterministic. If requests succeed and responses are returned correctly, the system is usually considered healthy.

For AI systems, however, operational availability does not necessarily indicate useful or trustworthy behaviour.


A Model Can Be Available and Still Be Failing:

An AI service may respond successfully to every request while generating inaccurate, misleading, or low-quality outputs. From an infrastructure perspective, the service appears healthy.

Users, however, experience a different reality. The system may technically meet its uptime target while failing to deliver meaningful value.

This creates a gap between operational success and user success that traditional SLAs fail to capture.


Output Quality Becomes a Reliability Metric:

Unlike conventional applications, AI systems must be evaluated based on the quality of their outputs. Accuracy, relevance, consistency, and usefulness directly influence user trust.

If quality degrades, the system becomes less valuable regardless of infrastructure performance. Users care about outcomes more than response codes.

This means AI reliability must include quality-related indicators alongside traditional operational metrics.


Latency Has Different Business Implications:

Latency is important for all software systems, but it affects AI products differently. A slight increase in response time for an AI assistant, recommendation engine, or summarisation feature can significantly impact user experience.

Users often perceive AI interactions as conversational and expect responses within reasonable timeframes. Slow responses reduce engagement and confidence in the system.

For AI systems, latency is not just a technical metric — it is a product experience metric.


Consistency Matters More Than Many Teams Expect:

Traditional software generally produces predictable outputs for identical inputs. AI systems behave differently because outputs may vary depending on prompts, context, model updates, or inference conditions.

Users often interpret inconsistency as unreliability. Two different answers to the same question can reduce trust even if both responses are technically acceptable.

Consistency therefore becomes an important operational concern that rarely exists in traditional SLAs.


Hallucinations Create a Unique Reliability Problem:

Most software failures are obvious. APIs return errors, services crash, or transactions fail visibly.

AI failures can be much harder to detect because hallucinations often appear plausible. The system responds confidently while providing incorrect information.

An AI system may technically satisfy all traditional SLAs while simultaneously generating outputs that create business risk.


User Trust Is a Performance Indicator:

For AI products, trust becomes a measurable operational outcome. If users repeatedly verify outputs manually, ignore recommendations, or stop relying on AI-generated results, the system is effectively underperforming.

Trust erosion often happens gradually. Infrastructure metrics may remain healthy while user confidence declines over time.

This makes user behaviour an important signal for AI system health.


Model Drift Requires Ongoing Monitoring:

Traditional applications generally behave consistently after deployment. AI models are influenced by changing data patterns, user behaviour, and evolving business environments.

A model that performs well today may degrade gradually over time. This degradation often occurs without triggering conventional monitoring systems.

AI SLAs must therefore include mechanisms for tracking performance drift and output quality continuously.


Cost Efficiency Is Part of Reliability:

AI systems introduce operational costs that scale differently from traditional software. Model inference, token consumption, retrieval operations, and GPU utilisation can increase rapidly as adoption grows.

A feature may be technically successful but economically unsustainable. Reliability includes maintaining predictable operational costs while delivering acceptable performance.

Cost-aware SLAs become increasingly important for production AI systems.


Human Escalation Paths Need Measurement:

Many production AI systems rely on human review for edge cases, compliance checks, or quality assurance. These human-in-the-loop processes are part of the overall system architecture.

Organisations should monitor escalation rates, review delays, and intervention frequency. Rising human intervention often signals declining model effectiveness.

Operational success depends on both the AI system and the workflows surrounding it.


AI SLAs Must Be Multi-Dimensional:

Traditional SLAs focus primarily on infrastructure reliability. AI systems require a broader approach that includes output quality, consistency, latency, trust, drift, and operational cost.

No single metric can accurately represent AI system health. Multiple indicators must work together to provide meaningful visibility.

Successful AI operations depend on understanding that reliability extends beyond uptime.


Conclusion:

AI systems need SLAs just as traditional systems do, but the metrics must reflect the realities of AI behaviour. Availability and latency remain important, yet they represent only part of the picture.

Reliable AI systems are measured not only by whether they respond, but by whether they provide consistent, useful, trustworthy, and economically sustainable outcomes. As AI adoption grows, organisations will increasingly need SLAs designed specifically for the unique characteristics of production AI.


If this article helped you, you can support my work on AW Dev Rethought. Buy me a coffee


Rethought Relay:
Link copied!

Enjoyed this post?

Stay in the loop

New posts + weekly digest, straight to your inbox.

or

Create a free account

  • Save posts to your vault
  • Like posts & build history
  • New-post alerts

Comments

Add Your Comment

Comment Added!