Auto-refresh: ON
Active API Keys
-
Total Keys
-
Requests Today
-
Avg Latency
-
milliseconds
Daily Requests (30 days)
By Task
Live Activity
Loading activity...
API Keys
| Key | Owner | Tier | Rate Limit | Status | Created | Last Used | Actions |
|---|
Requests Over Time
Requests by Task
Usage by API Key (30 days)
| Key Prefix | Owner | Requests |
|---|
Model Performance
| Task | F1 Macro | Accuracy | Training Time |
|---|
Total Predictions
-
Avg Latency
-
milliseconds
P95 Latency
-
milliseconds
Avg Confidence
-
prediction certainty
Peak Throughput
-
requests / hour
Latency Distribution
P50
-
ms
P95
-
ms
P99
-
ms
Min
-
ms
Max
-
ms
Confidence Distribution
Hourly Throughput (24h)
Per-Task Confidence
Cost / 1K Requests
-
at current capacity
Avg Latency
-
milliseconds per prediction
Monthly Infrastructure
-
Monthly Capacity
-
predictions / month
Cost per 1K Requests at Scale
Fixed GPU cost means the more you use it, the cheaper each prediction becomes.
SLM vs LLM API — When to Use What
Use IRIS SLM when:
- You need real-time classification (<50ms)
- Data cannot leave your network (compliance, PII)
- You're making high-volume classification calls
- You need predictable latency with no cold starts
- You want zero vendor dependency
- The task is well-defined classification (not open-ended generation)
Use LLM APIs when:
- You need open-ended text generation
- The task requires reasoning or multi-step logic
- You have low volume (<10K/month) and minimal latency needs
- The classification categories change frequently
Per-Task Inference Metrics
| Task | Requests | Avg Latency | Min | Max | Avg Confidence |
|---|
Request Logs
| Time (IST) | API Key | Task | Label | Confidence | Latency | Input Size | Batch | IP |
|---|---|---|---|---|---|---|---|---|
| Use the search filters above to query request logs | ||||||||
Inline Model Tester
Service State
-
Models Validated
-
Uptime Since
-
GPU
-
Maintenance Mode
Maintenance mode blocks all prediction traffic and shows a branded maintenance page to end users.
Admin panel and health probes remain accessible. Use this during model updates, retraining, or scheduled downtime.
Model Reload
Hot-reload models from disk. Use this after retraining to pick up new weights without restarting the service.
Models are re-validated with warm-up predictions after reload.
Model Status
| Task | Device | Version | Size | Warm-up | Latency | F1 |
|---|