IRIS Admin — IRISeller

Auto-refresh: ON

Active API Keys

-

Total Keys

-

Requests Today

-

Avg Latency

-

milliseconds

Daily Requests (30 days)

By Task

Live Activity

Loading activity...

API Keys

Key	Owner	Tier	Rate Limit	Status	Created	Last Used	Actions

Requests Over Time

Requests by Task

Usage by API Key (30 days)

Key Prefix	Owner	Requests

Model Performance

Task	F1 Macro	Accuracy	Training Time

Total Predictions

-

Avg Latency

-

milliseconds

P95 Latency

-

milliseconds

Avg Confidence

-

prediction certainty

Peak Throughput

-

requests / hour

Latency Distribution

P50

-

ms

P95

-

ms

P99

-

ms

Min

-

ms

Max

-

ms

Confidence Distribution

Hourly Throughput (24h)

Per-Task Confidence

Cost / 1K Requests

-

at current capacity

Avg Latency

-

milliseconds per prediction

Monthly Infrastructure

-

Monthly Capacity

-

predictions / month

Cost per 1K Requests at Scale

Fixed GPU cost means the more you use it, the cheaper each prediction becomes.

SLM vs LLM API — When to Use What

Use IRIS SLM when:

You need real-time classification (<50ms)
Data cannot leave your network (compliance, PII)
You're making high-volume classification calls
You need predictable latency with no cold starts
You want zero vendor dependency
The task is well-defined classification (not open-ended generation)

Use LLM APIs when:

You need open-ended text generation
The task requires reasoning or multi-step logic
You have low volume (<10K/month) and minimal latency needs
The classification categories change frequently

Per-Task Inference Metrics

Task	Requests	Avg Latency	Min	Max	Avg Confidence

Request Logs

Conf: -

Time (IST)	API Key	Task	Label	Confidence	Latency	Input Size	Batch	IP
Use the search filters above to query request logs

API Key

Inline Model Tester

Lead Scoring Intent Detection Sentiment Objection Deal Risk Email

Predicted Label

Confidence

-

- ms

Probability Distribution

Open in Classifier

Service State

-

Models Validated

-

Uptime Since

-

GPU

-

Maintenance Mode

Maintenance mode blocks all prediction traffic and shows a branded maintenance page to end users. Admin panel and health probes remain accessible. Use this during model updates, retraining, or scheduled downtime.

Model Reload

Hot-reload models from disk. Use this after retraining to pick up new weights without restarting the service. Models are re-validated with warm-up predictions after reload.

Model Status

Task	Device	Version	Size	Warm-up	Latency	F1

Admin Access

Daily Requests (30 days)

By Task

Live Activity

API Keys

Requests Over Time

Requests by Task

Usage by API Key (30 days)

Model Performance

Latency Distribution

Confidence Distribution

Hourly Throughput (24h)

Per-Task Confidence

Cost per 1K Requests at Scale

SLM vs LLM API — When to Use What

Per-Task Inference Metrics

Request Logs

Inline Model Tester

Maintenance Mode

Model Reload

Model Status

Generate New API Key

API Key Generated