๐ Prometheus metrics
LiteLLM Exposes a /metrics
endpoint for Prometheus to Poll
Quick Startโ
If you're using the LiteLLM CLI with litellm --config proxy_config.yaml
then you need to pip install prometheus_client==0.20.0
. This is already pre-installed on the litellm Docker image
Add this to your proxy config.yaml
model_list:
- model_name: gpt-4o
litellm_params:
model: gpt-4o
litellm_settings:
callbacks: ["prometheus"]
Start the proxy
litellm --config config.yaml --debug
Test Request
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
]
}'
View Metrics on /metrics
, Visit http://localhost:4000/metrics
http://localhost:4000/metrics
# <proxy_base_url>/metrics
Virtual Keys, Teams, Internal Usersโ
Use this for for tracking per user, key, team, etc.
Metric Name | Description |
---|---|
litellm_spend_metric | Total Spend, per "user", "key", "model", "team", "end-user" |
litellm_total_tokens | input + output tokens per "end_user", "hashed_api_key", "api_key_alias", "requested_model", "team", "team_alias", "user", "model" |
litellm_input_tokens | input tokens per "end_user", "hashed_api_key", "api_key_alias", "requested_model", "team", "team_alias", "user", "model" |
litellm_output_tokens | output tokens per "end_user", "hashed_api_key", "api_key_alias", "requested_model", "team", "team_alias", "user", "model" |
Team - Budgetโ
Metric Name | Description |
---|---|
litellm_team_max_budget_metric | Max Budget for Team Labels: "team_id", "team_alias" |
litellm_remaining_team_budget_metric | Remaining Budget for Team (A team created on LiteLLM) Labels: "team_id", "team_alias" |
litellm_team_budget_remaining_hours_metric | Hours before the team budget is reset Labels: "team_id", "team_alias" |
Virtual Key - Budgetโ
Metric Name | Description |
---|---|
litellm_api_key_max_budget_metric | Max Budget for API Key Labels: "hashed_api_key", "api_key_alias" |
litellm_remaining_api_key_budget_metric | Remaining Budget for API Key (A key Created on LiteLLM) Labels: "hashed_api_key", "api_key_alias" |
litellm_api_key_budget_remaining_hours_metric | Hours before the API Key budget is reset Labels: "hashed_api_key", "api_key_alias" |
Virtual Key - Rate Limitโ
Metric Name | Description |
---|---|
litellm_remaining_api_key_requests_for_model | Remaining Requests for a LiteLLM virtual API key, only if a model-specific rate limit (rpm) has been set for that virtual key. Labels: "hashed_api_key", "api_key_alias", "model" |
litellm_remaining_api_key_tokens_for_model | Remaining Tokens for a LiteLLM virtual API key, only if a model-specific token limit (tpm) has been set for that virtual key. Labels: "hashed_api_key", "api_key_alias", "model" |
Initialize Budget Metrics on Startupโ
If you want litellm to emit the budget metrics for all keys, teams irrespective of whether they are getting requests or not, set prometheus_initialize_budget_metrics
to true
in the config.yaml
How this works:
- If the
prometheus_initialize_budget_metrics
is set totrue
- Every 5 minutes litellm runs a cron job to read all keys, teams from the database
- It then emits the budget metrics for each key, team
- This is used to populate the budget metrics on the
/metrics
endpoint
litellm_settings:
callbacks: ["prometheus"]
prometheus_initialize_budget_metrics: true
Proxy Level Tracking Metricsโ
Use this to track overall LiteLLM Proxy usage.
- Track Actual traffic rate to proxy
- Number of client side requests and failures for requests made to proxy
Metric Name | Description |
---|---|
litellm_proxy_failed_requests_metric | Total number of failed responses from proxy - the client did not get a success response from litellm proxy. Labels: "end_user", "hashed_api_key", "api_key_alias", "requested_model", "team", "team_alias", "user", "exception_status", "exception_class" |
litellm_proxy_total_requests_metric | Total number of requests made to the proxy server - track number of client side requests. Labels: "end_user", "hashed_api_key", "api_key_alias", "requested_model", "team", "team_alias", "user", "status_code" |
LLM Provider Metricsโ
Use this for LLM API Error monitoring and tracking remaining rate limits and token limits
Labels Trackedโ
Label | Description |
---|---|
litellm_model_name | The name of the LLM model used by LiteLLM |
requested_model | The model sent in the request |
model_id | The model_id of the deployment. Autogenerated by LiteLLM, each deployment has a unique model_id |
api_base | The API Base of the deployment |
api_provider | The LLM API provider, used for the provider. Example (azure, openai, vertex_ai) |
hashed_api_key | The hashed api key of the request |
api_key_alias | The alias of the api key used |
team | The team of the request |
team_alias | The alias of the team used |
exception_status | The status of the exception, if any |
exception_class | The class of the exception, if any |
Success and Failureโ
Metric Name | Description |
---|---|
litellm_deployment_success_responses | Total number of successful LLM API calls for deployment. Labels: "requested_model", "litellm_model_name", "model_id", "api_base", "api_provider", "hashed_api_key", "api_key_alias", "team", "team_alias" |
litellm_deployment_failure_responses | Total number of failed LLM API calls for a specific LLM deployment. Labels: "requested_model", "litellm_model_name", "model_id", "api_base", "api_provider", "hashed_api_key", "api_key_alias", "team", "team_alias", "exception_status", "exception_class" |
litellm_deployment_total_requests | Total number of LLM API calls for deployment - success + failure. Labels: "requested_model", "litellm_model_name", "model_id", "api_base", "api_provider", "hashed_api_key", "api_key_alias", "team", "team_alias" |
Remaining Requests and Tokensโ
Metric Name | Description |
---|---|
litellm_remaining_requests_metric | Track x-ratelimit-remaining-requests returned from LLM API Deployment. Labels: "model_group", "api_provider", "api_base", "litellm_model_name", "hashed_api_key", "api_key_alias" |
litellm_remaining_tokens | Track x-ratelimit-remaining-tokens return from LLM API Deployment. Labels: "model_group", "api_provider", "api_base", "litellm_model_name", "hashed_api_key", "api_key_alias" |
Deployment Stateโ
Metric Name | Description |
---|---|
litellm_deployment_state | The state of the deployment: 0 = healthy, 1 = partial outage, 2 = complete outage. Labels: "litellm_model_name", "model_id", "api_base", "api_provider" |
litellm_deployment_latency_per_output_token | Latency per output token for deployment. Labels: "litellm_model_name", "model_id", "api_base", "api_provider", "hashed_api_key", "api_key_alias", "team", "team_alias" |
Fallback (Failover) Metricsโ
Metric Name | Description |
---|---|
litellm_deployment_cooled_down | Number of times a deployment has been cooled down by LiteLLM load balancing logic. Labels: "litellm_model_name", "model_id", "api_base", "api_provider", "exception_status" |
litellm_deployment_successful_fallbacks | Number of successful fallback requests from primary model -> fallback model. Labels: "requested_model", "fallback_model", "hashed_api_key", "api_key_alias", "team", "team_alias", "exception_status", "exception_class" |
litellm_deployment_failed_fallbacks | Number of failed fallback requests from primary model -> fallback model. Labels: "requested_model", "fallback_model", "hashed_api_key", "api_key_alias", "team", "team_alias", "exception_status", "exception_class" |
Request Latency Metricsโ
Metric Name | Description |
---|---|
litellm_request_total_latency_metric | Total latency (seconds) for a request to LiteLLM Proxy Server - tracked for labels "end_user", "hashed_api_key", "api_key_alias", "requested_model", "team", "team_alias", "user", "model" |
litellm_overhead_latency_metric | Latency overhead (seconds) added by LiteLLM processing - tracked for labels "end_user", "hashed_api_key", "api_key_alias", "requested_model", "team", "team_alias", "user", "model" |
litellm_llm_api_latency_metric | Latency (seconds) for just the LLM API call - tracked for labels "model", "hashed_api_key", "api_key_alias", "team", "team_alias", "requested_model", "end_user", "user" |
litellm_llm_api_time_to_first_token_metric | Time to first token for LLM API call - tracked for labels model , hashed_api_key , api_key_alias , team , team_alias [Note: only emitted for streaming requests] |
Tracking end_user
on Prometheusโ
By default LiteLLM does not track end_user
on Prometheus. This is done to reduce the cardinality of the metrics from LiteLLM Proxy.
If you want to track end_user
on Prometheus, you can do the following:
litellm_settings:
callbacks: ["prometheus"]
enable_end_user_cost_tracking_prometheus_only: true
[BETA] Custom Metricsโ
Track custom metrics on prometheus on all events mentioned above.
- Define the custom metrics in the
config.yaml
model_list:
- model_name: openai/gpt-4o
litellm_params:
model: openai/gpt-4o
api_key: os.environ/OPENAI_API_KEY
litellm_settings:
callbacks: ["prometheus"]
custom_prometheus_metadata_labels: ["metadata.foo", "metadata.bar"]
- Make a request with the custom metadata labels
curl -L -X POST 'http://0.0.0.0:4000/v1/chat/completions' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer <LITELLM_API_KEY>' \
-d '{
"model": "openai/gpt-4o",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What's in this image?"
}
]
}
],
"max_tokens": 300,
"metadata": {
"foo": "hello world"
}
}'
- Check your
/metrics
endpoint for the custom metrics
... "metadata_foo": "hello world" ...
Configuring Metrics and Labelsโ
You can selectively enable specific metrics and control which labels are included to optimize performance and reduce cardinality.
Enable Specific Metrics and Labelsโ
Configure which metrics to emit by specifying them in prometheus_metrics_config
. Each configuration group needs a group
name (for organization) and a list of metrics
to enable. You can optionally include a list of include_labels
to filter the labels for the metrics.
model_list:
- model_name: gpt-4o
litellm_params:
model: gpt-4o
litellm_settings:
callbacks: ["prometheus"]
prometheus_metrics_config:
# High-cardinality metrics with minimal labels
- group: "proxy_metrics"
metrics:
- "litellm_proxy_total_requests_metric"
- "litellm_proxy_failed_requests_metric"
include_labels:
- "hashed_api_key"
- "requested_model"
- "model_group"
On starting up LiteLLM if your metrics were correctly configured, you should see the following on your container logs
Filter Labels Per Metricโ
Control which labels are included for each metric to reduce cardinality:
litellm_settings:
callbacks: ["prometheus"]
prometheus_metrics_config:
- group: "spend_and_tokens"
metrics:
- "litellm_spend_metric"
- "litellm_total_tokens"
include_labels:
- "model"
- "team"
- "hashed_api_key"
- group: "request_tracking"
metrics:
- "litellm_proxy_total_requests_metric"
include_labels:
- "status_code"
- "requested_model"
Advanced Configurationโ
You can create multiple configuration groups with different label sets:
litellm_settings:
callbacks: ["prometheus"]
prometheus_metrics_config:
# High-cardinality metrics with minimal labels
- group: "deployment_health"
metrics:
- "litellm_deployment_success_responses"
- "litellm_deployment_failure_responses"
include_labels:
- "api_provider"
- "requested_model"
# Budget metrics with full label set
- group: "budget_tracking"
metrics:
- "litellm_spend_metric"
- "litellm_remaining_team_budget_metric"
include_labels:
- "team"
- "team_alias"
- "hashed_api_key"
- "api_key_alias"
- "model"
- "end_user"
# Latency metrics with performance-focused labels
- group: "performance"
metrics:
- "litellm_request_total_latency_metric"
- "litellm_llm_api_latency_metric"
include_labels:
- "model"
- "api_provider"
- "requested_model"
Configuration Structure:
group
: A descriptive name for organizing related metricsmetrics
: List of metric names to include in this groupinclude_labels
: (Optional) List of labels to include for these metrics
Default Behavior: If no prometheus_metrics_config
is specified, all metrics are enabled with their default labels (backward compatible).
Monitor System Healthโ
To monitor the health of litellm adjacent services (redis / postgres), do:
model_list:
- model_name: gpt-4o
litellm_params:
model: gpt-4o
litellm_settings:
service_callback: ["prometheus_system"]
Metric Name | Description |
---|---|
litellm_redis_latency | histogram latency for redis calls |
litellm_redis_fails | Number of failed redis calls |
litellm_self_latency | Histogram latency for successful litellm api call |
DB Transaction Queue Health Metricsโ
Use these metrics to monitor the health of the DB Transaction Queue. Eg. Monitoring the size of the in-memory and redis buffers.
Metric Name | Description | Storage Type |
---|---|---|
litellm_pod_lock_manager_size | Indicates which pod has the lock to write updates to the database. | Redis |
litellm_in_memory_daily_spend_update_queue_size | Number of items in the in-memory daily spend update queue. These are the aggregate spend logs for each user. | In-Memory |
litellm_redis_daily_spend_update_queue_size | Number of items in the Redis daily spend update queue. These are the aggregate spend logs for each user. | Redis |
litellm_in_memory_spend_update_queue_size | In-memory aggregate spend values for keys, users, teams, team members, etc. | In-Memory |
litellm_redis_spend_update_queue_size | Redis aggregate spend values for keys, users, teams, etc. | Redis |
**๐ฅ LiteLLM Maintained Grafana Dashboards **โ
Link to Grafana Dashboards maintained by LiteLLM
https://github.com/BerriAI/litellm/tree/main/cookbook/litellm_proxy_server/grafana_dashboard
Here is a screenshot of the metrics you can monitor with the LiteLLM Grafana Dashboard
Deprecated Metricsโ
Metric Name | Description |
---|---|
litellm_llm_api_failed_requests_metric | deprecated use litellm_proxy_failed_requests_metric |
litellm_requests_metric | deprecated use litellm_proxy_total_requests_metric |
Add authentication on /metrics endpointโ
By default /metrics endpoint is unauthenticated.
You can opt into running litellm authentication on the /metrics endpoint by setting the following on the config
litellm_settings:
require_auth_for_metrics_endpoint: true
FAQโ
What are _created
vs. _total
metrics?โ
_created
metrics are metrics that are created when the proxy starts_total
metrics are metrics that are incremented for each request
You should consume the _total
metrics for your counting purposes