Metrics
Elasticsearch is a powerful search and analytics engine for various data types. Monitoring its metrics is vital for maintaining performance, stability, and reliability. The following is a list of essential Elasticsearch metrics in PDS. Understanding these metrics will help administrators optimize performance, troubleshoot issues, and ensure the Elasticsearch cluster runs smoothly.
For Elasticsearch deployment, the data service metrics are accessible on port 9114.
Access metrics
Below is a step-by-step guide on how to access Elasticsearch metrics for PDS deployments:
-
Identify the Elasticsearch pod running in your namespace:
kubectl get pods -n <your-namespace>
Look for the pod name that corresponds to your Elasticsearch instance or its sidecar exporter.
-
Port-forward from your local machine’s port 9114 to the pod’s port 9114:
kubectl port-forward -n <your-namespace> <elasticsearch-pod-name> 9114:9114
-
Open a browser or use
curl
to go tohttp://localhost:9114/metrics
.You should see a text-based Prometheus metrics output specific to Elasticsearch.
-
Check for the service exposing the Elasticsearch exporter. for example,
<release-name>-elasticsearch-exporter
:kubectl get svc -n <your-namespace>
-
Access the metrics:
-
If NodePort, note
<nodeport>
:http://<node-ip>:<nodeport>/metrics
-
If LoadBalancer, note
<loadbalancer-ip>
:http://<loadbalancer-ip>:9114/metrics
-
-
Verify metrics:
-
Using curl:
curl http://<host>:9114/metrics
Replace
<host>
with either localhost (if using port-forward),<node-ip>
(NodePort), or<loadbalancer-ip>
(LoadBalancer). -
Prometheus UI:
In Prometheus, navigate to the Expression browser and search for metrics beginning with
elasticsearch_
or similar Elasticsearch-related prefixes to confirm they are being scraped. -
Grafana or other dashboards:
If you have Grafana connected to Prometheus, open your dashboard. Check that Elasticsearch metrics (those starting with
elasticsearch_
) are being ingested and displayed.
-
- Ensure that any NetworkPolicies or firewall rules allow inbound traffic on port 9114 if you plan to expose it externally.
- Metrics naming conventions can vary depending on the Elasticsearch exporter version. Generally, look for prefixes like
elasticsearch_
.
Elasticsearch metrics
Metric name | Description |
---|---|
elasticsearch_breakers_estimated_size_bytes | Estimated size in bytes of breaker |
elasticsearch_breakers_limit_size_bytes | Limit size in bytes for breaker |
elasticsearch_breakers_tripped | tripped for breaker |
elasticsearch_cluster_health_active_primary_shards | The number of primary shards in your cluster. This is an aggregate total across all indices. |
elasticsearch_cluster_health_active_shards | Aggregate total of all shards across all indices, which includes replica shards. |
elasticsearch_cluster_health_delayed_unassigned_shards | Shards delayed to reduce reallocation overhead |
elasticsearch_cluster_health_initializing_shards | Count of shards that are being freshly created. |
elasticsearch_cluster_health_number_of_data_nodes | Number of data nodes in the cluster. |
elasticsearch_cluster_health_number_of_in_flight_fetch | The number of ongoing shard info requests. |
elasticsearch_cluster_health_number_of_nodes | Number of nodes in the cluster. |
elasticsearch_cluster_health_number_of_pending_tasks | Cluster level changes which have not yet been executed |
elasticsearch_cluster_health_task_max_waiting_in_queue_millis | Max time in millis that a task is waiting in queue. |
elasticsearch_cluster_health_relocating_shards | The number of shards that are currently moving from one node to another node. |
elasticsearch_cluster_health_status | Whether all primary and replica shards are allocated. |
elasticsearch_cluster_health_timed_out | Number of cluster health checks timed out |
elasticsearch_cluster_health_unassigned_shards | The number of shards that exist in the cluster state, but cannot be found in the cluster itself. |
elasticsearch_clustersettings_stats_max_shards_per_node | Current maximum number of shards per node setting. |
elasticsearch_clustersettings_allocation_threshold_enabled | Is disk allocation decider enabled. |
elasticsearch_clustersettings_allocation_watermark_flood_stage_bytes | Flood stage watermark as in bytes. |
elasticsearch_clustersettings_allocation_watermark_high_bytes | High watermark for disk usage in bytes. |
elasticsearch_clustersettings_allocation_watermark_low_bytes | Low watermark for disk usage in bytes. |
elasticsearch_clustersettings_allocation_watermark_flood_stage_ratio | Flood stage watermark as a ratio. |
elasticsearch_clustersettings_allocation_watermark_high_ratio | High watermark for disk usage as a ratio. |
elasticsearch_clustersettings_allocation_watermark_low_ratio | Low watermark for disk usage as a ratio. |
elasticsearch_filesystem_data_available_bytes | Available space on block device in bytes |
elasticsearch_filesystem_data_free_bytes | Free space on block device in bytes |
elasticsearch_filesystem_data_size_bytes | Size of block device in bytes |
elasticsearch_filesystem_io_stats_device_operations_count | Count of disk operations |
elasticsearch_filesystem_io_stats_device_read_operations_count | Count of disk read operations |
elasticsearch_filesystem_io_stats_device_write_operations_count | Count of disk write operations |
elasticsearch_filesystem_io_stats_device_read_size_kilobytes_sum | Total kilobytes read from disk |
elasticsearch_filesystem_io_stats_device_write_size_kilobytes_sum | Total kilobytes written to disk |
elasticsearch_indices_active_queries | The number of currently active queries |
elasticsearch_indices_docs | Count of documents on this node |
elasticsearch_indices_docs_deleted | Count of deleted documents on this node |
elasticsearch_indices_deleted_docs_primary | Count of deleted documents with only primary shards |
elasticsearch_indices_docs_primary | Count of documents with only primary shards on all nodes |
elasticsearch_indices_docs_total | Count of documents with shards on all nodes |
elasticsearch_indices_fielddata_evictions | Evictions from field data |
elasticsearch_indices_fielddata_memory_size_bytes | Field data cache memory usage in bytes |
elasticsearch_indices_filter_cache_evictions | Evictions from filter cache |
elasticsearch_indices_filter_cache_memory_size_bytes | Filter cache memory usage in bytes |
elasticsearch_indices_flush_time_seconds | Cumulative flush time in seconds |
elasticsearch_indices_flush_total | Total flushes |
elasticsearch_indices_get_exists_time_seconds | Total time get exists in seconds |
elasticsearch_indices_get_exists_total | Total get exists operations |
elasticsearch_indices_get_missing_time_seconds | Total time of get missing in seconds |
elasticsearch_indices_get_missing_total | Total get missing |
elasticsearch_indices_get_time_seconds | Total get time in seconds |
elasticsearch_indices_get_total | Total get |
elasticsearch_indices_indexing_delete_time_seconds_total | Total time indexing delete in seconds |
elasticsearch_indices_indexing_delete_total | Total indexing deletes |
elasticsearch_indices_index_current | The number of documents currently being indexed to an index |
elasticsearch_indices_indexing_index_time_seconds_total | Cumulative index time in seconds |
elasticsearch_indices_indexing_index_total | Total index calls |
elasticsearch_indices_mappings_stats_fields | Count of fields currently mapped by index |
elasticsearch_indices_mappings_stats_json_parse_failures_total | Number of errors while parsing JSON |
elasticsearch_indices_mappings_stats_scrapes_total | Current total Elasticsearch Indices Mappings scrapes |
elasticsearch_indices_mappings_stats_up | Was the last scrape of the Elasticsearch Indices Mappings endpoint successful |
elasticsearch_indices_merges_docs_total | Cumulative docs merged |
elasticsearch_indices_merges_total | Total merges |
elasticsearch_indices_merges_total_size_bytes_total | Total merge size in bytes |
elasticsearch_indices_merges_total_time_seconds_total | Total time spent merging in seconds |
elasticsearch_indices_query_cache_cache_total | Count of query cache |
elasticsearch_indices_query_cache_cache_size | Size of query cache |
elasticsearch_indices_query_cache_count | Count of query cache hit/miss |
elasticsearch_indices_query_cache_evictions | Evictions from query cache |
elasticsearch_indices_query_cache_memory_size_bytes | Query cache memory usage in bytes |
elasticsearch_indices_query_cache_total | Size of query cache total |
elasticsearch_indices_refresh_time_seconds_total | Total time spent refreshing in seconds |
elasticsearch_indices_refresh_total | Total refreshes |
elasticsearch_indices_request_cache_count | Count of request cache hit/miss |
elasticsearch_indices_request_cache_evictions | Evictions from request cache |
elasticsearch_indices_request_cache_memory_size_bytes | Request cache memory usage in bytes |
elasticsearch_indices_search_fetch_time_seconds | Total search fetch time in seconds |
elasticsearch_indices_search_fetch_total | Total number of fetches |
elasticsearch_indices_search_query_time_seconds | Total search query time in seconds |
elasticsearch_indices_search_query_total | Total number of queries |
elasticsearch_indices_segments_count | Count of index segments on this node |
elasticsearch_indices_segments_memory_bytes | Current memory size of segments in bytes |
elasticsearch_indices_settings_creation_timestamp_seconds | Timestamp of the index creation in seconds |
elasticsearch_indices_settings_stats_read_only_indices | Count of indices that have read_only_allow_delete=true |
elasticsearch_indices_settings_total_fields | Index setting value for index.mapping.total_fields.limit (total allowable mapped fields in a index) |
elasticsearch_indices_settings_replicas | Index setting value for index.replicas |
elasticsearch_indices_shards_docs | Count of documents on this shard |
elasticsearch_indices_shards_docs_deleted | Count of deleted documents on each shard |
elasticsearch_indices_store_size_bytes | Current size of stored index data in bytes |
elasticsearch_indices_store_size_bytes_primary | Current size of stored index data in bytes with only primary shards on all nodes |
elasticsearch_indices_store_size_bytes_total | Current size of stored index data in bytes with all shards on all nodes |
elasticsearch_indices_store_throttle_time_seconds_total | Throttle time for index store in seconds |
elasticsearch_indices_translog_operations | Total translog operations |
elasticsearch_indices_translog_size_in_bytes | Total translog size in bytes |
elasticsearch_indices_warmer_time_seconds_total | Total warmer time in seconds |
elasticsearch_indices_warmer_total | Total warmer count |
elasticsearch_jvm_gc_collection_seconds_count | Count of JVM GC runs |
elasticsearch_jvm_gc_collection_seconds_sum | GC run time in seconds |
elasticsearch_jvm_memory_committed_bytes | JVM memory currently committed by area |
elasticsearch_jvm_memory_max_bytes | JVM memory max |
elasticsearch_jvm_memory_used_bytes | JVM memory currently used by area |
elasticsearch_jvm_memory_pool_used_bytes | JVM memory currently used by pool |
elasticsearch_jvm_memory_pool_max_bytes | JVM memory max by pool |
elasticsearch_jvm_memory_pool_peak_used_bytes | JVM memory peak used by pool |
elasticsearch_jvm_memory_pool_peak_max_bytes | JVM memory peak max by pool |
elasticsearch_os_cpu_percent | Percent CPU used by the OS |
elasticsearch_os_load1 | Shortterm load average |
elasticsearch_os_load5 | Midterm load average |
elasticsearch_os_load15 | Longterm load average |
elasticsearch_process_cpu_percent | Percent CPU used by process |
elasticsearch_process_cpu_seconds_total | Process CPU time in seconds |
elasticsearch_process_mem_resident_size_bytes | Resident memory in use by process in bytes |
elasticsearch_process_mem_share_size_bytes | Shared memory in use by process in bytes |
elasticsearch_process_mem_virtual_size_bytes | Total virtual memory used in bytes |
elasticsearch_process_open_files_count | Open file descriptors |
elasticsearch_snapshot_stats_number_of_snapshots | Total number of snapshots |
elasticsearch_snapshot_stats_oldest_snapshot_timestamp | Oldest snapshot timestamp |
elasticsearch_snapshot_stats_snapshot_start_time_timestamp | Last snapshot start timestamp |
elasticsearch_snapshot_stats_latest_snapshot_timestamp_seconds | Timestamp of the latest SUCCESS or PARTIAL snapshot |
elasticsearch_snapshot_stats_snapshot_end_time_timestamp | Last snapshot end timestamp |
elasticsearch_snapshot_stats_snapshot_number_of_failures | Last snapshot number of failures |
elasticsearch_snapshot_stats_snapshot_number_of_indices | Last snapshot number of indices |
elasticsearch_snapshot_stats_snapshot_failed_shards | Last snapshot failed shards |
elasticsearch_snapshot_stats_snapshot_successful_shards | Last snapshot successful shards |
elasticsearch_snapshot_stats_snapshot_total_shard | Last snapshot total shard |
elasticsearch_thread_pool_active_count | Thread Pool threads active |
elasticsearch_thread_pool_completed_count | Thread Pool operations completed |
elasticsearch_thread_pool_largest_count | Thread Pool largest threads count |
elasticsearch_thread_pool_queue_count | Thread Pool operations queued |
elasticsearch_thread_pool_rejected_count | Thread Pool operations rejected |
elasticsearch_thread_pool_threads_count | Thread Pool current threads count |
elasticsearch_transport_rx_packets_total | Count of packets received |
elasticsearch_transport_rx_size_bytes_total | Total number of bytes received |
elasticsearch_transport_tx_packets_total | Count of packets sent |
elasticsearch_transport_tx_size_bytes_total | Total number of bytes sent |
elasticsearch_clusterinfo_last_retrieval_success_ts | Timestamp of the last successful cluster info retrieval |
elasticsearch_clusterinfo_up | Up metric for the cluster info collector |
elasticsearch_clusterinfo_version_info | Constant metric with ES version information as labels |
elasticsearch_slm_stats_up | Up metric for SLM collector |
elasticsearch_slm_stats_total_scrapes | Number of scrapes for SLM collector |
elasticsearch_slm_stats_json_parse_failures | JSON parse failures for SLM collector |
elasticsearch_slm_stats_retention_runs_total | Total retention runs |
elasticsearch_slm_stats_retention_failed_total | Total failed retention runs |
elasticsearch_slm_stats_retention_timed_out_total | Total retention run timeouts |
elasticsearch_slm_stats_retention_deletion_time_seconds | Retention run deletion time |
elasticsearch_slm_stats_total_snapshots_taken_total | Total snapshots taken |
elasticsearch_slm_stats_total_snapshots_failed_total | Total snapshots failed |
elasticsearch_slm_stats_total_snapshots_deleted_total | Total snapshots deleted |
elasticsearch_slm_stats_total_snapshots_failed_total | Total snapshots failed |
elasticsearch_slm_stats_snapshots_taken_total | Snapshots taken by policy |
elasticsearch_slm_stats_snapshots_failed_total | Snapshots failed by policy |
elasticsearch_slm_stats_snapshots_deleted_total | Snapshots deleted by policy |
elasticsearch_slm_stats_snapshot_deletion_failures_total | Snapshot deletion failures by policy |
elasticsearch_slm_stats_operation_mode | SLM operation mode (Running, stopping, stopped) |
elasticsearch_data_stream_stats_up | Up metric for Data Stream collection |
elasticsearch_data_stream_stats_total_scrapes | Total scrapes for Data Stream stats |
elasticsearch_data_stream_stats_json_parse_failures | Number of parsing failures for Data Stream stats |
elasticsearch_data_stream_backing_indices_total | Number of backing indices for Data Stream |
elasticsearch_data_stream_store_size_bytes | Current size of data stream backing indices in bytes |