Skip to main content
Version: 25.01.01

Metrics

Elasticsearch is a powerful search and analytics engine for various data types. Monitoring its metrics is vital for maintaining performance, stability, and reliability. The following is a list of essential Elasticsearch metrics in PDS. Understanding these metrics will help administrators optimize performance, troubleshoot issues, and ensure the Elasticsearch cluster runs smoothly.

note

For Elasticsearch deployment, the data service metrics are accessible on port 9114.

Access metrics

Below is a step-by-step guide on how to access Elasticsearch metrics for PDS deployments:

  1. Identify the Elasticsearch pod running in your namespace:

    kubectl get pods -n <your-namespace>

    Look for the pod name that corresponds to your Elasticsearch instance or its sidecar exporter.

  2. Port-forward from your local machine’s port 9114 to the pod’s port 9114:

    kubectl port-forward -n <your-namespace> <elasticsearch-pod-name> 9114:9114
  3. Open a browser or use curl to go to http://localhost:9114/metrics.

    You should see a text-based Prometheus metrics output specific to Elasticsearch.

  4. Check for the service exposing the Elasticsearch exporter. for example, <release-name>-elasticsearch-exporter:

    kubectl get svc -n <your-namespace>
  5. Access the metrics:

    • If NodePort, note <nodeport>:

      http://<node-ip>:<nodeport>/metrics
    • If LoadBalancer, note <loadbalancer-ip>:

      http://<loadbalancer-ip>:9114/metrics
  6. Verify metrics:

    • Using curl:

      curl http://<host>:9114/metrics

      Replace <host> with either localhost (if using port-forward), <node-ip> (NodePort), or <loadbalancer-ip> (LoadBalancer).

    • Prometheus UI:

      In Prometheus, navigate to the Expression browser and search for metrics beginning with elasticsearch_ or similar Elasticsearch-related prefixes to confirm they are being scraped.

    • Grafana or other dashboards:

      If you have Grafana connected to Prometheus, open your dashboard. Check that Elasticsearch metrics (those starting with elasticsearch_) are being ingested and displayed.

note
  • Ensure that any NetworkPolicies or firewall rules allow inbound traffic on port 9114 if you plan to expose it externally.
  • Metrics naming conventions can vary depending on the Elasticsearch exporter version. Generally, look for prefixes like elasticsearch_.

Elasticsearch metrics

Metric nameDescription
elasticsearch_breakers_estimated_size_bytesEstimated size in bytes of breaker
elasticsearch_breakers_limit_size_bytesLimit size in bytes for breaker
elasticsearch_breakers_trippedtripped for breaker
elasticsearch_cluster_health_active_primary_shardsThe number of primary shards in your cluster. This is an aggregate total across all indices.
elasticsearch_cluster_health_active_shardsAggregate total of all shards across all indices, which includes replica shards.
elasticsearch_cluster_health_delayed_unassigned_shardsShards delayed to reduce reallocation overhead
elasticsearch_cluster_health_initializing_shardsCount of shards that are being freshly created.
elasticsearch_cluster_health_number_of_data_nodesNumber of data nodes in the cluster.
elasticsearch_cluster_health_number_of_in_flight_fetchThe number of ongoing shard info requests.
elasticsearch_cluster_health_number_of_nodesNumber of nodes in the cluster.
elasticsearch_cluster_health_number_of_pending_tasksCluster level changes which have not yet been executed
elasticsearch_cluster_health_task_max_waiting_in_queue_millisMax time in millis that a task is waiting in queue.
elasticsearch_cluster_health_relocating_shardsThe number of shards that are currently moving from one node to another node.
elasticsearch_cluster_health_statusWhether all primary and replica shards are allocated.
elasticsearch_cluster_health_timed_outNumber of cluster health checks timed out
elasticsearch_cluster_health_unassigned_shardsThe number of shards that exist in the cluster state, but cannot be found in the cluster itself.
elasticsearch_clustersettings_stats_max_shards_per_nodeCurrent maximum number of shards per node setting.
elasticsearch_clustersettings_allocation_threshold_enabledIs disk allocation decider enabled.
elasticsearch_clustersettings_allocation_watermark_flood_stage_bytesFlood stage watermark as in bytes.
elasticsearch_clustersettings_allocation_watermark_high_bytesHigh watermark for disk usage in bytes.
elasticsearch_clustersettings_allocation_watermark_low_bytesLow watermark for disk usage in bytes.
elasticsearch_clustersettings_allocation_watermark_flood_stage_ratioFlood stage watermark as a ratio.
elasticsearch_clustersettings_allocation_watermark_high_ratioHigh watermark for disk usage as a ratio.
elasticsearch_clustersettings_allocation_watermark_low_ratioLow watermark for disk usage as a ratio.
elasticsearch_filesystem_data_available_bytesAvailable space on block device in bytes
elasticsearch_filesystem_data_free_bytesFree space on block device in bytes
elasticsearch_filesystem_data_size_bytesSize of block device in bytes
elasticsearch_filesystem_io_stats_device_operations_countCount of disk operations
elasticsearch_filesystem_io_stats_device_read_operations_countCount of disk read operations
elasticsearch_filesystem_io_stats_device_write_operations_countCount of disk write operations
elasticsearch_filesystem_io_stats_device_read_size_kilobytes_sumTotal kilobytes read from disk
elasticsearch_filesystem_io_stats_device_write_size_kilobytes_sumTotal kilobytes written to disk
elasticsearch_indices_active_queriesThe number of currently active queries
elasticsearch_indices_docsCount of documents on this node
elasticsearch_indices_docs_deletedCount of deleted documents on this node
elasticsearch_indices_deleted_docs_primaryCount of deleted documents with only primary shards
elasticsearch_indices_docs_primaryCount of documents with only primary shards on all nodes
elasticsearch_indices_docs_totalCount of documents with shards on all nodes
elasticsearch_indices_fielddata_evictionsEvictions from field data
elasticsearch_indices_fielddata_memory_size_bytesField data cache memory usage in bytes
elasticsearch_indices_filter_cache_evictionsEvictions from filter cache
elasticsearch_indices_filter_cache_memory_size_bytesFilter cache memory usage in bytes
elasticsearch_indices_flush_time_secondsCumulative flush time in seconds
elasticsearch_indices_flush_totalTotal flushes
elasticsearch_indices_get_exists_time_secondsTotal time get exists in seconds
elasticsearch_indices_get_exists_totalTotal get exists operations
elasticsearch_indices_get_missing_time_secondsTotal time of get missing in seconds
elasticsearch_indices_get_missing_totalTotal get missing
elasticsearch_indices_get_time_secondsTotal get time in seconds
elasticsearch_indices_get_totalTotal get
elasticsearch_indices_indexing_delete_time_seconds_totalTotal time indexing delete in seconds
elasticsearch_indices_indexing_delete_totalTotal indexing deletes
elasticsearch_indices_index_currentThe number of documents currently being indexed to an index
elasticsearch_indices_indexing_index_time_seconds_totalCumulative index time in seconds
elasticsearch_indices_indexing_index_totalTotal index calls
elasticsearch_indices_mappings_stats_fieldsCount of fields currently mapped by index
elasticsearch_indices_mappings_stats_json_parse_failures_totalNumber of errors while parsing JSON
elasticsearch_indices_mappings_stats_scrapes_totalCurrent total Elasticsearch Indices Mappings scrapes
elasticsearch_indices_mappings_stats_upWas the last scrape of the Elasticsearch Indices Mappings endpoint successful
elasticsearch_indices_merges_docs_totalCumulative docs merged
elasticsearch_indices_merges_totalTotal merges
elasticsearch_indices_merges_total_size_bytes_totalTotal merge size in bytes
elasticsearch_indices_merges_total_time_seconds_totalTotal time spent merging in seconds
elasticsearch_indices_query_cache_cache_totalCount of query cache
elasticsearch_indices_query_cache_cache_sizeSize of query cache
elasticsearch_indices_query_cache_countCount of query cache hit/miss
elasticsearch_indices_query_cache_evictionsEvictions from query cache
elasticsearch_indices_query_cache_memory_size_bytesQuery cache memory usage in bytes
elasticsearch_indices_query_cache_totalSize of query cache total
elasticsearch_indices_refresh_time_seconds_totalTotal time spent refreshing in seconds
elasticsearch_indices_refresh_totalTotal refreshes
elasticsearch_indices_request_cache_countCount of request cache hit/miss
elasticsearch_indices_request_cache_evictionsEvictions from request cache
elasticsearch_indices_request_cache_memory_size_bytesRequest cache memory usage in bytes
elasticsearch_indices_search_fetch_time_secondsTotal search fetch time in seconds
elasticsearch_indices_search_fetch_totalTotal number of fetches
elasticsearch_indices_search_query_time_secondsTotal search query time in seconds
elasticsearch_indices_search_query_totalTotal number of queries
elasticsearch_indices_segments_countCount of index segments on this node
elasticsearch_indices_segments_memory_bytesCurrent memory size of segments in bytes
elasticsearch_indices_settings_creation_timestamp_secondsTimestamp of the index creation in seconds
elasticsearch_indices_settings_stats_read_only_indicesCount of indices that have read_only_allow_delete=true
elasticsearch_indices_settings_total_fieldsIndex setting value for index.mapping.total_fields.limit (total allowable mapped fields in a index)
elasticsearch_indices_settings_replicasIndex setting value for index.replicas
elasticsearch_indices_shards_docsCount of documents on this shard
elasticsearch_indices_shards_docs_deletedCount of deleted documents on each shard
elasticsearch_indices_store_size_bytesCurrent size of stored index data in bytes
elasticsearch_indices_store_size_bytes_primaryCurrent size of stored index data in bytes with only primary shards on all nodes
elasticsearch_indices_store_size_bytes_totalCurrent size of stored index data in bytes with all shards on all nodes
elasticsearch_indices_store_throttle_time_seconds_totalThrottle time for index store in seconds
elasticsearch_indices_translog_operationsTotal translog operations
elasticsearch_indices_translog_size_in_bytesTotal translog size in bytes
elasticsearch_indices_warmer_time_seconds_totalTotal warmer time in seconds
elasticsearch_indices_warmer_totalTotal warmer count
elasticsearch_jvm_gc_collection_seconds_countCount of JVM GC runs
elasticsearch_jvm_gc_collection_seconds_sumGC run time in seconds
elasticsearch_jvm_memory_committed_bytesJVM memory currently committed by area
elasticsearch_jvm_memory_max_bytesJVM memory max
elasticsearch_jvm_memory_used_bytesJVM memory currently used by area
elasticsearch_jvm_memory_pool_used_bytesJVM memory currently used by pool
elasticsearch_jvm_memory_pool_max_bytesJVM memory max by pool
elasticsearch_jvm_memory_pool_peak_used_bytesJVM memory peak used by pool
elasticsearch_jvm_memory_pool_peak_max_bytesJVM memory peak max by pool
elasticsearch_os_cpu_percentPercent CPU used by the OS
elasticsearch_os_load1Shortterm load average
elasticsearch_os_load5Midterm load average
elasticsearch_os_load15Longterm load average
elasticsearch_process_cpu_percentPercent CPU used by process
elasticsearch_process_cpu_seconds_totalProcess CPU time in seconds
elasticsearch_process_mem_resident_size_bytesResident memory in use by process in bytes
elasticsearch_process_mem_share_size_bytesShared memory in use by process in bytes
elasticsearch_process_mem_virtual_size_bytesTotal virtual memory used in bytes
elasticsearch_process_open_files_countOpen file descriptors
elasticsearch_snapshot_stats_number_of_snapshotsTotal number of snapshots
elasticsearch_snapshot_stats_oldest_snapshot_timestampOldest snapshot timestamp
elasticsearch_snapshot_stats_snapshot_start_time_timestampLast snapshot start timestamp
elasticsearch_snapshot_stats_latest_snapshot_timestamp_secondsTimestamp of the latest SUCCESS or PARTIAL snapshot
elasticsearch_snapshot_stats_snapshot_end_time_timestampLast snapshot end timestamp
elasticsearch_snapshot_stats_snapshot_number_of_failuresLast snapshot number of failures
elasticsearch_snapshot_stats_snapshot_number_of_indicesLast snapshot number of indices
elasticsearch_snapshot_stats_snapshot_failed_shardsLast snapshot failed shards
elasticsearch_snapshot_stats_snapshot_successful_shardsLast snapshot successful shards
elasticsearch_snapshot_stats_snapshot_total_shardLast snapshot total shard
elasticsearch_thread_pool_active_countThread Pool threads active
elasticsearch_thread_pool_completed_countThread Pool operations completed
elasticsearch_thread_pool_largest_countThread Pool largest threads count
elasticsearch_thread_pool_queue_countThread Pool operations queued
elasticsearch_thread_pool_rejected_countThread Pool operations rejected
elasticsearch_thread_pool_threads_countThread Pool current threads count
elasticsearch_transport_rx_packets_totalCount of packets received
elasticsearch_transport_rx_size_bytes_totalTotal number of bytes received
elasticsearch_transport_tx_packets_totalCount of packets sent
elasticsearch_transport_tx_size_bytes_totalTotal number of bytes sent
elasticsearch_clusterinfo_last_retrieval_success_tsTimestamp of the last successful cluster info retrieval
elasticsearch_clusterinfo_upUp metric for the cluster info collector
elasticsearch_clusterinfo_version_infoConstant metric with ES version information as labels
elasticsearch_slm_stats_upUp metric for SLM collector
elasticsearch_slm_stats_total_scrapesNumber of scrapes for SLM collector
elasticsearch_slm_stats_json_parse_failuresJSON parse failures for SLM collector
elasticsearch_slm_stats_retention_runs_totalTotal retention runs
elasticsearch_slm_stats_retention_failed_totalTotal failed retention runs
elasticsearch_slm_stats_retention_timed_out_totalTotal retention run timeouts
elasticsearch_slm_stats_retention_deletion_time_secondsRetention run deletion time
elasticsearch_slm_stats_total_snapshots_taken_totalTotal snapshots taken
elasticsearch_slm_stats_total_snapshots_failed_totalTotal snapshots failed
elasticsearch_slm_stats_total_snapshots_deleted_totalTotal snapshots deleted
elasticsearch_slm_stats_total_snapshots_failed_totalTotal snapshots failed
elasticsearch_slm_stats_snapshots_taken_totalSnapshots taken by policy
elasticsearch_slm_stats_snapshots_failed_totalSnapshots failed by policy
elasticsearch_slm_stats_snapshots_deleted_totalSnapshots deleted by policy
elasticsearch_slm_stats_snapshot_deletion_failures_totalSnapshot deletion failures by policy
elasticsearch_slm_stats_operation_modeSLM operation mode (Running, stopping, stopped)
elasticsearch_data_stream_stats_upUp metric for Data Stream collection
elasticsearch_data_stream_stats_total_scrapesTotal scrapes for Data Stream stats
elasticsearch_data_stream_stats_json_parse_failuresNumber of parsing failures for Data Stream stats
elasticsearch_data_stream_backing_indices_totalNumber of backing indices for Data Stream
elasticsearch_data_stream_store_size_bytesCurrent size of data stream backing indices in bytes