Metrics
Kafka is a distributed event streaming platform capable of handling a large number of events per day, enabling real-time data processing and integration across various applications and systems. Monitoring its metrics is crucial for ensuring performance, stability, and reliability. The following is a list of key Kafka metrics in PDS. Understanding these metrics will help administrators optimize performance, troubleshoot issues, and ensure the Kafka cluster runs smoothly.
For Kafka deployment, the data service metrics are accessible on port 5555.
Access metrics
Below is a step-by-step guide on how to access Kafka metrics for PDS deployments:
-
Identify the Kafka pod running in your namespace:
kubectl get pods -n <your-namespace>
Look for the pod name that corresponds to your Kafka instance or its sidecar exporter.
-
Port-forward from your local machine’s port 5555 to the pod’s port 5555:
kubectl port-forward -n <your-namespace> <kafka-pod-name> 5555:5555
-
Open a browser or use
curl
to go tohttp://localhost:5555/metrics
.You should see a text-based Prometheus metrics output specific to Kafka.
-
Check for the service exposing the Kafka exporter. for example,
<release-name>-kafka-exporter
:kubectl get svc -n <your-namespace>
-
Access the metrics:
-
If NodePort, note
<nodeport>
:http://<node-ip>:<nodeport>/metrics
-
If LoadBalancer, note
<loadbalancer-ip>
:http://<loadbalancer-ip>:5555/metrics
-
-
Verify metrics:
-
Using curl:
curl http://<host>:5555/metrics
Replace
<host>
with either localhost (if using port-forward),<node-ip>
(NodePort), or<loadbalancer-ip>
(LoadBalancer). -
Prometheus UI:
In Prometheus, navigate to the Expression browser and search for metrics beginning with
kafka_
or similar Kafka-related prefixes to confirm they are being scraped. -
Grafana or other dashboards:
If you have Grafana connected to Prometheus, open your dashboard. Check that Kafka metrics (those starting with
kafka_
) are being ingested and displayed.
-
- Ensure that any NetworkPolicies or firewall rules allow inbound traffic on port 5555 if you plan to expose it externally.
- Metrics naming conventions can vary depending on the Kafka exporter version. Generally, look for prefixes like
kafka_
.
Kafka metrics
Metric name | Type |
---|---|
jmx_config_reload_failure_total | counter |
jmx_config_reload_success_total | counter |
jmx_exporter_build_info | gauge |
jmx_scrape_cached_beans | gauge |
jmx_scrape_duration_seconds | gauge |
jmx_scrape_error | gauge |
jvm_buffer_pool_capacity_bytes | gauge |
jvm_buffer_pool_used_buffers | gauge |
jvm_buffer_pool_used_bytes | gauge |
jvm_classes_currently_loaded | gauge |
jvm_classes_loaded_total | counter |
jvm_classes_unloaded_total | counter |
jvm_compilation_time_seconds_total | counter |
jvm_gc_collection_seconds | summary |
jvm_memory_committed_bytes | gauge |
jvm_memory_init_bytes | gauge |
jvm_memory_max_bytes | gauge |
jvm_memory_objects_pending_finalization | gauge |
jvm_memory_pool_allocated_bytes_total | counter |
jvm_memory_pool_collection_committed_bytes | gauge |
jvm_memory_pool_collection_init_bytes | gauge |
jvm_memory_pool_collection_max_bytes | gauge |
jvm_memory_pool_collection_used_bytes | gauge |
jvm_memory_pool_committed_bytes | gauge |
jvm_memory_pool_init_bytes | gauge |
jvm_memory_pool_max_bytes | gauge |
jvm_memory_pool_used_bytes | gauge |
jvm_memory_used_bytes | gauge |
jvm_runtime_info | gauge |
jvm_threads_current | gauge |
jvm_threads_daemon | gauge |
jvm_threads_deadlocked | gauge |
jvm_threads_deadlocked_monitor | gauge |
jvm_threads_peak | gauge |
jvm_threads_started_total | counter |
jvm_threads_state | gauge |
kafka_controller_kafkacontroller_activebrokercount | gauge |
kafka_controller_kafkacontroller_activecontrollercount | gauge |
kafka_controller_kafkacontroller_eventqueueoperationsstartedcount | gauge |
kafka_controller_kafkacontroller_eventqueueoperationstimedoutcount | gauge |
kafka_controller_kafkacontroller_fencedbrokercount | gauge |
kafka_controller_kafkacontroller_globalpartitioncount | gauge |
kafka_controller_kafkacontroller_globaltopiccount | gauge |
kafka_controller_kafkacontroller_lastappliedrecordlagms | gauge |
kafka_controller_kafkacontroller_lastappliedrecordoffset | gauge |
kafka_controller_kafkacontroller_lastappliedrecordtimestamp | gauge |
kafka_controller_kafkacontroller_lastcommittedrecordoffset | gauge |
kafka_controller_kafkacontroller_metadataerrorcount | gauge |
kafka_controller_kafkacontroller_migratingzkbrokercount | gauge |
kafka_controller_kafkacontroller_newactivecontrollerscount | gauge |
kafka_controller_kafkacontroller_offlinepartitionscount | gauge |
kafka_controller_kafkacontroller_preferredreplicaimbalancecount | gauge |
kafka_controller_kafkacontroller_timedoutbrokerheartbeatcount | gauge |
kafka_controller_kafkacontroller_zkmigrationstate | gauge |
kafka_log_log_logendoffset | gauge |
kafka_log_log_logstartoffset | gauge |
kafka_log_log_numlogsegments | gauge |
kafka_log_log_size | gauge |
kafka_log_logcleaner_cleaner_recopy_percent | gauge |
kafka_log_logcleaner_deadthreadcount | gauge |
kafka_log_logcleaner_max_buffer_utilization_percent | gauge |
kafka_log_logcleaner_max_clean_time_secs | gauge |
kafka_log_logcleaner_max_compaction_delay_secs | gauge |
kafka_log_logcleanermanager_max_dirty_percent | gauge |
kafka_log_logcleanermanager_time_since_last_run_ms | gauge |
kafka_log_logcleanermanager_uncleanable_bytes | gauge |
kafka_log_logcleanermanager_uncleanable_partitions_count | gauge |
kafka_log_logmanager_logdirectoryoffline | gauge |
kafka_log_logmanager_offlinelogdirectorycount | gauge |
kafka_network_processor_idlepercent | gauge |
kafka_network_requestchannel_requestqueuesize | gauge |
kafka_network_requestchannel_responsequeuesize | gauge |
kafka_network_requestmetrics_errors_total | counter |
kafka_network_requestmetrics_requests_total | counter |
kafka_network_socketserver_expiredconnectionskilledcount | gauge |
kafka_network_socketserver_memorypoolavailable | gauge |
kafka_network_socketserver_memorypoolused | gauge |
kafka_network_socketserver_networkprocessoravgidlepercent | gauge |
kafka_server_assignmentsmanager_queuedreplicatodirassignments | gauge |
kafka_server_brokertopicmetrics_bytesin_total | counter |
kafka_server_brokertopicmetrics_bytesout_total | counter |
kafka_server_brokertopicmetrics_bytesrejected_total | counter |
kafka_server_brokertopicmetrics_failedfetchrequests_total | counter |
kafka_server_brokertopicmetrics_failedproducerequests_total | counter |
kafka_server_brokertopicmetrics_fetchmessageconversions_total | counter |
kafka_server_brokertopicmetrics_invalidmagicnumberrecords_total | counter |
kafka_server_brokertopicmetrics_invalidmessagecrcrecords_total | counter |
kafka_server_brokertopicmetrics_invalidoffsetorsequencerecords_total | counter |
kafka_server_brokertopicmetrics_messagesin_total | counter |
kafka_server_brokertopicmetrics_nokeycompactedtopicrecords_total | counter |
kafka_server_brokertopicmetrics_producemessageconversions_total | counter |
kafka_server_brokertopicmetrics_reassignmentbytesin_total | counter |
kafka_server_brokertopicmetrics_reassignmentbytesout_total | counter |
kafka_server_brokertopicmetrics_replicationbytesin_total | counter |
kafka_server_brokertopicmetrics_replicationbytesout_total | counter |
kafka_server_brokertopicmetrics_totalfetchrequests_total | counter |
kafka_server_brokertopicmetrics_totalproducerequests_total | counter |
kafka_server_controllerserver_linux_disk_read_bytes | gauge |
kafka_server_controllerserver_linux_disk_write_bytes | gauge |
kafka_server_controllerserver_yammer_metrics_count | gauge |
kafka_server_delayedoperationpurgatory_numdelayedoperations | gauge |
kafka_server_delayedoperationpurgatory_purgatorysize | gauge |
kafka_server_fetchsessioncache_incrementalfetchsessionevictions_total | counter |
kafka_server_fetchsessioncache_numincrementalfetchpartitionscached | gauge |
kafka_server_fetchsessioncache_numincrementalfetchsessions | gauge |
kafka_server_kafkaserver_brokerstate | gauge |
kafka_server_kafkaserver_linux_disk_read_bytes | gauge |
kafka_server_kafkaserver_linux_disk_write_bytes | gauge |
kafka_server_kafkaserver_yammer_metrics_count | gauge |
kafka_server_metadataloader_currentcontrollerid | gauge |
kafka_server_metadataloader_currentmetadataversion | gauge |
kafka_server_metadataloader_handleloadsnapshotcount | gauge |
kafka_server_replicaalterlogdirsmanager_deadthreadcount | gauge |
kafka_server_replicaalterlogdirsmanager_failedpartitionscount | gauge |
kafka_server_replicaalterlogdirsmanager_maxlag | gauge |
kafka_server_replicaalterlogdirsmanager_minfetchrate | gauge |
kafka_server_replicafetchermanager_deadthreadcount | gauge |
kafka_server_replicafetchermanager_failedpartitionscount | gauge |
kafka_server_replicafetchermanager_maxlag | gauge |
kafka_server_replicafetchermanager_minfetchrate | gauge |
kafka_server_replicamanager_atminisrpartitioncount | gauge |
kafka_server_replicamanager_failedisrupdates_total | counter |
kafka_server_replicamanager_isrexpands_total | counter |
kafka_server_replicamanager_isrshrinks_total | counter |
kafka_server_replicamanager_leadercount | gauge |
kafka_server_replicamanager_offlinereplicacount | gauge |
kafka_server_replicamanager_partitioncount | gauge |
kafka_server_replicamanager_partitionswithlatetransactionscount | gauge |
kafka_server_replicamanager_produceridcount | gauge |
kafka_server_replicamanager_reassigningpartitions | gauge |
kafka_server_replicamanager_underminisrpartitioncount | gauge |
kafka_server_replicamanager_underreplicatedpartitions | gauge |
kafka_server_snapshotemitter_latestsnapshotgeneratedagems | gauge |
kafka_server_snapshotemitter_latestsnapshotgeneratedbytes | gauge |
process_cpu_seconds_total | counter |
process_max_fds | gauge |
process_open_fds | gauge |
process_resident_memory_bytes | gauge |
process_start_time_seconds | gauge |
process_virtual_memory_bytes | gauge |