Metrics
Kafka is a distributed event streaming platform capable of handling a large number of events per day, enabling real-time data processing and integration across various applications and systems. Monitoring its metrics is crucial for ensuring performance, stability, and reliability. The following is a list of key Kafka metrics in PDS. Understanding these metrics will help administrators optimize performance, troubleshoot issues, and ensure the Kafka cluster runs smoothly.
Broker metrics
Description | Mbean Name | Normal Value |
---|---|---|
Message in rate | kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec,topic=([-.\w]+) | Incoming message rate per topic. Omitting 'topic=(...)' will yield the all-topic rate. |
Byte in rate from clients | kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec,topic=([-.\w]+) | Byte in (from the clients) rate per topic. Omitting 'topic=(...)' will yield the all-topic rate. |
Byte in rate from other brokers | kafka.server:type=BrokerTopicMetrics,name=ReplicationBytesInPerSec | Byte in (from the other brokers) rate across all topics. |
Controller Request rate from Broker | kafka.controller:type=ControllerChannelManager,name=RequestRateAndQueueTimeMs,brokerId=([0-9]+) | The rate (requests per second) at which the ControllerChannelManager takes requests from the queue of the given broker. And the time it takes for a request to stay in this queue before it is taken from the queue. |
Controller Event queue size | kafka.controller:type=ControllerEventManager,name=EventQueueSize | Size of the ControllerEventManager's queue. |
Controller Event queue time | kafka.controller:type=ControllerEventManager,name=EventQueueTimeMs | Time that takes for any event (except the Idle event) to wait in the ControllerEventManager's queue before being processed |
Request rate | kafka.network:type=RequestMetrics,name=RequestsPerSec,request={Produce|FetchConsumer|FetchFollower},version=([0-9]+) | |
Error rate | kafka.network:type=RequestMetrics,name=ErrorsPerSec,request=([-.\w]+),error=([-.\w]+) | Number of errors in responses counted per-request-type, per-error-code. If a response contains multiple errors, all are counted. error=NONE indicates successful responses. |
Produce request rate | kafka.server:type=BrokerTopicMetrics,name=TotalProduceRequestsPerSec,topic=([-.\w]+) | Produce request rate per topic. Omitting 'topic=(...)' will yield the all-topic rate. |
Fetch request rate | kafka.server:type=BrokerTopicMetrics,name=TotalFetchRequestsPerSec,topic=([-.\w]+) | Fetch request (from clients or followers) rate per topic. Omitting 'topic=(...)' will yield the all-topic rate. |
Failed produce request rate | kafka.server:type=BrokerTopicMetrics,name=FailedProduceRequestsPerSec,topic=([-.\w]+) | Failed Produce request rate per topic. Omitting 'topic=(...)' will yield the all-topic rate. |
Failed fetch request rate | kafka.server:type=BrokerTopicMetrics,name=FailedFetchRequestsPerSec,topic=([-.\w]+) | Failed Fetch request (from clients or followers) rate per topic. Omitting 'topic=(...)' will yield the all-topic rate. |
Request size in bytes | kafka.network:type=RequestMetrics,name=RequestBytes,request=([-.\w]+) | Size of requests for each request type. |
Temporary memory size in bytes | kafka.network:type=RequestMetrics,name=TemporaryMemoryBytes,request={Produce|Fetch} | Temporary memory used for message format conversions and decompression. |
Message conversion time | kafka.network:type=RequestMetrics,name=MessageConversionsTimeMs,request={Produce|Fetch} | Time in milliseconds spent on message format conversions. |
Message conversion rate | kafka.server:type=BrokerTopicMetrics,name={Produce|Fetch}MessageConversionsPerSec,topic=([-.\w]+) | Message format conversion rate, for Produce or Fetch requests, per topic. Omitting 'topic=(...)' will yield the all-topic rate. |
Request Queue Size | kafka.network:type=RequestChannel,name=RequestQueueSize | Size of the request queue. |
Byte out rate to clients | kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec,topic=([-.\w]+) | Byte out (to the clients) rate per topic. Omitting 'topic=(...)' will yield the all-topic rate. |
Byte out rate to other brokers | kafka.server:type=BrokerTopicMetrics,name=ReplicationBytesOutPerSec | Byte out (to the other brokers) rate across all topics |
Rejected byte rate | kafka.server:type=BrokerTopicMetrics,name=BytesRejectedPerSec,topic=([-.\w]+) | Rejected byte rate per topic, due to the record batch size being greater than max.message.bytes configuration. Omitting 'topic=(...)' will yield the all-topic rate. |
Message validation failure rate due to no key specified for compacted topic | kafka.server:type=BrokerTopicMetrics,name=NoKeyCompactedTopicRecordsPerSec | 0 |
Message validation failure rate due to invalid magic number | kafka.server:type=BrokerTopicMetrics,name=InvalidMagicNumberRecordsPerSec | 0 |
Message validation failure rate due to incorrect crc checksum | kafka.server:type=BrokerTopicMetrics,name=InvalidMessageCrcRecordsPerSec | 0 |
Message validation failure rate due to non-continuous offset or sequence number in batch | kafka.server:type=BrokerTopicMetrics,name=InvalidOffsetOrSequenceRecordsPerSec | 0 |
Log flush rate and time | kafka.log:type=LogFlushStats,name=LogFlushRateAndTimeMs | |
# of offline log directories | kafka.log:type=LogManager,name=OfflineLogDirectoryCount | 0 |
Leader election rate | kafka.controller:type=ControllerStats,name=LeaderElectionRateAndTimeMs | non-zero when there are broker failures |
Unclean leader election rate | kafka.controller:type=ControllerStats,name=UncleanLeaderElectionsPerSec | 0 |
Is controller active on broker | kafka.controller:type=KafkaController,name=ActiveControllerCount | only one broker in the cluster should have 1 |
Pending topic deletes | kafka.controller:type=KafkaController,name=TopicsToDeleteCount | |
Pending replica deletes | kafka.controller:type=KafkaController,name=ReplicasToDeleteCount | |
Ineligible pending topic deletes | kafka.controller:type=KafkaController,name=TopicsIneligibleToDeleteCount | |
Ineligible pending replica deletes | kafka.controller:type=KafkaController,name=ReplicasIneligibleToDeleteCount | |
# of under replicated partitions (|ISR| < |all replicas|) | kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions | 0 |
# of under minIsr partitions (|ISR| < min.insync.replicas) | kafka.server:type=ReplicaManager,name=UnderMinIsrPartitionCount | 0 |
# of at minIsr partitions (|ISR| = min.insync.replicas) | kafka.server:type=ReplicaManager,name=AtMinIsrPartitionCount | 0 |
Producer Id counts | kafka.server:type=ReplicaManager,name=ProducerIdCount | Count of all producer ids created by transactional and idempotent producers in each replica on the broker |
Partition counts | kafka.server:type=ReplicaManager,name=PartitionCount | mostly even across brokers |
Offline Replica counts | kafka.server:type=ReplicaManager,name=OfflineReplicaCount | 0 |
Leader replica counts | kafka.server:type=ReplicaManager,name=LeaderCount | mostly even across brokers |
ISR shrink rate | kafka.server:type=ReplicaManager,name=IsrShrinksPerSec | If a broker goes down, ISR for some of the partitions will shrink. When that broker is up again, ISR will be expanded once the replicas are fully caught up. Other than that, the expected value for both ISR shrink rate and expansion rate is 0. |
ISR expansion rate | kafka.server:type=ReplicaManager,name=IsrExpandsPerSec | See above |
Failed ISR update rate | kafka.server:type=ReplicaManager,name=FailedIsrUpdatesPerSec | 0 |
Max lag in messages btw follower and leader replicas | kafka.server:type=ReplicaFetcherManager,name=MaxLag,clientId=Replica | lag should be proportional to the maximum batch size of a produce request. |
Lag in messages per follower replica | kafka.server:type=FetcherLagMetrics,name=ConsumerLag,clientId=([-.\w]+),topic=([-.\w]+),partition=([0-9]+) | lag should be proportional to the maximum batch size of a produce request. |
Requests waiting in the producer purgatory | kafka.server:type=DelayedOperationPurgatory,name=PurgatorySize,delayedOperation=Produce | non-zero if ack=-1 is used |
Requests waiting in the fetch purgatory | kafka.server:type=DelayedOperationPurgatory,name=PurgatorySize,delayedOperation=Fetch | size depends on fetch.wait.max.ms in the consumer |
Request total time | kafka.network:type=RequestMetrics,name=TotalTimeMs,request={Produce|FetchConsumer|FetchFollower} | broken into queue, local, remote and response send time |
Time the request waits in the request queue | kafka.network:type=RequestMetrics,name=RequestQueueTimeMs,request={Produce|FetchConsumer|FetchFollower} | |
Time the request is processed at the leader | kafka.network:type=RequestMetrics,name=LocalTimeMs,request={Produce|FetchConsumer|FetchFollower} | |
Time the request waits for the follower | kafka.network:type=RequestMetrics,name=RemoteTimeMs,request={Produce|FetchConsumer|FetchFollower} | non-zero for produce requests when ack=-1 |
Time the request waits in the response queue | kafka.network:type=RequestMetrics,name=ResponseQueueTimeMs,request={Produce|FetchConsumer|FetchFollower} | |
Time to send the response | kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request={Produce|FetchConsumer|FetchFollower} | |
Number of messages the consumer lags behind the producer by. Published by the consumer, not broker. | kafka.consumer:type=consumer-fetch-manager-metrics,client-id={client-id} Attribute: records-lag-max | |
The average fraction of time the network processors are idle | kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent | between 0 and 1, ideally > 0.3 |
The number of connections disconnected on a processor due to a client not re-authenticating and then using the connection beyond its expiration time for anything other than re-authentication | kafka.server:type=socket-server-metrics,listener=[SASL_PLAINTEXT|SASL_SSL],networkProcessor=<#>,name=expired-connections-killed-count | ideally 0 when re-authentication is enabled, implying there are no longer any older, pre-2.2.0 clients connecting to this (listener, processor) combination |
The total number of connections disconnected, across all processors, due to a client not re-authenticating and then using the connection beyond its expiration time for anything other than re-authentication | kafka.network:type=SocketServer,name=ExpiredConnectionsKilledCount | ideally 0 when re-authentication is enabled, implying there are no longer any older, pre-2.2.0 clients connecting to this broker |
The average fraction of time the request handler threads are idle | kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent | between 0 and 1, ideally > 0.3 |
Bandwidth quota metrics per (user, client-id), user or client-id | kafka.server:type={Produce|Fetch},user=([-.\w]+),client-id=([-.\w]+) | Two attributes. throttle-time indicates the amount of time in ms the client was throttled. Ideally = 0. byte-rate indicates the data produce/consume rate of the client in bytes/sec. For (user, client-id) quotas, both user and client-id are specified. If per-client-id quota is applied to the client, user is not specified. If per-user quota is applied, client-id is not specified. |
Request quota metrics per (user, client-id), user or client-id | kafka.server:type=Request,user=([-.\w]+),client-id=([-.\w]+) | Two attributes. throttle-time indicates the amount of time in ms the client was throttled. Ideally = 0. request-time indicates the percentage of time spent in broker network and I/O threads to process requests from client group. For (user, client-id) quotas, both user and client-id are specified. If per-client-id quota is applied to the client, user is not specified. If per-user quota is applied, client-id is not specified. |
Requests exempt from throttling | kafka.server:type=Request | exempt-throttle-time indicates the percentage of time spent in broker network and I/O threads to process requests that are exempt from throttling. |
Max time to load group metadata | kafka.server:type=group-coordinator-metrics,name=partition-load-time-max | maximum time, in milliseconds, it took to load offsets and group metadata from the consumer offset partitions loaded in the last 30 seconds (including time spent waiting for the loading task to be scheduled) |
Avg time to load group metadata | kafka.server:type=group-coordinator-metrics,name=partition-load-time-avg | average time, in milliseconds, it took to load offsets and group metadata from the consumer offset partitions loaded in the last 30 seconds (including time spent waiting for the loading task to be scheduled) |
Max time to load transaction metadata | kafka.server:type=transaction-coordinator-metrics,name=partition-load-time-max | maximum time, in milliseconds, it took to load transaction metadata from the consumer offset partitions loaded in the last 30 seconds (including time spent waiting for the loading task to be scheduled) |
Avg time to load transaction metadata | kafka.server:type=transaction-coordinator-metrics,name=partition-load-time-avg | average time, in milliseconds, it took to load transaction metadata from the consumer offset partitions loaded in the last 30 seconds (including time spent waiting for the loading task to be scheduled) |
Rate of transactional verification errors | kafka.server:type=AddPartitionsToTxnManager,name=VerificationFailureRate | Rate of verifications that returned in failure either from the AddPartitionsToTxn API response or through errors in the AddPartitionsToTxnManager. In steady state 0, but transient errors are expected during rolls and reassignments of the transactional state partition. |
Time to verify a transactional request | kafka.server:type=AddPartitionsToTxnManager,name=VerificationTimeMs | The amount of time queueing while a possible previous request is in-flight plus the round trip to the transaction coordinator to verify (or not verify) |
Consumer Group Offset Count | kafka.server:type=GroupMetadataManager,name=NumOffsets | Total number of committed offsets for Consumer Groups |
Consumer Group Count | kafka.server:type=GroupMetadataManager,name=NumGroups | Total number of Consumer Groups |
Consumer Group Count, per State | kafka.server:type=GroupMetadataManager,name=NumGroups[PreparingRebalance,CompletingRebalance,Empty,Stable,Dead] | The number of Consumer Groups in each state: PreparingRebalance, CompletingRebalance, Empty, Stable, Dead |
Number of reassigning partitions | kafka.server:type=ReplicaManager,name=ReassigningPartitions | The number of reassigning leader partitions on a broker. |
Outgoing byte rate of reassignment traffic | kafka.server:type=BrokerTopicMetrics,name=ReassignmentBytesOutPerSec | 0; non-zero when a partition reassignment is in progress. |
Incoming byte rate of reassignment traffic | kafka.server:type=BrokerTopicMetrics,name=ReassignmentBytesInPerSec | 0; non-zero when a partition reassignment is in progress. |
Size of a partition on disk (in bytes) | kafka.log:type=Log,name=Size,topic=([-.\w]+),partition=([0-9]+) | The size of a partition on disk, measured in bytes. |
Number of log segments in a partition | kafka.log:type=Log,name=NumLogSegments,topic=([-.\w]+),partition=([0-9]+) | The number of log segments in a partition. |
First offset in a partition | kafka.log:type=Log,name=LogStartOffset,topic=([-.\w]+),partition=([0-9]+) | The first offset in a partition. |
Last offset in a partition | kafka.log:type=Log,name=LogEndOffset,topic=([-.\w]+),partition=([0-9]+) | The last offset in a partition. |
Kraft monitoring
Kraft quorum monitoring metrics
Metric | Mbean name | Description |
---|---|---|
Current State | kafka.server:type=raft-metrics,name=current-state | The current state of this member; possible values are leader, candidate, voted, follower, unattached, observer. |
Current Leader | kafka.server:type=raft-metrics,name=current-leader | The current quorum leader's id; -1 indicates unknown. |
Current Voted | kafka.server:type=raft-metrics,name=current-vote | The current voted leader's id; -1 indicates not voted for anyone. |
Current Epoch | kafka.server:type=raft-metrics,name=current-epoch | The current quorum epoch. |
High Watermark | kafka.server:type=raft-metrics,name=high-watermark | The high watermark maintained on this member; -1 if it is unknown. |
Log End Offset | kafka.server:type=raft-metrics,name=log-end-offset | The current raft log end offset. |
Number of Unknown Voter Connections | kafka.server:type=raft-metrics,name=number-unknown-voter-connections | Number of unknown voters whose connection information is not cached. This value of this metric is always 0. |
Average Commit Latency | kafka.server:type=raft-metrics,name=commit-latency-avg | The average time in milliseconds to commit an entry in the raft log. |
Maximum Commit Latency | kafka.server:type=raft-metrics,name=commit-latency-max | The maximum time in milliseconds to commit an entry in the raft log. |
Average Election Latency | kafka.server:type=raft-metrics,name=election-latency-avg | The average time in milliseconds spent on electing a new leader. |
Maximum Election Latency | kafka.server:type=raft-metrics,name=election-latency-max | The maximum time in milliseconds spent on electing a new leader. |
Fetch Records Rate | kafka.server:type=raft-metrics,name=fetch-records-rate | The average number of records fetched from the leader of the raft quorum. |
Append Records Rate | kafka.server:type=raft-metrics,name=append-records-rate | The average number of records appended per sec by the leader of the raft quorum. |
Average Poll Idle Ratio | kafka.server:type=raft-metrics,name=poll-idle-ratio-avg | The average fraction of time the client's poll() is idle as opposed to waiting for the user code to process records. |
Current Metadata Version | kafka.server:type=MetadataLoader,name=CurrentMetadataVersion | Outputs the feature level of the current effective metadata version. |
Metadata Snapshot Load Count | kafka.server:type=MetadataLoader,name=HandleLoadSnapshotCount | The total number of times we have loaded a KRaft snapshot since the process was started. |
Latest Metadata Snapshot Size | kafka.server:type=SnapshotEmitter,name=LatestSnapshotGeneratedBytes | The total size in bytes of the latest snapshot that the node has generated. If none have been generated yet, this is the size of the latest snapshot that was loaded. If no snapshots have been generated or loaded, this is 0. |
Latest Metadata Snapshot Age | kafka.server:type=SnapshotEmitter,name=LatestSnapshotGeneratedAgeMs | The interval in milliseconds since the latest snapshot that the node has generated. If none have been generated yet, this is approximately the time delta since the process was started. |
Kraft controller monitoring metrics
Metric | Mbean name | Description |
---|---|---|
Active Controller Count | kafka.controller:type=KafkaController,name=ActiveControllerCount | The number of Active Controllers on this node. Valid values are '0' or '1'. |
Event Queue Time Ms | kafka.controller:type=ControllerEventManager,name=EventQueueTimeMs | A Histogram of the time in milliseconds that requests spent waiting in the Controller Event Queue. |
Event Queue Processing Time Ms | kafka.controller:type=ControllerEventManager,name=EventQueueProcessingTimeMs | A Histogram of the time in milliseconds that requests spent being processed in the Controller Event Queue. |
Fenced Broker Count | kafka.controller:type=KafkaController,name=FencedBrokerCount | The number of fenced brokers as observed by this Controller. |
Active Broker Count | kafka.controller:type=KafkaController,name=ActiveBrokerCount | The number of active brokers as observed by this Controller. |
Global Topic Count | kafka.controller:type=KafkaController,name=GlobalTopicCount | The number of global topics as observed by this Controller. |
Global Partition Count | kafka.controller:type=KafkaController,name=GlobalPartitionCount | The number of global partitions as observed by this Controller. |
Offline Partition Count | kafka.controller:type=KafkaController,name=OfflinePartitionCount | The number of offline topic partitions (non-internal) as observed by this Controller. |
Preferred Replica Imbalance Count | kafka.controller:type=KafkaController,name=PreferredReplicaImbalanceCount | The count of topic partitions for which the leader is not the preferred leader. |
Metadata Error Count | kafka.controller:type=KafkaController,name=MetadataErrorCount | The number of times this controller node has encountered an error during metadata log processing. |
Last Applied Record Offset | kafka.controller:type=KafkaController,name=LastAppliedRecordOffset | The offset of the last record from the cluster metadata partition that was applied by the Controller. |
Last Committed Record Offset | kafka.controller:type=KafkaController,name=LastCommittedRecordOffset | The offset of the last record committed to this Controller. |
Last Applied Record Timestamp | kafka.controller:type=KafkaController,name=LastAppliedRecordTimestamp | The timestamp of the last record from the cluster metadata partition that was applied by the Controller. |
Last Applied Record Lag Ms | kafka.controller:type=KafkaController,name=LastAppliedRecordLagMs | The difference between now and the timestamp of the last record from the cluster metadata partition that was applied by the controller. For active Controllers the value of this lag is always zero. |
Timed-out Broker Heartbeat Count | kafka.controller:type=KafkaController,name=TimedOutBrokerHeartbeatCount | The number of broker heartbeats that timed out on this controller since the process was started. Note that only active controllers handle heartbeats, so only they will see increases in this metric. |
Number Of Operations Started In Event Queue | kafka.controller:type=KafkaController,name=EventQueueOperationsStartedCount | The total number of controller event queue operations that were started. This includes deferred operations. |
Number of Operations Timed Out In Event Queue | kafka.controller:type=KafkaController,name=EventQueueOperationsTimedOutCount | The total number of controller event queue operations that timed out before they could be performed. |
Number Of New Controller Elections | kafka.controller:type=KafkaController,name=NewActiveControllersCount | Counts the number of times this node has seen a new controller elected. A transition to the "no leader" state is not counted here. If the same controller as before becomes active, that still counts. |
Kraft broker monitoring metrics
Metric | Mbean name | Description |
---|---|---|
Last Applied Record Offset | kafka.server:type=broker-metadata-metrics,name=last-applied-record-offset | The offset of the last record from the cluster metadata partition that was applied by the broker |
Last Applied Record Timestamp | kafka.server:type=broker-metadata-metrics,name=last-applied-record-timestamp | The timestamp of the last record from the cluster metadata partition that was applied by the broker. |
Last Applied Record Lag Ms | kafka.server:type=broker-metadata-metrics,name=last-applied-record-lag-ms | The difference between now and the timestamp of the last record from the cluster metadata partition that was applied by the broker |
Metadata Load Error Count | kafka.server:type=broker-metadata-metrics,name=metadata-load-error-count | The number of errors encountered by the BrokerMetadataListener while loading the metadata log and generating a new MetadataDelta based on it. |
Metadata Apply Error Count | kafka.server:type=broker-metadata-metrics,name=metadata-apply-error-count | The number of errors encountered by the BrokerMetadataPublisher while applying a new MetadataImage based on the latest MetadataDelta. |
Common monitoring metrics for producers, consumers, connect, or streams
Name | Description |
---|---|
connection-close-rate | Connections closed per second in the window. |
connection-close-total | Total connections closed in the window. |
connection-creation-rate | New connections established per second in the window. |
connection-creation-total | Total new connections established in the window. |
network-io-rate | The average number of network operations (reads or writes) on all connections per second. |
network-io-total | The total number of network operations (reads or writes) on all connections. |
outgoing-byte-rate | The average number of outgoing bytes sent per second to all servers. |
outgoing-byte-total | The total number of outgoing bytes sent to all servers. |
request-rate | The average number of requests sent per second. |
request-total | The total number of requests sent. |
request-size-avg | The average size of all requests in the window. |
request-size-max | The maximum size of any request sent in the window. |
incoming-byte-rate | Bytes/second read off all sockets. |
incoming-byte-total | Total bytes read off all sockets. |
response-rate | Responses received per second. |
response-total | Total responses received. |
select-rate | Number of times the I/O layer checked for new I/O to perform per second. |
select-total | Total number of times the I/O layer checked for new I/O to perform. |
io-wait-time-ns-avg | The average length of time the I/O thread spent waiting for a socket ready for reads or writes in nanoseconds. |
io-wait-time-ns-total | The total time the I/O thread spent waiting in nanoseconds. |
io-wait-ratio | The fraction of time the I/O thread spent waiting. |
io-time-ns-avg | The average length of time for I/O per select call in nanoseconds. |
io-time-ns-total | The total time the I/O thread spent doing I/O in nanoseconds. |
io-ratio | The fraction of time the I/O thread spent doing I/O. |
connection-count | The current number of active connections. |
successful-authentication-rate | Connections per second that were successfully authenticated using SASL or SSL. |
successful-authentication-total | Total connections that were successfully authenticated using SASL or SSL. |
failed-authentication-rate | Connections per second that failed authentication. |
failed-authentication-total | Total connections that failed authentication. |
successful-reauthentication-rate | Connections per second that were successfully re-authenticated using SASL. |
successful-reauthentication-total | Total connections that were successfully re-authenticated using SASL. |
reauthentication-latency-max | The maximum latency in ms observed due to re-authentication. |
reauthentication-latency-avg | The average latency in ms observed due to re-authentication. |
failed-reauthentication-rate | Connections per second that failed re-authentication. |
failed-reauthentication-total | Total connections that failed re-authentication. |
successful-authentication-no-reauth-total | Total connections that were successfully authenticated by older, pre-2.2.0 SASL clients that do not support re-authentication. May only be non-zero. |
Common per-broker metrics for producers, consumers, connect, or streams
Name | Description |
---|---|
outgoing-byte-rate | The average number of outgoing bytes sent per second for a node. |
outgoing-byte-total | The total number of outgoing bytes sent for a node. |
request-rate | The average number of requests sent per second for a node. |
request-total | The total number of requests sent for a node. |
request-size-avg | The average size of all requests in the window for a node. |
request-size-max | The maximum size of any request sent in the window for a node. |
incoming-byte-rate | The average number of bytes received per second for a node. |
incoming-byte-total | The total number of bytes received for a node. |
request-latency-avg | The average request latency in ms for a node. |
request-latency-max | The maximum request latency in ms for a node. |
response-rate | Responses received per second for a node. |
response-total | Total responses received for a node. |
Producer monitoring
Name | Description |
---|---|
waiting-threads | The number of user threads blocked waiting for buffer memory to enqueue their records. |
buffer-total-bytes | The maximum amount of buffer memory the client can use (whether or not it is currently used). |
buffer-available-bytes | The total amount of buffer memory that is not being used (either unallocated or in the free list). |
bufferpool-wait-time | The fraction of time an appender waits for space allocation. |
bufferpool-wait-time-total | Deprecated The total time an appender waits for space allocation in nanoseconds. Replacement is bufferpool-wait-time-ns-total. |
bufferpool-wait-time-ns-total | The total time an appender waits for space allocation in nanoseconds. |
flush-time-ns-total | The total time the Producer spent in Producer.flush in nanoseconds. |
txn-init-time-ns-total | The total time the Producer spent initializing transactions in nanoseconds (for EOS). |
txn-begin-time-ns-total | The total time the Producer spent in beginTransaction in nanoseconds (for EOS). |
txn-send-offsets-time-ns-total | The total time the Producer spent sending offsets to transactions in nanoseconds (for EOS). |
txn-commit-time-ns-total | The total time the Producer spent committing transactions in nanoseconds (for EOS). |
txn-abort-time-ns-total | The total time the Producer spent aborting transactions in nanoseconds (for EOS). |
Producer sender metrics
Name | Description |
---|---|
batch-size-avg | The average number of bytes sent per partition per-request. |
batch-size-max | The max number of bytes sent per partition per-request. |
batch-split-rate | The average number of batch splits per second. |
batch-split-total | The total number of batch splits. |
compression-rate-avg | The average compression rate of record batches, defined as the average ratio of the compressed batch size over the uncompressed size. |
metadata-age | The age in seconds of the current producer metadata being used. |
produce-throttle-time-avg | The average time in ms a request was throttled by a broker. |
produce-throttle-time-max | The maximum time in ms a request was throttled by a broker. |
record-error-rate | The average per-second number of record sends that resulted in errors. |
record-error-total | The total number of record sends that resulted in errors. |
record-queue-time-avg | The average time in ms record batches spent in the send buffer. |
record-queue-time-max | The maximum time in ms record batches spent in the send buffer. |
record-retry-rate | The average per-second number of retried record sends. |
record-retry-total | The total number of retried record sends. |
record-send-rate | The average number of records sent per second. |
record-send-total | The total number of records sent. |
record-size-avg | The average record size. |
record-size-max | The maximum record size. |
records-per-request-avg | The average number of records per request. |
request-latency-avg | The average request latency in ms. |
request-latency-max | The maximum request latency in ms. |
requests-in-flight | The current number of in-flight requests awaiting a response. |
kafka.producer:type=producer-topic-metrics,client-id="{client-id}",topic="{topic}"
Name | Description |
---|---|
byte-rate | The average number of bytes sent per second for a topic. |
byte-total | The total number of bytes sent for a topic. |
compression-rate | The average compression rate of record batches for a topic, defined as the average ratio of the compressed batch size over the uncompressed size. |
record-error-rate | The average per-second number of record sends that resulted in errors for a topic. |
record-error-total | The total number of record sends that resulted in errors for a topic. |
record-retry-rate | The average per-second number of retried record sends for a topic. |
record-retry-total | The total number of retried record sends for a topic. |
record-send-rate | The average number of records sent per second for a topic. |
record-send-total | The total number of records sent for a topic. |
Consumer monitoring
Name | Description |
---|---|
time-between-poll-avg | The average delay between invocations of poll(). |
time-between-poll-max | The max delay between invocations of poll(). |
last-poll-seconds-ago | The number of seconds since the last poll() invocation. |
poll-idle-ratio-avg | The average fraction of time the consumer's poll() is idle as opposed to waiting for the user code to process records. |
committed-time-ns-total | The total time the Consumer spent in committed in nanoseconds. |
commit-sync-time-ns-total | The total time the Consumer spent committing offsets in nanoseconds (for AOS). |
Consumer group metrics
Name | Description |
---|---|
commit-latency-avg | The average time taken for a commit request |
commit-latency-max | The max time taken for a commit request |
commit-rate | The number of commit calls per second |
commit-total | The total number of commit calls |
assigned-partitions | The number of partitions currently assigned to this consumer |
heartbeat-response-time-max | The max time taken to receive a response to a heartbeat request |
heartbeat-rate | The average number of heartbeats per second |
heartbeat-total | The total number of heartbeats |
join-time-avg | The average time taken for a group rejoin |
join-time-max | The max time taken for a group rejoin |
join-rate | The number of group joins per second |
join-total | The total number of group joins |
sync-time-avg | The average time taken for a group sync |
sync-time-max | The max time taken for a group sync |
sync-rate | The number of group syncs per second |
sync-total | The total number of group syncs |
rebalance-latency-avg | The average time taken for a group rebalance |
rebalance-latency-max | The max time taken for a group rebalance |
rebalance-latency-total | The total time taken for group rebalances so far |
rebalance-total | The total number of group rebalances participated |
rebalance-rate-per-hour | The number of group rebalances participated per hour |
failed-rebalance-total | The total number of failed group rebalances |
failed-rebalance-rate-per-hour | The number of failed group rebalance events per hour |
last-rebalance-seconds-ago | The number of seconds since the last rebalance event |
last-heartbeat-seconds-ago | The number of seconds since the last controller heartbeat |
partitions-revoked-latency-avg | The average time taken by the on-partitions-revoked rebalance listener callback |
partitions-revoked-latency-max | The max time taken by the on-partitions-revoked rebalance listener callback |
partitions-assigned-latency-avg | The average time taken by the on-partitions-assigned rebalance listener callback |
partitions-assigned-latency-max | The max time taken by the on-partitions-assigned rebalance listener callback |
partitions-lost-latency-avg | The average time taken by the on-partitions-lost rebalance listener callback |
partitions-lost-latency-max | The max time taken by the on-partitions-lost rebalance listener callback |
Consumer fetch metrics
Name | Description |
---|---|
bytes-consumed-rate | The average number of bytes consumed per second |
bytes-consumed-total | The total number of bytes consumed |
fetch-latency-avg | The average time taken for a fetch request. |
fetch-latency-max | The max time taken for any fetch request. |
fetch-rate | The number of fetch requests per second. |
fetch-size-avg | The average number of bytes fetched per request |
fetch-size-max | The maximum number of bytes fetched per request |
fetch-throttle-time-avg | The average throttle time in ms |
fetch-throttle-time-max | The maximum throttle time in ms |
fetch-total | The total number of fetch requests. |
records-consumed-rate | The average number of records consumed per second |
records-consumed-total | The total number of records consumed |
records-lag-max | The maximum lag in terms of number of records for any partition in this window. NOTE: This is based on current offset and not committed offset |
records-lead-min | The minimum lead in terms of number of records for any partition in this window |
records-per-request-avg | The average number of records in each request |
kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}",topic="{topic}"
Name | Description |
---|---|
bytes-consumed-rate | The average number of bytes consumed per second for a topic |
bytes-consumed-total | The total number of bytes consumed for a topic |
fetch-size-avg | The average number of bytes fetched per request for a topic |
fetch-size-max | The maximum number of bytes fetched per request for a topic |
records-consumed-rate | The average number of records consumed per second for a topic |
records-consumed-total | The total number of records consumed for a topic |
records-per-request-avg | The average number of records in each request for a topic |
kafka.consumer:type=consumer-fetch-manager-metrics,partition="{partition}",topic="{topic}",client-id="{client-id}"
Name | Description |
---|---|
preferred-read-replica | The current read replica for the partition, or -1 if reading from leader |
records-lag | The latest lag of the partition |
records-lag-avg | The average lag of the partition |
records-lag-max | The max lag of the partition |
records-lead | The latest lead of the partition |
records-lead-avg | The average lead of the partition |
records-lead-min | The min lead of the partition |