In this lesson we will:
- Learn about the Kafka performance test scripts, which can be used to measure, test and optimise the performance of your Kafka cluster.
The Kafka performance test scripts allow us to generate and consume high volumes of data through your Kafka cluster in order to measure it's performance characteristics such as throughout and latency.
The tests we run using these scripts can be configured to match your real workloads, including setting the number and frequency of messages, message sizes and the level of reliability you need for each message.
For each test, we can capture outputs including the number of messages and the amount of data transferred including minimum, maximum, average and 99th percentile latencies in order to understand the real world performance characteristics of your cluster.
As well as focussing on latency, the Kafka performance test scripts can also be used for load testing by simulating high volumes of messages to ensure the broker and other processes remain available and within acceptable bounds for performance.
The performance test scripts are distributed as part of the Kafka distribution, and can be found in the ./bin folder of your deployment:
cd ./bin ls -la kafka*perf*
These two scripts are used for producing and consuming data respectively. They can be used independently, or be ran in parallel in different terminal windows.
We will begin the lesson by starting the console consumer script which we will use to visualise our test messages as they are generated. We will specify the server to connect to and a topic for our test, in this case new_orders:
cd ./bin ./kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic new_orders
This will begin a console consumer listening to the new_orders topic which we can use to visualise our test.
Next, we will simulate a series of records at high volume, pushing them into the same new_orders topic using the kafka-producer-perf-test.sh script.
Various command line parameters can be specified to configure the test and control the nature of the messages produced:
- num-records - The total number of records to generate during the test;
- throughput - The number of records to send per second;
- record-size - The size of each record in bytes.
As an example, we can issue the following command to start a simulation of 100 records, each of 100 bytes, and send them at a rate of 10 per second. Try this in a new terminal window so we can monitor the messages arrival in the console consumer that was setup above.
./kafka-producer-perf-test.sh --producer.config config/producer.properties --topic new_orders --num-records 100 --throughput 10 --record-size 100
If everything is working well, you should begin to see messages to arrive at the consumer:
LCTLNPENKLVJFLYLGNNWXWHZZWXLWBCRHZVHISZOFRZPBULSFXKFQNJIGJYBCGFPYTFAERUHAZASRTDEWAUIGWIQPKZPCI AJOQITXRTLRATEHMSMAQKLZOUIUNVHYKASMUPWDUKASWNGEBWCAYDASZJZLWYBMEXFNYXNHOVYCEXDPNPWYZLNBPJJWYNH DFMTMBAZWZUSVJQCTSFYUXDQICTIEWDOYTGTZOHVKBFIBXJPBIXDMQYNNGUIVAWCVHCPGTPHKTDQNVKTMTEWERWOBFJVIW QQEYKLOOGHBBNVHHDNQZTSILOTWIOOOBVYHXFGEAPSVHXENJVCRWIMWKKHAWTOFMCQKKVOJJEHRIZEQSDNFMTJIDKAMQUA JEJNMKPRSOYASPIJZQEYNPNOJXRAJNSHVPGGYWLQFGKLOEMZDXKFURIMFCNSQMEGEVQJSWEEDAHAMDEOJDCPJKZMVFRAVT GJTOXOLGCIVVVYLLUUFNSCGYHKPXCFNJGIMSXWAQMUSXMDGQFPJVFSHKMJVSFQCMGMKXCLAUMLQZUBJISKLOTUGZQUTSKZ GHIASALHWQOGCGJLXULUFZCUAUMYKIDKWFNACQRGSAAYCZGLWHZVPLJYKZKLTPYFGEDPCRILJRWREXFITOIOEVYTDWQZAE UENSFPCRLYMJFLKJGRIACAFZVLPWMJNAEIXKTRMSQHMPPHGCDLPDULWQDROVIWATICTIEBABTHWPYSTQJOFPLLPOONOYRH RXNEJEYQTBJHZGTFWGAXZIIKFWFELLHHGBAFCKQYGHUOKGYHPSMJWXSXBFLIHPTQZYTYIUJNWLDTKLBUDEVUOFOCABLMPD EQCAVVAEUXSJAKFFRPJUJQBZTPFBDZWZUNRSKETJOPJDFKBLZNEVDNWNARUNDXFBVSFITKGIWWAUXMIYKYLWQIZSUJIQXW LDLDPXDCPGKPULAOSFGUQUEUQBMPUJLDNZQQRRITNFFNLZOLWSVZTQVMLRYIMJJVYADMBRQSZHYHJXOGQSPUVEVVDINQVI NSLZWPXJGNUQHPENCNBFOHXCADCWBWBEZSWUTQXZWULQJSSUQGTZNHMRBVMHTPIUZWQUXRMDJNSSPDEWJFPZXOMGFVTCYK
These are random 100 byte records being created by thekafka-producer-perf-test.sh script. Your specific messages will of course look different to these random strings.
As the kafka-producer-perf-test.sh script runs, we will see a status of it's performance periodically:
52 records sent, 10.2 records/sec (0.00 MB/sec), 28.1 ms avg latency, 805.0 ms max latency.
This is illustrating the average and the highest latency for each message send.
When the test completes, the performance test script will output details regarding performance of the test. In this instance, we can see that we sent the requested 10 messages per second, with each message taking on average 16ms to publish. The slowest message was 805ms and the 50th percentile was 5ms.
100 records sent, 10.085729 records/sec (0.00 MB/sec), 16.76 ms avg latency, 805.00 ms max latency, 5 ms 50th, 55 ms 95th, 805 ms 99th, 805 ms 99.9th.
In the previous example, we created a console listener to visualise the test, and tested the producer performance. The next step is to performance test the consuming process.
As our New_Orders topic already has 100 messages from the previous run, we can use these messages for our first consumer test.
When running the consumer performance test, at a minimum we will need to specify the topic name and the number of messages to consume.
./bin/kafka-consumer-perf-test.sh --bootstrap-server localhost:9092 --topic new_orders --messages 100
If succesfull, this will consume the 100 messages, then output some statistics about the consumer process:
start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec, rebalance.time.ms, fetch.time.ms, fetch.MB.sec, fetch.nMsg.sec 2021-12-02 14:15:16:298, 2021-12-02 14:15:17:437, 0.0095, 0.0084, 100, 87.7963, 980, 159, 0.0600, 628.9308
Here we can see things like the start time and end time of the test, how much data was processed, and how many messages per second were retrieved. In this case, we can see that our producer listening to a single node broker setup on the same machine was capable of consuming 628 messages per second.
In a real world setting, we could then experiment with various settings for our producer, such as changing the batch size, message compression settings, acknowledgement levels and other settings to see how this impacts the tests. We could also make changes to the configuration of the broker cluster in order to optimise the end to end performance and determine our production configuration.
In the example above, the producer performance test script was simply generating random strings of 100 bytes long. To get more realistic tests, we may wish to send real messages, perhaps JSON messages if that is what you will ultimately be using.
These messages can be specified using the payload-file flag passed to the kafka-producer-perf-test.sh script.
./bin/kafka-consumer-perf-test.sh --bootstrap-server localhost:9092 --topic new_orders --payload-file /tmp/kafka-messages.json
Running a performance test on a single laptop doesn't tell us much, as we soon begin to hit limits on the machine such as the number of CPU cores.
In a more realistic situation, we would have the client and server running on different hosts, which would give us more compute capacity but introduce network latency. We would also likely have Kafka running as part of a cluster. Kafka performance test scripts become much more useful in these real world deployments.