P99 latency is a term used in performance testing to denote the time below which 99% of the network requests are served. It's a good way to understand the tail-end behavior of your system, as it provides insights beyond just average or median latencies, which can often mask outliers.
To calculate P99 latency, you have to collect all request latencies, sort them in ascending order, then find the value below which 99% of the measures fall.
For instance, assuming
latencies is an array storing all latencies in milliseconds:
In this Python code snippet, we first sort the latencies, then find the index corresponding to the 99th percentile (length of the list times 0.99). The p99 latency is the value at this index in the sorted list.
However, in production environments where latencies are gathered over multiple instances and long periods of time, you might use statistical approximation algorithms, like t-digest or HDR Histogram, as sorting large amounts of data may not be feasible due to resource constraints.