Latency" in computing refers to the delay before a transfer of data begins following an instruction for its transfer. It's a crucial aspect of understanding system performance and user experience, especially in distributed systems.
When we talk about "p95" and "p99" latencies, we are referring to percentile latencies. These numbers represent the maximum response time experienced by 95% and 99% of requests respectively.
It's worth mentioning that p99 latency represents more extreme outliers in your system's performance than p95 latency. Thus, if you're optimizing for the best performance under peak conditions, paying attention to p99 latency can be more important because it helps ensure good experiences even for those users who might otherwise have unusually long wait times.
For example, consider a simple method for measuring request latency in Python using the
You could collect these latencies over time for all requests, then calculate the p95 and p99 latencies like so:
In this code,
np.percentile() calculates the desired percentile value (in our case, p95 or p99) from the given list of latencies.
While both p95 and p99 are useful metrics, they serve different purposes based on your performance optimization goals. In general, monitoring various percentiles can give you a more holistic view of your system's performance.