A performance problem has three parts:
- The event that introduces the problem (e.g., application configuration change)
- The symptoms of the problem (e.g., CPU usage spike)
- The cause (e.g., logging was left in DEBUG mode)
This page focuses on the causes of performance problems.
- Failures
- These rapidly take all or part of the system to zero health (i.e., crashes, outages).
- It is difficult or impossible to provide advance warning that a failure will occur.
- Examples: uncaught insufficient permission exceptions, network cable failures.
- Resource saturation issues
- These gradually take all or part of the system to zero health.
- You can often identify these early by monitoring resource metrics like CPU or memory utilization.
- Examples: memory leaks, inefficient DB queries.
- Application (e.g., a bug that introduces a memory leak)
- Middleware such as a web server, load balancer, message queue etc. (e.g., web server plug-in failure)
- Container or server infrastructure (e.g., disk failure)
- Network (e.g., incorrect DNS server settings)
- External resources (e.g., 3rd-party payment processing service is down)
A breakdown of causes by the type of problem they produce, and the the part of the system they affect
Once you've isolated the potential source of a performance problem, you can use this table to form a hypothesis about what the problem is & how to remediate it.
| Part of stack | Ways it can fail | Ways it can experience resource saturation |
|---|---|---|
| Application |
|
|
| Middleware |
|
|
| Server or container |
|
|
| Network |
|
|
| External Service |
|
|
- Methodologies and frameworks
- Web framework/service configuration parameters & tuning
- https://www.playframework.com/documentation/2.6.x/Configuration
- https://nodesource.com/blog/node-js-performance-monitoring-part-3-debugging-the-event-loop
- https://docs.mongodb.com/manual/administration/analyzing-mongodb-performance/
- https://book.varnish-software.com/4.0/chapters/Tuning.html
- https://www.nginx.com/resources/wiki/start/topics/tutorials/config_pitfalls/
- https://httpd.apache.org/docs/2.4/misc/perf-tuning.html
- https://wiki.squid-cache.org/SquidFaq/TroubleShooting
- https://support.incapsula.com/hc/en-us/articles/209074918-Common-Incapsula-Errors-and-Their-Solutions
- https://kubernetes.io/docs/tasks/debug-application-cluster/debug-cluster/#a-general-overview-of-cluster-failure-modes
- https://www.ibm.com/support/knowledgecenter/en/SSFKSJ_9.0.0/com.ibm.mq.tro.doc/q038530_.html
- AppDynamics SaaS operations guides and interviews with AppDynamics colleagues