Summary: The Service is generating 5xx errors
Customer Impact: Errors
Actions:
Join and check the following channels
Read and gain and understand what is happening so you have context, if there is a major incident(MI) in progress. Participate and add relevant notes, e.g.
Application-2 received a 5xx alert at xx:xx
If there is no MI, keep investigating the Application-2 issue.
You will need to determine if Application-2 is the cause of the problem or suffering symptoms because of a dependency issue.
Check the 4 golden signals dashboard
Is there:
Diagnosis:
Check the past 1 hour activity through DOWNSTREAM_DEPENDENCY_B
Look for high frequency patterns of the same client IP address
Normal activity
Unusual activity
Remedial Action:
Diagnosis:
Check the past 1 hour activity through API Cache
Normal activity
Low number of hits
Unusual activity
High number of hits clustered together
Remedial Action:
Diagnosis: When was our last release?
Remedial Action:
If our neighbours aren’t breaking, consider rollback.
You will need to find the previous pipeline run to production and know how to execute it