Voice - Call Failures
Incident Report for Inspira Technology Group
Postmortem

Voice Outage 19/01/2021 - Post Incident Report

Yesterday, 19th January 2021 at 10:30 we declared an emergency and alerted you about an issue on our network, which later caused a complete loss of service to a number of clients connected to our hosted phone system.

At around 10:25 our automated network checks alerted us about a fluctuating link between the data centres which house our voice and database servers. We took the necessary steps to reroute traffic over other links to avoid any disturbance to ongoing calls and confirmed automated actions our platform takes in scenarios like this. To begin with, this was successful and not service impacting. Between 10:30 - 10:33 the regained connections from the voice to our database servers; which were pending due to the fluctuating link, were trying to re-establish in order to correct the state of the calls and finish correctly. This load and mass of connections caused an overload on our database servers, preventing any further calls.

To reduce the connections to the database, all secondary services such as the hosted web portal, dialler and BLF were disabled.

We then proceeded to investigate and mitigate the impact this outage had on our customers and were able to regain service at around 11:10. Within the 40 minutes of disturbance, a few thousand outbound calls were able to be made, but full service could not be resumed until connections to our database servers were reduced and voice servers cleared. 

During this incident, inbound calls were diverted to our system disaster recovery platform purposely designed and built for a scenario like this. Since it's been in service, this was the second time it had to be enabled and we handled an amazing 85% of our inbound call volume which diverted to backup destinations set up on our platform in advance.

We have already begun internal investigations as to why the connections to our database servers were overloaded and we will implement solutions to prevent this from happening in future.

We know how important our service reliability is to you and therefore strive to reach 100% uptime.

We are truly sorry this issue was service affecting and will take any necessary steps to prevent it from happening again.

Posted Jan 20, 2021 - 21:31 GMT

Resolved
Traffic has been rerouted and services resumed. We will confirm further details as soon as more information is available.
Posted Jan 19, 2021 - 12:39 GMT
Monitoring
Traffic has been rerouted and services resumed. We will confirm further details as soon as more information is available.
Posted Jan 19, 2021 - 11:27 GMT
Update
We are continuing to re-route traffic. Please standby for further updates.
Posted Jan 19, 2021 - 11:04 GMT
Update
We are continuing to work on a fix for this issue.
Posted Jan 19, 2021 - 10:50 GMT
Identified
We have identified an issue with an upstream carrier and are currently rerouting traffic.
Posted Jan 19, 2021 - 10:46 GMT
This incident affected: Connectivity & Voice (Inspira Hosted VOIP Platform).