Loren Data Outage Event
Incident Report for Fenwick
Postmortem

1. Although communication with customers is usually a core strength, communicating this event was a challenge because the tool used was brought down with the cluster event. Within the next 2 weeks, a secondary communication method will be put in place with multiple people having access. This secondary communication method will allow notifications to be sent independently of the network.

2. Thinning of database size for faster restores, indexing, and recovery

3. Instituting a more rapid cutover to secondary data center with no data loss

4. Provide messaging on user interfaces when network is unavailable

5. Desktop simulations of disaster recovery scenarios will be performed by Loren Data teams on a quarterly basis

Posted Nov 17, 2021 - 11:10 AEDT

Resolved
Loren Data detected a failure in the primary cluster node, which resulted in a network outage. The outage was identified internally, and steps were taken by the DR team, Executive Team, and TechOps to research and determine best steps for recovery. After coordinating with the software vendor, it was made clear that failing over to the secondary data center could result in an extended recovery time, resulting in potential data loss. The Executive team determined the best path forward was to recover the primary data center to preserve data integrity and minimize customer impact.

About 7 hours after the initial recovery, the network experienced a failed volume drive, which was resolved within a few minutes.

Because LD has full backups and mirrored drives, TechOps was able to completely recover the cluster node, with little to no lost or missing data. Databases could also be restored to the minute of failure without data loss.
Posted Nov 15, 2021 - 09:30 AEDT