The failover service dropped the IPs and monitoring did not catch the interruption, as a result of a combination of factors. The failover services did not successfully communicate with one another during a network restart, and the transitional state of the final stage of our control plane migration compounded the problem. The issues are now resolved, and work will continue to improve the resilience of the services.
Posted Mar 09, 2023 - 17:13 AWST
The main IP address became unresponsive. Restarting the failover service reinstated the IP, and all systems are now operating normally.
Posted Mar 09, 2023 - 15:49 AWST
We are currently experiencing an outage with the Nimbus dashboard and API. Access to Nimbus instances themselves is not affected. We are currently investigating the issue.
Posted Mar 09, 2023 - 15:16 AWST
This incident affected: Nimbus (Nimbus dashboard).