issues with /astro metadata servers

Incident Report for Pawsey Supercomputing Research Centre

Resolved

No further issues seen with the two MDS servers throughout the day.

Posted Jan 12, 2021 - 05:26 AWST

Monitoring

Both MDSs seem to be performing normally following a reboot

Posted Jan 11, 2021 - 07:01 AWST

Identified

The pair of servers that make up the metadata service for /astro (running the MGS, and both MDTs) aren't handling failover correctly. One server rebooted earlier this morning but the disk LUNs aren't being correctly handed off to the partner. This will require a reboot of one, possibly both servers.
During this time access to /astro will be impacted.

Posted Jan 11, 2021 - 04:14 AWST