No further issues seen with the two MDS servers throughout the day.
Posted Jan 12, 2021 - 05:26 AWST
Both MDSs seem to be performing normally following a reboot
Posted Jan 11, 2021 - 07:01 AWST
The pair of servers that make up the metadata service for /astro (running the MGS, and both MDTs) aren't handling failover correctly. One server rebooted earlier this morning but the disk LUNs aren't being correctly handed off to the partner. This will require a reboot of one, possibly both servers. During this time access to /astro will be impacted.
Posted Jan 11, 2021 - 04:14 AWST
This incident affected: Magnus (Magnus Compute nodes), Garrawarla (Garrawarla compute nodes), Zeus (Zeus Compute nodes, Data Mover nodes (CopyQ)), and Lustre filesystems (/astro filesystem).