Servers are back up with filesystem successfully mounted. There may be some delays while clients reconnect.
Posted May 13, 2020 - 07:36 AWST
One of the servers providing the /group filesystem (pgfs-oss12) became unresponsive overnight and has just been rebooted. Services may temporarily hang while trying to access certain paths on the /group filesystem while the disks are failed over to the partner host (pgfs-oss11)
Posted May 13, 2020 - 07:32 AWST
This incident affected: Galaxy (Galaxy Compute nodes, Galaxy login nodes), Magnus (Magnus Compute nodes, Magnus login nodes), Zeus (Zeus Compute nodes), Lustre filesystems (/group filesystem), and Topaz (GPU partition).