Disk array failure in cxfs cluster
Incident Report for Pawsey Supercomputing Centre
The unscheduled outage of Pawsey storage systems has been resolved
Posted Sep 10, 2020 - 17:45 AWST
Our vendor has replaced more hardware and the drive pools are rebuilding. Once that's completed, Pawsey staff will reboot the array again and commence recovery of the filesystems if the hardware comes up cleanly.
Posted Sep 09, 2020 - 16:30 AWST
Our vendor is still working on the issue, but this has been hampered by a public holiday in the US
Posted Sep 08, 2020 - 16:46 AWST
Although the array enclosure has been replaced and successfully powered up, our vendor is studying diagnostic logs before making any further recommendations to service recovery.
Posted Sep 07, 2020 - 18:39 AWST
Updated list of impacted services
Posted Sep 07, 2020 - 15:39 AWST
One of the disk arrays that compose the HSM filesystem has just failed, and the filesystems that use it are unresponsive.
Pawsey staff are shutting down the nodes in the cxfs cluster to allow the on-site vendor engineer to repair the array.

At this stage we have no indication of damage, or ETA for services being restored
Posted Sep 07, 2020 - 15:23 AWST
This incident affected: Storage Systems (Data Portal Systems, Hierarchical Storage Management Systems, MWA Nodes, CASDA Nodes).