reduced availability of some nodes
Resolved
Closing incident - all seems stable
Posted Dec 21, 2021 - 06:09 AWST
Monitoring
The replacement switch has been fitted and nodes are being brought back online
Posted Dec 08, 2021 - 10:58 AWST
Identified
Due to a hardware failure with an infiniband switch, the 'nvlinkq' partition on Topaz and the 'copyq' partition on Garrawarla are presently unavailable. We are awaiting a replacement switch from the vendor, but do not have an ETA at this time.
Posted Nov 18, 2021 - 15:24 AWST
This incident affected: Garrawarla (Garrawarla workq partition).