High speed interconnect issue
Resolved
The replacement part has been installed successfully; Zeus is now operating as normal.
Posted Jun 24, 2020 - 11:38 AWST
Update
The replacement parts have arrived from the vendor after considerable delay. Preparations are being made to install the parts next Wednesday, 24th June. For more detailed information about this scheduled maintenance, a log has been created here: https://support.pawsey.org.au/documentation/display/US/M-2020-06-24-SC
Posted Jun 19, 2020 - 13:40 AWST
Update
We are still awaiting the replacement part. The expectation is it should arrive in time for scheduled maintenance on Tuesday.
Posted May 27, 2020 - 08:45 AWST
Update
Todays update from the vendor: - they have requested an escalation from the Intel Support team to expedite on the RMA process, but we still don't have any parts on site or an ETA of them arriving.
Posted May 14, 2020 - 13:37 AWST
Update
We are still waiting for replacement parts to be shipped to site from our vendor.
Posted May 12, 2020 - 12:00 AWST
Update
We are in contact with the vendor requesting replacement parts. At this stage, we have not been able to secure an ETA from the vendor.
Posted May 07, 2020 - 09:39 AWST
Identified
We've identified the failed component and the Omni-Path fabric is stable. The Omni-Path connected Lustre clients are in recovery mode and may take some time to complete.
Posted May 06, 2020 - 08:58 AWST
Update
We are investigating an issue with the Omnipath switch. This affects certain Zeus partitions. Please see the incident log page for more information: https://support.pawsey.org.au/documentation/display/US/I-2020-05-05-SC
Posted May 05, 2020 - 14:41 AWST
Investigating
We are investigating an issue with the Omnipath switch.
Posted May 05, 2020 - 14:05 AWST