Annual High Voltage inspection shutdown

Scheduled Maintenance Report for Pawsey Supercomputing Research Centre

Completed

The work to connect Phase 2 of the Setonix system and implement the upgraded Cassini cards has been completed and Phase 1 has now been returned to service.

While we understand researchers frustration at the delay, the work was required to ensure Setonix was restored as a stable system.

Pawsey staff and our vendor partners continue to work on Phase 2 to bring that into service within the February timeframe.

Please note the February maintenance window for Setonix will be extended to 2 days (Tuesday 7th and Wednesday 8th of February) to allow sufficient time to make configuration changes to Phase 1 to allow Phase 2 to be brought on-line.

We appreciate your support and ask if you have any questions please e-mail help@pawsey.org.au.

Posted Jan 25, 2023 - 15:09 AWST

Update

Our vendor is working on bringing Setonix back into service in it's final configuration. The remaining services at Pawsey should all be operational. If you are having issues, please create a ticket via https://pawsey.org.au/support/

While Setonix was offline, Pawsey took the opportunity to request HPE to replace all the network cards in Phase 1 with the upgraded Cassini cards, these are the same network cards which are in Phase 2. This is a significant improvement to the networking capacity and capability with higher bandwidth (200 Gb/s) and come with an updated libfabric which resolves many of the MPI issues encountered with Phase 1. HPE are rectifying issues with the high-speed fabric, which is taking longer than expected, this work is required to ensure Phase 1 is restored in a resilient fashion.

While we understand researchers frustration at the delay, the work currently being done will ensure that Setonix is restored as a stable system as well as allowing us to bring Phase 2 into production as soon as possible.

Posted Jan 19, 2023 - 12:06 AWST

In progress

Scheduled maintenance is currently in progress. We will provide updates as necessary.

Posted Jan 13, 2023 - 08:00 AWST

Scheduled

Over the weekend of 14/15 January, the high voltage supply switchgear for Pawsey will undergo its annual inspection and maintenance. This requires that the supplies are isolated and so ALL Pawsey systems will be shut down for the weekend.
Systems will be shut down from Friday at 08:00 and startup resumed on Monday morning. We are expecting systems to be restored by the evening of Tuesday the 17th.

Posted Nov 29, 2022 - 08:25 AWST

This scheduled maintenance affected: ASKAP (ASKAP ingest nodes, ASKAP service nodes), Central Services (Authentication and Authorization, Service Desk, License Server, Application Portal, Origin, /home filesystem, /pawsey filesystem, Central Slurm Database), The Australian Biocommons (Fgenesh++), Storage Systems (Acacia Ingest, Acacia Projects, Banksia, Data Portal Systems), Lustre filesystems (/scratch filesystem, /software filesystem, /askapbuffer filesystem, /askapingest filesystem), Setonix (Login nodes, Data-mover nodes, Slurm scheduler, Setonix work partition), and Visualisation Services (Setonix vis nodes, Nebula vis nodes, Visualisation Lab).