Annual High Voltage inspection shutdown
The work to connect Phase 2 of the Setonix system and implement the upgraded Cassini cards has been completed and Phase 1 has now been returned to service.
While we understand researchers frustration at the delay, the work was required to ensure Setonix was restored as a stable system.
Pawsey staff and our vendor partners continue to work on Phase 2 to bring that into service within the February timeframe.
Please note the February maintenance window for Setonix will be extended to 2 days (Tuesday 7th and Wednesday 8th of February) to allow sufficient time to make configuration changes to Phase 1 to allow Phase 2 to be brought on-line.
We appreciate your support and ask if you have any questions please e-mail firstname.lastname@example.org.
Our vendor is working on bringing Setonix back into service in it's final configuration. The remaining services at Pawsey should all be operational. If you are having issues, please create a ticket via https://pawsey.org.au/support/
While Setonix was offline, Pawsey took the opportunity to request HPE to replace all the network cards in Phase 1 with the upgraded Cassini cards, these are the same network cards which are in Phase 2. This is a significant improvement to the networking capacity and capability with higher bandwidth (200 Gb/s) and come with an updated libfabric which resolves many of the MPI issues encountered with Phase 1. HPE are rectifying issues with the high-speed fabric, which is taking longer than expected, this work is required to ensure Phase 1 is restored in a resilient fashion.
While we understand researchers frustration at the delay, the work currently being done will ensure that Setonix is restored as a stable system as well as allowing us to bring Phase 2 into production as soon as possible.
Scheduled maintenance is currently in progress. We will provide updates as necessary.
Over the weekend of 14/15 January, the high voltage supply switchgear for Pawsey will undergo its annual inspection and maintenance. This requires that the supplies are isolated and so ALL Pawsey systems will be shut down for the weekend.
Systems will be shut down from Friday at 08:00 and startup resumed on Monday morning. We are expecting systems to be restored by the evening of Tuesday the 17th.
This scheduled maintenance affected: Setonix (Login nodes, Data-mover nodes, Slurm scheduler, Setonix work partition), ASKAP (ASKAP ingest nodes, ASKAP service nodes), Storage Systems (Acacia, Banksia, Data Portal Systems, MWA Nodes, CASDA Nodes, Ingest), Central Services (Authentication and Authorization, Service Desk, License Server, Application Portal, Origin, /home filesystem, /pawsey filesystem, Central Slurm Database), Lustre filesystems (/scratch filesystem (new), /software filesystem, /scratch filesystem (legacy), /group filesystem, /astro filesystem, /askapbuffer filesystem, /askapingest filesystem), The Australian Biocommons (Fgenesh++), Garrawarla (Garrawarla workq partition, Garrawarla gpuq partition, Garrawarla asvoq partition, Garrawarla copyq partition, Garrawarla login node, Slurm Controller (Garrawarla)), Topaz (GPU partition, Topaz login nodes, Slurm Controller (topaz)), Nimbus (Ceph storage, Nimbus instances, Nimbus dashboard), Visualisation Services (Remote Vis, Nebula, Visualisation Lab), and Legacy Systems (Galaxy Compute nodes, Galaxy login nodes, Slurm Controller (Galaxy)).