Pawsey Scheduled Maintenance (August)
Scheduled Maintenance Report for Pawsey Supercomputing Research Centre
Completed
We are done.
Posted Aug 16, 2024 - 14:04 AWST
Update
Garrawarla is operational.

Still to be deployed are:
* casda-an01
* carta
* setonix-workflow
Posted Aug 14, 2024 - 16:12 AWST
Verifying
Setonix has been returned to service. We caught an issue early this morning which has been fixed by HPE.

We are currently bringing back the last three cabinets and hope to have them operational by tomorrow, but all partitions should have resources available.

The management backplane has been replaced, but no major changes have been made to the user facing platform, but if you do notice any issues please reach out to us at help@pawsey.org.au.

Special thanks to the Pawsey and HPE team which made this happen.
Posted Aug 14, 2024 - 16:11 AWST
Update
Scheduled maintenance is still in progress. We will provide updates as necessary.
Posted Aug 14, 2024 - 10:47 AWST
Update
Maintenance on Nimbus, Acacia and core services is complete.

Setonix is in the final stretch of customisation and testing. We are currently running reframe tests and making minor adjustments have brought to light.

All login and visualisations nodes now sit on the same 100 gigabit fabric as Acacia.

Setonix and Garrawarla are still on schedule to be returned to service tomorrow.
Posted Aug 13, 2024 - 16:32 AWST
Update
Scheduled maintenance is still in progress. We will provide updates as necessary.
Posted Aug 13, 2024 - 11:50 AWST
Update
Scheduled maintenance is still in progress. We will provide updates as necessary.
Posted Aug 13, 2024 - 09:30 AWST
Update
Scheduled maintenance is still in progress. We will provide updates as necessary.
Posted Aug 13, 2024 - 08:31 AWST
Update
With the completion of the slurm database update the Askap-Ingest slurm cluster has been returned to service.

HPE continue to update Setonix and we eagerly await their progress.
Posted Aug 09, 2024 - 13:59 AWST
In progress
Scheduled maintenance is currently in progress. We will provide updates as necessary.
Posted Aug 09, 2024 - 08:00 AWST
Update
Setonix (including the Setonix visualisation nodes) and Garrawarla will be unavailable staring this Friday, August 9th from 8 AM AWST

The SLURM Database Daemon will be updated while Setonix and Garrawarla are offline. ASKAP Ingest will also be affected by the upgrade.

Regular Pawsey "systems maintenance” is scheduled for the Tuesday (13th August)
* Nimbus is being upgraded to OpenStack Wallaby
* Benchmarking will be performed on Acacia
* Core Pawsey Services will be patched
Posted Aug 06, 2024 - 12:48 AWST
Scheduled
Our regular first-Tuesday-of-the-month maintenance will not proceed in August. Instead, an extended maintenance period for Setonix is scheduled for August 9th - 14th.

During the shutdown, HPE will replace Setonix's management system, bringing several benefits:
• Moving forward system updates and patches will happen during regular maintenance, minimising disruptions.
• HPE reports improved system stability with the upgraded management system.

What this mean?
• Setonix (including the Setonix visualisation nodes) and Garrawarla will be unavailable during August 9th to 14th
• All Pawsey systems will be subject to regular maintenance on the 13th of August including disruptive testing on both Acacia clusters, an upgrade to the control plane of Nimbus and patching of the Banksia system.

The replacement of Setonix's management system has been successfully implemented on Pawsey's test and development system. When Setonix is returned to service, the version of Cray Operating System won't have changed nor will the software stack provided by Pawsey. Only security fixes are being applied.

The login and visualisation nodes are bring moved to the 100 gigabit network which means anyone providing external software access to Setonix should allow for access from the 146.118.74.0/22 network.

Further updates will be provided on status.pawsey.org.au, and any questions should be directed to help@pawsey.org.au.
Posted Jul 25, 2024 - 11:15 AWST
This scheduled maintenance affected: Garrawarla (Garrawarla workq partition, Garrawarla gpuq partition, Garrawarla asvoq partition, Garrawarla copyq partition, Garrawarla login node, Slurm Controller (Garrawarla)), Setonix (Login nodes, Data-mover nodes, Slurm scheduler, Setonix work partition, Setonix debug partition, Setonix long partition, Setonix copy partition, Setonix askaprt partition, Setonix highmem partition, Setonix gpu partition, Setonix gpu high mem partition, Setonix gpu debug partition), and Visualisation Services (Remote Vis, Reservation).