Completed
Setonix was handed back to Pawsey at 8 AM this morning. We (Pawsey) have rebooted the entire system and run our usual battery of tests. As usual a small number of nodes (including visualisation nodes) have been put into reservations so hardware issues can be triaged and resolved.
Setonix has been returned to service.
The Pawsey team will issue separate communication about the current configuration and the use of NVMe devices on Setonix GPU nodes.
Acacia is still having a rolling upgrade of its operating system which is being tracked in a seperate maintenance page (https://status.pawsey.org.au/incidents/39nvt00xyhm4)
The next scheduled maintenance will be 5th August 2025. Setonix will be upgraded to the next extended support release of Cray Operating System which is based on SLES 15 SP6.
Thank you to everyone involved in maintenance for their hard work.
If you have any questions, please contact help@pawsey.org.au. Ask nicely.
Posted Jun 25, 2025 - 13:14 AWST
Update
Apologies about the AWSET, it is where my brain currently is ....
Posted Jun 24, 2025 - 16:09 AWST
Update
Banksia has been returned to service (except for Kafka notifications which will be returned to service later this afternoon). The ScoutAM update, storage controller firmware update and tape library firmware update have been completed. A storage controller will need to be replaced, but will be done live.
Acacia (MWA) intrusive testing work is complete.
Acacia (Projects) and Acacia (Ingest) operating system upgrades are 1/3 complete. The service has been available throughout, but the availability is considered at risk. The upgrades will continue tomorrow, and a seperate maintenance page will be opened.
Operating System updates to the SLURM database daemon, patching of visualisation services and patching of core Pawsey services are complete.
ASKAP Ingest has also been returned to service.
HPE continue to work on Setonix. We hope to have an update at 5 PM (AWST). However we still expect HPE to hand Setonix back to Pawsey sometime tomorrow.
CARTA and Setonix Visualisation nodes are dependent on HPE handing Setonix back to Pawsey, before they can be put back into production.
Posted Jun 24, 2025 - 16:06 AWST
Update
Patching visualisation services is complete. CARTA and Setonix Visualisation nodes are waiting on HPE to return to service.
Patching of core Pawsey services is complete.
Posted Jun 24, 2025 - 15:22 AWST
Update
The Setonix management control plane has been upgraded to HPCM 1.13. The system has been handed over to the local HPE team to perform remediation work due to the extended work last maintenance (May). The system will be handed back to the remote team this evening to complete the management control plane upgrade.
Upgrades on other systems have commenced.
Posted Jun 24, 2025 - 08:47 AWST
In progress
Scheduled maintenance is currently in progress. We will provide updates as necessary.
Posted Jun 23, 2025 - 16:00 AWST
Update
The Setonix maintenance has been pushed forward to 4PM to allow Pawsey staff to safely shutdown services before handing the system over to HPE.
Posted Jun 22, 2025 - 13:14 AWST
Update
Acacia (Projects) and Acacia (Ingest) will move to newer operating systems via a rolling upgrade. There is no planned service outage, but the availability will be considered at risk.
Posted Jun 19, 2025 - 17:53 AWST
Scheduled
Maintenance will be carried out on Setonix starting at 5 PM, Monday June 23rd 2025. This is to support HPE engineers who will be performing the HPE Performance Cluster Manager (HPCM) upgrade, who are in a different time zone to Perth.
Maintenance will be carried out on all other Pawsey systems on Tuesday the 24th June to apply required patches and updates to improve the systems stability, security, and performance. This maintenance window will also be used to undertake other tasks which require down-time to achieve.
Planned work for this window includes:
• Update of NVMe SLURM gres configuration on Setonix GPU nodes
• Banksia ScoutAM update
• Banksia storage controller firmware update
• Banksia tape library firmware update
• Acacia (MWA) will be rebooted to test network changes "stick"
• Acacia (Projects) and Acacia (Ingest) will move to newer operating systems via a rolling upgrade. There is no planned service outage, but the availability will be considered at risk.
• Operating System updates to the SLURM database daemon
• Patching visualisation services
• Patching of core Pawsey services
We expect to be able to bring all services (except Setonix) back by the end of the day. Setonix is scheduled to be handed back to Pawsey early Wednesday morning (June 25th 2025).
The next scheduled maintenance will be 5th August 2025. Setonix will be upgraded to the next extended support release of Cray Operating System which is based on SLES 15 SP6.
If you have any questions, please contact help@pawsey.org.au.
Posted Jun 17, 2025 - 14:18 AWST
This scheduled maintenance affected: ASKAP (ASKAP ingest nodes, ASKAP service nodes), Central Services (Authentication and Authorization, Service Desk, License Server, Application Portal, Origin, /home filesystem, /pawsey filesystem, Central Slurm Database, Documentation), Storage Systems (Acacia - Projects, Banksia, Data Portal Systems, MWA Nodes, CASDA Nodes, Acacia - Ingest, MWA ASVO), Lustre filesystems (/scratch filesystem, /software filesystem, /askapbuffer filesystem, /askapingest filesystem), Setonix (Login nodes, Data-mover nodes, Slurm scheduler, Setonix work partition, Setonix debug partition, Setonix long partition, Setonix copy partition, Setonix askaprt partition, Setonix highmem partition, Setonix gpu partition, Setonix gpu high mem partition, Setonix gpu debug partition), and Visualisation Services (Remote Vis, Vis scheduler, Setonix vis nodes, Nebula vis nodes, Visualisation Lab, Reservation, CARTA - Stable, CARTA - Test, Pawsey Remote VR).