Completed -
Setonix was handed back to Pawsey at 8 AM this morning. We (Pawsey) have rebooted the entire system and run our usual battery of tests. As usual a small number of nodes (including visualisation nodes) have been put into reservations so hardware issues can be triaged and resolved.
Setonix has been returned to service.
The Pawsey team will issue separate communication about the current configuration and the use of NVMe devices on Setonix GPU nodes.
Acacia is still having a rolling upgrade of its operating system which is being tracked in a seperate maintenance page (https://status.pawsey.org.au/incidents/39nvt00xyhm4)
The next scheduled maintenance will be 5th August 2025. Setonix will be upgraded to the next extended support release of Cray Operating System which is based on SLES 15 SP6.
Thank you to everyone involved in maintenance for their hard work.
If you have any questions, please contact help@pawsey.org.au. Ask nicely.
Jun 25, 13:14 AWST
Update -
Apologies about the AWSET, it is where my brain currently is ....
Jun 24, 16:09 AWST
Update -
Banksia has been returned to service (except for Kafka notifications which will be returned to service later this afternoon). The ScoutAM update, storage controller firmware update and tape library firmware update have been completed. A storage controller will need to be replaced, but will be done live.
Acacia (MWA) intrusive testing work is complete.
Acacia (Projects) and Acacia (Ingest) operating system upgrades are 1/3 complete. The service has been available throughout, but the availability is considered at risk. The upgrades will continue tomorrow, and a seperate maintenance page will be opened.
Operating System updates to the SLURM database daemon, patching of visualisation services and patching of core Pawsey services are complete.
ASKAP Ingest has also been returned to service.
HPE continue to work on Setonix. We hope to have an update at 5 PM (AWST). However we still expect HPE to hand Setonix back to Pawsey sometime tomorrow.
CARTA and Setonix Visualisation nodes are dependent on HPE handing Setonix back to Pawsey, before they can be put back into production.
Jun 24, 16:06 AWST
Update -
Patching visualisation services is complete. CARTA and Setonix Visualisation nodes are waiting on HPE to return to service.
Patching of core Pawsey services is complete.
Jun 24, 15:22 AWST
Update -
The Setonix management control plane has been upgraded to HPCM 1.13. The system has been handed over to the local HPE team to perform remediation work due to the extended work last maintenance (May). The system will be handed back to the remote team this evening to complete the management control plane upgrade.
Upgrades on other systems have commenced.
Jun 24, 08:47 AWST
In progress -
Scheduled maintenance is currently in progress. We will provide updates as necessary.
Jun 23, 16:00 AWST
Update -
The Setonix maintenance has been pushed forward to 4PM to allow Pawsey staff to safely shutdown services before handing the system over to HPE.
Jun 22, 13:14 AWST
Update -
Acacia (Projects) and Acacia (Ingest) will move to newer operating systems via a rolling upgrade. There is no planned service outage, but the availability will be considered at risk.
Jun 19, 17:53 AWST
Scheduled -
Maintenance will be carried out on Setonix starting at 5 PM, Monday June 23rd 2025. This is to support HPE engineers who will be performing the HPE Performance Cluster Manager (HPCM) upgrade, who are in a different time zone to Perth.
Maintenance will be carried out on all other Pawsey systems on Tuesday the 24th June to apply required patches and updates to improve the systems stability, security, and performance. This maintenance window will also be used to undertake other tasks which require down-time to achieve.
Planned work for this window includes:
• Update of NVMe SLURM gres configuration on Setonix GPU nodes
• Banksia ScoutAM update
• Banksia storage controller firmware update
• Banksia tape library firmware update
• Acacia (MWA) will be rebooted to test network changes "stick"
• Acacia (Projects) and Acacia (Ingest) will move to newer operating systems via a rolling upgrade. There is no planned service outage, but the availability will be considered at risk.
• Operating System updates to the SLURM database daemon
• Patching visualisation services
• Patching of core Pawsey services
We expect to be able to bring all services (except Setonix) back by the end of the day. Setonix is scheduled to be handed back to Pawsey early Wednesday morning (June 25th 2025).
The next scheduled maintenance will be 5th August 2025. Setonix will be upgraded to the next extended support release of Cray Operating System which is based on SLES 15 SP6.
If you have any questions, please contact help@pawsey.org.au.
Jun 17, 14:18 AWST