The reservation on the system has been lifted. We (Pawsey) will monitor the system overnight.
Posted Jun 03, 2026 - 18:40 AWST
Update
The onsite HPE team (together with the L3 guys) are happy that Setonix is now fully back up and stable. HPE will continue to monitor and check later this evening and tomorrow morning.
Posted Jun 03, 2026 - 17:31 AWST
Update
HPE is still working on getting the 6 x leader nodes and the /gluster filesystem synced and healthy.
HPE L3 are assisting local Engineers to resolve the problem.
Once resolved HPE will be rebooting all the compute/GPU nodes to production ready.
Thanks HPE.
Posted Jun 03, 2026 - 15:31 AWST
Update
HPE have provided no further updates.
Posted Jun 03, 2026 - 15:13 AWST
Identified
HPE were able to get the leaders back up. HPE are going to reboot the leaders to make sure they have no extra problems
ER's primary focus right now is assisting the site team to bring production back online, and will provide a post-mortem after this.
Posted Jun 03, 2026 - 14:17 AWST
Update
A critical issue has been raised with the vendor. We are waiting for a response.
Posted Jun 03, 2026 - 13:04 AWST
Investigating
Because of the work performed by HPE yesterday, there appears to be name resolution issue on Setonix.
Pawsey staff are attempting a work around.
Posted Jun 03, 2026 - 12:44 AWST
This incident affects: Setonix (Login nodes, Data-mover nodes, Slurm scheduler, Setonix work partition, Setonix debug partition, Setonix long partition, Setonix copy partition, Setonix askaprt partition, Setonix highmem partition, Setonix gpu partition, Setonix gpu high mem partition, Setonix gpu debug partition).