Name resolution issues on Setonix

Incident Report for Pawsey Supercomputing Research Centre

Resolved

This incident has been resolved.

Posted Jun 05, 2026 - 08:08 AWST

Monitoring

The reservation on the system has been lifted. We (Pawsey) will monitor the system overnight.

Posted Jun 03, 2026 - 18:40 AWST

Update

The onsite HPE team (together with the L3 guys) are happy that Setonix is now fully back up and stable. HPE will continue to monitor and check later this evening and tomorrow morning.

Posted Jun 03, 2026 - 17:31 AWST

Update

HPE is still working on getting the 6 x leader nodes and the /gluster filesystem synced and healthy.

HPE L3 are assisting local Engineers to resolve the problem.

Once resolved HPE will be rebooting all the compute/GPU nodes to production ready.

Posted Jun 03, 2026 - 15:31 AWST

Update

HPE have provided no further updates.

Posted Jun 03, 2026 - 15:13 AWST

Identified

HPE were able to get the leaders back up. HPE are going to reboot the leaders to make sure they have no extra problems

Posted Jun 03, 2026 - 14:17 AWST

Update

A critical issue has been raised with the vendor. We are waiting for a response.

Posted Jun 03, 2026 - 13:04 AWST

Investigating

Because of the work performed by HPE yesterday, there appears to be name resolution issue on Setonix.

Pawsey staff are attempting a work around.

Posted Jun 03, 2026 - 12:44 AWST

This incident affected: Setonix (Login nodes, Data-mover nodes, Slurm scheduler, Setonix work partition, Setonix debug partition, Setonix long partition, Setonix copy partition, Setonix askaprt partition, Setonix highmem partition, Setonix gpu partition, Setonix gpu high mem partition, Setonix gpu debug partition).