Setonix Login nodes experiencing Slingshot issues

Update

setonix-04 reported a C_EC_CRIT error yesterday. It is not in the login pool, but HPE are stumped at why this is happening.
Posted Nov 07, 2025 - 13:40 AWST

Monitoring

HPE rebooted a number of Slingshot switches during maintenance.

We haven't observed any Slingshot errors on the login, data mover or visualisation nodes for 48 hours.

We will continue to monitor.
Posted Nov 06, 2025 - 12:29 AWST

Update

HPE have provided no new information
Posted Oct 31, 2025 - 08:11 AWST

Update

HPE have provided no new information.
Posted Oct 27, 2025 - 10:59 AWST

Update

HPE have provided no new information.
Posted Oct 24, 2025 - 21:01 AWST

Update

HPE have provided no new information.

setonix-08 has slingshot issues. Pawsey is rebooting it.
Posted Oct 20, 2025 - 13:25 AWST

Update

setonix-02 and setonix-03 have been added back to the RR DNS.
Posted Oct 16, 2025 - 14:09 AWST

Investigating

There appears to be an issue will the Slingshot interfaces in the login nodes in Setonix. We appear to be down to 1 login node in the normal pool of login nodes.

We have had a case open with HPE for weeks, but they appear to be no closer to providing any kind of solution.

Please, please, please, please don't run any computational intensive operations on the login nodes. We have lovely compute nodes for that.

Please be aware that you can log into setonix-workflow.pawsey.org.au and get access to additional "workflow" nodes.
Posted Oct 16, 2025 - 12:02 AWST
This incident affects: Setonix (Login nodes).