Pawsey Supercomputing Research Centre Status

Setonix Login nodes experiencing Slingshot issues

Update - setonix-04 reported a C_EC_CRIT error yesterday. It is not in the login pool, but HPE are stumped at why this is happening.
Nov 07, 2025 - 13:40 AWST

Monitoring - HPE rebooted a number of Slingshot switches during maintenance.

We haven't observed any Slingshot errors on the login, data mover or visualisation nodes for 48 hours.

We will continue to monitor.
Nov 06, 2025 - 12:29 AWST

Update - HPE have provided no new information
Oct 31, 2025 - 08:11 AWST

Update - HPE have provided no new information.
Oct 27, 2025 - 10:59 AWST

Update - HPE have provided no new information.
Oct 24, 2025 - 21:01 AWST

Update - HPE have provided no new information.

setonix-08 has slingshot issues. Pawsey is rebooting it.
Oct 20, 2025 - 13:25 AWST

Update - setonix-02 and setonix-03 have been added back to the RR DNS.
Oct 16, 2025 - 14:09 AWST

Investigating - There appears to be an issue will the Slingshot interfaces in the login nodes in Setonix. We appear to be down to 1 login node in the normal pool of login nodes.

We have had a case open with HPE for weeks, but they appear to be no closer to providing any kind of solution.

Please, please, please, please don't run any computational intensive operations on the login nodes. We have lovely compute nodes for that.

Please be aware that you can log into setonix-workflow.pawsey.org.au and get access to additional "workflow" nodes.
Oct 16, 2025 - 12:02 AWST

Setonix Operational

Data-mover nodes Operational

Slurm scheduler Operational

Setonix work partition Operational

Setonix debug partition Operational

Setonix long partition Operational

Setonix copy partition Operational

Setonix askaprt partition Operational

Setonix highmem partition Operational

Setonix gpu partition Operational

Setonix gpu high mem partition Operational

Setonix gpu debug partition Operational

Lustre filesystems Operational

/scratch filesystem Operational

/software filesystem Operational

/askapbuffer filesystem Operational

/askapingest filesystem Operational

Storage Systems Operational

Acacia Ingest Operational

Acacia MWA Operational

Acacia Projects Operational

Banksia Operational

Data Portal Systems Operational

CASDA Nodes Operational

MWA Nodes Operational

MWA ASVO Operational

ASKAP Operational

ASKAP ingest nodes Operational

ASKAP service nodes Operational

Central Services Operational

Authentication and Authorization Operational

Service Desk Operational

License Server Operational

Application Portal Operational

Origin Operational

/home filesystem Operational

/pawsey filesystem Operational

Central Slurm Database Operational

Documentation Operational

Visualisation Services Operational

Remote Vis Operational

Vis scheduler Operational

Setonix vis nodes Operational

Nebula vis nodes Operational

Visualisation Lab Operational

Reservation Operational

CARTA - Stable Operational

CARTA - Test Operational

Pawsey Remote VR Operational

The Australian Biocommons Operational

Fgenesh++ Operational

Nimbus - Legacy Operational

Ceph storage Operational

Nimbus instances Operational

Nimbus dashboard Operational

Nimbus APIs Operational

Operational

Degraded Performance

Partial Outage

Major Outage

Maintenance

Scheduled Maintenance

Pawsey Scheduled Maintenance (December) Dec `2`, `2025` `09:00`-`17:00` AWST

Maintenance will be carried out on Pawsey systems on Tuesday the 2nd December to apply required patches and updates to improve the systems stability, security, and performance. This maintenance window will also be used to undertake other tasks which require down-time to achieve.

Planned work for this window includes:
• The firewall will have security profiles applied to increase visibility and threat prevention coverage.
• Firewall security policy clean up.
• Preemptive high availability election on firewalls will be enabled.
• HPE will re-cable the temperature sensor in rack x1001 of Setonix.
• HPE will perform coolant sampling of Setonix.
• Update to the cli-filter on Setonix to support future GPU power control operations.
• Install CPU and GPU versions of NAMD 3.0.2 on Setonix.
• /home on ASKAP Ingest and Ella will be replaced with a NetApp (as well as /software on ASKAP Ingest).
• Block port 5000 externally on Acacia Projects.
• Acacia Ingest will continue migration work off Puppet infrastructure.
• Banksia will be moving to a new VLAN.
• Change over to new Kafka Production server for event notifications on Banksia.
• Patching of visualisation services will be undertaken.
• Patching of core Pawsey services will be undertaken.

If you have any questions, please contact help@pawsey.org.au.
Posted on Nov 25, 2025 - 09:49 AWST

System Metrics Month Week Day

Allocated Cores (Setonix)

Fetching

Allocated Nodes (Setonix work partition)

Fetching

Allocated nodes (Setonix askaprt partition)

Fetching

Past Incidents

Nov 25, 2025

Cooling Event

Resolved - Setonix has been stable overnight.
Nov 25, 06:47 AWST

Monitoring - HPE have returned all CDUs to operation. We will monitor overnight.
Nov 24, 17:33 AWST

Update - HPE have handed Setonix back to Pawsey. We are performing the last checking.
Nov 24, 14:49 AWST

Identified - CBIS have returned cooling to service. HPE are currently checking the Cooling Distribution Units in Setonix and will be rebooting nodes shortly.
Nov 24, 13:50 AWST

Investigating - Pawsey has had a partial loss in cooling. It appears that the Setonix CPU and GPU nodes have been powered off.

We will provide updates when we have more information.
Nov 24, 12:52 AWST

Nov 24, 2025

Nov 23, 2025

No incidents reported.

Nov 22, 2025

No incidents reported.

Nov 21, 2025

No incidents reported.

Nov 20, 2025

No incidents reported.

Nov 19, 2025

No incidents reported.

Nov 18, 2025

No incidents reported.

Nov 17, 2025

Pawsey Data Portal performance issues

Resolved - This incident has been resolved.
Nov 17, 12:44 AWST

Monitoring - The transfer speeds have improved after Tuesday maintenance. Pawsey continues to monitor the performance of this service.
Nov 6, 11:06 AWST

Identified - Pawsey is working closely with the Mediaflux vendor to resolve the issue.
Oct 30, 14:07 AWST

Investigating - We are currently investigating an issue with the Pawsey data portal aka storage.pawsey.org.au We are working with the product vendor on this issue - so until we settle on a fix and implement it, please be advised that the portal performance is presently suboptimal and there maybe timeouts when staging data from Banksia tape to online via the Data portal web GUI or with pshell.
Oct 27, 10:17 AWST

Nov 16, 2025

No incidents reported.

Nov 15, 2025

No incidents reported.

Nov 14, 2025

No incidents reported.

Nov 13, 2025

No incidents reported.

Nov 12, 2025

No incidents reported.

Nov 11, 2025

Acacia Ingest security update

Completed - The scheduled maintenance has been completed.
Nov 11, 15:00 AWST

In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.
Nov 11, 14:00 AWST

Scheduled - We plan to address a high priority security bug in the authentication subsystem of Acacia Ingest. The service will remain available throughout, but there is a possibility that some S3 access keys will no longer work. All research groups who are potentially affected have been contacted.

If you encounter any problems, please contact help@pawsey.org.au
Nov 11, 09:40 AWST

Scheduled Maintenance

Pawsey Scheduled Maintenance (December) Dec 2, 2025 09:00-17:00 AWST

Past Incidents

Pawsey Scheduled Maintenance (December) Dec `2`, `2025` `09:00`-`17:00` AWST