Pawsey Supercomputing Research Centre

Partially Degraded Service

Setonix Degraded Performance
Login nodes Operational
Data-mover nodes Operational
Slurm scheduler Operational
Setonix work partition Operational
Setonix debug partition Operational
Setonix long partition Operational
Setonix copy partition Operational
Setonix askaprt partition Operational
Setonix highmem partition Degraded Performance
Setonix gpu partition Operational
Setonix gpu high mem partition Operational
Setonix gpu debug partition Operational
Lustre filesystems Operational
/scratch filesystem Operational
/software filesystem Operational
/askapbuffer filesystem Operational
/askapingest filesystem Operational
Storage Systems Operational
Acacia Ingest Operational
Acacia MWA Operational
Acacia Projects Operational
Banksia Operational
Data Portal Systems Operational
MWA ASVO Operational
ASKAP Operational
ASKAP ingest nodes Operational
ASKAP service nodes Operational
Central Services Operational
Authentication and Authorization Operational
Service Desk Operational
License Server Operational
Application Portal Operational
Origin Operational
/home filesystem Operational
/pawsey filesystem Operational
Central Slurm Database Operational
Documentation Operational
Visualisation Services Operational
Remote Vis Operational
Vis scheduler Operational
Setonix vis nodes Operational
Nebula vis nodes Operational
Visualisation Lab Operational
Reservation Operational
CARTA - Stable Operational
CARTA - Test Operational
Pawsey Remote VR Operational
The Australian Biocommons Operational
Fgenesh++ Operational
Nimbus - Legacy Operational
Ceph storage Operational
Nimbus instances Operational
Nimbus dashboard Operational
Nimbus APIs Operational
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance

Scheduled Maintenance

Pawsey HV Shutdown Feb 9, 2026 11:00 - Feb 12, 2026 12:00 AWST

Mandatory Annual testing of the Site Main Electrical ACB Switches and High Voltage equipment will be performed on the 10th February 2026.

Pawsey will shutdown all services housed in the Pawsey Centre starting at 11am on Monday 9th February 2026 in preparation for the mandatory testing.

Staff will commence returning services and performing routine maintenance activities on Wednesday, 11th February 2026.

Systems will return to service progressively throughout Wednesday 11th February and all services are expected to be returned to researchers by noon Thursday 12th February 2026.

Planned work for this window includes:
• Setonix will have it's /software filesystem replaced with an NFS based filesystem. A final synchronisation of the Lustre filesystem and NFS filesystem will be performed.
• The new visualisation software stack on the Setonix Remote Visualisation nodes will be made default.
• Banksia will have a ScoutAM upgrade.
• The tape libraries supporting Banksia will have a firmware update.
• Acacia Projects will conclude migration work off Puppet infrastructure.
• Patching of core Pawsey services will be undertaken.

The removal of the 2023.08, 2025.03, and 2024.05 software stacks that were previously scheduled for the February maintenance has been rescheduled to be removed in that order across the March, April, and May maintenances to allow researchers more time to migrate to the 2025.08 software stack.

Posted on Feb 03, 2026 - 16:59 AWST
Feb 4, 2026

No incidents reported today.

Feb 3, 2026

No incidents reported.

Feb 2, 2026
Resolved - The secondary tape library maintenance was successful on Friday and we've been monitoring it over the weekend.
Feb 2, 09:33 AWST
Identified - Banksia – one of two tape copies unavailable (at risk)
The Banksia service is currently in a degraded, “at risk” state as it is operating with only one tape library instead of the standard two. As a result, the secondary copy of files will be unavailable for staging or archiving until Library 2 is restored to service. The primary copy is still available for all data so this should not impact the service. If you experience any issues accessing data please let us know at help@pawsey.org.au

The issue has been traced to a faulty tape drive in DBA 5. To resolve this, the field engineer will remove the drives from the DBA, replace the faulty unit, and reseat all components. Since DBA 5 is currently blocked, DBA 6 will need to be removed first to allow access for the repair work. This work will take some time but the engineer indicates that in the worst case it will take until Tuesday but they are hoping for resolution today.

Jan 30, 09:36 AWST
Feb 1, 2026

No incidents reported.

Jan 31, 2026

No incidents reported.

Jan 30, 2026
Jan 29, 2026

No incidents reported.

Jan 28, 2026

No incidents reported.

Jan 27, 2026
Resolved - HPE believe the issue has been resolved and has closed the support case.

They believe the issue was:
"Global Flow Control was disabled on the E1000's. Once enabled performance was regained. ClusterStor team working on a fix (CSPROD-18819) to make Global Flow Control enabled all the time moving forward."

Jan 27, 10:06 AWST
Update - HPE made change to the Global Flow Control on scratch and software during January's maintenance, as well as a modification to the configuration of the LAG ports in the Slingshot fabric.

We haven't seen any C_EC_CRIT errors on the login nodes since maintenance, and are continuing to monitor them like hawks.

Jan 19, 08:21 AWST
Update - setonix-04 reported a C_EC_CRIT error yesterday. It is not in the login pool, but HPE are stumped at why this is happening.
Nov 7, 13:40 AWST
Monitoring - HPE rebooted a number of Slingshot switches during maintenance.

We haven't observed any Slingshot errors on the login, data mover or visualisation nodes for 48 hours.

We will continue to monitor.

Nov 6, 12:29 AWST
Update - HPE have provided no new information
Oct 31, 08:11 AWST
Update - HPE have provided no new information.
Oct 27, 10:59 AWST
Update - HPE have provided no new information.
Oct 24, 21:01 AWST
Update - HPE have provided no new information.

setonix-08 has slingshot issues. Pawsey is rebooting it.

Oct 20, 13:25 AWST
Update - setonix-02 and setonix-03 have been added back to the RR DNS.
Oct 16, 14:09 AWST
Investigating - There appears to be an issue will the Slingshot interfaces in the login nodes in Setonix. We appear to be down to 1 login node in the normal pool of login nodes.

We have had a case open with HPE for weeks, but they appear to be no closer to providing any kind of solution.

Please, please, please, please don't run any computational intensive operations on the login nodes. We have lovely compute nodes for that.

Please be aware that you can log into setonix-workflow.pawsey.org.au and get access to additional "workflow" nodes.

Oct 16, 12:02 AWST
Jan 26, 2026

No incidents reported.

Jan 25, 2026

No incidents reported.

Jan 24, 2026

No incidents reported.

Jan 23, 2026

No incidents reported.

Jan 22, 2026

No incidents reported.

Jan 21, 2026

No incidents reported.