Pawsey Supercomputing Research Centre
Update - Maintenance is in progress on the first-copy holding tape library, however some tapes in the second copy online tape library are presenting with issues. Usually this would trigger a retrieval from the first copy for these files however as the first copy library is offline until next week, this will cause a delay for a number of stages.
May 14, 2026 - 12:06 AWST
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.
May 11, 2026 - 07:05 AWST
Scheduled - We will be undertaking non-disrupting Banksia tape library maintenance at this time, so available tape copies will be reduced from two to one.
May 11, 2026 07:05 - May 15, 2026 18:05 AWST
Setonix Operational
Login nodes Operational
Data-mover nodes Operational
Slurm scheduler Operational
Setonix work partition Operational
Setonix debug partition Operational
Setonix long partition Operational
Setonix copy partition Operational
Setonix askaprt partition Operational
Setonix highmem partition Operational
Setonix gpu partition Operational
Setonix gpu high mem partition Operational
Setonix gpu debug partition Operational
Lustre filesystems Operational
/scratch filesystem Operational
/software filesystem Operational
/askapbuffer filesystem Operational
/askapingest filesystem Operational
Storage Systems Under Maintenance
Acacia Ingest Operational
Acacia MWA Operational
Acacia Projects Operational
Banksia Under Maintenance
Data Portal Systems Under Maintenance
MWA ASVO Under Maintenance
ASKAP Operational
ASKAP ingest nodes Operational
ASKAP service nodes Operational
Central Services Operational
Authentication and Authorization Operational
Service Desk Operational
License Server Operational
Application Portal Operational
Origin Operational
/home filesystem Operational
/pawsey filesystem Operational
Central Slurm Database Operational
Documentation Operational
Visualisation Services Operational
Remote Vis Operational
Vis scheduler Operational
Setonix vis nodes Operational
Nebula vis nodes Operational
Visualisation Lab Operational
Reservation Operational
CARTA - Stable Operational
CARTA - Test Operational
Pawsey Remote VR Operational
The Australian Biocommons Operational
Fgenesh++ Operational
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance

Scheduled Maintenance

Upstream network provider maintenance May 25, 2026 22:00 - May 26, 2026 05:00 AWST

Our upstream network provider is performing work on the links that provide Pawsey's primary Internet connection. There may be one or more brief interruptions to access to Pawsey during this time as we switch to alternative paths. Internal connectivity between Pawsey resources (Such as Setonix to Acacia) will not be impacted. Jobs on our HPC systems will continue to be scheduled and launched but may have deteriorated access during this window to talk to off-site licence servers.
Posted on May 13, 2026 - 12:52 AWST
Allocated Cores (Setonix)
Fetching
Allocated Nodes (Setonix work partition)
Fetching
Allocated nodes (Setonix askaprt partition) ?
Fetching
May 14, 2026

Unresolved incident: Banksia Spectralogic Annual Tape Library Cleans - Reduced capacity.

May 13, 2026

No incidents reported.

May 12, 2026

No incidents reported.

May 11, 2026
May 10, 2026

No incidents reported.

May 9, 2026

No incidents reported.

May 8, 2026

No incidents reported.

May 7, 2026

No incidents reported.

May 6, 2026

No incidents reported.

May 5, 2026
Completed - Maintenance is complete.

All services have been returned to service.

Three cheers to the superstars who toil behind the scenes.

Please note for Setonix.
* The 2025.03 software stack on Setonix has been removed.
* The gpu-dev partition has been reduced from 10 to 8 nodes.
* The gpu-highmem partition maximum walltime has been increased to 2 days.
* The "linaro-forge" module is now available, providing the map profiler and ddt debugger, and set to replace the "arm-forge" module in July.

If you have any questions, problems or just want to say hi, please contact the Helpdesk (help@pawsey.org.au).

May 5, 16:46 AWST
Update - Startup completed—Banksia is back in operation.
May 5, 15:51 AWST
Verifying - Core services have been returned to service.

Maintenance work completed on all the vis services.

Xenon are currently updating the storage controllers, and their ETA for return to Pawsey is 5:30 PM.

The slurm database pruning has been cancelled. Once the backup has been restored, Setonix will be verified it is fit for service. ETA for return to service is 6 PM.

Apologies for the delays.

May 5, 15:38 AWST
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.
May 5, 08:00 AWST
Scheduled - Maintenance will be carried out on Pawsey systems on Tuesday the 5th May to apply required patches and updates to improve the systems stability, security, and performance. This maintenance window will also be used to undertake other tasks which require down-time to achieve.

Planned work for this window includes:
• Pawsey's core network switch will have a firmware update applied. *This will interrupt all connectivity into Pawsey*.
• Update the firmware of Aruba management switches in Setonix.
• Setonix will have the latest bug and security fixes applied from openSUSE Leap 15.6.
• The removal of the 2025.03 software stack on Setonix.
• The gpu-dev partition will be reduced from 10 to 8 nodes.
• The gpu-highmem partition will have maximum walltime increased to 2 days.
• Acacia and Nectar deployment system will be reconfigured with new machines.
• Banksia will have the latest bug and security fixes from Rocky Linux.
• Banksia will have a ScoutAM upgrade.
• Banksia's storage array will have a firmare update.
• Mediaflux server will have the latest bug and security fixes from Rocky Linux.
• Mediaflux will be upgraded.
• Patching of visualisation services will be undertaken.
• Patching of core Pawsey services will be undertaken.

If you have any questions, please contact help@pawsey.org.au.

Apr 28, 09:51 AWST
May 4, 2026

No incidents reported.

May 3, 2026

No incidents reported.

May 2, 2026

No incidents reported.

May 1, 2026
Resolved - Askapbuffer
* There has been no signs of any volumes being marked as being inactive / or uncontactable after filesytem check was performed on all volumes served by Storage Array 05
* Debrief was given in the RTG group
* The vendor has confirmed from the support bundle, the replacement controller is working per normal

May 1, 11:09 AWST
Monitoring - OST Filesystem Scan
* File system check was completed on the remainder Array Storage 05 LUNs/OSTs
* Only OST0024 presented issues which has to be corrected

Storage Controller replacement part
* Part Arrived during the remediation
* Storage Array 05 Controller A has been replaced
* Storage Unit looks correct, vendor storage bundle was collected to be submitted to confirm system health

Askap Ingest Cluster
* As with previous, the volumes with the problematic volumes ie inactive was locked and would not reconnect
* Cluster was rebooted to get a clean slate to re-attached the missing OSTs

Setonix
* The Setonix Data movers nodes reconnected to /askapbuffer once it was fully online
* Casda nodes reconnected to /askapbuffer" once it was fully online

Apr 29, 17:12 AWST
Identified - Storage Volume becoming inactive / locked
* It's has been identified the other volumes that was attached to askapbuffer oss03 from Array 05 is developing similar issues to the other volumes that was checked
* "Askapbuffer" will be going down at 3pm AWST
* Where a filesystem check will be performed on the other OST volumes pertaining to Array 05 that sits on askapbuffer oss03
* Ie OST00[21,23,24,25,27]
* Systems with these volumes mounted on it will literally freeze until the filesystem is released

Apr 29, 14:04 AWST
Update - Filesystem "/askapbuffer" (5:15pm)
* After e2fsck on ost00[20|22|26] the volumes are now mountable / writable
* Partial nodes in the askapingest cluster was stuck and would not reconnect to the volumes which has been addressed ie
* "lfs check: error: check 'askapfs1-OST0022-osc-ffff9c33b8dd8800': Cannot send after transport endpoint shutdown (108)
* Askapingest cluster nodes were rebooted to get a clean state to enable remounting the filesystem

Apr 28, 17:16 AWST
Update - Storage Volumes OST00[22|26] have either become readonly or uncontactable
* The volume filesystem check is required for OST00[22|26]
* The storage volume pair will be taken offline to check these volumes
* Systems with these volumes mounted will freeze during this check until the filesystem is restored

After primary checks of OST00[22|26]
* Has been e2fsck

OST0020 has similar issues
* 4:05pm to address OST0020

Apr 28, 15:09 AWST
Update - Pre-emptive replacement on Controller A for Storage Array 05 is pending
* Vendor has indicated there is backorder for the replacement part and is delayed

Apr 28, 10:12 AWST
Update - We are continuing to monitor for any further issues.
Apr 23, 11:11 AWST
Update - Support Logs has been reviewed by the vendor
* Recommendation pre-emptively replacing "Storage Controller A"
* Part will be shipped, where "Storage Controller A" will be replaced in Storage Array 05 in "/askapbuffer" system

Apr 23, 10:45 AWST
Update - System has been restored
* We just waiting for a vendor review of support bundle logs before we close this incident

Apr 21, 10:17 AWST
Monitoring - Storage Controller A has been restored for array05
* We are monitoring the storage controller A
* Storage Luns has restored High availability configuration access

Apr 20, 10:10 AWST
Identified - We have identified an issue with the "Askap Buffer" lustre filesystem where
* Filesystem is functional / usable but in a degraded state
* Storage Array 05 no longer has high availability as "Storage Controller A" is non-functional
* There will be an attempt to remediate "Storage Controller A"

Apr 20, 09:46 AWST
Apr 30, 2026
Resolved - Mitigation in place.
Apr 30, 11:03 AWST
Identified - They have a critical security vulnerability which can only be fixed by rebooting them.
Apr 30, 09:43 AWST