Pawsey Supercomputing Research Centre
Update - Filesystem "/askapbuffer" (5:15pm)
* After e2fsck on ost00[20|22|26] the volumes are now mountable / writable
* Partial nodes in the askapingest cluster was stuck and would not reconnect to the volumes which has been addressed ie
* "lfs check: error: check 'askapfs1-OST0022-osc-ffff9c33b8dd8800': Cannot send after transport endpoint shutdown (108)
* Askapingest cluster nodes were rebooted to get a clean state to enable remounting the filesystem

Apr 28, 2026 - 17:16 AWST
Update - Storage Volumes OST00[22|26] have either become readonly or uncontactable
* The volume filesystem check is required for OST00[22|26]
* The storage volume pair will be taken offline to check these volumes
* Systems with these volumes mounted will freeze during this check until the filesystem is restored

After primary checks of OST00[22|26]
* Has been e2fsck

OST0020 has similar issues
* 4:05pm to address OST0020

Apr 28, 2026 - 15:09 AWST
Update - Pre-emptive replacement on Controller A for Storage Array 05 is pending
* Vendor has indicated there is backorder for the replacement part and is delayed

Apr 28, 2026 - 10:12 AWST
Update - We are continuing to monitor for any further issues.
Apr 23, 2026 - 11:11 AWST
Update - Support Logs has been reviewed by the vendor
* Recommendation pre-emptively replacing "Storage Controller A"
* Part will be shipped, where "Storage Controller A" will be replaced in Storage Array 05 in "/askapbuffer" system

Apr 23, 2026 - 10:45 AWST
Update - System has been restored
* We just waiting for a vendor review of support bundle logs before we close this incident

Apr 21, 2026 - 10:17 AWST
Monitoring - Storage Controller A has been restored for array05
* We are monitoring the storage controller A
* Storage Luns has restored High availability configuration access

Apr 20, 2026 - 10:10 AWST
Identified - We have identified an issue with the "Askap Buffer" lustre filesystem where
* Filesystem is functional / usable but in a degraded state
* Storage Array 05 no longer has high availability as "Storage Controller A" is non-functional
* There will be an attempt to remediate "Storage Controller A"

Apr 20, 2026 - 09:46 AWST
Setonix Degraded Performance
Login nodes Operational
Data-mover nodes Operational
Slurm scheduler Operational
Setonix work partition Operational
Setonix debug partition Operational
Setonix long partition Operational
Setonix copy partition Operational
Setonix askaprt partition Operational
Setonix highmem partition Operational
Setonix gpu partition Operational
Setonix gpu high mem partition Degraded Performance
Setonix gpu debug partition Operational
Lustre filesystems Degraded Performance
/scratch filesystem Operational
/software filesystem Operational
/askapbuffer filesystem Degraded Performance
/askapingest filesystem Operational
Storage Systems Operational
Acacia Ingest Operational
Acacia MWA Operational
Acacia Projects Operational
Banksia Operational
Data Portal Systems Operational
MWA ASVO Operational
ASKAP Operational
ASKAP ingest nodes Operational
ASKAP service nodes Operational
Central Services Operational
Authentication and Authorization Operational
Service Desk Operational
License Server Operational
Application Portal Operational
Origin Operational
/home filesystem Operational
/pawsey filesystem Operational
Central Slurm Database Operational
Documentation Operational
Visualisation Services Operational
Remote Vis Operational
Vis scheduler Operational
Setonix vis nodes Operational
Nebula vis nodes Operational
Visualisation Lab Operational
Reservation Operational
CARTA - Stable Operational
CARTA - Test Operational
Pawsey Remote VR Operational
The Australian Biocommons Operational
Fgenesh++ Operational
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance

Scheduled Maintenance

Pawsey Scheduled Maintenance (May) May 5, 2026 08:00-17:00 AWST

Maintenance will be carried out on Pawsey systems on Tuesday the 5th May to apply required patches and updates to improve the systems stability, security, and performance. This maintenance window will also be used to undertake other tasks which require down-time to achieve.

Planned work for this window includes:
• Pawsey's core network switch will have a firmware update applied. *This will interrupt all connectivity into Pawsey*.
• Update the firmware of Aruba management switches in Setonix.
• Setonix will have the latest bug and security fixes applied from openSUSE Leap 15.6.
• The removal of the 2025.03 software stack on Setonix.
• The gpu-dev partition will be reduced from 10 to 8 nodes.
• The gpu-highmem partition will have maximum walltime increased to 2 days.
• Acacia and Nectar deployment system will be reconfigured with new machines.
• Banksia will have the latest bug and security fixes from Rocky Linux.
• Banksia will have a ScoutAM upgrade.
• Banksia's storage array will have a firmare update.
• Mediaflux server will have the latest bug and security fixes from Rocky Linux.
• Mediaflux will be upgraded.
• Patching of visualisation services will be undertaken.
• Patching of core Pawsey services will be undertaken.

If you have any questions, please contact help@pawsey.org.au.

Posted on Apr 28, 2026 - 09:51 AWST
Allocated Cores (Setonix)
Fetching
Allocated Nodes (Setonix work partition)
Fetching
Allocated nodes (Setonix askaprt partition) ?
Fetching
Apr 28, 2026

Unresolved incident: Askapbuffer - Degraded - Storage Array 05 Controller A is non-functional.

Apr 27, 2026

No incidents reported.

Apr 26, 2026

No incidents reported.

Apr 25, 2026

No incidents reported.

Apr 24, 2026

No incidents reported.

Apr 23, 2026
Apr 22, 2026

No incidents reported.

Apr 21, 2026
Apr 20, 2026
Apr 19, 2026

No incidents reported.

Apr 18, 2026

No incidents reported.

Apr 17, 2026

No incidents reported.

Apr 16, 2026

No incidents reported.

Apr 15, 2026

No incidents reported.

Apr 14, 2026
Resolved - This incident has been resolved.
Apr 14, 09:57 AWST
Update - We are continuing to investigate this issue.
Apr 13, 14:48 AWST
Investigating - We are currently investigating with our tape service company an issue affecting Banksia Tape Library one. It's been escalated as P1.
Apr 13, 14:27 AWST