Pawsey Supercomputing Research Centre

All Systems Operational

Setonix Operational
Login nodes ? Operational
Data-mover nodes ? Operational
Slurm scheduler ? Operational
Setonix work partition Operational
Setonix debug partition Operational
Setonix long partition Operational
Setonix copy partition Operational
Setonix askaprt partition Operational
Setonix highmem partition Operational
Setonix gpu partition Operational
Setonix gpu high mem partition Operational
Setonix gpu debug partition Operational
Lustre filesystems Operational
/scratch filesystem ? Operational
/software filesystem ? Operational
/askapbuffer filesystem ? Operational
/askapingest filesystem ? Operational
Storage Systems Operational
Acacia - Projects ? Operational
Banksia ? Operational
Data Portal Systems ? Operational
MWA Nodes Operational
CASDA Nodes Operational
Acacia - Ingest ? Operational
MWA ASVO ? Operational
ASKAP Operational
ASKAP ingest nodes ? Operational
ASKAP service nodes Operational
Nimbus Operational
Ceph storage ? Operational
Nimbus instances ? Operational
Nimbus dashboard ? Operational
Nimbus APIs ? Operational
Central Services Operational
Authentication and Authorization ? Operational
Service Desk Operational
License Server Operational
Application Portal ? Operational
Origin ? Operational
/home filesystem Operational
/pawsey filesystem Operational
Central Slurm Database ? Operational
Documentation ? Operational
Visualisation Services Operational
Remote Vis ? Operational
Vis scheduler ? Operational
Setonix vis nodes ? Operational
Nebula vis nodes ? Operational
Visualisation Lab Operational
Reservation ? Operational
CARTA - Stable ? Operational
CARTA - Test ? Operational
Pawsey Remote VR Operational
The Australian Biocommons Operational
Fgenesh++ ? Operational
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Allocated Cores (Setonix)
Fetching
Allocated Nodes (Setonix work partition)
Fetching
Allocated nodes (Setonix askaprt partition) ?
Fetching
Active Instances (Nimbus)
Fetching
Active Cores (Nimbus)
Fetching
May 18, 2025

No incidents reported today.

May 17, 2025

No incidents reported.

May 16, 2025
Resolved - This incident has been resolved
May 16, 16:03 AWST
Monitoring - Flash pool usage has returned to less concerning levels and we are monitoring the levels and implementing more systems to migrate data that has not been used in a while to disk to free up the highest speed storage. Pawsey remains impressed with our colleagues ability to generate data on Setonix at a rate faster than we can deal with it and acknowledge that under normal circumstances we'd be celebrating it rather than panicking. Thank you for your patience.
May 16, 08:50 AWST
Investigating - There are performance / usability issues pertaining to "/scratch" filesystem
* Flash Storage pools are approaching Max capacity
* This affects the overall performance / usability of scratch
* Data generation is exceeding data migration to Non-flash pools (Mitigation efforts)

If you have unnecessary files on /scratch, please remove them ASAP

May 15, 15:23 AWST
May 15, 2025
May 14, 2025

No incidents reported.

May 13, 2025

No incidents reported.

May 12, 2025

No incidents reported.

May 11, 2025

No incidents reported.

May 10, 2025

No incidents reported.

May 9, 2025

No incidents reported.

May 8, 2025

No incidents reported.

May 7, 2025
Completed - All Pawsey services have been returned to service. Please note that next maintenance is currently scheduled for the *last* Tuesday in June due to HPE resourcing issues. We will provide further updates in the near future.

Work completed:
• HPE have installed RAID cards in the SU Leader nodes in Setonix.
• The NetApp providing the /home filesystem has had an OnTap upgrade.
• Banksia has had new batteries installed in the storage controllers.
• Temporary bridge interfaces on Acacia MWA hosts have been removed.
• ASKAP Buffer has had disk firmware updates applied.
• Operating System updates to the SLURM database daemon and SLURM controller for ASKAP Ingest have been applied.
• Visualisation services have been patched.
• Core Pawsey services have been patched.

Thank you to all Pawsey and HPE staff involved.

As always, be kind, and e-mail our friendly help desk staff (help@pawsey.org.au) if you encounter any issues.

May 7, 10:22 AWST
Verifying - HPE handed Setonix to Pawsey at 4 AM this morning.

Pawsey have booted Setonix into as testing state and are currently running reframe.

May 7, 08:46 AWST
Update - Apparently HPE have not "resolved the issue" and refuse to tell Pawsey what the issue is.
May 6, 21:31 AWST
Update - HPE have "resolved the issue" however will require at least 4 hours before they can return Setonix to Pawsey.

At this stage, Pawsey estimate that they won't be able to return Setonix to researchers until tomorrow.

May 6, 18:00 AWST
Update - We (Pawsey) are still waiting for HPE to hand Setonix back to Pawsey. We have no ETA. We wait.
May 6, 16:39 AWST
Update - Core services have been patched.

ASKAP Ingest has been handed back to ASKAP.

HPE are still struggling with the SU Leader nodes. We have no ETA on when Setonix will be handed back to Pawsey.

May 6, 14:07 AWST
Update - Banksia up and staging has resumed.
May 6, 10:16 AWST
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.
May 6, 08:00 AWST
Scheduled - Maintenance will be carried out on Pawsey systems on Tuesday the 6th May to apply required patches and updates to improve the systems stability, security, and performance. This maintenance window will also be used to undertake other tasks which require down-time to achieve.

Planned work for this window includes:
• HPE will be installing RAID cards in the SU Leader nodes in Setonix. This will allow HPE to provide operations staff with advanced monitoring as part of their HPE Cray EX platform.
• The NetApp providing the /home filesystem will have an OnTap upgrade.
• Banksia will have new batteries installed in the storage controllers.
• Operating system upgrades for Acacia.
• Remove temporary additional bridge interfaces on Acacia MWA hosts.
• ASKAP Buffer will be having disk firmware updates.
• Operating System updates to the SLURM database daemon and SLURM controller for ASKAP Ingest.
• Patching of visualisation services will be undertaken.
• Patching of core Pawsey services will be undertaken.

We expect to be able to bring all services back by the end of the day (in the case of Setonix, sometime in the evening). If you have any questions, please contact help@pawsey.org.au.

Apr 29, 12:46 AWST
May 6, 2025
May 5, 2025

No incidents reported.

May 4, 2025

No incidents reported.