Pawsey Supercomputing Research Centre
Update - Currently, Data Portal and pshell users may be experiencing unusually long wait times downloading files from banksia. This will sometimes end in errors (eg timeout or content not available) however the banksia scheduler will still eventually bring the files back online and you will be able to download when trying again a day or so after the initial attempt. We are currently investigating this with the vendor as a potential scheduler issue.
Jan 23, 2025 - 14:47 AWST
Update - We have resolved most of the instability between Mediaflux and the tape storage system. Some minor issues remain which we are continuing to monitor and work on.
Dec 18, 2024 - 13:39 AWST
Update - Work continues with two Vendors to get this resolved. We will be making and testing some recommended steps today to see if this can be completely resolved.
Dec 13, 2024 - 07:57 AWST
Update - A setting was adjusted on the Mediaflux server and this has improved success rate for obtaining files however some issues still remain which we are still investigating with the vendor.
Dec 12, 2024 - 08:02 AWST
Investigating - We are aware of an issue that may be affecting some users attempting file transfers with Mediaflux (data portal and pshell.) It is currently being investigated.
Dec 11, 2024 - 11:44 AWST
Setonix Operational
Login nodes ? Operational
Data-mover nodes ? Operational
Slurm scheduler ? Operational
Setonix work partition Operational
Setonix debug partition Operational
Setonix long partition Operational
Setonix copy partition Operational
Setonix askaprt partition Operational
Setonix highmem partition Operational
Setonix gpu partition Operational
Setonix gpu high mem partition Operational
Setonix gpu debug partition Operational
Lustre filesystems Operational
/scratch filesystem (new) ? Operational
/software filesystem ? Operational
/askapbuffer filesystem ? Operational
/askapingest filesystem ? Operational
Storage Systems Operational
Acacia - Projects ? Operational
Banksia ? Operational
Data Portal Systems ? Operational
MWA Nodes Operational
CASDA Nodes Operational
Acacia - Ingest ? Operational
MWA ASVO ? Operational
ASKAP Operational
ASKAP ingest nodes ? Operational
ASKAP service nodes Operational
Nimbus Operational
Ceph storage ? Operational
Nimbus instances ? Operational
Nimbus dashboard ? Operational
Nimbus APIs ? Operational
Central Services Operational
Authentication and Authorization ? Operational
Service Desk Operational
License Server Operational
Application Portal ? Operational
Origin ? Operational
/home filesystem Operational
/pawsey filesystem Operational
Central Slurm Database ? Operational
Documentation ? Operational
Visualisation Services Operational
Remote Vis ? Operational
Vis scheduler ? Operational
Setonix vis nodes ? Operational
Nebula vis nodes ? Operational
Visualisation Lab Operational
Reservation ? Operational
CARTA - Stable ? Operational
CARTA - Test ? Operational
Pawsey Remote VR Operational
The Australian Biocommons Operational
Fgenesh++ ? Operational
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Allocated Cores (Setonix)
Fetching
Allocated Nodes (Setonix work partition)
Fetching
Allocated nodes (Setonix askaprt partition) ?
Fetching
Active Instances (Nimbus)
Fetching
Active Cores (Nimbus)
Fetching
Past Incidents
Feb 9, 2025

No incidents reported today.

Feb 8, 2025

No incidents reported.

Feb 7, 2025

No incidents reported.

Feb 6, 2025
Completed - All primary systems have been returned to service. Pawsey staff are still working on ancillary services, which we hope to have operational soon.

Setonix was returned to service last night. It has been updated to the latest available "Extended Support" version of the Cray Operating System which provides bug fixes and security patches.

The 2022.11 Pawsey software stack has been removed from /software as per normal operating procedures.

The CARTA service provided by the Setonix system is one of the ancillary services being worked on.

The purge policy on /scratch will be reinstated at 5 PM, Friday 7th February 2025. Any file which hasn't been accessed in 21 days will be removed.

Thank you for your patience. If you require any assistance, please reach out to use via help@pawsey.org.au.

And a big shout out to the CSIRO Building staff. You guys are rock stars.

Feb 6, 09:00 AWST
Update - Acacia (Projects and Ingest), Banksia, Nimbus and ASKAP filesystems (ingest and buffer) have all been returned to service.

The ASKAP ingest cluster is being powered on.

HPE are powering up Setonix after performing mechanical maintenance on the system. We have no ETA when Pawsey will be handed the system

Feb 5, 12:11 AWST
Update - Networking and core services have been restored. The Pawsey Team are currently starting up additional services.
Feb 5, 09:17 AWST
Update - Pawsey staff have been let back in the building. Networking and core services are being brought online.
Feb 4, 16:20 AWST
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.
Feb 3, 12:00 AWST
Update - Mandatory annual testing of the High Voltage equipment at the Pawsey Centre will be performed next week (4th February 2025).

Pawsey will start shutting down all services housed in the Pawsey Centre starting at 12 PM on the 3rd February 2025. This is to preserve the integrity of the data in our storage system and reduce the risk of damage to equipment.

By 5 PM Monday (3rd February 2025) all Pawsey Supercomputing Centre services will be unavailable.

We will start returning services as soon as we get the all clear from CBIS, starting on Wednesday, 5th February 2025.

Jan 28, 15:34 AWST
Scheduled - Mandatory Annual testing of the Site Main Electrical ACB Switches and High Voltage equipment will be performed on the 4th February 2025.

Pawsey will shutdown all services housed in the Pawsey Centre starting at 12 PM on the 3rd February 2025. This is to preserve the integrity of the data in our storage system and reduce the risk of damage to equipment.

During this time all Pawsey Supercomputing Centre services will be unavailable.

We will start returning services as soon as we get the all clear from CBIS, starting on Wednesday, 5th February 2025.

Jan 6, 10:34 AWST
Feb 5, 2025
Feb 4, 2025
Feb 3, 2025
Feb 2, 2025

No incidents reported.

Feb 1, 2025

No incidents reported.

Jan 31, 2025

No incidents reported.

Jan 30, 2025

No incidents reported.

Jan 29, 2025
Completed - The busbar has been replaced and all nodes have been tested. A number of nodes are currently sitting in reservations for HPE to investigate.
Jan 29, 13:34 AWST
Scheduled - HPE are performing maintenance on Cabinet X1007 in Setonix to replace a busbar. This requires all power to the cabinet to be isolated.

This will mean the nodes in the "gpu high mem partition" will be unavailable.

Jan 28, 09:19 AWST
Jan 28, 2025

No incidents reported.

Jan 27, 2025

No incidents reported.

Jan 26, 2025

No incidents reported.