Pawsey Supercomputing Research Centre
Update - Currently, Data Portal and pshell users may be experiencing unusually long wait times downloading files from banksia. This will sometimes end in errors (eg timeout or content not available) however the banksia scheduler will still eventually bring the files back online and you will be able to download when trying again a day or so after the initial attempt. We are currently investigating this with the vendor as a potential scheduler issue.
Jan 23, 2025 - 14:47 AWST
Update - We have resolved most of the instability between Mediaflux and the tape storage system. Some minor issues remain which we are continuing to monitor and work on.
Dec 18, 2024 - 13:39 AWST
Update - Work continues with two Vendors to get this resolved. We will be making and testing some recommended steps today to see if this can be completely resolved.
Dec 13, 2024 - 07:57 AWST
Update - A setting was adjusted on the Mediaflux server and this has improved success rate for obtaining files however some issues still remain which we are still investigating with the vendor.
Dec 12, 2024 - 08:02 AWST
Investigating - We are aware of an issue that may be affecting some users attempting file transfers with Mediaflux (data portal and pshell.) It is currently being investigated.
Dec 11, 2024 - 11:44 AWST
Setonix Operational
Login nodes ? Operational
Data-mover nodes ? Operational
Slurm scheduler ? Operational
Setonix work partition Operational
Setonix debug partition Operational
Setonix long partition Operational
Setonix copy partition Operational
Setonix askaprt partition Operational
Setonix highmem partition Operational
Setonix gpu partition Operational
Setonix gpu high mem partition Operational
Setonix gpu debug partition Operational
Lustre filesystems Operational
/scratch filesystem ? Operational
/software filesystem ? Operational
/askapbuffer filesystem ? Operational
/askapingest filesystem ? Operational
Storage Systems Operational
Acacia - Projects ? Operational
Banksia ? Operational
Data Portal Systems ? Operational
MWA Nodes Operational
CASDA Nodes Operational
Acacia - Ingest ? Operational
MWA ASVO ? Operational
ASKAP Operational
ASKAP ingest nodes ? Operational
ASKAP service nodes Operational
Nimbus Operational
Ceph storage ? Operational
Nimbus instances ? Operational
Nimbus dashboard ? Operational
Nimbus APIs ? Operational
Central Services Operational
Authentication and Authorization ? Operational
Service Desk Operational
License Server Operational
Application Portal ? Operational
Origin ? Operational
/home filesystem Operational
/pawsey filesystem Operational
Central Slurm Database ? Operational
Documentation ? Operational
Visualisation Services Operational
Remote Vis ? Operational
Vis scheduler ? Operational
Setonix vis nodes ? Operational
Nebula vis nodes ? Operational
Visualisation Lab Operational
Reservation ? Operational
CARTA - Stable ? Operational
CARTA - Test ? Operational
Pawsey Remote VR Operational
The Australian Biocommons Operational
Fgenesh++ ? Operational
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Allocated Cores (Setonix)
Fetching
Allocated Nodes (Setonix work partition)
Fetching
Allocated nodes (Setonix askaprt partition) ?
Fetching
Active Instances (Nimbus)
Fetching
Active Cores (Nimbus)
Fetching
Apr 25, 2025

No incidents reported today.

Apr 24, 2025
Resolved - Database communication issue with RemoteVis is resolved and the service is back to operational.
Apr 24, 09:19 AWST
Identified - RemoteVis has encountered db communication error. It affects on all RemoteVis users including Nebula and Setonix vis users. RemoteVis web portal login page not loading and users can't establish connection to any vis node.
Pawsey staff on working on this to bring it back up, asap.

Apr 24, 08:58 AWST
Apr 23, 2025

No incidents reported.

Apr 22, 2025

No incidents reported.

Apr 21, 2025

No incidents reported.

Apr 20, 2025

No incidents reported.

Apr 19, 2025

No incidents reported.

Apr 18, 2025

No incidents reported.

Apr 17, 2025
Resolved - All OSTs in scratch are operational and we will continue to monitor the filesystem over the weekend.

The issue may have been caused by the flash pool becoming critically full.

Please remember that the scratch filesystem is a shared resource for the temporary storage of results of simulations and data processing.

Please be mindful of other researchers, and do not store unnecessary data.

Apr 17, 15:40 AWST
Monitoring - The problematic OSTs have been checked and /scratch is fully operational.
Apr 17, 14:04 AWST
Update - HPE have reported two of the OSTs have "journal errors" are are performing a check.
Apr 17, 11:14 AWST
Update - Two OSSes have failed.
Apr 17, 11:00 AWST
Investigating - There is performance degradation in relation to the "/scratch" filesystem
* This is related to SSD storage OST pools being close to full capacity

Apr 17, 10:35 AWST
Apr 16, 2025

No incidents reported.

Apr 15, 2025

No incidents reported.

Apr 14, 2025

No incidents reported.

Apr 13, 2025

No incidents reported.

Apr 12, 2025

No incidents reported.

Apr 11, 2025
Completed - This maintenance completed without issue. The system is available for use again.
Apr 11, 11:42 AWST
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.
Apr 11, 11:00 AWST
Update - We will be undergoing scheduled maintenance during this time.
Apr 11, 08:06 AWST
Scheduled - We need to take Banksia offline for an hour today to downgrade the Versity software version due to a new bug. This will affect the scheduler and archive and the S3 gateways.
Apr 11, 07:50 AWST