Pawsey Supercomputing Centre
Monitoring - The replacement switch has been fitted and nodes are being brought back online
Dec 8, 10:58 AWST
Identified - Due to a hardware failure with an infiniband switch, the 'nvlinkq' partition on Topaz and the 'copyq' partition on Garrawarla are presently unavailable. We are awaiting a replacement switch from the vendor, but do not have an ETA at this time.
Nov 18, 15:24 AWST
Magnus Operational
Magnus Compute nodes ? Operational
Magnus login nodes ? Operational
Slurm Controller (Magnus) ? Operational
Galaxy Operational
Galaxy Compute nodes ? Operational
Galaxy login nodes ? Operational
Slurm Controller (Galaxy) Operational
Topaz Operational
GPU partition ? Operational
Topaz login nodes Operational
Slurm Controller (topaz) ? Operational
Zeus Operational
Zeus Compute nodes ? Operational
Zeus login node Operational
Data Mover nodes (CopyQ) ? Operational
Slurm Controller (Zeus) Operational
Lustre filesystems Operational
90 days ago
100.0 % uptime
Today
/scratch filesystem ? Operational
90 days ago
100.0 % uptime
Today
/group filesystem ? Operational
90 days ago
100.0 % uptime
Today
/astro filesystem ? Operational
90 days ago
100.0 % uptime
Today
/askapbuffer filesystem ? Operational
90 days ago
100.0 % uptime
Today
/askapingest filesystem ? Operational
90 days ago
100.0 % uptime
Today
ASKAP Operational
ASKAP ingest nodes ? Operational
ASKAP service nodes Operational
Garrawarla Operational
Garrawarla compute nodes ? Operational
Garrawarla login node Operational
Slurm Controller (Garrawarla) Operational
Nimbus Operational
Ceph storage ? Operational
Nimbus instances ? Operational
Nimbus dashboard ? Operational
Storage Systems Operational
Data Portal Systems Operational
Hierarchical Storage Management Systems Operational
MWA Nodes Operational
CASDA Nodes Operational
Central Services Operational
Authentication and Authorization ? Operational
Service Desk Operational
License Server Operational
Application Portal ? Operational
Origin ? Operational
/home filesystem Operational
/pawsey filesystem Operational
Central Slurm Database ? Operational
Visualisation Services Operational
Remote Vis ? Operational
The Australian Biocommons Operational
Fgenesh++ ? Operational
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Major outage
Partial outage
No downtime recorded on this day.
No data exists for this day.
had a major outage.
had a partial outage.
Active Instances (Nimbus)
Fetching
Active Cores (Nimbus)
Fetching
Allocated Nodes (Magnus workq) ?
Fetching
Allocated Nodes (Galaxy workq) ?
Fetching
Past Incidents
Dec 8, 2021
Resolved - This incident has been resolved.
Dec 8, 10:58 AWST
Monitoring - The Array rebuilt successfully yesterday onto a hot spare and a replacement disk has been despatched from the vendor.
Dec 7, 05:16 AWST
Investigating - Over the weekend, one of the HDDs in /askapbuffer has failed, leading to degraded performance and blocking of IO to one of the OSTs.
Dec 6, 06:01 AWST
Dec 7, 2021
Dec 6, 2021
Dec 5, 2021

No incidents reported.

Dec 4, 2021

No incidents reported.

Dec 3, 2021

No incidents reported.

Dec 2, 2021

No incidents reported.

Dec 1, 2021

No incidents reported.

Nov 30, 2021

No incidents reported.

Nov 29, 2021

No incidents reported.

Nov 28, 2021

No incidents reported.

Nov 27, 2021

No incidents reported.

Nov 26, 2021

No incidents reported.

Nov 25, 2021

No incidents reported.

Nov 24, 2021

No incidents reported.