Pawsey Supercomputing Centre
All Systems Operational
Magnus Operational
Magnus Compute nodes ? Operational
Magnus login nodes ? Operational
Slurm (Magnus) ? Operational
Galaxy Operational
Galaxy Compute nodes ? Operational
Galaxy login nodes ? Operational
Slurm (Galaxy) Operational
Topaz Operational
Slurm Controller (topaz) ? Operational
GPU partition ? Operational
Zeus Operational
Zeus login node Operational
Zeus Compute nodes ? Operational
Galaxy ingest nodes ? Operational
Data Mover nodes (CopyQ) ? Operational
Slurm (Zeus) Operational
Central Slurm Database ? Operational
Lustre filesystems Operational
90 days ago
98.87 % uptime
Today
/scratch filesystem ? Operational
90 days ago
98.86 % uptime
Today
/group filesystem ? Operational
90 days ago
98.86 % uptime
Today
/astro filesystem ? Operational
90 days ago
98.87 % uptime
Today
/askapbuffer filesystem ? Operational
90 days ago
98.87 % uptime
Today
Nimbus Operational
Ceph storage ? Operational
Nimbus instances ? Operational
Nimbus dashboard ? Operational
Storage Systems Operational
Data Portal Systems Operational
Hierarchical Storage Management Systems Operational
MWA Nodes Operational
CASDA Nodes Operational
Central Services Operational
Authentication and Authorization ? Operational
Service Desk Operational
License Server Operational
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Major outage
Partial outage
No downtime recorded on this day.
had a major outage
had a partial outage
Active Instances (Nimbus)
Fetching
Active Cores (Nimbus)
Fetching
Allocated Nodes (Magnus) ?
Fetching
Allocated Nodes (Galaxy)
Fetching
Past Incidents
Jul 11, 2020

No incidents reported today.

Jul 10, 2020

No incidents reported.

Jul 9, 2020

No incidents reported.

Jul 8, 2020
Resolved - This incident has been resolved.
Jul 8, 10:44 AWST
Monitoring - Supercomputing clusters: Magnus, Galaxy, Zeus, and Topaz, along with lustre filesystems, have returned to operation.
Jul 3, 16:41 AWST
Identified - One of the lustre filesystems (/astro) isn't coming back online cleanly. A call has been lodged with the vendor and we're awaiting an update from them before proceeding. Supercompute services will remain offline until the filesystem issue is addressed.
Jul 2, 21:05 AWST
Update - All the affected services are fed via a single sub-board. CSIRO facilities staff are on their way to site to investigate cause and Pawsey staff will commence service recovery when given the all-clear
Jul 2, 16:25 AWST
Update - We are continuing to investigate this issue.
Jul 2, 15:57 AWST
Investigating - We are investigating a possible power loss to some parts of the Pawsey infrastructure, it seems to be services located in the supercompute cell. Staff are investigating and will update shortly. See also https://support.pawsey.org.au/documentation/display/US/I-2020-07-02-Pawsey
Jul 2, 15:34 AWST
Jul 7, 2020
Completed - All supercomputing and data systems have been restored to service after maintenance
Jul 7, 19:06 AWST
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.
Jul 7, 08:00 AWST
Scheduled - Pawsey Technicians will be using the July 2020 maintenance window to undertake preventative and remedial work to improve system performance and reliability across the Supercomputing and Data systems.
The following systems will intermittently be off-line during their maintenance window and the work may impact other systems which rely on them.
We appreciate your support and ask if you have any questions, please e-mail help@pawsey.org.au.
There are Maintenance and Incidents web pages located here which are updated with relevant information.
Jun 30, 10:48 AWST
Jul 6, 2020

No incidents reported.

Jul 5, 2020

No incidents reported.

Jul 4, 2020

No incidents reported.

Jul 3, 2020
Jul 2, 2020
Jul 1, 2020

No incidents reported.

Jun 30, 2020

No incidents reported.

Jun 29, 2020
Completed - layout_mdts_completed: 2
layout_osts_completed: 64
layout_repaired: 4815812
namespace_mdts_completed: 2
Jun 29, 14:18 AWST
Update - Scheduled maintenance is still in progress. We will provide updates as necessary.
Jun 29, 14:17 AWST
Update - Dry-run complete, running again and allowing to make changes
Jun 29, 13:20 AWST
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.
Jun 29, 12:00 AWST
Scheduled - We will be performing an online lfsck of the /askapbuffer filesystem this afternoon. During that time access will continue unaffected, but performance may be lower than normal
Jun 29, 09:23 AWST
Jun 28, 2020

No incidents reported.

Jun 27, 2020

No incidents reported.