Maintenance will be carried out on Pawsey systems on Tuesday the 3rd of October to apply required patches and updates to improve the systems stability, security, and performance. This maintenance window will also be used to undertake other tasks which require down-time to achieve.
Planned work for this window includes: * Patching of core Pawsey service (including LDAP, Jira and Confluence) * Upgrade Tape Library firmware * Replacement of DDN hardware * Running HPL on Setonix as part of acceptance testing Posted on
Sep 26, 2023 - 15:36 AWST
Coolant has been flushed. Nodes returned to service.
Sep 29, 09:54 AWST
In progress -
Scheduled maintenance is currently in progress. We will provide updates as necessary.
Sep 25, 07:00 AWST
HPE will be flushing and replacing the coolant in the racks delivered as part of Phase 1 (nid001000 to nid001511). The highmem partition will be unavailable and the work partition will be operating at reduced capacity.
Sep 18, 14:37 AWST
We have seen no further issues with the meta data servers. We are still waiting for a root cause analysis from HPE.
Sep 26, 15:26 AWST
Both meta data targets have been remounted by HPE engineers and they are monitoring the system. A root cause of the issue is currently under investigation.
Sep 25, 14:25 AWST
Today is a public holiday in Western Australia, however we are still monitoring this incident and awaiting an update from our vendor. We are aware that there are over 600 jobs in the slurm queue stuck in 'Completing' state, presumably because they were unable to finalise any file IO before exiting.
Sep 25, 05:10 AWST
Both meta data servers have been STONITHed. A critical case with HPE has been lodged.
Sep 22, 17:34 AWST
Both meta data servers have booted. The Meta Data targets has been mounted and are currently in recovery mode.
Sep 22, 16:17 AWST
We are currently investigating this issue.
Sep 22, 13:21 AWST