Get webhook notifications whenever Pawsey Supercomputing Research Centre creates an incident, updates an incident, resolves an incident or changes a component status.
Resolved -
This has been resolved * Temperature rise on the SANs matches the read/write pattern on storage volume (OST) which happened to be on front end storage oss04 which matches back end askapbuffer storage array 07|08 * It's been established it's normal behaviour with this workload pattern
Mar 26, 12:00 AWST
Monitoring -
A fix has been implemented and we are monitoring the results.
Mar 26, 11:19 AWST
Identified -
We are investigating an issue with Lustre Filesystem "/askapbuffer" * There is an artificial high load / temp on one of the physical back end SAN storage unit Array8 * It look like there is high load lustre thread on one of the front end lustre node ie OSS4 which attached to this unit * There will be high availability failover of OSS4 to OSS3 to it's matching pair * Then the original storage LUNS will be restored back from OSS3 to OSS4 * Failover has been completed
Mar 26, 10:36 AWST
Resolved -
This item has been resolved. Both Tape Libraries are back into full operation for Banksia.
Mar 25, 12:17 AWST
Identified -
We have identified the issue with our external engineering company. and we plan to return both tape libraries to full use this week.
Mar 23, 12:34 AWST
Update -
We are continuing to work on this issue with our external engineering company.
Mar 20, 12:54 AWST
Update -
We are continuing to work on this issue with our external engineering company and this may continue until Friday at a minimum
Mar 19, 11:12 AWST
Update -
We are continuing to work on this issue with our external engineering company.
Mar 18, 14:58 AWST
Update -
We are continuing to work on this issue with our external engineering company.
Mar 16, 13:17 AWST
Investigating -
The Banksia service is currently in a degraded, “at risk” state as it is operating with only one tape library instead of the standard two. As a result, the alternate copy of files on tape will be unavailable for staging or archiving until Library 1 is restored to service. The secondary copy is still available for all data so this should not impact the Banksia offline files service. If you experience any issues accessing data please let us know at help@pawsey.org.au
The service maintenance provider has required some work to be undertaken on Library 1 to resolve a minor fault affecting a limited number of tapes. This work will take some time but the engineer indicates that in the worst case it will take until Tuesday but they are hoping this is resolved no later than COB Monday.
Mar 16, 13:16 AWST