Disk failure in askapbuffer
Incident Report for Pawsey Supercomputing Centre
Resolved
a third replacement disk has arrived from the vendor which seems to be working as expected.
Closing incident as filesystem now back to normal conditions
Posted May 14, 2020 - 13:33 AWST
Update
One of the drives was replaced successfully, a second failed drive has been replaced but wasn't recognised by the system. We're waiting for the vendor to investigate a possible faulty slot.
Posted May 12, 2020 - 12:06 AWST
Update
We have identified two failed drives (in separate arrays) and replacements are being ordered from the vendor.
Posted May 04, 2020 - 09:17 AWST
Identified
One of the drives has failed in the askapfs1 (/askapbuffer) filesystem in array02.

Whilst there should be sufficient redundancy to prevent an outage, there may be a performance impact while the array rebuilds


An error was reported by a disk drive. (disk: channel: 0, ID: 39, SN: VAJMXM0R, enclosure: 0, slot: 39) (Key,Code,Qual,UEC:0x3,0x11,0x0,0x0000) (CDB:Rd 00000005 0004)(Info:0x00000008)(CmdSpc:0x0, FRU:0x0, SnsKeySpc:0xD0)(Medium Error, unrecovered read error)
Posted May 02, 2020 - 10:48 AWST
This incident affected: Lustre filesystems (/askapbuffer filesystem).