Writes have been re-enabled for the 4 OSTs that were previously 100% full. /scratch is now back to it's normal configuration.
Posted Mar 27, 2025 - 09:05 AWST
Monitoring
Bulk deletion of older files from /scratch (see the policy at https://pawsey.atlassian.net/wiki/display/US//Filesystem+Policies) has increased the free capacity of the flash pool to over 100 TB. Individual OSTs within this pool are still between 90 and 95% full and researchers are requested to delete any unneeded files from /scratch to prevent jobs failing when they cannot write output.
Posted Mar 27, 2025 - 05:56 AWST
Investigating
The /scratch filesystem used by setonix is composed of two pools - a high performance flash component, and a much larger but slower disk component. The flash pool is presently 98% full with some individual OSTs having only a few tens of GB free. While there is still plenty of capacity (over 3PB) free on the disk pool, users may see jobs fail with write errors, especially if they are overriding the default striping applied by Pawsey.
Posted Mar 26, 2025 - 04:35 AWST
This incident affected: Lustre filesystems (/scratch filesystem (new)).