Singularity Containers - Failed to find loop device.
Resolved
We have put an epilog script to manage the loop devices. As always there will be edge cases, so if you run into the problem, please e-mail help@pawsey.org.au.
Posted Dec 22, 2023 - 08:55 AWST
Update
We are continuing to investigate this issue.
Posted Nov 23, 2023 - 10:11 AWST
Update
Dear Researchers

We have identified a secondary issue pertaining to the error

“error: while mounting image proc/self/fd/3: failed to find loop device: could not attach image file to loop device: no loop devices available”

A hotfix has been applied to Setonix as of ~9:40am AWST on 23rd of November to address the secondary issue pertaining to the use of singularity
Posted Nov 23, 2023 - 10:06 AWST
Update
To address the recently reported incidents involving loop device errors using Singularity, we have made a change to the Singularity configuration (Tuesday 21st of November at 8:30 AM AWST). This enables shared loop devices in Singularity, so that multiple instances of the same container will share a device. This will reduce the total number of loop devices used by some workflows and reduce the frequency of encountering related errors.

We also note that the previous software stack available under /software/setonix/2022.11 is deprecated, and researchers should not use the previous versions of Singularity. It is expected that this older version of Singularity will be removed in the upcoming maintenance on the 5th of December.
Posted Nov 21, 2023 - 08:35 AWST
Investigating
Dear Researchers,

Setonix is currently experiencing a fault which affects some containerised workflows, particularly those employing MPI or using GPUs. Affected jobs may fail will an error message such as:

“error: while mounting image proc/self/fd/3: failed to find loop device: could not attach image file to loop device: no loop devices available”

We are working to diagnose and remedy the problem. If you encounter this issue, please submit a ticket to the Pawsey Helpdesk by emailing help@pawsey.org.au and include the jobID of the failed job.
Posted Nov 17, 2023 - 16:40 AWST
This incident affected: Setonix (Login nodes, Data-mover nodes, Setonix work partition, Setonix debug partition, Setonix long partition, Setonix copy partition, Setonix askaprt partition, Setonix highmem partition, Setonix gpu partition, Setonix gpu high mem partition, Setonix gpu debug partition).