Skip to content

Storage Platforms

Robust storage to support the next generation of data and AI

The Sanger's storage platforms couple with the HPC and cloud resources to produce a versatile system for enabling computational research.

iRODS

Archival storage of genome sequencing results is stored in iRODS both locally and with off-site backups. The data from the Illumina and PacBio sequencers is archived using the grid-based storage system (see (Chiang et al, 2011)).

This storage platform is accessible from the farm within batch jobs and the OpenStack cloud, enabling a wide range of bioinformatics pipelines.

The iRODS system has a total capacity of 59.41PB.

NFS

The farm uses NFS for networked user home directories, and for more permanent storage areas.

This includes the warehouse areas as well as team-specific ares which are used for long-term storage of project-specific tools and results. This ensures that research performed using the Sanger's HPC is as repeatable as possible, since the tools used and the results obtained from each team are retained for each study.

Additionally, the software area hosts standardised and RSE-RTP-maintained software stacks and modules. This area is maintained with direct consultation and collaboration with research teams and allows for a higher standard of repeatability. See software services for details.

The capacity of the NFS system is 4.06PB.

Lustre

Lustre is the high performance striped storage system used for batch computing processes on the farm. It is designed for handling the high IO loads generated by thousands of batch jobs running simultaneously on the farm, and is managed by an RSE-RTP-led team in consultation with research programme informatics teams.

Lustre storage is managed by quota per research programme and team. Since the purpose of the Lustre storage is to handle the IO loads of the farm, it is designed for intermediate results and not permanent storage - thus the system is not backed up.

There are seven on-site disks for storage, totalling 29.45PB of capacity.

Next generation storage procurement

Read more about our plans to procure storage to bolster the iRODS system for the next generation of data production from the sequencers here.