Storage Platforms
Robust storage to support the next generation of data and AI
The Sanger's storage platforms couple with the HPC and cloud resources to produce a versatile system for enabling computational research.
iRODS¶
Archival storage of genome sequencing results is stored in iRODS both locally
and with off-site backups. The data from the Illumina and PacBio sequencers is
archived using the grid-based storage system (see (Chiang et al, 2011)).
This storage platform is accessible from the farm within batch jobs and the OpenStack cloud, enabling a wide range of bioinformatics pipelines.
The iRODS system has a total capacity of 59.41PB.
NFS¶
The farm uses NFS for networked user home directories, and for more permanent storage areas.
This includes the warehouse areas as well as team-specific ares which are used
for long-term storage of project-specific tools and results. This ensures that
research performed using the Sanger's HPC is as repeatable as possible, since the
tools used and the results obtained from each team are retained for each study.
Additionally, the software area hosts standardised and RSE-RTP-maintained
software stacks and modules. This area is maintained with direct consultation
and collaboration with research teams and allows for a higher standard of
repeatability. See software services for details.
The capacity of the NFS system is 4.06PB.
Lustre¶
Lustre is the high performance striped storage system used for batch computing processes on the farm. It is designed for handling the high IO loads generated by thousands of batch jobs running simultaneously on the farm, and is managed by an RSE-RTP-led team in consultation with research programme informatics teams.
Lustre storage is managed by quota per research programme and team. Since the purpose of the Lustre storage is to handle the IO loads of the farm, it is designed for intermediate results and not permanent storage - thus the system is not backed up.
There are seven on-site disks for storage, totalling 29.45PB of capacity.
Next generation storage procurement¶
Read more about our plans to procure storage to bolster the iRODS system for the next generation of data production from the sequencers here.