Streamlining AI Workflows

In support of the research initiatives of the AICE lab, and the CASM research programme, the ability to fully utilise the power of many GPU was required to match the needs of their intensive AI workflows.

To accommodate these needs, the RTP and RSE team has ensured that suitable hardware is available and have dedicated two Nvidia 8xH100 hosts to the highly parallelised workflows required. The hosts are part of a dedicated, restricted-access queue allowing users to execute their workflows across 16 GPU - greatly empowering the training and inference of the teams' AI models.

Additionally, the RSE team has put together standardised LSF tooling for submission of cross-host parallel GPU job submission, further contributing to the repeatability and reliability of the research while presenting users with a low barrier to entry for running their pipelines.