This article originally appeared on The Next Platform >
While there are a lot of different file system and object storage options available for HPC and AI customers, many AI organizations and a lot of traditional HPC simulation and modeling centers choose either the open source Lustre parallel file system or the modern variants of IBM’s General Parallel File System (GPFS), known previously as Spectrum Scale and now known as IBM Storage Scale, as the storage underpinning of their applications.
You can rent managed Lustre services in the cloud, but up until recently, you could not get managed GPFS services in the cloud. But now, thanks to Sycomp, you can.
Speaking very generally, across the parallel file system scratchpads that are set up with tens to hundreds of petabytes of capacity and terabytes to tens of terabytes per second of I/O bandwidth on disk or flash storage, Lustre has about half of the capacity on pre-exascale and exascale HPC systems and GPFS has slightly less than that share.
That ratio has held roughly true for the past two decades. There are plenty of options, and new ones emerging all the time, lately from Vast Data, WekaIO, and DataDirect Networks, which are all peddling disaggregated and software-defined storage for various object and file system storage for high performance AI and HPC systems.
With Lustre being open source as well as popular, it makes perfect sense that the big cloud builders would offer managed Lustre services on their rentable infrastructure when customers needed the I/O and capacity of a parallel file system for their customers as well as for their own internal use. Amazon Web Services, which arguably took HPC seriously earliest among the big clouds, started first with its FSx for Lustre service way back in 2018, and Microsoft followed suit with its Azure Managed Lustre in the summer of 2023. With HPC on the cloud becoming more common, Google Cloud and Oracle Cloud Infrastructure, the number three and four clouds in the world, launched their respective managed Lustre services this year and are pitching them as a way to boost the utilization of GPUs for AI and HPC processing.
But what about those AI and HPC customers who want to run GPFS on the cloud? They can always negotiate with IBM for Storage Scale licenses and deploy the parallel file system themselves on one of the big clouds, just like that was always an option for customers who wanted to deploy Lustre on the cloud before the big cloud builders mentioned above wrapped up their own managed services.
While Big Blue can deploy Storage Scale on the IBM Cloud under a bring-your-own-license method, including deployment templates in its cloud development kit, none of the other big clouds that are selling HPC and AI compute have created a GPFS managed service. But, luckily, Sycomp has worked with IBM to package up Storage Scale into a managed service that is deployable on Google Cloud and on Microsoft Azure. And it is reasonable to assume this managed GPFS storage service, which Sycomp calls the Intelligent Data Storage Platform will eventually be available on AWS and OCI at some point if enough customers ask for it.

Sycomp was founded back in 1994 by Michael Symons, the company’s chief executive officer, as a global services and logistics provider serving the enterprise IT market. Today, the company has over 55 locations around the world where its techies work, including eight integration centers and warehouses from which it supports customers in more than 150 countries with products from over 400 technology partners.
Sycomp’s Intelligent Data Storage Platform is ‘infrastructure as code’, a term of art that takes DevOps approaches to configure, provision, and manage IT infrastructure. Scott Fadden, HPC solutions architect at Sycomp, explains to The Next Platform that the company is not just throwing around that infrastructure as code terminology for marketing purposes, but means it literally.
To be specific, says Fadden, Sycomp engineers took the GPFS code that is embodied in IBM’s Storage Scale and packaged it up into Terraform, which is the provisioning tool that HashiCorp (now part of IBM) created to span compute, storage, and networking for both on-premises and cloud infrastructure and manage it all in a consistent way.
“In fact, we deliver Intelligent Data Storage Platform as modules of Terraform,” says Fadden. “It’s not a click through GUI because the way that customers use it is they add it into their pipelines. They have their own Terraform pipeline that does whatever they do in their own compute, storage, and networking. They just add our pipeline into it. That’s why we deliver it in that form, and that’s why it’s considered infrastructure as code.”
John Zawistowski, global storage solutions executive at Sycomp, says that the company created a GPFS managed service for Google Cloud and Microsoft Azure not just because GPFS is three decades old and considered rock solid at this point in its history, but that the Active File Manager (AFM), which was added to GPFS a little more than a decade ago, is what enables Sycomp Intelligent Data Storage Platform to hook into NFS file systems or any object storage system that is compliant with the AWS S3 object format and pull data from them into a high performance storage tier on the cloud that can underpin HPC and AI workloads.
“When you are talking object store, for example, the other vendors write proprietary objects so those objects can be seen by their storage system, but not necessarily by any other object storage,” explains Zawistowski. “The way that GPFS and its AFM handles objects, you can use traditional cloud object tools that you are used to, and IBM’s GPFS can read anybody’s object. And when we put the object back, we put it back in a format that is usable by all of the traditional cloud provider object tools. So it’s a unique way that we handle that, and AFM helps us do that.”
AFM is also the foundation for creating a single, unified view of files and directories that span multiple cloud datacenters and regions and that can also be extended to on premises installations, creating a global namespace that can actually go global.
Sycomp’s Intelligent Data Storage Platform or managed GPFS service had a lot of tweaks to turn it into a service, according to Fadden.
“There were extensions we actually had to write, for example, for NFS IP failover,” he says. “If you install GPFS, NFS IP failover doesn’t know how to interact with the cloud. To make that work we had to extend it. We also tied into cloud monitoring at Google Cloud and natively generate Google Alerts. GPFS serves a lot of different markets, and a lot of different infrastructure, so it’s a set of tools. We simplified the use of GPFS, and we could do that because we have a limited scope. We don’t have to support every bit of hardware out there. So instead of the customer having to run ten different commands, they run one. For example, with AFM we monitor it for them. When customers are migrating data, they can run one command and we monitor the progress. We make sure the data is there, and we will tell you when it’s there, and then you can kick off the next job. If you want us to add and remove clients, we can do it in batch. We automated a lot of GPFS operations.”
Sycomp has built the Intelligent Data Storage Platform managed service atop Rocky and Red Hat Linux. Rocky is a bug-for-bug comparable clone of IBM’s Red Hat Enterprise Linux.
Sycomp Intelligent Data Storage Platform setup is generally available on Google Cloud providing GPFS as a managed service – and is available in a click through the Google Cloud Marketplace. Sycomp and Google support the deployment of Intelligent Data Storage Platform on the C3 machines, and most recently on the Z3 family of instances, which were announced in July, optimized for I/O intensive workloads, and include flash SSDs that have up to 72 TB of capacity and that include Google’s own Titanium I/O offload processing engines to drive higher I/O operations per second.
Start Your Free 30-Day POC of the Sycomp Intelligent Data Storage Platform>
Other Resources

Sycomp Intelligent Data Storage Platform Now Available on Google Cloud Cluster Toolkit

Cut Complexity and Costs with Sycomp Network-as-a-Service (NaaS)

HPCwire Features Sycomp Intelligent Data Storage Platform at SC25

