Learn how Solidigm is partnering with Supermicro to advance the role of AI storage in meeting customer needs for AI data pipelines with Wendell Wenjen, Director of Storage Development for Supermicro, and Paul McLeod, Product Storage Director from Supermicro.
In this AI Field Day video, Wendell and Paul discuss the challenges and solutions for AI usages and how high-density storage helps to solve those challenges. They also emphasize the need for large-capacity of storage to handle the various phases of the AI data pipeline, and talk about how Solidigm SSDs help meet these challenges and provide storage solutions.
Learn more about Solidigm SSDs and their role in AI solutions here.
Featuring Wendell Wenjen, Director of Storage Market Development, Supermicro and Paul Mcloed, Product Director, Storage, Supermicro
Wendell Wenjen: Thank you to Solidigm. We appreciate you inviting us to AI Field Day. You know, we use Solidigm SSDs in our servers: in our storage servers, as well as in our GPU accelerated server. So Paul will talk about that. I'm Wendell Wengen. I'm the Director of Storage Market Development for Supermicro. And with me I have Paul McLeod, who's our Product Director for Storage.
Today we're going to talk about some of the challenges. We're sort of now focused. We've talked a lot about the software this morning. We've talked about the media, with flash. Now we're gonna talk about the systems that are running all of this AI training, and particularly the storage piece of this.
So the storage challenges with AIOps and MLOps, some of the issues with sort of conventional storage approaches, and then an approach that we've deployed with a number of customers in the many multi petabyte range, and we'll talk about how that works.
So just a little about Supermicro. If you haven't heard of us, we're a server, a storage, a GPU-accelerated server company, and [we provide] networking. And we deliver all this in a rack, in a fully-integrated rack. We're located just 5 miles from here. We're in Santa Clara today. We're in San Jose, about 10 minutes away, where we have our headquarters [which is] where we do a lot of our manufacturing in rack integration.
One of the notable things about our company, if you notice, our revenue from last year to this year is about double. I've been in the server and storage business for 20 years. The only time I've ever seen that was in ‘96 when I joined Intel server group. It just started.
We went from 0 to a billion dollars in about a year with the Pentium Pro. So it's a really exciting time for the providers of all of the systems to enable AI, along with the media companies like Solidigm, and the software companies that we work with that are providing the file and object storage solutions.
A little bit more about us. You know, we not only manufacture here in Silicon Valley. We also do so in Taiwan, the Netherlands, a number of other places with the capacity to deliver 5000 integrated—tested with software—cable up racks per month. We can do that in locations that are nearby where our customers are deploying this. So we can really deliver these systems from the time to order just literally in a matter of a couple of weeks.
Audience member Ray Lucchesi: Hey Wendell, what do you attribute to doubling of revenue in one year? What's driving that?
Wenjen: Yeah, that's an interesting question. So a couple of quarters ago, about two quarters ago I think, we announced an earning call that over half our revenue was AI related.
Of course, we make a lot of storage as well, and a lot of servers. You know, we started as a server company. So really, this trend of deploying AI that, we've all been talking about here today is driving our growth as well as, just traditional; we sell into the CSP market, the enterprise market, we have a channel business worldwide for all the products that I mentioned as well.
But I think uniquely with our company, about half of the staff is really engineers working on system design development. And we're the number one company providing generative AI and large language model platforms. And that's growing 500% year over year, which is which is truly amazing.
Just a little bit more about our company: You know we have a very unique way of developing products that we call the building block solution. We develop these modular components that can be reused in different configurations. So think of motherboards, chassis, and power supplies but really sort of more complex than that, but can be really assembled in just a huge variety of almost customized types of solutions for storage, for computing, for GPU accelerated computing.
So that's sort of the foundation of our product development. we've been doing this for 30 years. And then we take that and really in the last 10 years, we've been really focused on delivering that as a fully integrated rack solution that, you know, is a 42U high rack and delivered in a crate along with some people to go out and assemble it and power it on, and get it up running on day one.
So that's really been a focus for us. We support and sell into a lot of large customers and CSPs, but also a lot of enterprise into the channel, and provide a variety of solutions.
And then the third thing I'll just mention, that's really very near and dear to our CEO and founders, [is] green computing. We want to be as efficient as possible with the energy that we're using.
And so that means that one of the things we deliver to our customers, if they need it, is water cooled systems, [which are] really a lot more efficient in the data center. We develop our own power supplies. In fact, uniquely among all of the server and storage companies that we compete with, we're pretty much the only one that designs, develops manufacturers everything from the power supplies, the boards. and the full systems.
I've been in this business a long time. And, you know, we've done it. The companies I've worked for have done it for a lot of the big OEMs
Audience member Donnie Berkholz: Just to clarify what you said: “Unique,” but then you said, “Pretty much.” Is it unique or is it rare?
Wenjen: I would say that I don't know of any other major OEM that I can think of that's doing their own manufacturing. They're all using ODMs and contract manufacturers. And I've worked for some of those so I'm familiar with that market.
Audience member Ben Young1: What does the ownership model look like? Obviously over the last few years, we've seen number of hardware manufacturers kind of pivot towards this “as a service” model. Does Supermicro have something that plays in the space or is it all currently kind of capital expenditure at this point?
Wenjen: Yeah, we're shipping and providing equipment to our customers.
Audience member Ben Young 1: But they just buy them out right? Or is there an “as a service?” Like, I'm thinking HPE GreenLake or Pure as a Service, where you can kind of subscribe to a unit rate and then as your capacity grows, you'll kind of plug more equipment into it.
Wenjen: Yeah, we haven't announced anything like that. I mean, in some ways that really tends to compete with what our customers are doing, that are CSPs.
So, let me go on to the topic here about storage for AI and machine learning. Our partners WEKA did the survey of 1500 customers and looked at what are the main inhibitors to their customer success.
And as you can imagine, compute performance is one area, and security, leaking of the data and maybe these public models, is another area. But really the largest area is around data management, both collecting the data—doing the detail on the data—and providing that data into the GPU clusters for that training and inference.
Solidigm talked about the the AI Data pipeline. Here's what I want to mention, without repeating, is that we have products that are focused on each of the phases in the AI data pipeline. And in the ingest, we know that customers typically don't really know what kind of data they're going to be needing in a model they’re doing; a year from now, two years from now. So they often have to collect just a lot of a lot of digital data, customer service data, manufacturing data. It all can be valuable in the future. So that really requires a large data lake. That's really optimal for a scaled-out unstructured storage using, using object storage that we've talked about. We have very high capacity, 90-bay disk systems that have dual processors in there that can be the foundation of that type of storage system.
In the clean and transform [phases] you have things like labeling, the ETL. My own experience here, I've worked on a proof-of-concept project at another company and we were using machine learning developing a supervised learning model for electron beam microscopy data for wafers. We were looking for defects. And it turns out that, because it's labeled, you need people to tell you what a defect looks like. There are only a handful of engineers in the company that were able to tell you what a defective E-beam scan looked like versus a non-defective one. To me, they looked all the same but they weren't that interested in looking at 50,000 images and labeling it for us—it’s very time consuming—and you're not going to outsource it to Amazon. It's very proprietary data.
So this whole process of clean and transform where we have systems that can do that could be using flash, or could be using a combination of disk and flash, is an important area to really think about, depending on the type of model. And in the training and evaluation area the thing I mention here is that, of course, you have the training data. It could be labeled, it could be unlabeled. But you really need to retain that data for all the cycles of your model development, and then everything that used to deploy that model for explainable AI, right? Because if you deploy this model, and you start getting bizarre results, which sometimes happens, you need to be able to go back and trace what the input data was that created that model.
And then you also need, of course, a separate set of data for validating that model that wasn't used for training. So all of that is really speaking to the need for a very large capacity of storage for that phase. And then in the inferencing as Solidigm was talking about, a lot of that can be done in the edge. We will talk about our portfolio of products that we have that's really been optimized for edge environments.
So one of the things we see with that training pipeline is that we call this I/O blender effect. And we see here in this first pipeline what the stages of that AI data pipeline is, but of course, that's not the only pipeline that's often running, right? You might have a second pipeline that's perhaps offset.
So you're now running a dual I/ O profile. And then later we're going to see in a mixed I/O profile, where you have a number of these pipelines. And that may be because you have multiple data scientists running different models, or running different versions of the same model, or maybe you're in a multi-tenant environment that has multiple parties accessing the data.
So that creates just this mix of different I/ O profiles that Solidigm talked about. What we see with that in some data that that our partners, WEKA, had collected through the dashboard in solutions that we deployed with them is that we see a combination of very, very small IOs. These are a lot of 4K IOs that are split between reads and writes. And we see also some large block IOs as well. And so that mixture of very small IOs, which is really not optimal for like a traditional NAS solution if you're going to use that for deployment, is really problematic, and that really does require a specifically designed solution for this type of storage.
Audience member Ray Lucchesi: That's the mixed IO pattern that you're seeing when all those pipelines are running concurrently?
Wenjen: That's, that's part of it, yeah. We have quite a bit more data that we didn't include in here, but this is sort of representative.
Audience member Ray Lucchesi: It's surprising to me that it's so high. And the write is so small.
Wenjen: Yeah. I mean, this is what was collected and it's really hard to say what was causing it. But this came from customers.
Audience member Ray Lucchesi: Certainly not checkpointing going on. It's something else.
Wenjen: There's checkpointing. There's archiving. There's the ETL going on. So I think the point is it's very hard to predict what those IO patterns are going to be in advance. And so rather than trying to guess and maybe be wrong it's better to design for a variety of IO patterns. And that's what we're going to talk about.
So let me turn it over to Paul McLeod, who's going to talk about the storage solution.
Paul Mcloed: Thanks, Wendell. Yeah, so in terms of this data on the screen here, this is actually one subset. It's the one that most people don't think about when they're thinking about big data. They're usually thinking, oh, this is all large files. We're going to move these sequentially, but it really becomes this blender effect.
And it's one of the things that that our partner Weka has really gotten into early. And, in terms of a partner, they're a software-defined storage partner. But the whole goal of storage solution that you want to provide for this kind of environment is something that's going to be able to work for all those different, stages of the workflow and all the different files and file sizes that are concurrently happening in that environment. And WEKA did a great job in terms of that? Because one of the things that that we ran into 5 or 6 years ago was when NVMes came out and we saw that NVMes— it is the first time in my career and I've been in storage for 20 over 25 years—where storage was faster than processor, right?
To move the data on an NVMe, even one NVMe was outpacing the processor. So you'd run out of processor before you ran out of the capabilities of these flash devices. So that kind of data set and that kind of performance needs a very special architecture if you want to scale it. I can work with one NVMe and get great performance. If I work with 1000 NVMes, I'm going to run into some problems with metadata. I'm going to run into some problems with things that you don't normally think about when you say, “hey, just give me a faster thing.” Right? “Give me a faster pipe. Give me a faster device.”
Well, that kind of is one of the things that I think that was well thought out in the WEKA structure. The other piece that was well thought out was this integration to an S3/Object store. So in terms of whether that's flash or whether that's hard drive, having an object store, which makes that data transportable; movable from a file-based application to the cloud or to anywhere in your environment where you don't have the same kind of pressures to deliver the FIFO that you would in say a block device.
And then the other part of this, which is key in terms of GPU workloads, especially with an NVIDIA, is this GPUDirect storage. So GPUDirect storage for you that don't know, basically gives the application, if your storage supports it, essentially an RDMA relationship directly with the GPU memory.
So you're circumventing the CPU memory and working with the GPU, which again is one of those latency steps because every one of these parts of the process is going to add latency. And when you try to scale that, that's when you run into problems. So Supermicro basically is well-positioned in terms of portfolio for that architecture going from deep and deep with 3.5 inch storage to high performance flash. And then, because our portfolio is so vast, we have systems that are multi-nodes, blades. So in terms of being able to create a footprint of storage using the bite sizes that will best fit your environment, we can deliver those at a rack level and essentially tune the environment for our customers. Because not every customer is going to be buying a superPOD, right?
There are guys that are doing a rack level AI integration, in which case something like a multi node may be a better fit than something like our petascale line. So, in terms of our flash side, we begin with multi node and we end with petascale. And then on the hard drive side we have an enormous portfolio of 3.5 inch storage servers going up to 90 bays in a 4U enclosure.
And so, depending on whether or not you have already implemented 3.5 inch storage… I think this is one of the questions that came up in one of the Solidigm [presentations]. It's like, yeah, it would be great to have that greenfield and deploy all flash, I’d be great to put everything in memory, really. But, there's budgetary. There's whether or not it is a greenfield project. So a lot of times our customers are bringing in a data set that's already existing in 3.5 inch storage. And so their ability to move off onto this greenfield environment may be difficult, and that will take time.
We have others that have greenfield projects and deep pockets, and they will deploy all flash. And again, one of the things that's nice about Solidigm having those tiered flash devices, that means I could put lower cost flash here, and I could put more performant flash closer to the GPU application.
Audience member Ray Lucchesi: So, Paul, these would be nodes on a WEKA cluster? Is that how I read this? Is that correct?
Mcloed: So, yes. In terms of the flash storage, if I go back here to the previous slide, you'll see I basically have multi nodes all the way to 3.5 inch. 3.5 inch typically lives in that S3 stack, right? You wouldn't really want to have a GPU trying to pull data in a random manner from a 3.5 inch drive. But in the case of WEKA, those are all file accessible. There's other platforms as well that basically you have a file handle into this S3 storage device or cloud that you can pull out as a file. So that will hydrate the flash, right? The first operation is basically I'm coming off this S3, hydrating the flash, and then my GPUs can then work that at flash speeds.
So, very important to the pipeline again is having the portfolio to hit all these different elements. The other key piece that we were talking about in the Solidigm session was this IOT edge. So Supermicro: if you haven't gone to our site go there. You'll be lost in how many servers we have. This is just a minute fraction of the number of servers and markets that we address.
In this slide, I'm basically showing at the far edge, we have essentially a fanless industrial computer that would go in a factory floor, right? This is just a box that you may have walked by, maybe even in one of these buildings.
It looks like a heat sink. Then it goes all the way to the extra large, which would be like something that would be used in a telco environment, in one of those centers. And each of these will have some element of storage. So, GPUs, probably hitting more in the medium to large range, [since it’s] very rare to have very small [server] with GPUs, but I'm not going to give you exceptions.
We do have very small edge and far edge servers that have GPUs. Cases for that would be like in restaurant locations, ordering systems and things like that. That data interacts with this AI. Because the companies that have all these locations are pooling that information, bringing it to somewhere to actually analyze and figure out how they can improve their business.
And in the center we have that pole mounted architecture. This is the example of having a server that has flash and GPU that could be out in the weather. So really, we're looking at all the different fields where AI touches the hardware, and creating hardware for those applications.
Back to the main event here in terms of how we're doing the main data center storage for AI. Supermicro has a huge portfolio of GPU servers so that's another reason why we have this big uptick, in terms of AI and interest in AI, is [that] we have every form factor you could possibly want in terms of deploying GPUs from all the manufacturers that make GPUs. And then, in the instance of NVIDIA environments, this GPUDirect storage, we have our partner, WEKA, working with Solidigm on these high-capacity all-flash systems and [we’re] able to tune that to the customer’s location.
And then finally, in the data lake environment, typically this will be a 3.5 inch storage deployment and that could be in the cloud. But we also work with all the different partners that are providing S3 storage and high-capacity S3 storage. So, typically, what happens is our customers will already have deployed a Scality cluster or an active scale cluster that they're using for object storage and they're storing assets that are key to their business, and then they're adding these AI elements to their environment.
Here's a deeper dive into that same architecture. So in terms of how we deliver our products, it's fully integrated. Our preference is for us to ship you a rack with everything in it: software, all the plumbing. And basically we roll it in, plug it in, and then turn the keys over to you, and you get to roll with your discrete application. But we are open to any sort of partnership there. It's one of the [ways] that Supermicro, I think, differentiates ourselves is we very much listen to our customers requests.
So in this example, we're using a 400G network to talk to the flash. That flash pairs into 3.5 inch storage using 25G or 100G networking and then Supermicro delivers that in rack level with our own switches. So essentially the customer can get control of almost everything that they'd ever want control of. More and more we're seeing engagements with OCP and open BMC and all these things that we're open to because, again, we are trying to make sure our customers get what they need to do their do their work.
So just do a take a little deeper dive into our flash petascale architecture. So our petascale architecture you can think of as being the tip of the spear. This architecture is designed with the latest flash innovations. You may have heard of EDSFF. Or you may not have heard of EDSFF. But EDSFF is a new form factor for flash devices. Solidigm was a leader in that space, Intel was a leader in that space. And we have been a leader since the start. So basically, for the past five or six years, we have been producing servers that have this technology.
Now that the PCI bus is moving faster and faster, it is becoming more and more relevant because it was ahead of the curve. The U.2 drives that are available today are sort of running out of gas; the connectors, how they behave in a thermal environment, because again, flash, as most people deploy today, is living inside a box that was designed for a rotating drive so it's not optimal.
So with the petescale architecture, we're looking at the PCI buses how the PCI buses move through the processor because you're going to be putting a processor in the middle of your storage and your networking, and that networking may be more advanced networking like DPUs; so actually having acceleration and having security protocols built into it.
So really, from a plumbing perspective, we wanted to have a balance of the PCI lane. So if we look in that far corner there, you can see that we're as balanced as you can be within a processor environment. One of the benefits we also get with this architecture is CXL, the next memory generation technology is also being delivered in an EDSFF enclosure.
So this is essentially the future that's here today. And all our large scale customers are looking at this. Because again, this is up to 32 NVMes in a 2.U enclosure, so very much cutting-edge stuff.
And this is a closer look at that 32-drive enclosure. This is an AMD system. With AMD, it's unique. We can bifurcate by two, so we can actually offer a server that has 32 NVMes if you're more interested in having it on the higher capacity side. Or we can do a 1U server with 16 drives in it, in which case those are going to be equivalent in terms of performance. So I can get double the performance if I basically take that same 2.U and put two of those 16 drive units in there.
So these are the kind of decisions that our customers are usually having to make, and we can guide them to say how do you want to do it? And part of our partnership, with buffer to find vendors like WEKA, is to test and tune these systems and select best of breed components.
Basically, [we] build these architectures to make it easy for our customers to get to the right storage solution for their environment. And so here's a 1U. So this is actually the E1 NVMe EDSFF form factor. So again, the EDSFF form factor is this game changer.
You're going to see a lot more things happening in that space. And Solidigm, as I said, has been right in there.
[1] Audience member Ben Young is erroneously named as Donnie Berkholz in two instances within the video. Ben's name has been correctly credited within the transcript.