We are generating and consuming data like never before. Across the world, and in every industry, data never sleeps. Experts project that by 2025,1 we will generate, consume, and analyze 181 Zetta Bytes of data per year. With a compound growth annual rate (CAGR) of 23%, what we are observing is a massive surge in data generation and consumption. The question becomes, where are we going to store all that data?
When we talk about data storage locations, there are three main subcategories to consider. In endpoint data, information is located in a device itself, like a laptop, mobile device, wearable tech, vehicle, or local server. At the edge, the data from these endpoints flows to enterprise-hardened locations like regional offices and small data centers, or to the core, (aka, the cloud).
Imagine:
While we have been producing more data year-over-year, usage has changed in the last few years. Now instead of producing it in temporary caches, then overwriting it, we are storing this new data at unprecedented rates. Most of that data will be stored in the cloud or at the edge.2
The majority of this data is going to the cloud. Gartner estimates more than 95% of new digital initiatives will be cloud-native by 2025.3 Data centers housing this content will need to be cooled. In fact as much as 40% of data center energy cost can be attributed to cooling. And with advanced AI models growing at a rate of 10,000x every few years,4 we need to address power consumption while keeping data center drives performant and, at the same time, gentler on power consumption. Multi-story data centers make those cooling and power challenges even more pronounced.
At the same time, the data storage phenomenon referred to as “edgeification” is real now. The Internet of Things (IoT) is impacting data creation, which in turn impacts data storage and retrieval. With about 14.5 billion connected IOT devices by the end of 2023 and local edge SSD CAGR growing 50%. 5,6 Edge presents unique locality challenges ranging from weight, form factor and ruggedness.
Now more than ever, we need SSDs with higher density that provide the right endurance with an eye on sustainability for the real world. One of the key challenges of sustainability is footprint reduction. By increasing unit level density and providing the right endurance, Solidigm storage solutions go a long way to drive those sustainability metrics; drive disposal cost reduction, rack level consolidation, and total power reduction.
Now, let’s look at three of the emerging use cases spanning core and edge to see how the density and endurance requirement is shaping up in real-world applications.
For advanced driver-assistance system (ADAS), commonly referred to as autonomous driving, there is massive amount of data logging and detection work that needs to take place. SSDs are the right choice here for their superior shock and vibration specifications and capabilities. Bumps and road imperfections demand that for this edge- storage application. These systems can have a fill rate requirement as high as 19TB/hour. While it is not a 24/7 duty cycle, 17,600 mins per year driving profile will generate 5PB/year of data.7
Multiple SSDs in the Solidigm SSD portfolio of products can meet the endurance requirement in smart driving applications. But more importantly, QLC product lines can provide both the performance and endurance required for some of the emerging storage use cases including applications like smart agriculture and precision agriculture which is improving crop yields while optimizing resources such as labor, water, and fertilizer. These are processes that drive a lot of data need at the edge and primarily read dominated for decision making.8
Robotic systems, drones, and satellite image analysis create real-time data. For example, in the case of the See and Spray feature from John Deere, there are 20 images per second per crop being taken while the vehicle is moving. These images are then compared against 1 million stored images to understand the area that needs to be sprayed. This is assisted by an onboard camera capturing images every moment, accounting for 6TB/day of storage needed per sprayer.9
At the core data centers, object store is the data storage solution for unstructured data. Imagine a user needing to expand to cover a 5PB data pool. Dell EMC’s F600 or F900 systems are excellent installations for that. The Dell F900 can house almost a petabyte of storage. According to Dell’s own field trace data analysis they could extract 14 years of drive life from some of the deployed D5-P5316 drives.10
From some of these real-life use cases, we can focus on some drive level analysis and fleet data. For example:
There is an increasing need for dedicated swim lanes for storage use cases. Which SSD you choose for your application will depend on your target drive writes per day (DWPD) as well as your workload; write-heavy, read-heavy, or some mixture of both. One size does not fit all, and the best results will come from weighing multiple factors as you plan your storage solution build. Figure 1 can help you get started choosing the right storage solution for you for the application you want to run
At Solidigm, we provide a number of endurance levels and performance levels, dedicated for a wide range of applications. As you see from Figure 1, the Solidigm D7-P5810 (SLC NAND based) provides the highest endurance, measured in terms of DWPD. At the same time, the QLC-based SSDs with 0.5+ DWPD, by the virtue of their massive capacity (up to 61.44TB), deliver the highest petabyte-written capability of the Solidigm family.
The following table shows how Solidigm’s drives can be utilized for a range of data center workload based on their relative write endurance capability. As we embark on use cases from core to the edge, understanding the right requirement mix of density, endurance and performance will be key to making sure you have the right storage solution for your use case.
Some of the key use cases for SSDs have been evolving as solid-state storage technology has propelled into multiple directions with use case expansion.
First generation SSDs were expected to pack up a very high write endurance requirement, in some cases 10 to 20 DWPD. Write amplification factor (the amount of time you have to write to NAND compared to host requests) has gone down since then due to better SSD firmware architectures and SLC, TLC and QLC NAND endurance level in terms of program/erase cycle has somewhat standardized.
JESED219 type of standards have provided much-needed clarity on the type of workload used for endurance measurement. The upper section of Figure 1 is an example of how the workload landscape can be fitted to the unique offering of endurance defined by each of the swim lanes of products.
With generational changes happening in solid-state storage and use case evolution, there are areas beyond bandwidth and IOPS that are gaining more attention. Consistency of IOPS after transitional workload, latency response post TRIM operations, and looking at low-to-medium QD performance are of interest.
With drive sizes getting larger, newer techniques are being implemented at the NAND and drive firmware level to take advantage of multi-tenancy applications, where IOs on one tenant doesn’t cause latency impact on the other. In the future, granular controls with Flexible Data Placement (FDP) mode will help the host place data, without incurring endurance and performance loss that can be caused by internal garbage collection of the drive firmware.
Gone are the days of one-size-fits-all story for data center SSDs with an HDD-inherited 2.5” form factor. EDSFF has provided different form factors with better signal integrity and hot plug robustness to the connectors. Implementation of a common connector design enabled multiple long, short, and tall form factors for deployment in cloud and data center platforms.
First generation SSDs deployed for data centers had two key features as must-haves: 1) Power loss data protection and, 2) The ability to shield the data end-to-end with ECC as it traverses from temporary buffer to the NAND media.
Advanced features like out-of-band management, telemetry, and the ability to track latency and drive health on the fly are must-have features in modern day SSDs. With the advent of computational storage and AI, we expect to see these technologies deployed for failure prediction of the drive itself for future deployment. One key area of future development is the sustainable use of SSDs for longer life: reuse, repurpose, and re-provision for that extra mile of usage.
With data growth and the maturity of SSDs, usage models are no longer tied to traditional enterprise application. Cloud Service Providers (CSPs) revolutionized the deployment of SSDs at a massive scale. CSPs also helped shape the industry by embracing new features, encouraging new form factors, and has taken advantage of the differentiated portfolio based on NAND media and endurance tier. Emerging use cases at the edge along with the need for storage to propel the AI revolution will further impose newer requirements for the next-generation SSDs.
Gone are the days of fitting one density, with one form factor for the need of the overall market. Innovation for the future will be led and defined by the workload-need of the storage use case landscape. As a technology, Solidigm SSDs are ready and nimble enough to take those on.
[1] https://explodingtopics.com/blog/data-generated-per-day
[3] https://www.datacenterdynamics.com/en/opinions/the-five-big-trends-powering-tomorrows-data-center/
[4] https://pages.dataiku.com/report-idc-2023
[5] https://iot-analytics.com/number-connected-iot-devices/
[6] https://www.idc.com/getdoc.jsp?containerId=US50673423
[7] https://www.solidigm.com/products/technology/inonet-used-solidigm-qlc-drives-for-duration-cost-accuracy-of-test-drive-results.html and https://www.visualcapitalist.com/network-overload/
[9] https://www.deere.com/en/sprayers/see-spray-ultimate/
[10] https://www.storagereview.com/review/dell-powerscale-benefitting-from-qlc-ssd-economics-and-performance
[11] https://www.usenix.org/conference/fast22/presentation/maneas
[12] https://www.usenix.org/system/files/fast22-maneas.pdf
Tahmid Rahman is the Data Center Director of Product Marketing at Solidigm. His primary responsibilities include product positioning, benchmarking, and customer requirement integration for current and future products. He has a bachelor's degree from Bangladesh University of Engineering and Technology, and a Masters degree in Electrical Engineering from Texas A&M. He also holds an MBA from University of California, Davis. He loves outdoor activities including sightseeing and hiking.