The future of storage and how data volumes are driving change [Q&A]
There have been rapid increases in storage capacity in recent years, but the way the technology is used is largely unchanged. We always load the data from storage into memory, process it and write any changes.
But as storage reaches petabytes, that model will become more difficult to maintain. The future of storage will require layers of abstraction and heterogeneous computing, allowing scalability but reducing over-sophistication.
To learn more about what storage will look like in the future, we spoke to Tong Zhang, co-founder and chief scientist of Scale flow.
BN: There seems to be a dichotomy between increasing storage capacity and data processing that does not. What does this mean for the future of storage?
TZ: The ever-growing gap between data volume and CPU processing power is exactly why computer storage has gained so much attention in recent years. Moore’s Law slowdown is forcing the IT industry to shift from traditional homogeneous, processor-based computing to heterogeneous domain-specific computing. This inevitable paradigm shift offers an unprecedented opportunity to rethink and innovate the design of future data storage devices / systems (especially solid-state data storage).
With the transition to heterogeneous domain-specific computing, the entire computing software ecosystem will become increasingly ready to adopt computing storage that tightly integrates domain-specific compute capacity into data storage hardware. By sending compute closer to the location of the data, instead of moving the data to the CPU (or GPU), compute storage could bring significant performance and power benefits to the overall computer system. Therefore, the future of storage lies in the trend to integrate compute capabilities into storage devices.
BN: The storage architecture has remained largely unchanged from tape and floppy disks, how does that affect future storage trends and do you see an evolution starting to happen?
TZ: The storage architecture has remained largely unchanged over the decades, mainly because the duty / function of the data storage hardware has remained unchanged (i.e. storing data and handling requests for I / O). By fundamentally expanding the function of data storage hardware, computer storage is sure to open a new chapter in data storage industry with many new and exciting opportunities to come.
BN: What are Data Processing Units (DPUs) and how do they impact storage device management performance?
TZ: The term DPU derives mainly from network processors. To avoid overloading host processors with network processing in the presence of ever-increasing network data transfer traffic, data centers have widely deployed SmartNIC (Intelligent Network Interface Card) which uses dedicated network processors to offload data centers. Heavy network processing operations (eg, packet encapsulation, encryption of data in transit, and most recently NVMe-oF) from host CPUs. To further enhance their value proposition, network processor chip vendors (e.g. Nvidia / Mellanox, Marvell, Broadcom) have recently started to expand beyond the network realm into storage (and even storage). general purpose computing). Network processor chips are complemented by additional special-purpose hardware engines (e.g. compression, security), more integrated processors (e.g. ARM or RISC-V cores), and stronger PCIe connectivity (e.g., PCIe switches with multiple ports).
Thus, the term DPU was coined to distinguish itself from the traditional network processor. With many integrated processors, DPUs could offload storage-related functions (for example, storage virtualization, RAID, and erase coding) from host CPUs, leading to a lower total cost of ownership of the system. Of course, the deployment of DPUs in the IT infrastructure requires high development and integration costs with significant modifications to the existing software stack. Therefore, it will take at least two to three years before the true value and potential of DPUs is better understood.
BN: How are AI and machine learning affecting modern storage? And in the future?
TZ: AI / ML will be one of the most important (if not the most important) drivers for data storage from a demand and innovation perspective:
- Increased demand for data storage capacity: With ever increasing amounts of data being generated every day, AI / ML provides the means to use data efficiently. As a result, there is a greater incentive for people to store data at least temporarily, which directly leads to increasing demand for data storage capacity.
- Higher requirements for storage system innovation: AI / ML training platforms mainly contain three components: training kernel compute, data preprocessing, and data storage. Most of the past and current R&D activities focus on improving the efficiency of the first component (i.e. kernel computation), and hence the efficiency of AI / ML training for kernel computation has improved dramatically over the years. This, however, makes the entire AI / ML training system increasingly bottlenecked by the performance / efficiency of the other two components (i.e. data preprocessing and storage) . For example, as recently reported by Facebook to HotChips’21, data preprocessing / storage accounts for over 50% of total AI / ML training energy consumption. This requires rethinking the design and implementation of data pre-processing / storage in AI / ML training platforms, for which computer storage could be a very interesting solution.
BN: Would it be practically and commercially feasible to move computing from host processors to storage hardware? How can this idea evolve from academic research papers to the mainstream market?
TZ: Indeed, compared to other heterogeneous computing solutions (e.g. GPU, TPU, SmartNIC / DPU, video codec), moving compute to storage devices via the I / O stack suffers from a much higher abstraction breaking cost. The ongoing standardization efforts of the SNIA and NVMe communities will undoubtedly play a crucial role in reducing the cost of disruption of direct debits. Nevertheless, history teaches us that it will be a long time (more than five years) before the whole ecosystem can easily adopt the improvements developed by SNIA / NVMe. Additionally, modifying various applications to take advantage of the underlying computer storage hardware is not trivial and requires a significant investment.
Therefore, to start the idea’s marketing journey, we need to let go of the mindset of “explicitly offloading computation through the I / O stack” and instead focus on storage native computation which is. transparent to other abstraction layers. By eliminating the disruption cost of abstraction, transparent computation in storage makes it much easier to establish a commercially justifiable cost / benefit trade-off.
Meanwhile, to further enhance the benefits, transparent compute in storage should have two properties: wide applicability and low efficiency of CPU / GPU based implementation. General purpose lossless data compression is a good candidate here. Besides its almost universal applicability, lossless data compression (e.g. the famous LZ77 and its variants such as LZ4, Snappy, and ZSTD) is dominated by random access to data which causes high CPU / cache failure rates. Very high GPUs, leading to very low CPU / GPU hardware utilization efficiency and therefore low speed performance. Therefore, native compression in storage could transparently exploit runtime data compressibility to reduce the cost of storage without consuming host CPU / GPU cycles and without incurring abstraction breakage cost.
The benefit of native compression in storage goes far beyond “transparent storage cost reduction”. The design of any data management system (e.g., relational database, key-value storage, and file system) is subject to tradeoffs between read / write performance, complexity of implementation, and use of storage space. Compression in storage essentially decouples the use of logical storage space visible to the host from the use of physical storage space, allowing data management systems to deliberately swap the use of storage space. logical storage space for higher read / write performance and / or lower implementation complexity, without sacrificing the true cost of physical storage. This creates a new spectrum of design space for innovative data management systems without requiring any modification of existing abstractions.
The most plausible starting point for computer storage is a computer storage drive with built-in transparent compression. We are confident that transparent compression will transport computer storage from a concept to the mainstream market. Of course, the full potential goes far beyond transparent compression. As the ecosystem becomes more ready to adopt IT storage drives with more diverse and programmable IT functions, we will see a great wave of innovation across the IT infrastructure.