In June, 2017, the NVM Express group released the 1.3 version of the NVMe spec, chock full of goodies and new features. Some of the features are pretty straight-forward (e.g., Boot Partitions and Error Log Updates), but some of them are just vague enough to warrant further explanation (e.g., Virtualization, Streams, and Directives).
My ultimate goal is to turn this into a series of articles is going to take many of the features that are new and describe them in Plain English™, so that you can get better understanding of what each of these features are supposed to do and why they have been included.
Before we begin, however, I’d like to point you to the excellent NVM Express BrightTalk webinar on “What’s New in NVMe 1.3” by Jonmichael Hands of Intel. Also, if you have access to the materials for 2017 Flash Memory Summit, Peter Onufryk has a quick-and-dirty summary that he presented during the NVMe track. In fact, many of these articles have pulled some of the source material from Jonmichael and Peter and would not have been possible without their fantastic work. Of course, the NVMe 1.3 spec can be found here.
In order to make this a bit more consumable, I’m going to break the articles down into individual posts, starting with some of the more “heavyweight” feature additions that may affect upcoming deployment options. This article, then, will act as a “landing page” and the links will be updated as articles get posted.
Let’s take a high-level view of what each of these new features are.
New Features in 1.3
The new features in rev 1.3 fall into 5 basic categories:
- Data Center and Enterprise
- Client/Mobile Devices
Before we take a look at the details of each feature in turn, let’s look at how they are grouped together (massive hat tip to Peter Onufryk for allowing me to use this chart):
We also need to clarify a couple of terms before we get started. This will become apparent as we go through the new features. I’m trying to keep things accurate but approachable at the same time, so try not to beat me up too badly of the fuzzy edges of definitions and descriptions aren’t as crisp as are in the standard. You can get the non-fuzzy explanation in the spec itself and the document of changes for 1.3.
Host: Normally in storage architectures, we think of a “host” as a server, or some computing function (either physical or virtual). In the NVMe spec, though, a host means the software that communicates to a non-volatile memory subsystem (a controller – see below – and its corresponding Non-Volatile Memory – NVM – media).
Controller: In the NVMe spec, a controller is a PCIe function that implements NVMe. This is important because storage people often think of a controller as a specific entity that manages storage media, provides feature functionality, and the like. Here, however, the term controller is highly specialized to mean a physical PCIe function.
So, let’s take a quick look and see what each of these things mean at a high level. We’ll be saving the deeper explanations for their own articles, of course.
Data Center & Enterprise
You can think of Directives as a category of functions and features that shapes the way that hosts and controllers communicate information between each other. That’s a fancy way of saying that if there needs to be a special way that the data needs to be handled in specific ways, the host and the controller can agree not just that it needs to be done, but also how.
The first major Directive is Streams, can be used to optimize data placement on non-volatile media. The goal is to be able to increase endurance for NAND-based SSDs. For example, suppose you have a need for an application to indicate to the device that certain logical blocks are part of one connected, contiguous stream of data. This information can then be used by the storage device in order to make intelligent and efficient decisions about how to allocate those blocks. It could, say, reduce the Write Amplification Factor by enabling the device to free up all of the media associated with that stream in one fell swoop.
In large Data Center environments, it is a fact of life that there is high-density, oversubscription, and multi-tenancy. On top of all of that, virtual machines are inherently dynamic, and often mobile.
One of the virtualization enhancements in NVMe 1.3 is called Direct Assignment. Here, we enable each Virtual Machine to directly access the NVMe controller, which eliminates software overhead and allows guest OSes to use the standard NVMe driver.
NVMe already supports SR-IOV, but these new virtualization enhancements standardize physical functionality and allow flexible dynamic mapping of resources. In doing so, this additional abstraction capability allows future mechanisms beyond SR-IOV.
Emulated Controller Optimization
Makes software-defined NVM Express controller work better and have lower latency.
Client & Mobile
As you can imagine, because it’s part of the “Client & Mobile” category, this functionality is primarily for, well, client and mobile systems. What it does is set up a simple bootstrap mechanism (in an optional area of NVM storage) to initialize the controller without the need to set up submission and completion queues in memory, or even enabling the controller.
The reason why this is cool is that it allows a system to boot to a pre-OS environment using the resident storage instead of needing to have a completely separate storage medium, such as SPI flash.
Host Controlled Thermal Management
From time to time hosts need to manage how much heat a system generates. Host Controlled Thermal Management (HCTM) allows
Every manufacturer has its own methods of identifying what’s going on with their equipment, and they write that data to logs on their systems. It turns out that it’s pretty handy to be able to get those logs. Telemetry, then, enables manufacturers to collect internal data logs to improve the functionality and reliability of their products, and then use industry-standard commands and tools to process them. Even though the data can be vendor-specific (that is, unique to each vendor), the Telemetry feature defines the way to access and collect that data in a standardized manner.
Pretty much what it sounds like. This allows a host to set a timestamp in the controller.
Error Log Updates
NVMe now returns more information on where errors occur, whether they be in the queue, with a command, LBA, namespace, etc.) and error count.
This allows a standardized way of requesting and getting an internal check of a SSD’s health, ensuring devices are operating expected (regardless of vendor).
As you might imagine, Sanitize commands are used when retiring SSDs, putting them out to pasture, having them kick the mortal coil, etc. It’s also pretty useful for when you want to reuse them for a new use case.
This functionality is quite thorough – it alters the data so that it’s unrecoverable. It can do it with a few different options of various levels of security and strength. Unlike the existing Format command, Sanitize keeps running after a reset, and it clears out all metadata, sensitive log pages, and status during the operation, as well as all user data in the caches.
More to Come!
Obviously, this was a quick-and-dirty description of what’s new in NVMe. Each of these different new features have significantly more information behind them, but for people wanting to get a grip on the high-level overview this should be a decent starting point. We’ll be looking at many of these, in turn, in upcoming articles. Also, don’t forget that you can watch Jonmichael Hands’ great webinar on what’s new in NVMe 1.3 as well.
In the meantime, feel free to ask questions, make comments, or offer suggestions, and I’ll do my best to make things as clear as possible.
Pingback: A NVMe Bibliography – J Metz's Blog
Pingback: ADATA XPG SX8200 M.2 Solid State Drive Review – GND-Tech
Pingback: ADATA XPG SX8200 M.2 Solid State Drive Review – GND-Tech - tech-1st
Hi , nice article thanks. I have a question on NS sharing in SRIOV ENABLED sad. Is namespace sharing host responsibility?
Actually, it’s the NVMe Controller on the NVM Subsystem, mostly. One thing of note: SR-IOV is handled at the PCIe level, where the physical and virtual functions are separate from anything happening at the NVMe layer. The namespace sharing will occur between the host driver and the NVMe Controller, sitting on top of those SR-IOV functions.
If you’re asking about how the host coordinates NSIDs with SR-IOV, I don’t actually remember off the top of my head. If this is what you mean, and you want to know, I can go back and look (or ask; I’m very good at asking people smarter than I am 🙂 ).