Quora Question: Is It Worth Upgrading A SSD From TLC to QLC?

In Storage, Technology by J Michel Metz2 Comments

This is actually a good question that came through Quora, and I thought the answer was worth repeating here.

First, QLC is not necessarily an “upgrade” from TLC. What gets tricky is why.

First, let me say that I think I understand where the confusion comes from. We are used to thinking that “new” = “better.” If TLC is the “old” technology, then QLC must be “better.” Or, if SSDs have 3 bits per cell with TLC, having 4 bits per cell must be better, right?

As I said, I can see why it would be easy to think this way, but in reality, not everything evolves according to that principle.

In storage, everything is a tradeoff. Everything. When they say you can’t get something for free, nowhere in technology is that more true than in storage.

By now, you know that SSDs are much, much faster than spinning hard drives. In fact, everyone knows that. Most people don’t know (or care) about why they’re faster, though. While that’s fine if you’re just looking to compare a SSD to HDD, not knowing a few more details can cause confusion like we see in this question.

SSD Tradeoffs

It used to be really easy when looking for drives, in terms of the number of choices that you had. You really didn’t have a lot of choices to choose from. At the time, most people really only had spinning hard drives (HDDs) as a medium for a choice, and effectively your options were simply

  • How much did it cost ($/£/€)
  • How much capacity did you get (GB)

If you were looking for something specific, you also had caching size, rotational speed, and connector (i.e., SAS/SATA) options. However, for the most part, they key metric was that precious “dollar-per-Gigabyte” ($/GB).

SSDs came along, and really changed the playing field drastically. It really cannot be overstated how much this impacted the storage industry, as well as making so many things possible that we take for granted today (your mobile phone is probably the most obvious example).

Generally speaking, SSDs contain a storage medium called NAND Flash. NAND Flash is the technology that happens to have the best balance of durability and speed and cost for the kinds of uses most people need.

[One thing to note: SSDs are a form factor, not a storage medium. SSDs do not need to use NAND Flash, but more often than not they do. There is another type of medium called NOR Flash, but it is not often used in commercial/end user applications.]

NAND is not a perfect technology, however. While it is much, much faster, it is also more expensive (in terms of actual cost, as well as physical wear and tear on the device) per bit over time.

Yes, SSDs suffer from wear and tear. In fact, that’s one of the reasons why QLC is not “better” than TLC.

Why? I’m glad you ask.

The Fundamentals of Storing a Bit

It’s useful to remember that storage – all storage – has One Job™:

Give me back the correct bit I asked you to store.

Not “to store data.” This is incorrect; the job of storage is to give the correct data back to you. This is very, very important. Remember this – we will be returning to this concept.

For our purposes here, there are three components worth knowing about, though we’ll be focusing on just the first. (Bear with me – it will make sense at the end, I promise.)

In the graphic above, the block marked “1” represents the cell. This is where the actual bit is stored on the NAND device (called a “die”). These cells are collected into a page (“2″), which is the smallest unit that can be written to. In other words, when you write data to a Flash device, you must write to an entire page – you cannot simply write to a specific cell. That’s right – even if you only have 1 bit to write (hypothetically), you’ll need to use an entire page to do it.

As if that wasn’t confusing enough, these pages are combined into blocks. While you can write to a page, if you want to erase data, you must erase the entire block to do it.

So, that means that even if you have data stored in a cell just sitting there, minding its own business, if the block in which it’s residing must be erased, it needs to be moved.

That movement can be as a result of a couple of different reasons. For right now, let’s just suffice to say that data on SSDs gets moved around a lot– sometimes every day! (Generally speaking, no storage is ever usually kept in the same cell for longer than a month when the drive is in use.)

TLC Doesn’t Mean “Tender Loving Care”

At a high level, a cell stores a bit by keeping a charge. If there’s a charge, then there’s a bit. If there’s no charge, there’s no bit. Pretty simple.

[Technical info: Now, we’ll get into this a little later, but in Flash, bits are represented by voltage levels. In the simplest case using two levels, representing ‘1’ and ‘0’. Special thanks to my friend Rob Peglar for helping me re-word this so that it’s clear and accurate!]

In fact, this simplicity is what made Flash appealing (and so fast!) in the first place. Once again, though, it came with a tradeoff.

Storing a single bit in a cell – also called “Single Level Cell,” or SLC – has the distinct benefit of its simplicity to lasting a very, very long time. Remember when I said that bits get moved around all the time? With SLC Flash, you can expect close to 100,000 program read/writes life cycle. That is, when you write to a cell, you program it, in storage vernacular, and you can do that close to 100k times.

That gives it a really, really long life span. This is good. It is also really, really fast, since you only are doing the on/off programming of the cell. This is also good.

The problem is that there is only a limited amount of bits you can store at time, so the capacities are pretty small. This can be not-so-good. It’s also very expensive, because you are dedicating quite a bit of energy to babysit individual bits. Sometimes, simplicity comes at a cost.

In order to reduce that cost, it makes sense to add more bits into the cell. So, instead of just having one bit (on/off), you can add multiple bits into the cell. The way that it does this is to have different voltages going through the cell. So, if you have two voltages, you can detect whether one, both, or none of them are active. So now you have effectively doubled your capacity. Yay!

Here’s the bad news – running voltage through a cell is not as “clean” as whether or not the voltage is on or off. So, by running multiple volts through a cell, you get multiple chances for error. As you add voltages (2 for MLC, 3 for Triple Level Cells, or TLC, 4 for Quad-level Cell, or QLC), you start to compound the chances for error.

[More technical info, in case you’re curious. SLC has two voltage levels to represent the ‘1’ and ‘0,’ MLC has four, TLC has eight. As we’ll see in a second, QLC has sixteen.]

The more of a chance for error, the more “stuff” you have to put into the device in order to make sure you actually have the bit you think you have. That “stuff” includes, but is not limited to, error-correction codes as well as different caching mechanisms.

Now, the good news is that you get a lot more capacity. So, not only are you putting a lot more data into the same space (thus driving down the costs, as well), but you are also not wasting the really expensive NAND for use cases that don’t require it.

Take your friendly USB thumb drive, for instance. These things are so cheap as to basically be free (in fact, these things are given away more at tech conferences than candy at Halloween!). Why? Because they are not intended to be used as permanent storage. The TLC NAND in those devices are really only rated for about, oh, 500 cycles (compare that to the 100,000 cycles for SLC!). If you’re like me, you probably lose your drive much earlier than you hit that limit!

One of the other downsides to this, too, is that it slows down the write speed considerably. Remember, you need to not only store the bit, but you have to do it at the right voltage. And since we need to ensure that we’re going to be able to do storage’s One Job™, we have to add in quite a bit of that “stuff” to make sure we get the correct bit back.

In some cases, like the USB thumb drives, we simply don’t even bother with any data protection techniques like caching. The use case is cost, not quality. If you lose data on a thumb drive, the assumption is that it was disposable anyway, and not the source of your primary data.

So, here’s a piece of advice. Don’t use USB thumb drives as your primary data storage!

And Now… QLC!

So, you probably see where this has been leading to. QLC NAND flash has continued to add capabilities into the media. For instance, despite the “Quad” name, some NAND memory has the capability of using 16 different voltage levels (though, realistically, they probably use some of those voltage levels as buffers to help keep 4 bits stored without interference – lots of wiggle room between voltage levels).

As a result, QLC has adopted all of the benefits of the lower cost-per-bit (and, consequently, cost-per-Gigabyte), and solved some of the major capacity problems/limits for using SSDs. After all, HDDs have been really, really good at long-term archiving for data that doesn’t need to be accessed or changed regularly.

Now, then, if you have long-term backup or archiving solutions, you can actually use QLC NAND because it’s cheap and the data is not supposed to change frequently.

However, it has also adopted the fragility of multi-cell technology. That means that there is a very low program cycle, and the fragility of the medium means that you need to compensate somewhere (usually in the software stack). This means slower storage, overall, and storage that can’t be used when the data changes much (if at all).

So… “Upgrade?”

There are devices on the market for Data Centers that use QLC as the primary storage medium, but they are not intended to be used for high-transaction data (such as databases), nor would I recommend them for limited deployments. That is, QLC is best for when you are planning on having a lot of storage devices with multiple copies that don’t need to change very much.

To that end, Iwould not put QLC drives in my personal computer, or even in my home NAS devices. I might create a special archive box in my personal lab with QLC storage because I would use it as an archive node only – but even then I’d effectively make two of them because the media is so fragile, and I tend to forget to refresh my drives when I should. 🙂

Please don’t take this as a condemnation of QLC Flash. Nothing could be further from the truth. Like any technology, QLC Flash is good for the appropriate use case. The trick is in knowing whether or not youruse case is appropriate or not.

If you’re looking for more information on how to avoid memory loss for Flash, you may want to take a look at another article I wrote on the subject:

Storage: How Does Flash Memory Avoid Data Loss?

You will actually see a familiar graphic and description in there as well. 🙂

Another good source for more information is, believe it or not, the Wikipedia entry. I know, I’m shocked too!

[Update: Check out this good article:  Your Next SSD May Be Slower (Thanks to QLC Flash)]

Comments

  1. Pingback: Storage Short Take #12 – J Metz's Blog

  2. Pingback: Storage Short Take: Bonus FMS Edition – J Metz's Blog

Leave a Comment