FCoE vs. iSCSI: The Cagefight! – Performance

In a previous post, I posed a series of questions with respect to understanding the nature of FCoE and iSCSI marketplaces. In this post I’m going to address one of those questions:

Is iSCSI fast enough at 10GbE, 40GbE, and 100GbE – even with overhead concerns, to rival the performance and efficiency of FCoE?

[2012.08.09 Update: This has finally been tested and I’ve written an update about the performance of the two protocols on my Cisco Blog]

Defining The Terms

I want to try to avoid the “yeah, but” or fanboi comments from the outset. First, I understand FCoE much, much better than I understand iSCSI. So, there may be some specifics or details that I may be missing and I highly encourage corrections or additions. My motive here is to examine the technologies in as detached and unbiased as possible to get to the true performance numbers.

Also, I’m looking here at the question of performance. By itself performance is a pandora’s box of “it depends,” and I understand and accept that burden from the get-go. Performance, like price, must be handled as a purchase criterion in context, so I’m not suggesting that any recommendations be made solely upon any one element over another.

Having said that, what exactly are the performance concerns we should have with iSCSI vs. FCoE?

The Nitty Gritty

At first glance, it appears that FCoE provides a more efficient encapsulation method using standard transmission units. There is no need to travel as far up and down the OSI layer stack, for example, which means that there is less processing required on either end of a point-to-point network for dealing with additional headers.

If you’re new to this, think of it this way: You have a letter you want to send to Santa Claus. You write your letter and place it in an envelope and then drop it in the mail. That letter then arrives at the North Pole (if you addressed it properly) and Santa’s helpers open the letter and hand it to him. That’s the FCoE metaphor. (Actually, here’s a much better – and visually appealing – description).

How many layers?

The TCP/IP metaphor (with respect to layers) means that you have to take that letter to Santa Claus and then place it into a larger envelope, and then put that larger envelope into a box before sending it on its way. The extra layers of packing and unpacking takes time and processing power.

iSCSI requires more packing and unpacking in order to get to the letter, the argument goes, so over time that would mean that Santa would – in theory – be able to open fewer letters in the same amount of time.

There is evidence to suggest that this conventional wisdom may be misleading, however. There are a lot of factors that can affect performance to the degree that a properly-tuned iSCSI system can outperform an improperly configured FC system.

In fact, an iSCSI storage system can actually outperform a FC-based product depending on more important factors than bandwidth, including the number of processors, host ports, cache memory and disk drives and how wide they can be striped. (Inverted.com).

Ujjwal Rajbhandari from Dell wrote a blog piece comparing the performance between iSCSI, FCoE and FC in which he found that iSCSI’s efficiency can be profound, especially when enabling jumbo frames.

Dell’s measurements are somewhat difficult to place in context, however. While the article was written in late October, 2009, only 4Gb throughput was used even though FCoE cards running at line speed had been available for more than half a year. (Also, the graphs are difficult to turn into meaning as well: one of the graphs included doesn’t really make much sense at all, in fact, as it appears that CPU utilization is a continuum from reading to writing rather than a categorization of activities.)

It seems to me that the whole point of understanding protocol efficiencies become salient as the speeds increase. The immediate question I have is that if Dell points out that iSCSI efficiencies at 1GbE are inappropriate when compared to faster FC speeds, why would Dell compare slower FC speeds and efficiencies to 10 Gb iSCSI?

For instance, when moving from 4Gb to 8Gb HBAs, even within a pure 4Gb switching environment using 4Gb storage, the overall throughput and bandwidth efficiency can increase significantly due to the improved credit handling.

Nevertheless, there is plenty of evidence to suggest that iSCSI performance is impressive. In February Frank Berry wrote an article about how Intel and Microsoft are tweaking iSCSI for enterprise applications, improving CPU efficiency as well as blasting through some very impressive IOPS numbers. Steven Foskett has a very interesting article on on how it was done and rightfully asks the more important question, can your storage handle the truth?

Now, it’s very easy to get sidetracked as far as looking at other aspects of a FCoE/iSCSI decision tree. “Yeah, but…” becomes very compelling to say, but for our purposes here we’re going to stick with the performance question.

How much performance is enough?

Ultimately the question involves the criteria for data center deployment. How much bandwidth and throughput does your data center need? Are you currently getting 4 GB/s of storage bandwidth in your existing infrastructure?

There is more to SAN metrics than IOPS, of course; you need to take it hand-in-hand with latency (which is where the efficiency question comes into play). Additionally, there is the question of how well-tuned iSCSI target drivers have been written.

So, obviously iSCSI performance can be highly tuned to deliver jaw-dropping performance when given the right circumstances. The question that comes to mind, then is…

How does performance scale?

iSCSI best practices require a completely separate iSCSI VLAN or network, which help with dedicating the bandwidth for SAN traffic. Nevertheless, what’s not clear is what happens to the performance at larger scales:

  • What happens with boot-from-SAN (e.g., PXE) environments?
  • What is the theoretical maximum node count?
  • What is the practical maximum node count?
  • What is the effect of in-flight security (e.g., encryption) upon performance? What is the threshold for performance degradation?
  • How does scaling affect the performance of the IQN server/management?
  • Where is the retransmission threshold for congestion and what is the impact on the performance curve?

This is where my limited experience with iSCSI is likely to get me into trouble. I’m having a hard time finding the answers to those questions as it relates to 10Gb iSCSI, so I’m open to input and clarification.

Bottom line.

Even with these additional questions that arise regarding issues that affect performance, it’s clear that iSCSI does have the performance capability for data center storage traffic. There are other considerations, of course, and I’ll be addressing them over time. Nevertheless, I think it’s quite clear that all things being equal (and yes, I know, they never are), iSCSI can easily put up the numbers to rival FCoE.

.

You can subscribe to this blog to get notifications of future articles in the column on the right. You can also follow me on Twitter: @jmichelmetz