Gartner on FCoE. Whoa There, Sparky

In Storage, Technology by J Michel Metz23 Comments

I’m too cheap to shell out the $195 (that’s $20/page!) for the latest Gartner report on FCoE, but there have been enough reports on the report that it’s not difficult to see the gist is consistent with Gartner’s long-running hate-affair with the nascent technology. I’m the first to admit that FCoE isn’t a panacea for the data center’s myriad array of issues, but Gartner seems to be far too willing to play the convenient role of grumpy old man and needs to be placed into some perspective.

As I said, I’m too cheap to shell out $200 for a 10 page report, but fortunately several trade rags have provided direct quotes of some of the hot topics which allow me to address some of the more interesting points.

Define Your Terms

Storagenewsletter points out that Joe Skorupa, research vice president at Gartner, is a wee bit skeptical about what a converged data center core can really mean, but Skorupa seems to mix and match his terms:

“The industry is abuzz with the promise of a single converged network infrastructure, this time in the data center core,” said Joe Skorupa, research vice president at Gartner. “Alternatively described as Fibre Channel over Ethernet (FCoE), Data Center Ethernet (DCE), or more precisely, Data Center Bridging (DCB), this latest set of developments hopes to succeed where InfiniBand failed in its bid to unify computing, networking and storage networks.”

There are two issues that I have with Gartner’s statement. First, FCoE != DCB. Data Center Bridging  (or Converged Enhanced Ethernet, or Data Center Ethernet – pick your poison) refers to a set of Ethernet modifications which, in turn, set the stage for Fibre Channel over Ethernet. In other words, FCoE uses DCB as a superset of standards, but you do not need to have FCoE in order to have or use DCB.

Not all DCB is FCoE

Why is this important? Because understanding the nature of FCoE as it relates to the broader spectrum of Ethernet standards can mitigate confusion. If Gartner (or anyone) conflates the two it makes it easier to make criticisms that apply to one but not the other, and since Gartner begins with this erroneous assumption we have to wonder if the criticisms adequately apply to FCoE specifically, or DCB in general.

The second problem I have with this is that yes, there is a very positive potential for converged networks in the data center core, but if there weren’t why would the industry be abuzz? Gartner claims that the “buzz” is about the core “this time,” as if to say that vendors have failed to capitalize on any earlier promises and are now attempting to shift focus elsewhere.

The potential, or promise, for a converged core is still over a year away and despite bloggers, pundits, and critics claiming that vendors have been all a-gaga about this I have yet to see an actual vendor make any claims about current promises of FCoE that have been broken.

Learning How to Count

Then we get into the crux of some of the issues:

“The promise that a single converged data center network would require fewer switches and ports doesn’t stand up to scrutiny…This is because as networks grow beyond the capacity of a single switch, ports must be dedicated to interconnecting switches. In large mesh networks, entire switches do nothing but connect switches to one another. As a result, a single converged network actually uses more ports than a separate local area network (LAN) and storage area network (SAN). Additionally, since more equipment is required, maintenance and support costs are unlikely to be reduced.”

It is true that one of the major hassles with large mesh networks is that you can often find yourself dedicating entire switches to ISLs. This is particularly true in FC-land, where you must dedicate ports as E-ports to connect. A notable exception to this rule is my alma mater, QLogic, whose nifty 5800 stackable switch dedicates high-speed ISLs for inter-switch traffic leaving standard ports available for what they were intended.

This “stackable” mentality is the driving force behind FCoE as well. Both Brocade and Cisco have designed their 50×0 and 8000 switches, respectively, to be Top of Rack (TOR) solutions. It’s an imperfect (okay, damn ugly) metaphor but for our purposes here let’s stick with it.

By using a TOR solution, which in turn leads to an End of Row (EOR) switch chassis, the data center becomes more streamlined, not cluttered. Gartner’s premise that mesh networks occupy more ports in order to sustain the mesh is the genesis for the assertion that a single converged network uses use more ports than a separate LAN or SAN.

What Gartner is missing – and is glaringly obvious when talking in real-world applications – is just how many ports are currently being used by LAN and SAN environments. When I was presenting on FCoE back in 2008 and 2009, I would routinely ask my audience just how many NICs they had installed per server. In nearly every presentation (and I did dozens), there were at least 1 or 2 members of the audience who had 16 NICs per server.

That’s 16, folx. 4 to the power of 2. 2 x 2 x 2 x 2. 16 FREAKIN‘ NICs. Dayum!

Mathematically, if I can replace 16 nics (even with 1 port only) and 4 HBAs (even with one port only) with 2 CNAs (with 2 ports each) you have port reduction.

Financial Barriers?

Gartner makes the assertion that because more ports are necessary, more equipment is required, and ergo your maintenance and support costs balloon.

All things being equal, this would be true. However all things are not equal.

For one thing, we’re talking about moving to FCoE as an evolutionary/expansionist approach, rather than Rip-n-Replace. This is a move to 10GbE which isn’t ‘free,’ as single 1GbE infrastructures are often seen. This means that you are buying one 10GbE/FCoE switch as opposed to both Ethernet and Fibre Channel switches. You are purchasing one CNA as opposed to one 10GbE NIC and one 8Gb FC HBA. You’re purchasing one cable as opposed to two. Two SFP+ transceivers per Eth/FC port combo versus 4.

This is asset reduction, and it only gets better when you start figuring out how many NICs and HBAs you’re truly replacing.

Maintenance costs? Should we get into the power and cooling costs you get when you swap out Cat 6a for TwinAx (16w vs. .1w)? In another post I’ll break it down for you in detail, but the bottom line is that for large DC deployments you’re talking over $70k/year in power reduction just from cabling.

Increased Complexity

Gartner asserts that by layering multiple protocols onto a single infrastructure, increased complexity is inevitable:

Gartner also believes that there are significant design and management issues to be addressed. When two networks are overlaid on a single infrastructure, complexity increases significantly. As traffic shares ports, line cards and inter-switch links, avoiding congestion (hot spots) becomes extremely difficult.

This is true. And it’s also not news.

Fibre Channel over Ethernet is, in effect, a method of virtualization of storage traffic. The method by which the links are created involve virtualizing the nodes and ports, which in turn abstract the links themselves. An entirely new protocol of link discovery, called FIP (Fibre Channel over Ethernet Initialization Protocol) needed to be developed in order to handle not only the addressing scheme but also the link management.

But again, why is this a shock? Is Gartner shocked, shocked I tell you! that VMware needed to increase complexity in order to virtualize hosts onto bare metal hardware, for instance? Is ESX doomed, doomed I tell you! because adding multiple hosts on a single hardware platform is more complex?

Mr. Skorupa said that over time, emerging standards, such as Transparent Interconnection of Lots of Links (TRILL) may make it easier to avoid these hot spots, but mature, standards-compliant implementations are at least two to three years away.

Well, no and yes. TRILL is not a method to handle congestion, it’s a method of maintaining link-states to mitigate temporary loop issues, a way of addressing some of the issues surrounding spanning tree. It’s a topic worth exploring on its own, but what it isn’t is a method of “hot spot handling.”

What is true is that standards-compliant implementations are a ways away. However, it’s not clear what the criticism here really is. Is Gartner complaining that they’re not here yet? Are they claiming that FCoE vendors are saying that it is? Are they saying that because it’s not available now we should stop working on it, pack up our bags and go home?

What’s your point, dude?

It’s not a Debug, it’s a De-feature!

Gartner makes a very interesting claim about debugging problems on a converged network, since the “interactions between LAN and SAN traffic can make root cause analysis more difficult.”

Since many problems are transient in nature, events must be correlated across the two virtual networks, increasing complexity. Should an outage be required for solving a problem or simply for performing maintenance, a downtime window that is acceptable for both environments may be required. This increases complexity and may increase cost, as well.

Wait, what?

When you have an Ethernet problem, what are your troubleshooting steps? Chances are if you’re an Ethernet network admin you have a series of steps to go through (I’m being sarcastic here; there are very well-known and well-tested troubleshooting techniques. Packet sniffing, anyone?). If you’re a Fibre Channel guy, you have your own troubleshooting techniques.

With Fibre Channel over Ethernet – guess what? – you sniff packets and troubleshoot the same. freakin’. way. You use the same tools you have always used because to an admin it is fibre channel and ethernet.

The notion that “problems are transient in nature, events must be correlated across the two virtual networks,” is bizarre in light of the way that FCoE packets are handled in a DCB switch. FCoE traffic is not transient across the link by any means, and to suggest that somehow LAN traffic and SAN traffic intermingle is to imply that Gartner has no clue how PFC works.

Because of the fact that FCoE necessarily is a virtualized abstraction layer the traffic does not get “correlated.” Don’t believe me? Take a look at the Ethernet frame and tell me how decisions based upon ethertypes can somehow get ‘confused.’

If an outage occurs for solving a problem “or simply performing maintenance,” Gartner appears to be concerned that admins must find an “acceptable” window for both environments. It’s a good thing that outages don’t cause that kind of grief right now, eh?

From the Sublime to the Surreal

How is this for the most bizarre conclusion ever:

“[W]hile the promise that a unified fabric will require fewer switches and ports, resulting in a simpler network that consumes less power and cooling, may go unfulfilled, that doesn’t mean that enterprises should forgo the benefits of a unified network technology.”

There are so many things wrong with this statement it’s difficult to know where to begin.

After saying that the “promise” is not available now, but in some cases only a year (or three) away, now Gartner claims that it will “go unfulfilled.” Way to throw the baby out with the bathwater on this one.

Well, there you have it. It’s not available now, so it will NEVER be available. I might as well just take my bat and ball and go home and never go outside again.

But then, with no explanation whatsoever (and no connection to the logic), Gartner qualifies the claim that enterprises shouldn’t “forgo the benefits” of a unified network technology – those same benefits that it just said would go unfulfilled. Gartner then goes on to support two separate networks in the conclusion of the press release.

In short, it appears that Gartner is attempting to switch back and forth between the circles in the Venn diagram shown above, thus reinforcing the importance or properly defining your terms in the first place.

Missed Opportunities

What’s interesting to me is how Gartner apparently missed the boat on some very real, and very legitimate concerns about FCoE and converged networks.

For one thing, congestion management is something that SAN admins have a right to be concerned about and while the Ethernet folks are getting it sorted out they’re using terminology that may not be familiar to storage network admins. There’s going to have to be some major clarification going to happen.

For another, there is still the question and concern about multi-hop capabilities. Fibre Channel has its own limitations, and FCoE currently does not permit multi-hop configurations. This is scheduled to be ratified in the next standards revision, but is certainly a legitimate concern for those interested in implementing FCoE in any sizable deployment.

Still another is the real comparison between FCoE and 10Gb iSCSI. For those who simply want a wicked fast storage environment iSCSI has been maturing nicely, showing incredible performance when tweaked properly, and there are no shortage of technical gurus to help companies get to where they want to go.

Additionally, and perhaps most importantly, is the aspect of the cultural changes that are required for FCoE deployment. Perhaps most glaringly is the simple fact that cross-functional planning is required across teams that are traditionally heavily fortified silos. Had Gartner focused on the cultural implications, rather than some very bizarre technical claims, they would have remained on pretty solid ground.

Myopic Strawmen

Gartner misses some of the most important potential benefits of FCoE and the reason why it’s been getting so much buzz.

The operating cost reduction (see the cable example above) and asset reduction (see the NIC and HBA reduction example above) are not the only benefits of FCoE. There are two things, in particular, that are quite interesting about the technology that you don’t hear a lot about (and are worth exploring in detail on their own at a later date).

First, the actual implementation of FCoE, both from a frame and a transmission (PFC) standpoint, is incredibly simple and modular. It’s abstracted nature means portability and flexibility, just like any other virtualized environment. The full ramifications of what is possible haven’t even been fully explored, let alone tested. With simple building blocks you can create some very customizable solutions.

Second, there is the quietly looming benefit of Enhanced Transmission Service (ETS). In a nutshell, PFC – which separates out the pause flow control mechanism into multiple traffic classes – doesn’t provide a way to associate different traffic classes with priority levels or limit bandwidth. ETS is a way to address that, by classifying traffic, queueing traffic, and more granular transmission selection.

While detailed description of ETS goes beyond the scope of this particular post, the important point is this: by taking these basic building blocks and the implicit flexibility of delivery, LAN and SAN administrators have an incredibly powerful ability to customize their networks and tune them precisely how they want them to be.

Conclusion

Ultimately, anyone who believes that FCoE is the Second Coming needs to stop drinking with Jim Jones. There’s no question that this is not at the level of maturity of either of its underlying technologies, but nor does anyone who is looking at it seriously suggest that it is.

Gartner’s Chicken Little approach seems to be a way of garnering (pun intended) attention with respect to FCoE, but does it in a way that seems to indicate that either they’re not completely familiar with the technology or simply needs to feel that it’s important to be contrarian.

FCoE has numerous obstacles to overcome, but to dismiss the “promise” of a technology because that promises has yet to be fulfilled seems disingenuous.

.

You can subscribe to this blog to get notifications of future articles in the column on the right. You can also follow me on Twitter: @jmichelmetz

Comments

  1. This is reminiscent of the technical superiority of token ring over ethernet. FDDI was good at moving multimedia. For awhile, ATM was the high bw solution in the LAN. TDM telephony was not afraid of VoIP. It goes on and on. 802.3ba is less than 3 months away. If I was a Gartner customer, I would not find this helpful.

  2. J Michel – this is an excellent piece of work. Your ability to cut through the BS on this topic is impressive. I want to state that I, like you, have not read the entire report. But the excerpts and tone are pretty clear from the summaries that have been published.

    Personally I believe it stems from two factors: 1) Gartner trying to be Gartner and 2) The fact that Gartner talks to users about these topics.

    I speak to lots of users too and I will make two observations: 1) many don’t understand FCoE and 2) they’re apprehensive about change (wow there’s a shocker). So I think this confusion and fear are seeping through in Gartner’s analysis and of course Gartner’s only too happy to play on those. I don’t blame them but it’s unclear what service they provide to buyers with this report.

    The problem as you state is the flawed logic of “it’s 2-3 yrs away; it’s unfulfilled; maybe it will work with TRILL, etc.” It’s unclear as a buyer what conclusion to make. I might as a buyer say– OK…this is all BS and I’ll ignore it because it won’t save me money. Or “Oh…I don’t have to worry about this one it’s all nonsense.”

    I would not make these recommendations to users. I would say the potential is there to absolutely cut connection costs. And I would also point out that heat density is of increasing importance in many data centers and as you point out the reduction in the number of things I’m stuffing into servers will absolutely help address this problem.

    Further – I would advise buyers to get educated on this topic because with all the vendor investment in FCoE there is a very good chance (P = 0.9 Gartner dude!) that it’s going to happen.

    And rightly I would say that users are concerned with performance, they are concerned with cascading dependencies and they haven’t thought through the organizational implications.

    I would advise buyers to start playing with the technology and doing POC’s now. It’s an excellent lab project that if I were a CIO I would have my people on now so that I can know more than all the sales guys coming into my shop.

    I would also advise CIO’s – as you point out – to start thinking about the organizational implications of bringing the networking and storage groups together. Different worlds with different reporting lines with different incentives. Better start thinking about roles.

    I’m glad Gartner did this because it puts a stake in the ground for the crowdsource crowd to vett the issue and add additional value to the user community.

  3. Author

    Scott and Dave, I agree with everything you’ve both said. The one question I keep coming back to is, “What does Gartner gain from this?” The only thing I can think of is that they can attempt to maintain some dependency from their customers. Even that seems a bit weak, but I can’t think of anything else.

    I admit it’s been a while since I’ve been in front of an FCoE audience (about 6 months now), but from what I’ve seen their problems and concerns haven’t changed all that much. The general impression I’ve gotten from people is that this type of thing simply isn’t useful. *shrug*.

  4. Pingback: ViewYonder » Chicken Little in the Unified Data Center starring Joe Skorupa of Gartner

  5. Excellent post. I too haven’t read the full report, because it was clear from the points being published that Mr. Skorupa was being feed an incomplete sound bite, and neither he nor the person writing the report really understood the basics about FCoE.

    While FCoE holds promise, it’s still needs to mature before being set loose in the data center. But, I think it’s more of a question of ‘when’ rather than ‘if’.

  6. Not that Gartner wants or needs any defending (they are doing just fine on their own thank you very much), and not that I want to be in the business of defending them (I’m usually in the business of challenging them), but it bares pointing out that Storage Newsletter deserves some blame here. If you actually read the Gartner report this is the central thrust:

    “However, while the promise that a unified fabric will require fewer switches and ports, resulting in a simpler network that consumes less power and cooling, may go unfulfilled, that doesn’t mean that enterprises should forgo the benefits of a unified network technology.”

    Mr. Skorupa said that there is clear benefit in standardizing on a single technology for all data center networking if that technology adequately supports the needs of applications. This will simplify acquisition, training and sparing.

    There are two implications to this: (1)the report supports unifying on Ethernet even if that means maintaining two separate Ethernet networks for the foreseeable future–and this actually helps make the case for FCoE going forward. Second, the report is supportive of FCoE in the access layer.

    Storage Newsletter obviously picked up on the more sensationalist aspects of the report while glossing over its central contention. Part of the problem here, IMHO, is the sensationalist title of the report, which actually does a disservice to an otherwise fairly well-balanced read. Just the same, I honestly believe that both the Storage Newsletter piece and this blog mis-characterize what Joe Skorupa intended.

    Let the flaming begin : )

  7. Author

    No flaming, Jesse.

    As I mentioned, I didn’t get the chance to read the full report, but Storage Newsletter didn’t necessarily hype anything up. Essentially what they did was reprint the press release from Gartner (I saw the PR, but can’t seem to find it at the moment and don’t have the time to scour for the link, but Dave Simpson reprints it here: http://bit.ly/amdtmS

    So, Gartner is responsible for its own sensationalism here, I believe.

    Gartner has had a long-standing issue with FCoE. In October 2008 they published (by the same author) a report called “The Folly of FCoE” and promoted PCIe as the interconnect of choice. This is an interesting claim and definitely should be examined further, but the debate seems disingenuous when many of the claims about FCoE simply don’t hold water (or confuse DCB for FCoE).

    Of course, if someone wants to donate the Gartner report I will be more than happy to give it it’s full due diligence and will gladly write a more complete and comprehensive review based on more than just the press release. 🙂

  8. Oh I should also mention that I work for Cisco as fair disclosure.

  9. J Michael, that’s a very fair point. I do believe that Gartner’s PR folks glommed onto the more sensationalist aspect of the report — whether to sell more copies or because they didn’t understand the nuances. Just the same, as a Cisco contractor, I find the report to be neutral/positive for Cisco and its FCoE interests when it is taken as a whole.

    Personally, I’d love to send you a copy of the report, and I feel like Gartner should do so for their own benefit, but it’s their business model for better or for them to work out. Still, I really hope they do because I think it would clear up some misconceptions.

  10. Author

    Well, I’m not sure if they’ve heard about this (my) response, but no one from Gartner has contacted me. I try to be as fair as I can be but from what I can see from the reaction this piece has gotten (and it’s shocked the hell out of me, TBH), it appears that Gartner has established an uphill battle for themselves. Their own press release has put them in a position where now they’ll have to go back and claim “that’s not what I meant,” which is an awkward place to need to start from.

  11. FCoE debate is in fact caused by vendors themself and not analysts.

    None of the switch and CNA vendors have products that works with existing storage devices without adding ports to the equation (large deployment). FCoE native storage will be massively around mid-2011 or earlier. Until then, most of storage will remain FC causing additional ports and support issues with FCoE maturity. These FCoE vendors blame protocol certification processes but in fact they did a poor job understanding storage requirements at first. They also did a poor job not forcing, at least, one 10Gbits connection to any sold servers. They are still running Gbit by default in 99.999% servers sold. That would had attention to move to FCoE.

    This FCoE technology need to mature and provide “facts” and not future promises as they like to do for the last 3 years. Otherwise this protocol will fail like iSCSI and Infiniband failed to do the same concept and promises of cheap storage connections over the last 5 years.

    Massive amount of FC products are running today. None of them will be scrap just base on future promises of lower cost. It will take another 2 to 3 years before we start to see FCoE gaining significant market-share over other storage protocols.

  12. Good article and discussion in the comments – I actually have a copy of the Gartner report and do think that the negative pieces were highlighted a bit more. A question that I ask people is “what is success for FCoE”. The comment from Visiotech seems to say that if a protocol isn’t #1 that it is a “failure”. iSCSI has done very well in the commercial markets. InfiniBand is doing great in HPC environments. Customers are risk averse and industries often take a decade or more to adopt new technologies. FCoE will NOT be the default solution this year, but that doesn’t mean that the solution set and ecosystem aren’t growing nicely. Understand what can be done today and plan towards the future – this is the general point in the Gartner article and one that I’d agree with.

  13. Pingback: FCoE vs. iSCSI: The Cagefight! – Flexibility « J Metz's Blog

  14. Precision for Mr. Miniman.

    For me iSCSI and Infiniband are a failure to replace Fibre Channel. Not a failure in terms of achievement. As you say, they both have their own market with respectable success too.

    I remember 5 years ago when startups and large IP vendors were ridding on iSCSI 10Gbits as the next big thing in IT for Fibre Channel replacement. We all know now it did not happen as they promise. Yes they grab a small part of market. Connectivity wise NAS with NFS and CIFS have probably much bigger impact today than all others protocols.

    Another area IT directors are looking today. Cost and cost. Most FCoE gears available today are about 2 years old. Most vendors replace their products within 3 years. Just in time for FCoE to be ready in 2011 and 2012…

    Like experienced suffers, they are looking at the horizon and wait for the big wave to hit the reef before jumping in…

  15. I work for NetApp and am a big proponent of Ethernet storage, regardless of protocol. We are currently the only storage vendor to support FCoE.

    I think FCoE is very interesting in order to upgrade existing hardware. Meaning, for those with large FC investments, FCoE offers a significant opportunity to improve efficiency in the data center. And the performance is excellent. However, an interesting question to ask is that with 10GbE port prices dropping below $500 and 10GbE ports soon to be found embedded on motherboards, what will be the choice for new data centers or new storage deployments? iSCSI or FCoE? What impact does virtualization or the inevitable FCoE software initiator have on that decision?

    DCB helps all data traffic types, and the foundation of FCoE is still FC. I think iSCSI benefits from FCoE and DCB for at least a couple of reasons.

    1) FCoE validates Ethernet as an enterprise storage fabric
    2) 10GbE levels the playing field for the physical network transport – no more performance debate (at least in terms of transport bandwidth)
    3) New server CPUs are so fast, that IP overhead is becoming a non-issue

    Add to the above, the scalability, ubiquity, and virtual addressing capabilities of IP protocols which all give iSCSI an advantage.

    Ultimately, moving to 10GbE with DCB offers the ability to manage all of the environment more effectively and efficiently, regardless of which protocol you choose. And having a unified storage platform that can support all protocols equally makes that message even more compelling.

Leave a Comment