This past week I went to the SNIA Storage Developer’s Conference in Santa Clara. It was the first time that I came to a conference as an attendee in a very, very long time. Overall, the experience was positive and interesting, but there were some moments of disappointment as well, as some of the major limitations of storage perceptions came out and hit you across the face like an 18-pound sledgehammer.
For me, personally, the highlight of the week were some of the keynote presentations. The top two, by far, were presented on Wednesday morning by GoDaddy and NetFlix.
In particular. Julia Palmer, Manager of Virtual Storage and Backup Teams at GoDaddy, presented a fascinating walkthrough of the growing pains the company had to undergo in order to accommodate rapid growth from their inception in the late 90s through the 2000s, up through the crunch of 2008. I was impressed with the candor (and humor) of her presentation as she discussed the need to accurately model storage growth, including the growth in metadata, where freeware tools fall far, far short. When you have 31+ Petabytes of customer data, you can’t afford to make mistakes.
Jason Brown, Senior Software Engineer for NetFlix, took a similar approach to outlining what has to be done in order to get a several hundred Terabytes of data moving to millions of subscribers. I’m not a programmer, but even I could follow most of why things weren’t working properly and why certain decisions were made (especially after NetFlix’s very public outage on Christmas Eve last year thanks to AWS).
In both cases, the presenters delivered excellent, technical, and yet entertaining keynotes about their particular case studies that went beyond just the typical “this is the veiled commercial about what we do and wouldn’t you like to be a subscriber too?” Instead, both presenters focused on admitting that there were times that they didn’t know what they didn’t know, and moved forward based upon what they thought were ‘common sense.’
However, common sense actually can make things worse, especially in storage. GoDaddy had an excellent example of this. During the evaluation process for new storage solutions, the main question was, “would this work in production?”
The common sense answer was to establish a change in the procedure by introducing new storage solutions in a controlled production environment with slow customer ramp-up. The expectation was that any issues that came up would only affect a small number of customers who, because of the low numbers, could be easily re-routed to other environments.
GoDaddy moved forward with designs based upon vendor specifications, and everything seemed to be working okay. However, what may be common sense in a testing environment may not work under full load, which is where storage deployments often break (they don’t bend). Sure enough, load-related failures can take up to 12 months to happen, by which point in time you could have potentially thousands of customers disrupted.
Ultimately, this lead to the CIO mandate: “No more testing on live customers!”
This, in turn, lead to changes in thinking about the way they did their modeling and testing, and they moved to a more efficient (albeit not freeware) solution that more accurately modeled their production environment. As a result, they’ve been able to better handle the growth requirements and even prepare for unexpected hiccups in a systematic fashion.
The Keynotes definitely weren’t the only great sessions to watch, however. Without question the most consistent track of all was the SMB, where presenters from Microsoft (of course), Red Hat, and some independent developers. This doesn’t surprise me, as they have been working non-stop and full-bore to improve the SMB protocol to work with Hyper-V, and that effort has paid off handsomely with some incredible improvements, as well as some very interesting science projects (such as SCSI direct access via SMB3). It would have been very nice to see some information about what’s going on with StorSimple, however.
One of the other stellar sessions I went to is without-a-doubt Jeda Networks. I first learned about Jeda a while ago, and if there’s anyone who comes closest to understanding a “software-defined” model for solving problems, they do.
Jeda’s solution is a very simple principle – take the nature of a storage fabric (like Fibre Channel or FCoE) that currently has to reside on the hardware in switches – and then distributed to all the switches in the network – and abstract that fabric information into a software controller instead. This means that the fabric can take advantage of a Data Center Bridging (DCB) Ethernet network’s facilitation for lossless traffic and removes the limitation that older ASIC hardware places on scale.
Is it perfect? No. Are there more questions to answer? Absolutely. But without question it was the closest I saw during the entire week to a solution that actually had software define a network’s attributes and capabilities.
In my mind the Storage Developer’s Conference is everything that SNW should return to. It’s a technical big brother to the Storage Plumbing and Data Engineering Conference (SPDECon) and is focused more on the extremely technical deep dives for storage, which is necessary.
However, as I’ll explain in the next post, there are some major, major problems that became apparent as I moved from track to track and session to session. From the perspective of the storage industry, I think my eyes were opened even wider to the struggles and problems we face as an industry.