Could the internet, one day, ever be full? Like, no more content would fit or run out of storage space?
I was going to pass on this Quora question, but it got me thinking. I actually know the answer from a unique perspective, so why not provide a useful response?
As I write this, and hopefully for some time to come, I am on the Board of Directors for three storage industry associations. SNIA, the Storage Networking Industry Association; NVM Express; and the Fibre Channel Industry Association (FCIA).
Between these three organizations, and the related and partner organizations, we work on this very problem every single day. It is our responsibility to not just make sure that something like this never happens, but can never happen.
Behind The Question
The question pertains to how much data the Internet can hold. At first glance, I know there are a lot of people who would scoff at such an idea – after all, there are always more hard drives that you can buy, always more network links to add.
Here’s the thing, though: That storage is absolutely worthless if you can’t get it back.
You see, storage has One Job™. Give me back the correct bit I asked you to hold on to.
That’s it. There’s nothing more to it. If you can’t get back the bit you need, when you need it, then storage is worthless to you.
However, it’s much, much easier to retrieve an item from storage in a situation like this:
Not only do you have to know where the bit is, but you have to know the route to get there. You also have to know if it’s the correct bit.
The larger the Internet gets, the more difficult that job is to do correctly.
How Large is Too Large?
The Internet is actually an Inter-Net. An inter-networked set of networks. It’s initial purpose grew out of the ARPANET, a network designed by the Advanced Research Projects Agency (ARPA, later renamed Defense Research Projects Agency, DARPA) to promote resource sharing among the universities and research institutions.
Sidenote: Many people think that the Internet was developed to withstand a nuclear war. This is simply not true. ARPA realized that they were spending upwards of $2M in 1965/6 dollars funding the same projects at different universities and wanted them to work together.
When Bolt, Beranek and Newman (BBN) were contracted to develop the first cross-country network to connect these universities, they opted for a technology called packet switching. It just so happened that packet switching was developed to build in reliability in networks that hadn’t existed before, and up until that time hadn’t even been tested.
It was theorized at the time that a packet-switched network with seven nodes could withstand an outage caused by a catastrophic event, such as an atomic blast. However, it’s too much of a stretch to say that the Internet was developed with this as a primary goal in mind.
What we do in these industry standards bodies is make sure that when you want to get your bit from across the planet, it gets back to you safe and sound – no matter what equipment you’re using.
To do this, you need to have a handle on how the growth of storage is managed, not just the storage itself. The more data you dump into a publicly (or semi-publicly) accessible realm like the Internet, the more difficult it will be to find later on, so entire systems (and systems of systems) need to be put in place.
Every year we talk about how to handle the zettabytes and yottabytes of data information that is coming. Every year we talk about how to cope with that much data, and as of this writing many people are content with letting big hyperscaler data warehouse companies (like AWS, Azure, GoogleCloud, etc.) handle the heavy lifting for them.
But even those guys are acutely aware of how difficult it is to keep up with the increasing pace of data storage, and when you have customers who are looking to get their data back, they cannot afford to lose even a single bit among that zettabyte data pile.
There is a difference between how much data and storage there is on the Internet, and how much you can access (or find). So, the question changes from “finding a place to store your data” to “finding your data,” and that is a very, very real concern.