Cloud storage, especially object storage, is often marketed by touting its “durability,” with many providers boasting eleven or thirteen “nines”, in other words 99.999999999% reliability. It sounds great—as close to 100% reliable as you can get. But what is durability in relation to storage, and do you really need those eleven nines?
All storage resides on an underlying media, in most cases hard disk drives, and in some cases flash storage arrays. Regardless of where the media is located within the data center, different technologies can access it, split it up, and share it among different hosting products. You can read more about types of cloud storage to see how the primary platforms of file, block, and object differ.
Durability is a measurement of the tiny errors that occur in files due to these underlying media. When you write, read, and rewrite gigabytes, terabytes, and petabytes of information to the same drive, one or more individual bytes can get corrupted or lost.
Not every service provider even offers a durability rating as it can be difficult to measure and guarantee. A more important question to ask your cloud hosting provider is about how they are protecting against data loss generally. What technologies are in play? What are your odds of recovering data? How can you tie in backup?
For object storage, which is designed around storing massive quantities of files, especially media-rich files like documents, images, and video, durability becomes especially important. Once you reach the petabytes, dropping even a single nine of durability, say from 99.999999999% to 99.99999999% might mean losing 90 or 200 extra files in the case of data loss.
One method of fighting byte loss is erasure coding. When a file is copied to cloud storage, erasure coding splits it up and adds an extra piece of the file that is a duplicate. This means that when a single file is lost, it can be reconstructed from the pieces spread across the entire storage area.
So instead of worrying about the number of nines, which is hard to prove anyway, ask if erasure coding or another backup method is available to ensure the availability of your data at all times.
Erasure coding may not be available for all forms of cloud storage, however. Deduplication is another way that copies can be kept without storing two complete duplicate versions of every file for every backup. The system only copies newly changed files to your backup, keeping the storage footprint down. If the backup is corrupted, it can not be reconstructed, unlike erasure coding, but a deduped backup of block or file storage is a good way to hedge your bets against data loss.
A full copy is also faster to restore than one rebuilt from erasure-coded storage.
When planning your cloud storage, the vital questions become “What type of storage is best suited for my environment?” and “Can this data stand reduced durability?” Critical business data that you need for daily operations should absolutely have a full backup and preferably multiple backups in geographically separated data centers.
If you have lower file volume, in the gigabytes or a few terabytes, durability is much less important, as losing a few bytes and corrupted files will be much smaller proportionally compared to a petabyte or exabyte environment.