Pages

Thursday, January 10, 2013

Backups Using Amazon S3 and Glacier: A Clarification

Some days ago I published a post about the recent integration of Glacier as a new storage class in Amazon S3 and how it pages the way to new and interesting use cases, even for home users, despite being a service more geared towards enterprise users. The post was then kindly cited by Ted Forbes on the latest instalment (at the time of writing) of his excellent photography podcast, The Art Of Photography: Episode 118, Photo Storage with Amazon Glacier and S3. The podcast has surely driven a great deal of visits to my blog post, and I've received lots of emails with questions related to it.

Many of them asked whether I ever did, or why the post did not, consider other more user-friendly backup solutions in the cloud. In fact, most of these comments focussed on completely different kind of services, with a particular emphasis on services which enable easy and automatic backups of both entire computers, drives, or folders.

Now, it was never my intention to go into details of that kind of offering, and I won't do it know. But I do think that a followup to the original post is necessary to clarify a couple of things.

First of all, I want to stress the relevance of a fundamental assumption that I took for granted when choosing S3 and Glacier as a cold storage service for some files of mine: I want to offload files from my disks, assuming I'm done working with them and won't almost certainly need to access them in the medium term (if not in the foreseeable future). Ted made a great work in his podcast episode in explaining how Amazon S3 and Glacier can be used and in suggesting some interesting use cases. Ted certainly did a better job than I did in the original blog post in suggesting that Glacier is an interesting option to offload big files we don't use often to a reliable and affordable cloud storage service.

In fact, in my current workflow there's no room (nor will) for other kinds of strategies than offloading from my workstations, and I suspect many users out there have got similar workflows and issues (I guess photographers do). Some kind of content is very "bulky":  photographs and video footage can easily reach the tens of gigabytes per work session, if not more, and even an amateur photographer like me can easily overgrow its hard disk, no matter how big it is. Of course, I've always kept on expanding my disk pools at home to satisfy the always increasing need of space, but I'm certainly not willing to maintain unnecessary files on the internal hard disks of my machine beyond the amount of time strictly necessary to work on them. Once I'm done with them, I either back them up in my home storage appliance (if I foresee the need to have them quickly available) or I offload them.

That's the use case Glacier is great for! I'm not asking for anything more, nor anything less, than an affordable and reliable site to store them until I'll need them, should it ever happen.

To make a long story short, I agree there are lots of alternatives out there, each of them with its own features, strengths and shortcomings, and different level of complexity. Google Drive, for example, is just great to keep a relatively small amount of content organised and synchronised across a wide range of devices. CrashPlan offerings for home users are a great way to start easily backing up entire computers and drives. Zoolz have got a similar offerings, with distinct online and cold storage tiers.

Nevertheless, what I really don't like about some of this services is the fact that they sometimes charge depending on the number of users and/or computers you're backing up. I'm using many different devices and, because of my workflow, they're all still pretty easy to setup and contain pretty much the same data: I just keep locally the applications I need and the data I'm working on. Everything else is not kept in my the internal hard drives. This approach is very convenient because I never worry about the loss of a machine: I just need to install the OS and the applications which, of course, I always keep available. As an OS X user I don't even use Time Machine, because it's quicker (much quicker) to just reinstall the OS and the apps I need. Let alone synchronising tens of gigabytes over the internet. For me it's just non sense, I just need to work fast and to recover fast. But I recognise it's certainly appealing to lot of other users with different needs.

For that reason, in my workflow I really don't need nor want any client synchronising anything on the wire. I just load a bunch of data I'm working on on my workstations (a photo session, for example), back it up locally elsewhere (as you should always do with assets you need and cannot lose) and, when I'm finished with it, I offload it somewhere else and delete it from my drives.

That somewhere is currently Amazon S3 and Amazon Glacier: it's affordable, it's easy to use and no matter how many devices I'm working on, I can always grab my data if I need it.