20090206

ZFS at Home

The days are gone, when the average consumer only required reliable data storage at the office. Today, many people need reliable storage at home for everything ranging from digital media, to tax and business records, or massive data sets for research. In my case, I work mainly from home, and using the remote storage at my office is sometimes a slow and painful process. Also, my wife is in the middle of a PhD and she is constantly working with several versions of data sets that each span several hundreds of megabytes. Both of us, on the other hand, would love to have a Home Theatre PC (HTPC), to play audio, view our photos, and watch video from a central location.

That being said, reliability is not the solution to end all problems. Data can be lost by a simple overwrite, which is more human error than anything. Luckily, the clever folks at Sun Microsystems have taken human error into account and have designed a filesystem with versioning for each file. This allows the active version of a file to be 'rolled back' and also 'rolled forward'. The filesystem is called ZFS.

ZFS is a filesystem that works very similarly to Apple's Time Machine, although I'm not sure which came first. Unlike Time Machine, ZFS scales from 1 disk to many, but works best under the same conditions as RAID. Like RAID, ZFS will take advantage of having several physical hard disks pooled together, but unlike RAID, ZFS goes a step further and periodically takes a snapshot of your files. This positions ZFS and Sun Microsystems at the forefront of the market for data storage. Using ZFS, a snapshot only takes a fraction of a second for tens of gigabytes of data, and it doesn't necessarily increase the amount of disk space substantially, by clever use of file deltas.

Now, if Sun Microsystems, or any OEM reseller, really wants to make money, what they will do, is develop a 4-disk network attached storage solution using ZFS for the home. The network attached storage (NAS) system could be easily powered by Solaris, FreeBSD, and (hopefully soon) Linux. It could provide a nice web interface for management, with the familiar CIFS (windows file sharing) which works across all operating systems.

However, in order to really harness the buying power of the general consumer, the reseller should also provide a network API and tools so that home users could take advantage of the Time-Machine-like features of ZFS, but from the comfort of their own desktop or laptop. That means, having a windows-explorer-like application, where the user can easily navigate to their file of choice, right click, and have instant access to choose which version of the file they would like active.

Now, because there is the possibility for having several networked users accessing the same file, at the same time, but with different versions, it might make sense to have

a) simple file locks
* only one version is active, and in use, at one time by any user
b) or per-user versioning

I would suggest going with option 'a' for simplicity's sake. In the end, several users at home can back up their digital media and important documents at an easy-to-use and well integrated centralized location, without fear that they could lose their data, and also with the added benefit that they can use a copy of that data from any point in time.

Here is a small table containing typical use-cases for the general consumer.

File TypeImportance
Reliability?Versioning?
Media (Audio/Video)YesNo
Media (Photos)YesNo
Business DocumentsYesYes
Tax DocumentsYesYes

As you can see, versioning is quite important, but most people will accept nothing less than reliable storage. If we were to make a use case for people who work with multimedia, then versioning also becomes very important.

This clearly translates into sales for any company that can implement a ZFS NAS for the home user, including management software. It does go against the current trend of the big players to migrate toward cloud-computing, where all of the end-users' data is isolated somewhere on the remote network. Realistically though, many people like to have direct access to their data with very little delay. Several latency issues would arise if cloud-storage were to be used for all digital media, which also supports the business model for having ZFS at home.

If anyone feels that this idea has some merit, particularly from Sun or any OEM of NAS equipment, then please let me know - I'm open for business.

2 comments:

Edmond Cote said...

Couldn't you have just talked to me about this? I don't think that your idea makes too much business sense.

Not that I work on anything like this, but I get to talk to a lot of people at work and see where things are going. If you want, call me and we can chat about it sometime.

Unknown said...

There is the trend toward keeping data stored in "the cloud" .., which is probably the best argument against what I'm proposing, the second best argument is, that for windows users, the entire infrastructure of a versioned filesystem would have to be built in to Windows and I'm pretty sure that the windows filesystem backend is pretty awful. I personally wouldn't want to write the code for that, but I'm sure somebody would.