The Guru College
Replicated, shared-nothing file systems
I’m starting to get very tired of Sun/Oracle and their long term stance on Solaris and ZFS. While I understand they own the technology (which they invented and implemented), I bought into OpenSolaris with the idea that it was… well, Open. It’s feeling less and less open now, especially with the shutdown of the community for all intents and purposes, and the changing of “OpenSolaris” into “Solaris 11 Express”.
Thankfully my file server is rocking along, and I’m backing up everything with CrashPlan, just in case, but it’s time to start looking at a new storage infrastructure. Learning from my current lesson of a single, expensive, monolithic storage server, I think it’s time to expand horizontally with low cost GigE storage nodes, and build a distributed environment from that. The requirements are:
- It has to be truly open
- All data is replicated in real time between nodes
- Upgrades are seamless
- Good diagnostics and self healing
I’m honestly not sure this is possible. I’d also like to move in the direction of Shared Nothing – so I don’t have to invest in expensive quorum devices, or fenced/multi-homed storage. I’d like to be able to build nodes on the integrated Intel Atom boards, so the motherboard, CPU, power supply and RAM would cost about $160/node, leaving most of the rest of the budget for hard drive purchases. I’d also like the ability to boot from the network vie PXE, or failing that, from a USB drive or CF adapter. I do have 8 drives to re-use from my current OpenSolaris servers, but they don’t come out to play until I have at least 3.5 TB of useable disk space where I cam move everything to, and test for a little while.
There is a final requirement – the storage must be natively accessible from Mac OS X Lion. My definition for native in this case is to provide a filesystem that Aperture thinks is local enough to load an Aperture Library from. (It’s very picky, and refuses to use NFS, SMB or even Apple’s own AFP). So, this could be iSCSI, but it could also be MacFUSE, depending on how well that works, or another technology that has tapped into the VFS layer of Mac OS X.
I’m currently looking into GlusterFS, ceph and the filesystem side of Hadoop. They seem the best suited to my needs, in terms of technical architecture, but I’m not sure if they are going to work for my needs. The other contender, which I know will work, but doesn’t yet support replicated read/write volumes is OpenAFS. I use it at work, and I know it will work, but without replicated storage, it’s a non-starter for me.