The Guru College

Flip Mino

We have been moaning for awhile that we were missing memories of the little Q man for lack of a video capture device in the household. Last Friday, we went out and got a Flip Mino. The standard one – not the HD uber-ness. Maybe in a year or two, if we really use it, we’ll consider upgrading. It’s got the freatures we need – it’s simple, it’s durable, it fits in your pocket, and it doesn’t have anything to lose.

I’ll be honest – the video quality is nothing to write home about. Here’s a clip of Q in the car on a bright day, so there’s plenty of light but also a lot of movement. So, it will work, and we’re going to keep it, and keep using it. But we know it’s not a fine art kind of thing. Hopefully before he gets into things like school plays, the RED cameras will be cheaper, or Nikon will make a DSLR with enough ISO sensitivity and low enough price to be reasonable to purchase.

ZFS Dedup – wait, what?

As it stands, there is actually no reason to turn on dedup (apart from possibly less I/O) as you can’t actually use the space you save.

In short, they created the dedup engine and did all the work to make dedup possible in the filesystem, but never updated the dataset model to allow people to use the blocks freed by dedup’ing? Wait, what? They are aware of the problem, and I assume they will fix it, but still. This is a little crazy.

Schumpeter’s Gale

The opening up of new markets and the organizational development from the craft shop and factory to such concerns as US Steel illustrate the process of industrial mutation that incessantly revolutionizes the economic structure from within, incessantly destroying the old one, incessantly creating a new one … [The process] must be seen in its role in the perennial gale of creative destruction; it cannot be understood on the hypothesis that there is a perennial lull.

—Joseph Schumpeter, The Process of Creative Destruction, 1942

The process of creative destruction is an important one. A classic example is the death of the Polaroid Instamatic film camera once digital cameras became popular. The increasing marketplace irrelevance of IBM when Microsoft released Windows, and the turning of the tide against Microsoft towards Google. The lesson to be learned, on every level – expand your talents and skills, as one day you have to stop doing things the way you currently do them. If you don’t, you will wake up one morning in the near future, and realize that you are obsolete.

Moreso than other industries, this is true in IT. Things are still moving fast enough that the market is very different every few months. If you fall asleep at the wheel, and do the thing you’ve been doing for 5 years, you’ll wake up to find someone else is doing it faster, cheaper and much much better. The best large scale example of this that comes to mind is email. For years, companies, universities and governments have been buying and maintaining their own email systems. Nobody really thought about the economies of scale – even those who were in the market providing webmail (Hotmail comes to mind). They never figured out the two most important features of email – users need more space and interfaces suck. Before Google arrived on the scene the average webmail provider handed out between 5 and 50 MB of disk space per mailbox. Universities offered between 5 and 500 MB, usually tending towards the lower numbers. Usability was a nightmare – left to the desktop client for most things, and the painful webmail interfaces otherwise.

Google changed all of that. When they launched GMail, they announced that everyone would have a gig of storage – and the world thought it was an April Fool’s joke due to unfortunate timing. They had keyboard shortcuts that made sense for email reading, and most importantly, the system felt like it was used by someone who really understood email. Today, 5 years later, GMail gives out 7.5GB of space without blinking an eye. It’s integrated into their jabber server (GTalk) and their web-based Office Suite (Google Docs). They have single sign-on between all their related services. What makes it even crazier is what they charge for it. If you are only getting 50 or 500 accounts, it’s around $60/user/year (including 10 years of legal-compliant archives). When you scale out to a University system, or the City of Los Angeles, the cost drops rapidly. The numbers I’ve heard are around $14/user/year.

Any mail server administrator that runs Exchange, GroupWise, or Lotus Notes needs to be paying attention. It’s getting to the point where if you support less than 4000 users, you are no longer cost effective. It will be cheaper to replace you, the servers you run, and your expensive benefits (401K, health insurance, etc) with the faceless and unsleeping Google Apps administration. Another member of the organization will spend a few weeks scripting the Active Directory environment to auto-provision email accounts from Google when new people show up (and, hopefully, disable accounts as people leave). The lowly mail server administrator’s career is going to be over in a few years.

Now is the time to learn other skills. Most good administrators aren’t good because they know one product very well. They are usually good because they know a whole range of products and skills very well. They look at the Google Apps as a welcome thing. Google is offering to take a tedious, boring and repetitive task off their hands, and free them up to work on the next fun project. Control freaks and lesser administrators fear Google, because it’s going to replace their only skill set. Schumpeter’s perennial gale of creative destruction is blowing.

ZFS Dedup

It’s made the putback, and ZFS dedupe is on it’s way to us. The scheduled build for putback is b128 – which should hit our grubby little mitts in about a month’s time. Using zdb -S, I was able to pull a block signature list for my root pool (rpool) on the secondary fileserver, thor. There are currently 473,912 unique non-zero blocks in the pool, of which 376,823 are unique. This suggests I can dedup more than 97,000 blocks – or just over %20 of the blocks in the pool. I didn’t count block sizes directly, but the average blocksize on the rpool is just over 32K. That would be close to 3GB saved on a 15GB pool. This piqued my interest. I ran the same block signature list generator from the zpool I had been sending Time Machine backups to. Total: 3.43 million blocks. After dedup’ing out duplicate checksums – 2.39 million. That’s a little over %30 of the blocks that could be dedup’ed. Here’s the kicker – the average blocksize for the pool is 124K. That’s over 120 GB of duplicate blocks, using my fuzzy math.

Thanks to Jeff Bonwick and all the others at Sun who have made this happen. The only other distantly promised feature I wait for now is the ability to add and remove top-level vdevs from raidz/raidz2 pools.

Photo Management

Something occurred to me last night, as I moved yet another Aperture project over to my ZFS share: the ability to arbitrarily relocate projects to different storage platforms essentially gives me a manual version of the buzzword from yesteryear, Information Lifecycle Management (ILM). In essence, old data (projects) can be moved to slower disks, where speed isn’t as critical, and active data can be kept on the faster disks. This way, the project you completed last year that you haven’t touched in months won’t be filling up the platters of your high speed drives.

It does suck that the granularity level for Aperture relocation is projects, not at the file level or even the album level. Perhaps Aperture 3.0 (or Aperture X, if the rumors are to be believed) will include that? While we’re dreaming, it would be really nice to be able to tell Aperture that it could use a number of disks or locations, and to move files around based on usage patterns, and implement a real automatic policy. For example, a preference for moving anything not viewed in a month to the slowest storage, anything used in the last month but not in the last 48 hours goes on your regular disks, and files used in the last 48 hours get moved to the SSD or 15K SAS drive you have for Photoshop scratch space. I know most users don’t have 15K drives in their machines, or even performance SSDs yet, but it would be a way for Apple to raise the bar on Lightroom. It would also allow photographers to free up swap space for Photoshop and other image manipulation programs, which would also help the appeal of Aperture to the professional photographer.

Back in the real world, I’m considering what it would take to setup an iSCSI volume on a zpool made of a mirror of fast disks current projects on, and move them to a regular raidz backed CIFS share once they are no longer active. About $200 would get me a fast 320GB pool for active projects, accelerated with an SSD and with mirroring for failure protection. Not a bad deal. Makes me understand the appeal of the Hybrid Storage Pool a little better.

ThinkTank Camera Bag

As a birthday present, my wife got me a new camera bag – the one I mentioned a few days ago – the ThinkTank Digital Holster 40. As I was unwrapping it, I was a little nervous, as it didn’t seem all that big. But I’ve taken it out for two long walks now – attached to the belt of the Ergo baby carrier. With the boy on my back, the dog in my hand, and the camera on my hip it all worked out very well. I managed a signifiant percentage of crisp 200mm shots, and after walking the 2 mile ‘long loop’, nothing hurt. In the past, a two mile walk with the camera+70-200 hanging around my neck would have been a recipe for neck and shoulder pain. The only problem with the walks was that by the end of each there wasn’t enough light to shoot lower than ISO 800 – and ISO 800 looks pretty bad when you’re zoomed out to 200mm.

My only complaint about the bag is that the rain fly is pretty bulky, and sandwiches the camera in a little too tightly for my tastes. Handily, it’s removable. I’m going to have to play with it a little to see if I can make it work though, as the rain fly will be very useful to have, especially when hiking or camping.

Aperture versus Lightroom

I’ve begin the horrid task of relocating all of my master images from Aperture to a ZFS CIFS share. It’s slow and painful, but it needs to happen, as it’s one of the steps in the chain to make my photo management a little more sane. Right now, my Aperture library has grown to 550 GB. This is with all of the master images, plus all edits, databases of metadata, tags, and multiple revisions of files. It’s impossible to backup, and it’s chewing up far too much disk space when viewed as a monolithic block.

By relocating the master files to a network share, they become easier to access and therefore easier to backup. The image files will be separated from the Aperture data, making each easier to maintain and manage. I can also use other image manipulation programs on the files – possibly even setting up a photo sharing web application, hosted from the file server itself. The other thing this will allow to me to is to evaluate Adobe Lightroom. I love Aperture, and it’s not missing any features per se, but it’s been a long time since it’s been upgraded. Not nearly as bad as the 10 years that Hypercard sat as an inactive product on Apple’s website, but still. I want to know that I have options. This won’t preserve my edits, but at least I’ll be able to get at the files themselves.

The other issue: ZFS is getting deduplication, soon. Hopefully this year. Which means by moving the image files to a ZFS share, I can take advantage of block level deduplication for my image files. This may not get me terribly much with different files, but I know I have a lot of copies of the same images. It would be excruciating to try to get rid of duplicates by hand. ZFS will make this a whole lot less bad.

Newer Posts | Home | Older Posts