The Guru College
A Python For The Web
I’ve been learning Python recently at work. It’s a decent language that has been well thought out, and is honestly better suited to large projects than Perl, my language of choice. I hate to say it, but it’s true. As Rami pointed out recently, no language is good at scale if there isn’t a modern, commercially viable computer game written in it. Can’t think of Python’s example? EVE Online is written in Stackless, a Python variant. Likewise, I can’t find a shipping game written in Perl. If someone can throw me an example in the comments, I’d appreciate it. (No – not an engine with Perl bindings, an engine written in Perl, that has been a commercial success.)
That aside, getting into Python means needing a project. I have a couple at work that I’m juggling, but I’ve also got copious amounts of free time at night, with a toddler and all. So, I’ve been working with the web framework django. It saves a lot of time, like Python, as it’s really well thought out. The deeper I delve into it, the more I like it. I’m going to take a stab writing a photoblog in it, and I may even port my consulting-project-nagios code to it, as it lends itself better to almost all of those functions. (So, look for sites on http://gurucollege.homeftp.net/ again soon!)
The model-view-template ethos is deeply ingrained, and this in turn fits the model of web design and raid development fairly well. The default template system is straightforward and well thought out, and if you don’t like it, it can be replaced with TGenshi or a number of other Python template engines. As limiting as it is to have the SQL structures force-fed to you, in a way it’s also really nice. You never worry about INSERT versus UPDATE. You just call .save()
. You don’t have to write or debug SQL joins – it’s just a matter of Model.objects.filter(key='var')
. And while it’s a pain in the rear, you can work with legacy database structures.
Python and django aren’t perfect, and I’m not going to try to convert the guys at work over to it yet. But there are a number of places where it seems to fit in.
Using Nagios Logs For Availability Calculation
I’m of the opinion that using Nagios log data for any kind of statical percentage data is a bad idea. My employer runs a distributed, load balanced Nagios system (that I architected, deployed and currently maintain), so I am in a position to have looked at this problem repeatedly. Nagios gets almost all of its data by polling a server to see if a given server is responding properly. To keep load down on servers, checks are usually run no more frequently than once every 15 minutes. Two things come to mind that must be overcome before you can assess Nagios log data properly: granularity of event data, and the fact that Nagios servers aren’t perfect.
First, granularity. Let’s take a hypothetical look at a service that is checked every 13 minutes by the Nagios polling servers (node1 and node2), which report events to the master notification nodes (master1 and master2):
`I’m of the opinion that using Nagios log data for any kind of statical percentage data is a bad idea. My employer runs a distributed, load balanced Nagios system (that I architected, deployed and currently maintain), so I am in a position to have looked at this problem repeatedly. Nagios gets almost all of its data by polling a server to see if a given server is responding properly. To keep load down on servers, checks are usually run no more frequently than once every 15 minutes. Two things come to mind that must be overcome before you can assess Nagios log data properly: granularity of event data, and the fact that Nagios servers aren’t perfect.
First, granularity. Let’s take a hypothetical look at a service that is checked every 13 minutes by the Nagios polling servers (node1 and node2), which report events to the master notification nodes (master1 and master2):
`
An outage just shy of 13 minutes is never reported, even though it is 8:00 AM. Many users know their service was down, but the official uptime report says that all is well and you are still at %100 availability. The next night, this happens:
``I’m of the opinion that using Nagios log data for any kind of statical percentage data is a bad idea. My employer runs a distributed, load balanced Nagios system (that I architected, deployed and currently maintain), so I am in a position to have looked at this problem repeatedly. Nagios gets almost all of its data by polling a server to see if a given server is responding properly. To keep load down on servers, checks are usually run no more frequently than once every 15 minutes. Two things come to mind that must be overcome before you can assess Nagios log data properly: granularity of event data, and the fact that Nagios servers aren’t perfect.
First, granularity. Let’s take a hypothetical look at a service that is checked every 13 minutes by the Nagios polling servers (node1 and node2), which report events to the master notification nodes (master1 and master2):
`I’m of the opinion that using Nagios log data for any kind of statical percentage data is a bad idea. My employer runs a distributed, load balanced Nagios system (that I architected, deployed and currently maintain), so I am in a position to have looked at this problem repeatedly. Nagios gets almost all of its data by polling a server to see if a given server is responding properly. To keep load down on servers, checks are usually run no more frequently than once every 15 minutes. Two things come to mind that must be overcome before you can assess Nagios log data properly: granularity of event data, and the fact that Nagios servers aren’t perfect.
First, granularity. Let’s take a hypothetical look at a service that is checked every 13 minutes by the Nagios polling servers (node1 and node2), which report events to the master notification nodes (master1 and master2):
`
An outage just shy of 13 minutes is never reported, even though it is 8:00 AM. Many users know their service was down, but the official uptime report says that all is well and you are still at %100 availability. The next night, this happens:
``
By failing enough times and tripping the HARD state, Nagios falls back into it’s regular checking routine. In this case, a 4 minute outage is reported as CRITICAL for 17 minutes. Yes, I fully acknowledge that this is a worst-case-scenerio example, but nothing really strange has to happen to make it fail in this manner. If the check interval is set to 30 minutes instead of 13, an outage of 30 minutes can be totally missed, or a 4 minute outage can be reported as a 34 minute outage. Going the other way and setting the check interval to 2 minutes is just as bad a solution, as we have over 5000 service checks that are executed on both poller nodes – a 2 minute interval would melt the face off our servers. Perhaps when we move to Merlin, this summer, this can be fine tuned.
The second problem comes from the fact that the Nagios system was designed for reliability and notifications, not for statistical accuracy. The system is designed in a way that outages don’t impact it’s ability to report on service problems in a timely way. Another way to phrase this: is an outage seen from master1 but not seen from master2 an outage? Unfortunately, the front end nodes don’t always have the same data. Recent events at work back this up:
- Recent switch maintenance made master2 have a very different view of the network than master1, as master2 was on the switch stack that the Networking group was working on. If you compared the log files, you will get very different numbers about host and service availability.
- A few weeks ago, master2 was moved between racks, and it was down for 45 minutes. For those 45 minutes, in the view of master2, no state changes took place.
- A few weeks before that, the NSCA daemon on master1 hung, and stopped accepting service check results for over an hour before it was caught and fixed. For that hour, master1 thought that no state changes took place.
That list goes on and on, pretty much forever. The nature of services is that they go down from time to time, including the monitoring services. In order to accurately correlate the data and calculate statical availability numbers, you’d have to keep track of every time something happened to the nagios servers, and adjust your results accordingly. Remember: it’s a different adjustment if a poller goes down, or if there is network separation, or if a firewall allows one poller node to check a remote host and the other poller node can’t communicate. The data about outages to do that kind of processing isn’t recorded by anyone where I work, and even if it was, it won’t be simple to translate that information into availability numbers. And again, Merlin may help make the numbers consistent by making state data a shared resource, but that is very much not a common use scenario.
I’m sure I could go on, but there’s just no way I can see to justify relying on anything other than the log files from the hosts providing a service to see when the service was up or down. When people can log in and access their data, the service is up. When they get errors or get denied entry, it’s down.
Photo Tips
Nothing much to say today, just a comment: pixiq.com posts some great content. Like the recent walk through of illuminating product shots, where the product is glass, and can be tricky to light.
Nokia Is Burning
Stephen Elop, CEO of Nokia, in an open letter to the company:
The battle of devices has now become a war of ecosystems, where ecosystems include not only the hardware and software of the device, but developers, applications, ecommerce, advertising, search, social applications, location-based services, unified communications and many other things. Our competitors aren’t taking our market share with devices; they are taking our market share with an entire ecosystem. This means we’re going to have to decide how we either build, catalyse or join an ecosystem.
I would honestly be unsurprised if 24 months from now there are three major smartphone vendors left: Apple at the middle to high end, Android everywhere, and Microsoft screwing about trying to figure out who they are in the mobile space. Apple will likely stay where they are with their focus on unparalleled user experience and unheard of profit margins; Android appears to be the de facto response to the iPhone, and it costs carriers very little in terms of short-term cash; and Microsoft, who can afford to pour billions of dollars down the drain getting back into the market with Windows Phone 7
Symbian, MeeGo, WebOS and a dozen other software platforms? I don’t see a long term future for them. They cost a lot to develop internally, they cost a lot to maintain, and they cost a lot to market. Android is a zero-cost solution, and I suspect we’ll see Android phones available in the feature phone segment in AT&T and Verizon stores soon. There is still some time for the carriers to suck profits out of the specialness of a “smartphone” – but I suspect in 24 months all phones will be smart. To some degree or another.
Other Camera Things
The new camera uses SD or SDHC memory cards; the old camera used Compact Flash. No biggie – the prices are similar when you get quality cards. The problem? My card reader that is nailed to my bookshelf doesn’t have a working SDHC slot. (Hey, don’t give me that look. It was the 3.25″ media bay from my Shuttle). The other reader I have is pretty flimsy, and sticks out of the computer down at toddler level. So, it’s time to get a new card reader.
Or is it? It may be a better route to get an EyeFi Pro X2. It’s an 8GB SDHC card that does an interesting form of geo-tagging based on the SkyHook wireless system, as well as being able to act as an 802.11n wifi client. So you can transfer images from the camera back to your computer from any public wifi network in the world. Well, the “back to your computer” is only true for very large values of “your computer”, but I have no doubt that I can force it to work.
The downside? A new media bay to nail to my desk is $18. An EyeFi Pro X2 is $150. So, we’ll see how long I can last before I break down and need a better mechanism for photo transport. There’s also the issue that I want a number of other camera-related gizmos before too long.
The next bit of kit that I’ll be trying to get my hands on is the battery grip for the D7000. $240-ish device allows me to buy an additional battery to load into the camera. It also gives me a portrait-orientation grip, and a bit more bulk. Which I really miss about my D200 and it’s battery grip. Also, the longer shooting life is nice – it means vacations are a little easier to pack for.
Finally, I want to spend more time with off-camera flash. Ironic as it may seem that I now have a camera that can take excellent pictures at ISO 5000 but I want to add a lot of light and shoot at ISO 100, it’s one of the things I want to do. I want to be able to take better portraits. Ones that actually have some feeling in them, not just the “hey, it’s time for a Christmas card” kind of photo. I want product photos to “pop”. To further this goal, I’ve registered to ride The Flash Bus when it’s in town in March. So it would be nice to get a pair of the Lumo Pro 160s to play with before March…
But I also have a family to play with, a job to do, and life to get on with. So all of these will wait. I guess it’s a new card reader, right?
Catching Up With Now
I’ve been rather out of sorts recently – work, home, snow, and car stuff going on. Finally catching up tonight it seems. The boy is in good health, the wife still tolerates me after all of my, well… being me. I’ve even gotten the office cleaned up! This is great! I’m almost caught up.
While catching up, I was digging through the paperwork that came with my new camera, and found an entry form for a $1,000 Nikon shopping spree.
Great! This may be my chance! Wait…
Nikon, it expired in March. Of 2009. Why is this shipping with a camera that was announced in October of 2010?
Whatever. I love the new camera. It can see better in the dark and take pictures faster than my old camera. But it’s buffer is smaller, the files are larger, and the RAW images can’t be processed in Adobe Photoshop CS4. Which means I’m running everything through a DNG converter… oh, wait. No I’m not. I’m just not using Photoshop anymore. Which is sad, because Photoshop really does have superior image editing abilities to Aperture.
D7000 in low light
The D7000 is worlds above my old camera when it comes to low-light photography. It focuses a little faster, shoots 1080P video, has a longer and less flashy neck strap, fires all my Speedlights remotely, and has a 100% viewfinder. It’s almost everything I wanted in a new camera. That is to say, it’s smaller and lighter than the D200, which means my hand wraps around the bottom a little bit, and my hand tremor catches up with me faster. It works with the $5 wireless IR remote rather than the $100 wired remote. I had an IR remote. In 2009. When I had my D70.
Not that I’m whinging. This camera is amazing. ISO 1800@f/1.4 and 1/160th of a second lets you take handheld shots in a bagel shop at 6:40 AM before the sun is even over the trees. And you can just keep shooting.
Or, you can shoot indoors with any light:
Or, you can shoot in the middle of the night. Well, with the light from a few cars driving by, and a single streetlight. (ISO 6400@f/1.8):
Newer Posts | Home | Older Posts