The Guru College
Graphite and Nagios
I do a lot with Nagios and the data it generates. I use it to establish baselines for application performance and network traffic, as well as to help predict future needs for hardware expansion (or reduction, as the case may be). There’s nothing like being able to turn off a service because nobody uses it, even when a small but vocal set of users insist it’s vital.
“Never mess with the people who hold the data. You won’t win.”
To translate data into a useable form, and do it in a way that management can easily explain the data to other parts of management on campus, you need a graphing engine. Today, we’re going to talk about Graphite.
I came across Graphite shortly before it was linked out of the SysAdmin Advent Calendar, and have fallen in love with it. It takes time-series data and stores it away for future use and cross referencing. The network protocol it uses is very simple, and can be implemented in less than a dozen lines of straightforward perl or python. Graphite itself is a django webapp, a network listener/relay/caching engine called carbon, and a round-robin style database backend. Carbon is the data management engine that writes incoming data to disk and allows data to be read back off the disk in a fairly efficient manner. You can manage data redundancy with carbon (sending data streams to multiple storage server), adjust retention granularity, and manage disk subsystem utilization rates. Whisper, on the other hand, is the data storage framework that carbon uses, to overcome some limitations of the venerable (and amazing) RRDTool. The primary advantage from where I sit it that Whisper allows for data to be inserted out-of-order into a database file, which is a huge saving grace for someone like me who wants to backfill or merge databases.
One the features about this system is that it’s written almost entirely in python. This presents a challenge for our environment: packaging. For the same reason we don’t use CPAN for production systems, we also don’t use the Python equivalent of sudo python setup.py install. Not only do we need our servers to be easily replicated on the current version of the packages we use, they need to be installed in a way that lets us track what files belong to what package, as well as manage what dependancies are in play. When you have a python package that relies on MySQL being installed, for example, it needs to just work at install time. This limited our deployment of python related tools… until I found the build dist subcommands – in our case invoked by running _python setup.py bdistrpm. It nicely takes the tarball, creates a buildable RPM out of it, and leaves the SPEC file in the right place so it can be tweaked for our environment. This lets us tie it into our mockbuild system and load it into our many internal Yum repositories, which gives us easy, repeatable and harmonious package installation for python packages.
The next task is configuration. I’m not going to cover setting up Django, integrating it into apache, or doing the SSO work to make it run harmoniously with an internal authentication system. Those are covered ad nauseam on Stack Exchange, and if you aren’t tall enough to ride that ride, you really need to catch up.