Monitoring – Shedding light on your infrastructure
In July 2010, I started working at 5AM Solutions (http://www.5amsolutions.com). One of the first tasks we conquered was to take a hard look the monitoring, and find the best monitoring system for the company. The process was what anyone would imagine from a software shop. Meetings were had, requirements were gathered, options were evaluated. Our process is outlined here, for better or worse.
In the beginning
5AM Solutions had the good fortune of growing quickly. As is often the case, the IT infrastructure wasn’t able to keep pace, despite the best efforts of a competent and dedicated team. The monitoring situation was a series of home-brewed scripts, some utilizing Hudson (http://jenkins-ci.org/) in very novel ways, to keep some basic health checks going on critical systems.
There were several issues with this approach that made it undesirable going forward.
· It would be extremely difficult, bordering on impossible to scale
· There wasn’t any historical data retention for analysis
· There wasn’t a way to get quick-hit visual representations of data
· There wasn’t a consistent UI or methodology
When the 5AM IT Group sat down to discuss the issues with the existing monitoring set up and come up with a more permanent solution, everyone walked away with a handful of requirements for the new platform.
· The application stack had to be agreeable to our desires within the group. Quite simply, we didn’t want an application written in Perl or Java or .Net. We are all Python fanboys, and while MySQL is fine, we hold PostgreSQL near and dear to our hearts.
· We didn’t have time to learn a whole new language just to run an application. Of course each app will have its unique syntax and workflows, but we wanted to be able to use standard tools (e.g. XML or SQL, etc.) to import and export data, configure and talk to an API.
· The solution had to have a short learning curve. The unofficial motto of 5AM IT is “do more with less”. We are committed to innovation and automation. Spending endless hours tweaking and supporting an application is not an option.
· We wanted an Open Source solution. 5AM releases as much of its source code as possible under variations of the GPL. The 5AM IT Group is also committed to FOSS, and contributes to multiple projects and organizations.
· We needed a monitoring application, not an entire helpdesk/inventory/monitoring/coffee solution. 5AM was already committed to the Atlassian software suite for Issue Tracking, and proper inventory management was to be considered further down the road. With that said, we also wanted a mature API, so we could automate and integrate whenever we had the opportunity.
· The application had to be flexible. More and more companies are adopting a hybrid approach to their infrastructure. IaaS, PaaS, and straight web-based applications make up an increasing share of things that a Systems Administrator has to keep any eye on with regards to performance and availability.
· There had to be an established community for the product we chose. While we want to innovate, we didn’t have time to spend hour after hour solving basic issues with an immature application.
Choosing a Solution
With those requirements in mind, we began looking over the options. Some candidates were immediately eliminated due to their cost or implementation needs. After some research and consideration we ended up with a short list of contenders that included Zabbix, Nagios, and Zenoss.
Zenoss looked to be a great fit, on the surface. Zenoss Core is an open source project written in Python, with a large following. It has a very slick user interface as well. Unfortunately, multiple members of the IT group had deployed or maintained Zenoss installations that suffered from a history of stability issues.
Also, the Zenoss “open core” business model, where it offers a free but limited or degraded version and a fully functional paid one, is not a truly open source solution in our eyes, and was wholly undesirable.
Nagios made its way on to the short list because of its venerable history and massive community within IT, especially with Linux administrators. We spent a long time scratching out head on both of those counts. However, their business model mirrors Zenoss.
The final point that eliminated Nagios is often considered its strongest selling point. It’s age. The application infrastructure for Nagios; what it defines as its mission and how it goes about it, it too limited and we were concerned that we would hit limitations due to this down the road. The face of technology changes every day, and none of us felt that Nagios had kept up nor did its community have a desire for that.
Zabbix was the next potential application to evaluate. Its feature list was impressive.
· The ability to run multiple types of distributed setups
· web-based checks
· agent-based checks
· SNMP/TCP/ICMP/UDP based passive checks
· simple installation
· mature XML-RPC API
· light network footprint
· database agnostic (able to run on SQLite, Oracle, MySQL, and PostgreSQL currently)
The Zabbix business model is akin to that of OpenNMS (http://www.opennms.org), another open source monitoring system that is designed to sole a different sort of problem. At its heart is a for-profit company, which steers and contributes to a fully open source community. The company generates revenue by providing professional services such as training, specialized deployments, support, etc. for the open source product.
The decision was made to go with Zabbix as our monitoring solution at 5AM.
In 2001, Alexei Vladishev decided that he didn’t like the monitoring tools that were available, and he was going to do something about it. The product he came up with was named Zabbix. Releasing the software under the GPL (currently released under GPL3), he began the Zabbix community.
That community now contains over 20,000 members. Downloads of the Zabbix software average approximately 500 daily. This doesn’t include the binary installations from repositories like EPEL and the Ubuntu universes.
In 2005, Alexei Vladishev launched Zabbix SIA in Riga, Latvia. This corporation steers the Zabbix project and also provides consulting, training, paid support and custom deployment services for Zabbix to customers. Zabbix SIA currently has partners and resellers on four continents.