2014 Open Source Monitoring Conference

This year I got to return to the Open Source Monitoring Conference hosted by Netways in Nürnberg, Germany.

Netways is one of the sponsors of the Icinga project, and for many years this conference was dedicated to Nagios. It is still pretty Nagios-centric, but now it is focused more on the forks of that project than the project itself. There were presentations on Naemon and Sensu as well as Icinga, and then there are the weirdos (non-check script oriented applications) such as Zabbix and OpenNMS.

I like this conference for a number of reasons. Mainly there really isn’t any other conference dedicated to monitoring, much less one focused on open source. This one brings together pretty much the whole gang. Plus, Netways has a lot of experience in hosting conferences, so it is a nice time: well organized, good food and lots of discussion.

My trip started off with an ominous text from American Airlines telling me that my flight from RDU to DFW was delayed. While flying through DFW is out of the way, it enables me to avoid Heathrow, which is worth the extra time and effort. On the way to the airport I was told my outbound flight was delayed to the point that I wouldn’t be able to make my connection, so I called the airline to ask about options.

With the acquisition by US Airways, I had the option to fly through CLT. That would cut off several hours of the trip and let me ride on an Airbus 330. American flies mainly Boeing equipment, so I was curious to see if the Airbus was any better.

As usual with flights to Europe, you leave late in the evening and arrive early in the morning. Ulf and I settled in for the flight and I was looking forward to meeting up with Ronny when we landed.

The trip was uneventful and we met up with Ronny and took the ICE train from the airport to Nürnberg. The conference is at the Holiday Inn hotel, and with nearly 300 of us there we kind of take over the place. I did think it was funny that on my first trip there the instructions on how to get to the hotel from the train station were not very direct. I found out the reason was that the most direct route takes you by the red light district and I guess they wanted us to avoid that, although I never felt unsafe wandering around the city.

We arrived mid-afternoon and checked in with Daniela to get our badges and other information. She is one of the people who work hard to make sure all attendees have a great time.

I managed to take a short nap and get settled in, and then we met up for dinner. The food at these events is really nice, and I’m always a fan of German beer.

I excused myself after the meal due in part to jet lag and in part due to the fact that I needed to finish my presentation, and I wanted to be ready for the first real day of the conference.

The conference was started by Bernd Erk, who is sort of the master of ceremonies.

He welcomed us and covered some housekeeping issues. The party that night was to be held at a place called Terminal 90, which is actually at the airport. Last time they tried to use buses, but it became pretty hard to organize, so this time they arranged for us to take public transportation via the U-Bahn. After the introduction we then broke into two tracks and I decided to stay to hear Kris Buytaert.

I’ve known Kris through his blog for years now, but this was the first time I got to see him in person. He is probably most famous in my circles for introducing the hashtag #monitoringsucks. Since I use OpenNMS I don’t really agree, but he does raise a number of issues that make monitoring difficult and some of the methods he uses to address them.

The rest of the day saw a number of good presentations. As this conference has a large number of Germans in attendance, a little less than half of the tracks are given in German, but there was also always an English language track at the same time.

One of my favorite talks from the first day was on MQTT, a protocol for monitoring the Internet of Things. It addresses how to deal with devices that might not always be on-line, and was demonstrated via software running on a Raspberry Pi. I especially liked the idea of a “last will and testament” which describes how the device should be treated if it goes offline. I’m certain we’ll be incorporating MQTT into OpenNMS in the future.

Ronny and I missed the subway trip to the restaurant because I discovered a bug in my presentation configuration and it took me a little while to correct it, but I managed to get it done and we just grabbed a taxi. Even though it was in the airport, it was a nice venue and we caught up with Kris and my friend Rihards Olups from Zabbix. I first met Rihards at this conference several years ago and he brought me a couple of presents from Lativa (he lives near Riga). I still have the magnet on my office door.

Ulf, however, wasn’t as pleased to meet them.

We had a lot of fun eating, drinking and talking. The food was good and the staff was attentive. Ulf was much happier with our waitress (so was Ronny):

Since I had to call it an early night because my presentation was the first one on Thursday, a lot of people didn’t. After the restaurant closed they moved to “Checkpoint Jenny” which was right across the street (and under my window) from the hotel. Some were up until 6am.

Needless to say, the crowds were a little lighter for my talk. I think it went well, but next year I might focus more on why you might want to move away from check scripts to something a little more scalable. I did a really cool demo (well, in my mind) about sending events into OpenNMS to monitor the status of scripts running on remote servers, but it probably was hard to understand from a Nagios point of view.

Both Rihards and Kris made it to my talk, and Rihards once again brought gifts. I got a lot of tasty Latvian candy (which is now in the office, my wife ordering me to get it out of the house so it won’t get eaten) as well as a bottle of Black Balsam, a liqueur local to the region.

Rihards spoke after lunch, and most people were mobile by then. I enjoyed his talk and was very impressed to learn that every version of the remote proxy ever written for Zabbix is still supported.

I had to head back to Frankfurt that evening so I could fly home on Friday (my father celebrated his 75th birthday and I didn’t want to miss it) but we did find time to get together for a beer before I left. It was cool to have people from so many different monitoring projects brought together through a love of open source.

Next year the conference is from 16-18 November. I plan to attend and I hope to spend more time in Germany that trip than I had available to me this one.

Shameless Promotion

Just a heads up that I have a couple of new websites that aren’t open source or OpenNMS related.

The first is tarus.io where I plan to put all of the geeky things that really don’t belong here, and because all of the cool kids seem to be getting .io addresses.

The second is forgottencocktails.com which is a blog where I’m trying to make all of the drinks in the seminal Vintage Spirits and Forgotten Cocktails book. You might notice a trend in the frequency of posts there versus here (grin).

I figure at least one of my three readers might be interested in such things, but not to worry as I’ll still be providing open source insight and reckless commentary here for your enjoyment.

Can a Service Outage be Fraud?

I’m in Germany for the always excellent Open Source Monitoring Conference (review coming) and I wanted to have data for my mobile phone. At the airport we stopped at a Relay store and bought an Ortel SIM card for 20 euros (well, €19.90). Since Ronny was with me I just let him activate the card (the process was mainly in German) and we got on the train to Nürnberg.

During the two hour trip I must have exhausted the small amount of default data that came with it, and thus began an odyssey that took over 24 hours to get resolved.

First we tried to go to the “Mein Ortel” site, but it was down.

Then, we downloaded the “Mein Ortel” app from Google Play. It loaded but we could never authenticate.

This lasted for hours.

After we had arrived at the hotel, we noticed that the website, at least, had become available. But at any point when we tried to purchase more time we’d get still another error.

They do have a customer service number, but they charge €0.49 per minute to use it. In desperation we called it but they had closed for the day, so there was no resolution to be had on the first night.

The next day we tried, unsuccessfully, to get the web site and the app to work. Finally Ronny called, was put on hold (!) and then told that they were having issues with their payment system. Why a total lack in the ability to accept payments would require so much time to determine that you would have to be put on hold is beyond me, but my guess is that Ortel just wanted to ratchet up a few more euros from me.

At lunch we went in search of another provider. We found a Base store that sold Ortel and Blau SIMs, but we were told that Blau may take up to 24 hours to activate. We then found a Vodafone store but they wanted €45 for a SIM. In the end, we decided to buy an Ortel voucher (the SIM was activated at least) for €15 and with the help of the lady at the Base store managed to get the credit applied, and I should have service for the reminder of my stay.

My question is: isn’t is fraud to take money for a service and then fail to deliver that service? I’m only here for three days and I was without data on my phone for more than a third of the trip, all due to the fact that Ortel can’t be bothered to implement network management.

I’m doubly surprised that this happened in Germany, since they tend to be more strict on these things than most countries.

Yeah, I know “first world problems”, but as someone who is in this country with nearly 300 other professionals to discuss monitoring it seems like Ortel could benefit from sending some people to this conference. As commercial network-services become even more prevalent and important, I do expect to see the implementation of fines for outages.

Anyway, if you are ever offered the option to get mobile service from Ortel, run the other way.

Net Neutrality and Enron

Yesterday, Senator Ted Cruz from Texas tweeted the following:

It was in response to President Obama making a statement in support of Net Neutrality by wanting to classify broadband Internet as a utility. Despite the fact that it was about six years too late, I had to roll my eyes because I knew that if Obama came out in support of something, the Republicans would feel required to take the opposite stance.

Treating broadband as a utility is a no-brainer. It is basically an extension of the telephone system which has done very well as a utility, and it has become so important to most people and businesses that creating barriers to access would be a huge step backward. The OpenNMS Group would not have been able to survive in a world where we would have to pay to compete for access at levels that HP and IBM can afford, and there are thousands of other small businesses and entrepreneurs in the same boat.

But Senator Cruz and others have received a large amount of money from cable companies, especially Comcast, who stand to benefit the most if they can charge different rates to different content providers. This isn’t an new argument, Jon Stewart discussed it on his show back in 2006:

But now with Obama’s stance and the newly minted Republican-controlled Congress wanting to flex its muscles, expect it to become a hotter topic.

I was made aware of this through The Oatmeal, and while Matt Inman is dead on as usual, his language and analogies are, hmm, shall we say, not often for gentle ears. So while he makes his point he is basically preaching to the choir, and we need to frame the discussion in something that may actually shame the Republicans into doing the right thing.

Then I remembered Enron.

If broadband is not a utility, but seems like one, what could happen if we put control into private hands? That’s exactly what California did in 1996 by partially deregulating its energy market. This let to an energy crisis in 2000 and 2001, that according to Wikipedia was “caused by market manipulations, illegal shutdowns of pipelines by the Texas energy consortium Enron, and capped retail electricity prices”.

It’s eerie that Comcast’s shutdown of Netflix traffic is so similar to “illegal shutdowns of pipelines”. It’s already happening.

So, when faced with irrational statements like those from Senator Cruz, remain calm and just point out “so you think we need an Enron of the Internet?”. Keep saying it, over and over again.

Perhaps they’ll get the message.

Test Driven Development

One of the things that bothers me a lot about the software industry is this idea that proprietary software is somehow safer and better written than open source software. Perhaps it is because a lot of people still view software as “magic” and since you can’t see the code, is must be more “magical”. Or perhaps is it because people assume that something you have to pay for must be better than something that is free.

I’ve worked for and with a number of proprietary software companies, so I’ve seen how the sausage is made, and in some cases you don’t want to know. Don’t get me wrong, I’ve seen well managed commercial software companies that produce solid code because in the long run solid code is better and costs less, but I’ve also seen the opposite done simply to get a product to market quickly.

With open source, at least if you expect contribution, you have to produce code that is readable. It also helps if it is well written since good programmers respect and like working with other good programmers. It’s out there for everyone to see, and that puts extra demands on its quality.

In the interest of making great code, many years ago we switched to the Spring framework which had the benefit that we could start writing software tests. This test driven development is one reason OpenNMS is able to stay so stable with lots of code changes and a small test team.

What’s funny is that we’ve talked to at least two other companies who started implementing test driven development but then dropped it because it was too hard. It wasn’t easy for us, either, but as of this writing we run 5496 tests every time something changes in the main OpenNMS application, and that doesn’t include all of the other branches and projects such as Newts. We use the Bamboo product from Atlassian to manage the tests so I want to take this opportunity to thank them for supporting us.

OpenNMS 14 contained some of the biggest code changes in the platform’s history but so far it has been one of the smoothest releases yet. While most of that was due to to the great team of developers we have, part of it was due to the transparency that the open source process encourages.

Commercial software could learn a thing or two from it.

OpenNMS 14 Timelines

I often talk about how OpenNMS is a platform and not just an application, and with the release of OpenNMS 14 there is a lovely way to demonstrate the difference.

There is a cool little GUI improvement that I believe was started at last year’s Dev Jam which provides graphical timeline for outages. So now instead of having to look at the outage table on a node’s page, you can just look at the service availability section.

Cool, huh? What you may not realize is that instead of hardcoding the feature the timelines are rendered through ReST. The GUI sends a ReST request to the server which returns the graphic information. Let’s examine the “Update” service above.

The query

/opennms/rest/timeline/image/46/172.20.1.38/Update/1415119622/1415206023/480

results in:

with a format of:

/opennms/rest/timeline/image/{nodeId}/{ipAddress}/{serviceName}/{start}/{end}/{width}

Even the header graphic is done the same way

/opennms/rest/timeline/header/1415119622/1415206023/480

results in:

with a format of:

/opennms/rest/timeline/header/{start}/{end}/{width}

Of course, assembling all of that can be tedious, so this query:

/opennms/rest/timeline/html/46/172.20.1.38/Update/1415119622/1415206023/480

with a format of:

/opennms/rest/timeline/html/{nodeId}/{ipAddress}/{serviceName}/{start}/{end}/{width}

will create the whole HTML code needed to render the timeline:

document.write('<img src="/opennms/rest/timeline/image/46/172.20.1.38/Update/1415119622/1415206023/480" usemap="#46-172.20.1.38-Update">
<map name="46-172.20.1.38-Update"><area shape="rect" coords="128,2,412,18" href="/opennms/outage/detail.htm?id=153740" alt="Id 153740"
title="2014-11-04 18:13:24.628"><area shape="rect" coords="-111,2,-26,18" href="/opennms/outage/detail.htm?id=153724" alt="Id 153724"
title="2014-11-04 06:12:56.322"><area shape="rect" coords="-2051,2,-1925,18" href="/opennms/outage/detail.htm?id=153348" alt="Id 153348"
title="2014-10-31 06:13:11.421"><area shape="rect" coords="-2291,2,-2291,18" href="/opennms/outage/detail.htm?id=153289" alt="Id 153289"
title="2014-10-30 18:11:33.006"><area shape="rect" coords="-2691,2,-2397,18" href="/opennms/outage/detail.htm?id=153258" alt="Id 153258"
title="2014-10-29 22:13:27.086"><area shape="rect" coords="-2871,2,-2871,18" href="/opennms/outage/detail.htm?id=153235" alt="Id 153235"
title="2014-10-29 13:12:29.747"><area shape="rect" coords="-3071,2,-2884,18" href="/opennms/outage/detail.htm?id=153137" alt="Id 153137"
title="2014-10-29 03:12:13.887"><area shape="rect" coords="-3232,2,-3231,18" href="/opennms/outage/detail.htm?id=153132" alt="Id 153132"
title="2014-10-28 19:11:02.873"><area shape="rect" coords="-3690,2,-3670,18" href="/opennms/outage/detail.htm?id=153086" alt="Id 153086"
title="2014-10-27 20:14:11.949"><area shape="rect" coords="-6431,2,-6431,18" href="/opennms/outage/detail.htm?id=152786" alt="Id 152786"
title="2014-10-22 03:11:05.149"></map>');

If a service isn’t monitored, such as the StrafePing service in the above example, that empty timeline is also available:

/opennms/rest/timeline/empty/1415119622/1415206023/480

with a format of:

/opennms/rest/timeline/empty/{start}/{end}/{width}

Pretty cool, huh? A lot of OpenNMS is accessible by ReST and the wiki page covers most of the options. Thus you can use the data via the OpenNMS GUI or integrate it with one of your own.

Announcing OpenNMS 14 and Newts 1.0

It is with great pleasure that I can announce the release of OpenNMS 14. Yup, you heard right, OpenNMS *fourteen*.

It’s been more than 12 years since OpenNMS 1.0 so we’ve decided to pull a Java and drop the “1.” from the version numbers. Also, we are doing away with stable and development branches. The Master branch has been replaced with the develop branch, which will be much more stable than development releases have been in the past, and we’ll name the next major stable release 15, followed by 16, etc. Do expect bug fix point releases as the in past, but the plan is to release more major releases per year than just one.

A good overview of all the new features in 14 can be found here:

https://github.com/OpenNMS/opennms/blob/release-14.0.0/WHATSNEW.md

The development team has been working almost non-stop over the last two months to make OpenNMS 14 the best and most tested version yet. A lot of things has been added, such as new topology and geographic maps, and some big things have been made better, such as linkd. Plus, oodles of little bugs have finally been closed making the whole release seem more polished and easier to use.

Today we also released Newts 1.0, the first release in a new time series data storage library. Published under the Apache License, this technology is built on Cassandra and is aimed at meeting Big Data and Internet of Things needs by providing fast, hugely scalable and redundant data storage. You can find out more about this technology here:

http://newts.io

While not yet integrated with OpenNMS, the 1.0 release is the first step in the process. Users will have the option to replace the JRobin/RRDtool storage strategies with Newts. Since Newts stores raw data, there will be a number of options for post-processing and graphing that data that I know a number of you will find useful. Whether your data needs are simple or complex, Newts represents a way to meet them.

Feel free to check out both projects. OpenNMS 14 should be in both the yum and apt repos, and as usual I welcome feedback as to what you think about it.