Okay, I’ve been heads down on 1.1.3/1.2 and ignoring the list, but I hear talk that folks are looking for a “roadmap”. This is something I sent around internally, and it is subject to heavy changes, but I thought I’d share it for what it is worth.
Here is a draft of the roadmap for OpenNMS, represented by pairs of version numbers reflecting the development/production cycle for each release.
Note that this list is by no means exhaustive, and represents the main points I want to cover.
Version: 1.1/1.2
Timeframe: As soon as possible
- This is almost complete. All the features are basically in, but many things are broken.
I have been spending time in Bugzilla cleaning things up so that I can set up a “blocker”
bug that will track when we can release 1.1.3.
(Bug 804) - Installer: We need to fix the installer to handle the install more cleanly.
- Documentation: The documentation needs a lot of work. I want to sequester myself for a week
and fix this.
In the “would be nice” category:
- Performance enhancements to data collection
- Support for the latest Java: 1.4.2, as well as IBM’s SDK.
- OpenNMS on a disk: Using White Dwarf, support OpenNMS as a single CD iso.
Version: 1.3/1.4
Timeframe: release stable (1.4) in December
- Internationalization. By having versions of OpenNMS available worldwide, not only will we see increased adoption, but since people overseas tend to have a more open attitude toward open source it is important to allow them to use OpenNMS in their native tongue.
- Path Outage: The idea here is to implement a method of detecting network outages (at the beginning limited to layer 3) and handling sympathetic events to limit the number of notifications that get sent. For example, if we are monitoring ten servers on the other side of a router, and the router goes down, we will get at least 11 events: 10 nodeDown events from the servers and one from the router. Instead, we should get a single “path outage” event that the router is down.
- Nessus Integration: Network management is often defined by FCAPS: Fault, Configuration,
Accounting, Performance and Security. We currently handle Fault and Performance, and I would like to add Security. Nessus is a vulnerability scanner. Much like our capsd process, vulscand scans the network, not for services, but for security holes. Much of the groundwork has been done on this, it just needs to be completed. - Java: OpenNMS is mainly a Java product that requires some external code to handle various tasks. We need to make OpenNMS as pure Java as is possible. Not only will that provide performance and code management benefits, it will look more professional to the open source world and will result in greater consideration of the product.
- webUI Redesign: Leveraging the design team at Blast, we need to look into improving the webUI.
- Installer improvements: Create a professional looking installer that can help new users get OpenNMS installed painlessly.
- webUI configuration: Increase the amount of configuration done via the webUI to reduce the reliance on hand editing of XML files.
- Availability report improvements: OpenNMS currently has poor availablility reporting. Since availability is based on the outages table in the database, we could use something like Crystal Reports to really improve on how the outage information is displayed. I believe there are open source Java alternatives to Crystal Reports that we can leverage.
- User-based views and security: The ability to segment what part of the network can be seen by an OpenNMS user, and the ability to customize, for that user, how information is presented.
- Top Ten: We also need to look to the community and take their top ten enchancement requests into consideration as possible features.
Version: 1.9/2.0
Timeframe: 1.9.0 available by the end of the year.
The idea of OpenNMS 2.0 is to provide a truly scalable management platform where resources can be added and removed from the system as needed. This also includes the ability for third parties to plug their technology into the framework, as well as redundancy and failover.
Here are some examples:
A large corporation needs to monitor a large number of devices that are spread out worldwide.
For remote offices they have instances of OpenNMS that do locate polling and performance
measurement and send that information to the center Network Operations Center (NOC).
At headquarters there are a large number of servers to be monitored, so locally there are a number of pollers to help manage the load. If a poller starts to get behind, OpenNMS can instruct another system to take part of the load. A similar thing will happen if a poller should fail.
The architecture should support “n” tiers, meaning that I can consolidate information at multiple levels. So countries can consolidate their sites and roll that up into a region, regions can be consolidated into, say, continents, and continents can be consolidated into a single world view.
In addition, there should be a set API for integrating other vendors products into OpenNMS. Thus as OpenNMS reaches out for world domination, companies such as IBM and Cisco can easily use OpenNMS instead of their current platform.
Finally, the system should be self healing.
Hey You old timers? Remember me?!!!
I went digging in the dist source.. Still trying to ascertain a flow of sorts. Some of my initial observations:
TONs of Threads! (There’s got to be a better way!)
Has anyone took a good look at SEDA? http://sourceforge.net/projects/seda
This stuff runs rings around the old thread per socket paradigm… In fact, I heard through the Grapevine that somebody was porting JBoss to utilize SEDA.
Another MAJOR thought I had was concerning the integration of OpenNMS services in with the Globus Grid Toolkit. This would give you a secure way of doing a plug and play of services / pollers/ UIs that would scale much larger than anything in the commercial realm. Check out http://www.globus.org Imagine OpenNMS running on 500 Linux boxes!
SNMPV3 needs to be supported…
Another thing I noticed was a single OID per GETNEXT… Alot of Agents will take a GETNEXT with multiple OIDs. This may increase speed significantly.
On the SNMPV2 GetBulk, you kinda have to guess how big the bulk range is. I did not see a function to take these table sizes and store them for subsequent polls to enable some level of efficiency regarding the guesstimation… (BTW – Theres an RFC published by the Network Management Research Group that proposes a GETBULK that is Table focused…. No need to guess any more…)
Have you considered doing a Metadata DB and distributing RRD data? This could be an “embedded” sort of thing where everything gets frontended by daemons that use the Metadata DB… BerkeleyDB-XML comes to mind!!! VERY FAAAAAAAST!!!!! Probably handle > 50K inserts / sec on Linux…
Path outage is still a guess for me… I went looking for the schema to the DB… Maybe its there and I missed it… I need to see whats discovered. Is all the data there?
Tables I’d need:
system
ifEntry
ipNetToMediaEntry
ipAddrEntry
Routers
atTableEntry
OSPF Area Tables
BGP Peering Tables
Switches
Bbridge-mib fdbtables.
Have you considered a SOAP interface into Events?
Theres a significant amount of real time event data missing… SYSLOG… Cisco outputs ~ 7500 different message types and it is significantly more verbose than Traps.
I went looking for the mapping… No can find. Reason I went looking is that I’m doing 3D VR mapping of the network @ Work… But I kinda exchanged the normal maps pair-o-dime to 50 cent… 😉 Most of my maps are < 50K in size… I currently run with a browser plugin but I have a Java class thats OpenSores that can display the same thing and still give me a EAI compliant server connection.
What about WebStart? Viable or no?
On the Vendors API thingy… Alot of Vendors use cookie /or cert based authentication schemes. Probably need to take a looksee at how to proxy these “under the covers” so that a single user signon be made possible. Like sign on once and get access to ARSWeb, CW2K, and maybe even WebTop…
Self Healing… I think a start to this is a Control port… I have some compadres that use an open JAVA based control port mechanism. It gives you the ability to “see” what objects are instantiated, what threads are spawned, etc. via a CTRL port. Another guy I know is using a Process manager to check the Control ports to make sure stuff is doing stuff. (Lemme go find out about that…)
Dougie!!!
I noticed you’re still using Bugzilla. Nice app, if you don’t mind using a brain-dead perl CGI that requires you to do manual DB schema updates every time you upgrade 😉
I’ve been using JIRA: http://www.atlassian.com/ – its free for OSS so you guys at OpenNMS should love it. Also, its Java. Its one of the best Java apps I’ve seen. It also has one of the best interfaces of any web app I’ve seen.
Check it out, even if it’s just for the experience – and I fully recommend you give it a try for your bug tracking system! Its really smart, and has a bugzilla import system too! 🙂
Man, I should sell Atlassian’s gear professionally and get paid for it … sadly, I’m just a happy user 🙂