Scheduler Monitoring Done Right – Obsidian Scheduler

One of the most important features of Obsidian is the ability to be notified of application events and also quickly locate and correct any issues that arise. While products like Quartz and cron4j give us the basic scheduling, we have to develop our own solutions for monitoring and notification of events. Unfortunately, in many corporate environments, developers are very limited in how much time they can work on infrastructure pieces and things like error handling and monitoring end up getting far too little focus. When you are developing in-house or revenue-generating software, managers just want the product out the door, and aren’t worried about maintenance implications until far too late.

With these realities in mind, we’ve built in 3 levels of functionality to facilitate monitoring. Based on decades of software and operations experience, we’ve identified the primary areas where developers and support teams spend their time, and the ways that we could reduce costs so that using our scheduler would be a huge gain for our clients.

Level One – Logging Console

The first problem we identified is many applications rely on text logs generated by frameworks like log4j to troubleshoot and locate issues. The problem is, these logs can be very verbose, especially over time, and if not correctly configured, the information you need to troubleshoot an issue just may not be there. It is also a major pain to find what you are looking for within these logs. Add in issues with access control for production environments and delays from change requests, and you have a major headache on your hands.

To alleviate this problem, we developed a dashboard console within our scheduler web application. This dashboard has records of dozens of types of application events. This dashboard can be made accessible to read-only users, and can even be used by support teams with little software experience to see issues when they arise.Â Events can be filtered by type, time, severity and the contents of the message. Any time an event worth logging occurs, it is put into the dashboard history and can then be navigated from the dashboard console.

The real beauty of this approach is that the information is available at your fingertips and can easily be located again and again with a few simple clicks. All events are logged so there is no chance of missing information because it is at too fine a level, and you can easily configure our default job to clean old dashboard information. It also avoids the issue of granting production access to support teams, or having to contact administrators to provide logs.

Level Two – Notifications

Another big issue of system monitoring and maintenance is knowing immediately when things go wrong, or when unexpected things happen. We knew we needed some kind of notification support, but we had to carefully consider the best way to provide this to our users so they have the flexibility and simplicity they require to maintain their applications inexpensively.

Since we had such a clean and detailed system to log application events, we leveraged this to provide event notifications. Every event may be tied to a specific entity – this may be a scheduled job, a job configuration, etc. We also have severity for every event which range from Trace to Fatal (inspired by logging levels).

The natural way to approach notification was to allow users to subscribe to a certain severity level for events of a certain type. Not only can they be alerted when errors occur, but they can also subscribe to informational events or warnings.Â Users can also target a specific entity to limit when they are notified – for example, they may wish to know when any job fails, or when a specific job fails. We outlined every type of subscribable event and exposed a clean and simple interface within our administration console so that users could quickly change their event subscriptions, and even temporarily disable them.

We include email notification support out of the box currently and this also gives virtually all mobile users access to SMS notifications as well – see Wikipedia’s list of SMS email gateways to see the address to use for your carrier.

Level Three – Full and Complete Application Logging

While we log just about any event of any relevance within Obsidian, we also know that organizations have current standards and infrastructure for logging in place. In response to this, we use log4j to log all events logged to the dashboard, plus additional diagnostic information. Since we use the well-established log4j framework, you have access to all its appender types including text appenders, network appenders and JDBC appenders. In this way, developers can have customize Obsidian’s logging to fit their needs. We still believe our notification and event logging approach is tough to beat, but we want to make life as easy for our customers as possible. This approach also makes it easy for our clients to embed Obsidian within their applications while keeping logging consistent and simple.

If you have any questions about our experience or would like to suggest a feature for Obsidian’s monitoring and logging, leave a comment.