Ramin's blog (ramin at ramin dot net) Your Ad Here

Sunday, January 25, 2009

Elements of a monitoring system



A good monitoring system is hard to find. There are plenty of tools/scripts/applications that provide a solution for a narrow use case. For example, someone wants to have a way to poll a service to see when it is not available, so they write a script to do that. There are hardly any systems that provide a good end-to-end solution for monitoring. This is the primary reason why we chose to develop our own monitoring system inside Yahoo. It was the only way we could provide a solution that was flexible enough to fit the diverse usage in Yahoo, while designed to leverage the way Yahoo service engineers operated. Scalability is a big factor for Yahoo and none of the existing solutions address scalability to the extend that satisfies Yahoo.

What are the various elements of a complete solution for monitoring?
They are (in no particular order):


* Data Collection

* Status Tracking

* Alert Generation

* Storage

* Configuration Management

* User Interface



You can argue the list is too short (or too long), but the purpose of the list is to capture the main areas (the main elements). Each area can be broken down to sub-areas. I'll cover each area in a little more depth to provide some clarity. Keep in mind that this topic can have so much details that a book can be devoted to it!

About Me

Ramin Naimi
I have over 18 years of experience in various high-tech industries. I am currently leading the Web Infrastructure team in TinyPrints, a small company that is revolutionizing the Greeting Card business. In recent past, I had managed Yahoo’s monitoring infrastructure group (part of platform engineering group). We developed and operated Yahoo’s internal monitoring and operational metrics collection systems. I have a wide range of experience from client side development to distributed servers.
View my complete profile