Wednesday, August 30, 2006

Monitoring server

Server for monitoring IT services is crucial for effective management of IT infrastructure and to guarantee SLA. Two years ago we developed something like that and in the end it was realy usefull piece of software. ZABBIX is very interesting open-source project for network and service monitoring. For further information check out http://www.zabbix.com

6 comments:

cooper said...

Cau Dave,
'we developed' - ty ses na tom Zabbixu podilel ? Uvazuju o prechodu z Nagiosu. Zabbix vypada velmi dobre, ale slysel jsem (na root.cz), ze ma problemy s vetsim poctem monitorovanych hostu/sluzeb. Jde mi o rozsah cca 100 hostu x 500 sluzeb. Co myslis ?

cooper said...

Sorry, I forgot to write in English. Shortly: I am considering migration from Nagios. But I have heard that Zabbix has problems with higher number of hosts / services monitored. I am thinking of 100 hosts and 500 services...

David Pasek said...

Hi Cooper. 'We developed' means that we worked together with Pechy on our proprietary systems. One was based on MRTG and second one (watchdog system) in perl. Watchdog system was really easy perl script and conf file with special external hook perl functions with defined interface. Anybody was able to write special test hook functions like ping, flood_ping, http_ok, etc. I didn't try Zabbix yet but what i red it looks really good.

Z Company said...

Problem se zabbixem je, ze ma strasne bugu, oni to snad ani netestnou nez to vypusti. S tim se do produkce jit neda.

Jakub Suchy said...

I think that the comment on Root.cz regarding Zabbix problems with more hosts was from me. We tried Zabbix and loved it, but immediatelly after starting to migrate all monitors to it, we suffered many problems, which led to a situation whose only solution was to switch back to Nagios. It was 1-1.5 years ago, so it may be fixed now, but I doubt. The problem was deep in Zabbix structure, it was something like:

Imagine you create a monitor, which does SOFT alert (like Nagios) after 5 minutes, another one after more 5 minutes and then HARD alert. This means 15 minutes before any alert is sent. But Zabbix didn't use real time timestamps for marking these times so if you configure many hosts (100+), it loads your server which means performing these tests is little slower. Then, Zabbix just does doesn't determine that anything is slower and you get HARD alert in 3 minutes instead of 15 minutes. Sorry to be this vague, but it's a little long time last time we tried it.

Jakub Suchy, Enlogit s.r.o.

David Pasek said...

I have never used zabbix in production so I haven't any real experience. I just red documentation and test it in my lab. On the other hand, Zabbix offers enterprise support so they should repaire bags and suggest solutions for enterprise deployment. By the way, I found another interesting open-source product what I want to test. Look at http://davidpasek.blogspot.com/2008/01/system-management-monitoring-server.html