記得上個月也摸了一下 nagios ,但後來因忙碌而中斷的 Orz 這次就專心補齊了一下。簡單的說,若是在單機上安裝,則是自我檢測的方式,若透過 nagios-nrpe-server 則可以晉升為遠端監控。
僅需挑一檯機器當 Monitor,在上頭安裝 nagios3 環境 (nagios3 server),而在其他待監控的機上,安裝 nagios-nrpe-plugin 環境,並設置可以監控它的來源、要監控的指令。
待監控的 Servers:
由於有些資源還是要從 Server 自身監控,如 Disk space、 CPU Load 等,所以透過 nagios-nrpe-server 來提供遠端查詢方式
$ sudo apt-get install nagios-nrpe-server
$ sudo vim /etc/nagios/nrpe_local.cfg
allowed_hosts=127.0.0.1,MonitorServerIP
command[check_load]=/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20
command[check_all_disks]=/usr/lib/nagios/plugins/check_disk -w 15% -c 5%
$ sudo service nagios-nrpe-server restart
netstat -at |grep nrpe
tcp 0 0 *:nrpe *:* LISTEN
$ grep nrpe /etc/services
nrpe 5666/tcp # Nagios Remote Plugin Executor
自我連線測試,也可以在 Monitor Server 測試指定 Server IP :
$ telnet localhost 5666
Monitor Server:
$ sudo apt-get install nagios3 nagios-nrpe-plugin
別忘了帳密在 /etc/nagios3/htpasswd.users 設定。此外,在 /usr/lib/nagios/plugins/ 就有一堆可以用的 tools ,例如監控 Google 是否正常:
$ /usr/lib/nagios/plugins/check_http -H www.google.com
HTTP OK: HTTP/1.1 200 OK - 12316 bytes in 0.057 second response time |time=0.056623s;;;0.000000 size=12316B;;;0
接著,則是定義自己的服務跟機器:
$ sudo vim /etc/nagios3/conf.d/my-server.cfg
#define host
define host {
host_name db
alias db.xxxx.com
address db.xxxx.com
hostgroups ssh-servers,remote-servers,https-servers
use generic-host
}
define host {
host_name www
alias www.xxxx.com
address www.xxxx.com
hostgroups ssh-servers,remote-servers,http-servers,https-servers
use generic-host
}
# define hostgroup
define hostgroup {
hostgroup_name mysql-servers
alias MySQL DB Service
members db
}
define hostgroup {
hostgroup_name https-servers
alias HTTPS Service
members db
}
define hostgroup {
hostgroup_name remote-servers
alias Remote Server
members db
}
# define service checking
define service {
hostgroup_name https-servers
service_description HTTPS
check_command check-https!$HOSTADDRESS!443
use generic-service
notification_interval 0 ; set > 0 if you want to be renotified
}
define service {
hostgroup_name remote-servers
service_description Remote NRPE CPU Load
check_command check_nrpe_1arg!check_load
use generic-service
notification_interval 0
}
define service {
hostgroup_name remote-servers
service_description Remote NRPE Disk Space
check_command check_nrpe_1arg!check_all_disks
use generic-service
notification_interval 0
}
# define commands
define command{
command_name check-https
command_line /usr/lib/nagios/plugins/check_http -I $ARG1$ -p $ARG2$ -S
}
$ sudo service nagios3 restart
如此一來,到 http://MonitorServerIP/nagios3 登入後,就可以觀察現況啦。以上的偵測包括 http, https, ssh, cpu loading, disk space 等,如果想要加上 mysql db service 的情況,可以試試
check_mysql_health 這支,需要額外下載:
下載 check_mysql_health 和編譯:
$ cd /tmp
$ wget -qO- http://labs.consol.de/download/shinken-nagios-plugins/check_mysql_health-2.1.8.2.tar.gz | tar -xzvf -
$ cd check_mysql_health-2.1.8.2
$ ./configure
$ make
$ sudo cp /tmp/check_mysql_health-2.1.8.2/plugins-scripts/check_mysql_health /usr/lib/nagios/plugins/
接著,撰寫相關 mysql db service checking:
$ sudo vim /etc/nagios3/conf.d/my-server.cfg
define host {
host_name db
alias db.xxxx.com
address db.xxxx.com
hostgroups ssh-servers,mysql-servers
use generic-host
}
define hostgroup {
hostgroup_name mysql-servers
alias MySQL DB Service
members db
}
define service {
hostgroup_name mysql-servers
service_description MySQL Remote Connection
check_command check-mysql-db!$HOSTADDRESS
#check_command check_tcp!-H!$HOSTADDRESS$!-p!3306
use generic-service
notification_interval 0 ; set > 0 if you want to be renotified
}
define command{
command_name check-mysql-db
command_line /usr/lib/nagios/plugins/check_mysql_health --hostname $ARG1$ --username nagios --password nagiospassword --mode querycache-hitrate --warning 90 --critica 95
}
此外,別忘了建立帳號供 monitor server 連到 db server,在 db server 上建立 nagios 帳號:
mysql> GRANT usage ON *.* TO 'nagios'@'nagios_monitor_server' IDENTIFIED BY 'nagiospassword';