2014年4月15日 星期二

[Linux] 使用 Nagios 和 nagios-nrpe-server 定期偵測系統狀況



記得上個月也摸了一下 nagios ,但後來因忙碌而中斷的 Orz 這次就專心補齊了一下。簡單的說,若是在單機上安裝,則是自我檢測的方式,若透過 nagios-nrpe-server 則可以晉升為遠端監控。

僅需挑一檯機器當 Monitor,在上頭安裝 nagios3 環境 (nagios3 server),而在其他待監控的機上,安裝 nagios-nrpe-plugin 環境,並設置可以監控它的來源、要監控的指令。

待監控的 Servers:

由於有些資源還是要從 Server 自身監控,如 Disk space、 CPU Load 等,所以透過 nagios-nrpe-server 來提供遠端查詢方式

$ sudo apt-get install nagios-nrpe-server
$ sudo vim /etc/nagios/nrpe_local.cfg
allowed_hosts=127.0.0.1,MonitorServerIP
command[check_load]=/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20
command[check_all_disks]=/usr/lib/nagios/plugins/check_disk -w 15% -c 5%
$ sudo service nagios-nrpe-server restart
netstat -at |grep nrpe
tcp        0      0 *:nrpe                  *:*                     LISTEN
$ grep nrpe /etc/services
nrpe            5666/tcp                        # Nagios Remote Plugin Executor


自我連線測試,也可以在 Monitor Server 測試指定 Server IP :

$ telnet localhost 5666

Monitor Server:

$ sudo apt-get install nagios3 nagios-nrpe-plugin

別忘了帳密在 /etc/nagios3/htpasswd.users 設定。此外,在 /usr/lib/nagios/plugins/ 就有一堆可以用的 tools ,例如監控 Google 是否正常:

$ /usr/lib/nagios/plugins/check_http -H www.google.com
HTTP OK: HTTP/1.1 200 OK - 12316 bytes in 0.057 second response time |time=0.056623s;;;0.000000 size=12316B;;;0


接著,則是定義自己的服務跟機器:

$ sudo vim /etc/nagios3/conf.d/my-server.cfg

#define host

define host {
        host_name db
        alias db.xxxx.com
        address db.xxxx.com
        hostgroups ssh-servers,remote-servers,https-servers
        use generic-host
}

define host {
        host_name www
        alias www.xxxx.com
        address www.xxxx.com
        hostgroups ssh-servers,remote-servers,http-servers,https-servers
        use generic-host
}

# define hostgroup

define hostgroup {
        hostgroup_name          mysql-servers
        alias                   MySQL DB Service
        members                 db
}

define hostgroup {
        hostgroup_name          https-servers
        alias                   HTTPS Service
        members                 db
}

define hostgroup {
        hostgroup_name          remote-servers
        alias                   Remote Server
        members                 db
}

# define service checking

define service {
        hostgroup_name          https-servers
        service_description     HTTPS
        check_command           check-https!$HOSTADDRESS!443
        use                     generic-service
        notification_interval   0 ; set > 0 if you want to be renotified
}

define service {
        hostgroup_name          remote-servers
        service_description     Remote NRPE CPU Load
        check_command           check_nrpe_1arg!check_load
        use                     generic-service
        notification_interval   0
}

define service {
        hostgroup_name          remote-servers
        service_description     Remote NRPE Disk Space
        check_command           check_nrpe_1arg!check_all_disks
        use                     generic-service
        notification_interval   0
}

# define commands

define command{
        command_name    check-https
        command_line    /usr/lib/nagios/plugins/check_http -I $ARG1$ -p $ARG2$ -S
}


$ sudo service nagios3 restart

如此一來,到 http://MonitorServerIP/nagios3 登入後,就可以觀察現況啦。以上的偵測包括 http, https, ssh, cpu loading, disk space 等,如果想要加上 mysql db service 的情況,可以試試 check_mysql_health 這支,需要額外下載:

下載 check_mysql_health 和編譯:

$ cd /tmp
$ wget -qO- http://labs.consol.de/download/shinken-nagios-plugins/check_mysql_health-2.1.8.2.tar.gz | tar -xzvf -
$ cd check_mysql_health-2.1.8.2
$ ./configure
$ make
$ sudo cp /tmp/check_mysql_health-2.1.8.2/plugins-scripts/check_mysql_health /usr/lib/nagios/plugins/


接著,撰寫相關 mysql db service checking:

$ sudo vim /etc/nagios3/conf.d/my-server.cfg

define host {
host_name db
alias db.xxxx.com
address db.xxxx.com
hostgroups ssh-servers,mysql-servers
use generic-host
}

define hostgroup {
hostgroup_name mysql-servers
alias MySQL DB Service
members db
}

define service {
hostgroup_name mysql-servers
service_description MySQL Remote Connection
check_command check-mysql-db!$HOSTADDRESS
#check_command check_tcp!-H!$HOSTADDRESS$!-p!3306
use generic-service
notification_interval 0 ; set > 0 if you want to be renotified
}

define command{
command_name check-mysql-db
command_line /usr/lib/nagios/plugins/check_mysql_health --hostname $ARG1$ --username nagios --password nagiospassword --mode querycache-hitrate --warning 90 --critica 95
}


此外,別忘了建立帳號供 monitor server 連到 db server,在 db server 上建立 nagios 帳號:

mysql> GRANT usage ON *.* TO 'nagios'@'nagios_monitor_server' IDENTIFIED BY 'nagiospassword';

沒有留言:

張貼留言