SUSE Linux Enterprise 10/11 – Monitoring the online patch updates with Nagios through SNMP
|
| ![]() |
Automating the the watch for new online patch updates is extremly helpful.
Especially when it is integrated into a existing Monitoring and notification
tool like Nagios, it eliminates the need to explicitly check each systems
updater icon through direct logon. There are already descriptions on
the web that explain such a setup for Redhat and Debian Linux in detail.
Below is the modification to make it all work for with SUSE Linux
Enterprise Server 10 (SLES10 SP2) and subsequent versions up to SLES11.
|
|
This guide has been written for Novell SLES10-SP2/SP3, referencing the 'rug' command
for OS update management. Novell discontinued 'rug' in SLES-11. Therefore the update
monitoring plugin has been re-written to work with the 'zypper' command. Zypper works
*much* better then 'rug' and I am glad 'rug' is gone. There were enough complaints on
the web how rug was unreliable, broken and slow, running through the .net implementation
called mono. Even on old SLES10 machines I replaced the rug scripts with the new zypper plugin.
The new plugin name is 'check-zypper-update.pl' and its available here.
Below is the overview how the rug plugin works. Zypper is almost identical to use and has similar options to rug. |
ml08460:/home/local/fm # rug ping
ZMD 7.2.2, Copyright (C) 2007 Novell, Inc.
Started at 10/2/2008 3:16:23 PM (uptime: 0 days, 20 hours, 32 minutes)
RSS size: 31352
Network Connected: Yes
Running on Mono 1.2.2
OS Target: SUSE Linux Enterprise Server 10 (x86_64)
Module Name | Description
-------------------+-------------------------------------------------
Inventory | Software and Hardware inventory module for Linux
NetworkManager | NetworkManager support
Package Management | Package Management module for Linux
ZENworks Server | SOAP methods used by a ZENworks server
XML-RPC interface | Export ZMD public interfaces over XML-RPC
ml08460:/home/local/fm # rug ca
Sub'd? | Name | Service
-------+-----------------------------+----------------------
Yes | SLES10-SP2-Updates | https://nu.novell.com
Yes | SLES10-SP2-Pool | https://nu.novell.com
Yes | SLES10-SP2-Online | https://nu.novell.com
| SLE10-SP2-Debuginfo-Updates | https://nu.novell.com
ml08460:/home/local/fm # rug lu |
ml08460:/usr/local/bin # ./check-rug-update.pl --run-rug OK - system is up to date ml08460:/home/app/nagios/libexec # cat test S | Catalog | Bundle | Name | Version | Arch --+-------------------+--------+----------+-----------+------- | SLES10-SP2-Online | | SPident | 0.9-74.24 | noarch ml08460:/home/app/nagios/libexec # ./check-rug-update.pl --file=test WARNING - 1 update(s) available: SPident Version 0.9-74.24 |
ml08460:/etc/snmp # echo "extend nagiosupdate /home/app/nagios/libexec/check-rug-update.pl --run-rug" >> snmpd.conf ml08460:/etc/snmp # /etc/init.d/snmpd restart Shutting down snmpd: done Starting snmpd ml08460:/etc/snmp # snmpget -v 2c -c myread 127.0.0.1 NET-SNMP-EXTEND-MIB::nsExtendOutput Full.\"checkupdate\" NET-SNMP-EXTEND-MIB::nsExtendOutputFull."checkupdate" = STRING: No updates are available. |
ml08460:/home/app/nagios/libexec # ls -l check_snmp_extend.sh
-rwxr-x--- 1 nagios nagios 1979 2008-10-02 16:50 check_snmp_extend.sh
ml08460:/home/app/nagios/libexec # ./check_snmp_extend.sh
Syntax: check_snmp_extend.sh ipaddr community extend-name
ml08460:/home/app/nagios/libexec # ./check_snmp_extend.sh 127.0.0.1 myread nagiosupdate
OK - system is up to date
ml08460:/home/app/nagios/etc/objects # vi commands.cfg
# 'check_snmp_extend' command definition
# check_snmp_extend hostip community extend-name
define command{
command_name check_snmp_extend
command_line $USER1$/check_snmp_extend.sh $HOSTADDRESS$ $ARG1$ $ARG2$
}
|
ml08460:/home/app/nagios/etc/objects # vi patch-services.cfg
###############################################################################
# Define a servicegroup for Linux patch update service checks
# check_patch_sles10 service checks will be member of this group
###############################################################################
define servicegroup{
servicegroup_name patch-checks ; The name of the servicegroup
alias OS Update Checks ; Long name of the group
}
###############################################################################
# Define the patch update check service template
###############################################################################
define service{
name generic-patch
active_checks_enabled 1
passive_checks_enabled 1
parallelize_check 1
obsess_over_service 1
check_freshness 0
notifications_enabled 1
event_handler_enabled 1
flap_detection_enabled 1
failure_prediction_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
is_volatile 0
check_period 24x7
max_check_attempts 3
normal_check_interval 120 ; check every 2 hours
retry_check_interval 1
contact_groups frankonly
notification_options u,w,c,r
notification_interval 1440 ; notify only once a day
notification_period 24x7
register 0
servicegroups patch-checks
}
###############################################################################
# SLES10 OS Patch Update Check via SNMP extend scripts
###############################################################################
define service {
use generic-patch
host_name ml08460
name check_snmp_extend
service_description check_patch_sles10
check_command check_snmp_extend!myread!nagiosupdate
}
###############################################################################
ml08460:/home/app/nagios/etc/objects # echo "cfg_file=/home/app/nagios/etc/objects/
patch-services.cfg" >> /home/app/nagios/etc/nagios.cfg
ml08460:/home/app/nagios/etc/objects # /etc/init.d/nagios restart
Running configuration check...done.
Stopping nagios: .done. |
| Occassionally, the 'rug' commands started to 'hang' on our Novell SLES10 SP2 servers. 'Hang' means they do not complete their run anymore. 'rug lu', 'rug update', 'rug ca' commands all just pile up, a trace shows 'wait4(-1, <unfinished ...>' and 'connect(12, {sa_family=AF_FILE, path="/var/run/zmd/zmd-remoting.socket"}, 35'. So, the Novell zmd daemon hangs. The last meaningful log message in /var/log/zmd-messages.log is '14 Feb 2009 08:15:09 INFO ServiceManager Failed to add service 'https://nu.novell.com' (keeping): Failed to parse XML metadata: cannot rollback transaction - SQL statements in progress'. Trying to stop the zmd daemon fails (/etc/init.d/novell-zmd stop), only kill -9 removes the faulty zmd daemon. Just re-starting the zmd daemon did not resolve the issue. Reading up on similar descriptions on the web, I resolved it by re-building the zmd and zypper databases in /var/lib/zmd and /var/lib/zypp from scratch. Still, this issue is annoying and occasionally re-occuring... Any comments are highly welcome. I updated 'check-rug-update.pl' to exit if there are multiple rug commands running, so it doesn't increase the problem and subsequent SNMP requests continue to work. |