SUSE Linux Enterprise 10/11 – Monitoring the online patch updates with Nagios through SNMP

suse-logo
Frank4DD, @2008,
updated 03/2010

Automating the the watch for new online patch updates is extremly helpful. Especially when it is integrated into a existing Monitoring and notification tool like Nagios, it eliminates the need to explicitly check each systems updater icon through direct logon. There are already descriptions on the web that explain such a setup for Redhat and Debian Linux in detail. Below is the modification to make it all work for with SUSE Linux Enterprise Server 10 (SLES10 SP2) and subsequent versions up to SLES11.



  • Update for Novell SLES-11
  • This guide has been written for Novell SLES10-SP2/SP3, referencing the 'rug' command for OS update management. Novell discontinued 'rug' in SLES-11. Therefore the update monitoring plugin has been re-written to work with the 'zypper' command. Zypper works *much* better then 'rug' and I am glad 'rug' is gone. There were enough complaints on the web how rug was unreliable, broken and slow, running through the .net implementation called mono. Even on old SLES10 machines I replaced the rug scripts with the new zypper plugin. The new plugin name is 'check-zypper-update.pl' and its available here.

    Below is the overview how the rug plugin works. Zypper is almost identical to use and has similar options to rug.

    1. Check the SLES Zenworks update service using the 'rug' command
      ml08460:/home/local/fm # rug ping
      ZMD 7.2.2, Copyright (C) 2007 Novell, Inc.
      Started at 10/2/2008 3:16:23 PM (uptime: 0 days, 20 hours, 32 minutes)
      RSS size: 31352
      Network Connected: Yes
      Running on Mono 1.2.2
      
      OS Target: SUSE Linux Enterprise Server 10 (x86_64)
      
      Module Name        | Description
      -------------------+-------------------------------------------------
      Inventory          | Software and Hardware inventory module for Linux
      NetworkManager     | NetworkManager support
      Package Management | Package Management module for Linux
      ZENworks Server    | SOAP methods used by a ZENworks server
      XML-RPC interface  | Export ZMD public interfaces over XML-RPC
      
      ml08460:/home/local/fm # rug ca
      Sub'd? | Name                        | Service
      -------+-----------------------------+----------------------
      Yes    | SLES10-SP2-Updates          | https://nu.novell.com
      Yes    | SLES10-SP2-Pool             | https://nu.novell.com
      Yes    | SLES10-SP2-Online           | https://nu.novell.com
             | SLE10-SP2-Debuginfo-Updates | https://nu.novell.com
      
      ml08460:/home/local/fm # rug lu
      No updates are available.

    2. Place the script check-rug-update.pl on the SLES10 server to produce Nagios-usable output
      ml08460:/usr/local/bin # ./check-rug-update.pl --run-rug
      OK - system is up to date
      
      ml08460:/home/app/nagios/libexec # cat test
      S | Catalog           | Bundle | Name     | Version   | Arch
      --+-------------------+--------+----------+-----------+-------
       | SLES10-SP2-Online |        | SPident  | 0.9-74.24 | noarch
      
      ml08460:/home/app/nagios/libexec # ./check-rug-update.pl --file=test
      WARNING - 1 update(s) available: SPident Version 0.9-74.24
      

    3. Enable the 'check-rug-update.pl' script to report results through snmpd and the NET-SNMP Extend MIB
      ml08460:/etc/snmp # echo "extend nagiosupdate /home/app/nagios/libexec/check-rug-update.pl
      --run-rug" >> snmpd.conf
      
      ml08460:/etc/snmp # /etc/init.d/snmpd restart
      Shutting down snmpd: done
      Starting snmpd 
      ml08460:/etc/snmp # snmpget -v 2c -c myread 127.0.0.1 NET-SNMP-EXTEND-MIB::nsExtendOutput
      Full.\"checkupdate\" 
      NET-SNMP-EXTEND-MIB::nsExtendOutputFull."checkupdate" = STRING: No updates are available.
      

    4. Get, install and configure the check_snmp_extend.sh Nagios plugin
    5. ml08460:/home/app/nagios/libexec # ls -l check_snmp_extend.sh
      -rwxr-x--- 1 nagios nagios 1979 2008-10-02 16:50 check_snmp_extend.sh
      
      ml08460:/home/app/nagios/libexec # ./check_snmp_extend.sh
       Syntax: check_snmp_extend.sh ipaddr community extend-name
      
      ml08460:/home/app/nagios/libexec # ./check_snmp_extend.sh 127.0.0.1 myread nagiosupdate
      OK - system is up to date
      
      ml08460:/home/app/nagios/etc/objects # vi commands.cfg
      # 'check_snmp_extend' command definition
      # check_snmp_extend hostip community extend-name
      define command{
        command_name check_snmp_extend
        command_line $USER1$/check_snmp_extend.sh $HOSTADDRESS$ $ARG1$ $ARG2$
      }
      

    6. Configure the Nagios service, i.e. via patch-services.cfg
    7. ml08460:/home/app/nagios/etc/objects # vi patch-services.cfg
      ###############################################################################
      # Define a servicegroup for Linux patch update service checks
      # check_patch_sles10 service checks will be member of this group
      ###############################################################################
      define servicegroup{
        servicegroup_name             patch-checks     ; The name of the servicegroup
        alias                         OS Update Checks ; Long name of the group
      }
      ###############################################################################
      # Define the patch update check service template
      ###############################################################################
      define service{
        name generic-patch
        active_checks_enabled         1
        passive_checks_enabled        1
        parallelize_check             1
        obsess_over_service           1
        check_freshness               0
        notifications_enabled         1
        event_handler_enabled         1
        flap_detection_enabled        1 
        failure_prediction_enabled    1
        process_perf_data             1
        retain_status_information     1
        retain_nonstatus_information  1
        is_volatile                   0
        check_period                  24x7
        max_check_attempts            3
        normal_check_interval         120                      ; check every 2 hours
        retry_check_interval          1
        contact_groups                frankonly
        notification_options          u,w,c,r
        notification_interval         1440                     ; notify only once a day
        notification_period           24x7
        register                      0
        servicegroups                 patch-checks
      }
      ###############################################################################
      # SLES10 OS Patch Update Check via SNMP extend scripts
      ###############################################################################
      define service {
        use                           generic-patch
        host_name                     ml08460
        name                          check_snmp_extend
        service_description           check_patch_sles10
        check_command                 check_snmp_extend!myread!nagiosupdate
      }
      ###############################################################################
      
      ml08460:/home/app/nagios/etc/objects # echo "cfg_file=/home/app/nagios/etc/objects/
      patch-services.cfg" >> /home/app/nagios/etc/nagios.cfg
      
      ml08460:/home/app/nagios/etc/objects # /etc/init.d/nagios restart
      Running configuration check...done.
      Stopping nagios: .done.
      Starting nagios: done.

    8. Enjoy the Nagios patch update monitoring (example screenshots)

    9. service group detail                    nagios service detail 2

      nagios service detail

    10. ... and the resulting Nagios notification e-mail body (example screenshots)

    11. nagios e-mail notification

    12. Additional comments
    13. Occassionally, the 'rug' commands started to 'hang' on our Novell SLES10 SP2 servers. 'Hang' means they do not complete their run anymore. 'rug lu', 'rug update', 'rug ca' commands all just pile up, a trace shows 'wait4(-1, <unfinished ...>' and 'connect(12, {sa_family=AF_FILE, path="/var/run/zmd/zmd-remoting.socket"}, 35'. So, the Novell zmd daemon hangs. The last meaningful log message in /var/log/zmd-messages.log is '14 Feb 2009 08:15:09 INFO ServiceManager Failed to add service 'https://nu.novell.com' (keeping): Failed to parse XML metadata: cannot rollback transaction - SQL statements in progress'. Trying to stop the zmd daemon fails (/etc/init.d/novell-zmd stop), only kill -9 removes the faulty zmd daemon. Just re-starting the zmd daemon did not resolve the issue. Reading up on similar descriptions on the web, I resolved it by re-building the zmd and zypper databases in /var/lib/zmd and /var/lib/zypp from scratch. Still, this issue is annoying and occasionally re-occuring... Any comments are highly welcome. I updated 'check-rug-update.pl' to exit if there are multiple rug commands running, so it doesn't increase the problem and subsequent SNMP requests continue to work.

    14. Credits, copyrights original scripts etc