Monitoring the Windows Update Status with Nagios through SNMP

windows-logo nagios-logo suse-logo 
by Frank4DD, @2009
This tutorial describes an approach to check if Windows systems are being properly patched. This is important in particular if you have servers in larger numbers, and you need to evaluate their compliance and risk status for your company. The typical existing solutions are running reports through Windows Update Servers (WSUS), or running scripts against the registry to list and compare the applied patches against a baseline (Security scanners like Foundstone or Nessus do just that). The last approach is certainly the most accurate, but also the most intensive way. With Microsoft releasing patches bi-weekly, these patch lists are growing huge over time. Even when they finally collapse into a service pack after many month's, patch lists are frequently changing and confusing.

The approach described here does not check for the existence of each single patch. Instead it checks the correct setup of the automatic patch service, either being Microsoft Online or a local WSUS server. It then runs the Windows-build-in check to see if there are any patches outstanding for this system and reports the results to the central monitoring system Nagios. The benefits are:
  1. Leveraging the existing monitoring setup of Nagios:
    • Using the already defined server inventory (windows server list)
    • Using the already defined administrator notification and contact list
    • Integrating with the other systems (LINUX, IOS) OS patch monitoring
      for a 'enterprise' view of on patch compliance
  2. Small server footprint and easy rollout
  3. Easy verification for patch infrastructure set up (the server settings are OK)
The drawbacks of this method are:
  1. It is less accurate then the method of comparison against a patch baseline list.
For example, it is impossible to tell directly and independendly if a particular patch has been applied. It does not tell if there are patch-overriding settings that suppress a particular patch installation. The future will show if this alternative method is sufficient and practical enough. The method currently works well with our Linux servers, where the patch-check connects to the online patch service, returning the list of outstanding patches for a particular system. Our Linux Vendors (Novell SuSE Linux Enterprise Server) patch release cycle is even more frequent then Microsoft, with patches being released on a almost daily basis. With a active Nagios notification being send to our admins, I found the patching being done much faster and pro-active. Nobody want's their servers being listed in status warning for too long. With patch reminders being send out until the patching is complete, administrators cannot 'forget' the patch task in their daily struggle of shifting priorities between server maintenance and project work.
  1. Set up and test the Windows patch check: win_update_trapsend.vbs
    First, we need to have a program that determines the current patch status. Microsofts Windows Scripting Host is universally available, we can use VBscript to write the check program win_update_trapsend.vbs. First we edit the top of the script to set our SNMP trap destination IP. Running it without further options, Windows scripting runs in interactive mode, opening a output window. We want to suppress that window and redirect any output into a local logfile. I created a batch file called win_update_trapsend.bat so I do not need to re-type the commandline options when I want to run it by hand. Finally, we need to find a good home directory for our script, often admins already have such a script home for their ops scripts. If not, I tend to use C:\update-monitor.
    C:\> cscript.exe -NoLogo C:\update-monitor\win_update_trapsend.vbs > C:\update-monitor\
    win_update_trapsend.log
    
    win_update_trapsend.vbs example1
  2. Transmit the check results to the Nagios system: TrapGen
    We are using SNMP to monitor Windows severs and SNMP is our central monitoring protocol used accross all systems. In Linux, we have the extend function in UCD NET-SNMP that allows to run scripts remotely and receive the output through SNMP. Unfortunately, the SNMP service shipping with Windows is limited: incapable of SNMPv3, no extend. As a result, we face the dilemma how to initiate the check and how to transport our monitoring result back to Nagios. One solution add a service such as NRPE-NT, which is exactly made for that purpose. But should we do that just for one single script? Repeat after me: "I dont want another daemon! I don't want another daemon!..." :-) In a enterprise with hundreds of servers, it makes a difference of getting a small client program rolled out vs. going through all the required testing of implementing another service. I tested sucessfully TrapGen from Network Computing Technologies, Inc., a small 136KB binary that can send custom SNMP traps from Windows systems. Together with the setup of a SNMP trap daemon, plus the passive service configuration in Nagios, we receive Windows update check results that are launched daily through the Windows scheduler.
    windows scheduler setup
  3. Submit the update status data into Nagios
    The client setup is easy on the windows system and also easy on the Nagios side, because we can leverage the existing SNMP trap implementation of our Windows Reboot Monitoring. We just add a new trap handler definition to '/etc/snmp/snmptrapd.conf' and update the send_trap_data.pl script, responsible for processing the received SNMP trap data and submitting it to Nagios as a passive check result. Passive checks have disadvantages: We cannot force a re-check of the Service from Nagios. If we want to update the Nagios status (Manager after patching: "Make it green!"), we need to either wait for the next scheduled check to kick in, or we need to run the check script on the Windows client by hand. A second disadvantage is that a system's monitoring configuration can break and it is not noticed. Then a passive check will not receive any new data. Fortunately, we can visualize this in Nagios using the 'freshness' parameters together with the check_command definition for 'stale' results (see no-patch-report in the next section).
    susie3 ~ # cat /etc/snmp/snmptrapd.conf
    ###############################################################################
    # snmptrapd.conf:
    #    configuration file for configuring the ucd-snmp snmptrapd agent.
    ###############################################################################
    
    # first, we define the access control
    authCommunity   log,execute,net SECtrap
    
    # next , the trap handlers.
    # capture Windows reboots: SNMPv2-MIB::snmpTrapOID.0 = SNMPv2-MIB::coldStart
    traphandle   SNMPv2-MIB::coldStart              /srv/app/nagios/libexec/send_trap_data.pl
    
    # capture Win update traps: SNMPv2-MIB::snmpTrapOID.0 = RFC1155-SMI::enterprises.2854.0.1
    traphandle   RFC1155-SMI::enterprises.2854.0.1  /srv/app/nagios/libexec/send_trap_data.pl
    

  4. Configure the plugin and service in Nagios
    Here, one important item is the service description name. It must match the name configured in send_trap_data.pl. Otherwise, Nagios cannot relate the event to any existing service for processing it.
    vi /srv/app/nagios/etc/nagios.cfg
    
    # passive service check for Windows Patch Update SNMP traps
    cfg_file=/home/app/nagios/etc/objects/patch-services-windows.cfg
    
    vi /srv/app/nagios/etc/objects/patch-services-windows.cfg
    
    ###############################################################################
    # Define a servicegroup for patch service checks
    # All patch service checks will be members of this group
    ###############################################################################
    define servicegroup{
      servicegroup_name        patch-checks-win     ; The name of the servicegroup
      alias                    Patch Checks Windows ; Long name of the group
    }
    ###############################################################################
    # Define the database check template service
    ###############################################################################
    define service{
      name                          generic-patch-win
      active_checks_enabled         0               ; traps are only passive checks
      passive_checks_enabled        1               ; yes, check passive
      parallelize_check             1               ; yes, please
      obsess_over_service           0               ; we don't run extra commands
      check_freshness               1               ; check if a report came in
      freshness_threshold           93600           ; 26 hour threshold for stale, the
                                                    ; patch check should run once a day
      check_command                 no-patch-report ; runs if service result is "stale"
      notifications_enabled         1               ; send notifications
      event_handler_enabled         1               ; yes, but we have none
      flap_detection_enabled        0               ; with auto-OK, we don't
      failure_prediction_enabled    1               ; dependency checks
      process_perf_data             0               ; don't send this to perfdata
      retain_status_information     1               ; yes, once auto-OK'ed, keep it
      retain_nonstatus_information  1
      is_volatile                   1               ; enable for passive checks
      check_period                  24x7            ; always check for submissions
      max_check_attempts            1               ; one trap is enough
      normal_check_interval         1
      retry_check_interval          1
      contact_groups                frankonly
      notification_options          w,r             ; notify for warnings and recovery
      notification_interval         1440            ; notify once a day
      notification_period           24x7            ; always notify
      register                      0               ; template, don't register
      service_groups                patch-checks-win
    }
    
    ###############################################################################
    # Receive SNMP traps for Windows update status
    ###############################################################################
    define service {
      use                           generic-patch-win
      hostgroup                     2-windows-servers
      name                          check_trap_winpatch
      service_description           check_trap_winpatch
    }
    
    vi command.cfg and add the definition for check_command no-patch-report:
    
    # This will always return "OK" but tells us no patch report came in that day.
    # see also http://nagios.sourceforge.net/docs/3_0/freshness.html
    define command{
     command_name no-patch-report
     command_line $USER1$/check_dummy 0 "Daily patch check result was not reported!"
    }
    
    susie3:/srv/app/nagios/etc/objects # echo "cfg_file=/srv/app/nagios/etc/objects/
    patch-services-windows.cfg" >> /srv/app/nagios/etc/nagios.cfg
    
    susie3:/srv/app/nagios/etc/objects # /etc/init.d/nagios restart
    Running configuration check...done.
    Stopping nagios: .done.
    Starting nagios: done.
    

  5. Example screenshots of the Nagios Windows Update Monitoring

  6. service group detail   nagios service detail 2

    nagios status detail1

    nagios status detail2

  7. ... and the resulting Nagios notification e-mail example
  8. nagios service notification

  9. Comments for Windows Update Servers WSUS
  10. With most servers being set to use WSUS, Windows patches are WSUS approved and then deployed on a fixed schedule. That means the patch check could *always* return OK, because patches become only visible to the system shortly before the actual patching. Also, we depend fully on the WSUS administrator to determine which patches are applicable. The solution? For the time of our patch check, we switch from WSUS to the *official* Windows Online update service and back to WSUS after our check. It is quite an effort (registry key changes, proxy settings, etc), but the only way for an independend check. This code is in development/testing, your comments highly welcome.
  11. Troubleshooting Tips
  12. Implementing a passive service with SNMP traps is not for the faint of heart. Here are some tips to get it going:
    • Check if the Windows trap data has been send, call win_update_trapsend.bat
    • Check if trap data arrives at the Nagios server using a packet sniffer such as tcpdump or etherreal
    • Check if the snmptrapd daemon processes this data (firewall might block, daemon config might be wrong, daemon might not be running)
    • Check if send_trap_data.pl generates the correct data for Nagios, using the tmp file
    • Check if the data is received by Nagios by checking nagios.log file
    • Check if the data is correctly associated with a Nagios service and hostname.
    Most of these troubleshooting steps are also described (in more detail) here Windows Reboot Monitoring.

  13. Credits, copyrights original scripts etc