|
Monitoring the Windows Update Status with Nagios through SNMP |
|
| by Frank4DD, @2009 | ![]() |
This tutorial describes an approach to check if Windows systems are being properly patched.
This is important in particular if you have servers in larger numbers, and you need to
evaluate their compliance and risk status for your company.
The typical existing solutions are running reports through Windows Update Servers (WSUS),
or running scripts against the registry to list and compare the applied patches against
a baseline (Security scanners like Foundstone or Nessus do just that). The last approach
is certainly the most accurate, but also the most intensive way. With Microsoft releasing
patches bi-weekly, these patch lists are growing huge over time. Even when they finally
collapse into a service pack after many month's, patch lists are frequently changing and
confusing.
|
The approach described here does not check for the existence of each single patch. Instead it checks
the correct setup of the automatic patch service, either being Microsoft Online or a local WSUS server.
It then runs the Windows-build-in check to see if there are any patches outstanding for this system
and reports the results to the central monitoring system Nagios. The benefits are:
|
|
First, we need to have a program that determines the current patch status. Microsofts Windows Scripting
Host is universally available, we can use VBscript to write the check program
win_update_trapsend.vbs.
First we edit the top of the script to set our SNMP trap destination IP.
Running it without further options, Windows scripting runs in interactive mode, opening a
output window. We want to suppress that window and redirect any output into a local logfile. I created
a batch file called win_update_trapsend.bat so I do not need to
re-type the commandline options when I want to run it by hand. Finally, we need to find a good home
directory for our script, often admins already have such a script home for their ops scripts. If not, I tend to use C:\update-monitor.
|
C:\> cscript.exe -NoLogo C:\update-monitor\win_update_trapsend.vbs > C:\update-monitor\ win_update_trapsend.log |
|
|
We are using SNMP to monitor Windows severs and SNMP is our central monitoring protocol used
accross all systems. In Linux, we have the extend function in UCD NET-SNMP
that allows to run scripts remotely and receive the output through SNMP.
Unfortunately, the SNMP service shipping with Windows is limited: incapable of SNMPv3,
no extend. As a result, we face the dilemma how to initiate the check and how to
transport our monitoring result back to Nagios. One solution add a service such as
NRPE-NT, which is exactly made for that purpose.
But should we do that just for one single script?
Repeat after me: "I dont want another daemon! I don't want another daemon!..." :-)
In a enterprise with hundreds of servers, it makes a difference of getting a small client program
rolled out vs. going through all the required testing of implementing another service. I tested
sucessfully TrapGen from Network Computing Technologies, Inc.,
a small 136KB binary that can send custom SNMP traps from Windows systems. Together with the setup of
a SNMP trap daemon, plus the passive service configuration in Nagios, we receive Windows update
check results that are launched daily through the Windows scheduler.
| ![]() |
|
The client setup is easy on the windows system and also easy on the Nagios side, because
we can leverage the existing SNMP trap implementation of our Windows Reboot Monitoring.
We just add a new trap handler definition to '/etc/snmp/snmptrapd.conf' and update the
send_trap_data.pl script, responsible for processing the
received SNMP trap data and submitting it to Nagios as a passive check result.
Passive checks have disadvantages: We cannot force a re-check of the Service from Nagios.
If we want to update the Nagios status (Manager after patching: "Make it green!"), we need to either
wait for the next scheduled check to kick in, or we need to run the check script on the
Windows client by hand. A second disadvantage is that a system's monitoring configuration
can break and it is not noticed. Then a passive check will not receive any new data. Fortunately,
we can visualize this in Nagios using the 'freshness' parameters together with the check_command
definition for 'stale' results (see no-patch-report in the next section).
|
susie3 ~ # cat /etc/snmp/snmptrapd.conf ############################################################################### # snmptrapd.conf: # configuration file for configuring the ucd-snmp snmptrapd agent. ############################################################################### # first, we define the access control authCommunity log,execute,net SECtrap # next , the trap handlers. # capture Windows reboots: SNMPv2-MIB::snmpTrapOID.0 = SNMPv2-MIB::coldStart traphandle SNMPv2-MIB::coldStart /srv/app/nagios/libexec/send_trap_data.pl # capture Win update traps: SNMPv2-MIB::snmpTrapOID.0 = RFC1155-SMI::enterprises.2854.0.1 traphandle RFC1155-SMI::enterprises.2854.0.1 /srv/app/nagios/libexec/send_trap_data.pl |
|
Here, one important item is the service description name.
It must match the name configured in send_trap_data.pl.
Otherwise, Nagios cannot relate the event to any existing service for processing it.
|
vi /srv/app/nagios/etc/nagios.cfg
# passive service check for Windows Patch Update SNMP traps
cfg_file=/home/app/nagios/etc/objects/patch-services-windows.cfg
vi /srv/app/nagios/etc/objects/patch-services-windows.cfg
###############################################################################
# Define a servicegroup for patch service checks
# All patch service checks will be members of this group
###############################################################################
define servicegroup{
servicegroup_name patch-checks-win ; The name of the servicegroup
alias Patch Checks Windows ; Long name of the group
}
###############################################################################
# Define the database check template service
###############################################################################
define service{
name generic-patch-win
active_checks_enabled 0 ; traps are only passive checks
passive_checks_enabled 1 ; yes, check passive
parallelize_check 1 ; yes, please
obsess_over_service 0 ; we don't run extra commands
check_freshness 1 ; check if a report came in
freshness_threshold 93600 ; 26 hour threshold for stale, the
; patch check should run once a day
check_command no-patch-report ; runs if service result is "stale"
notifications_enabled 1 ; send notifications
event_handler_enabled 1 ; yes, but we have none
flap_detection_enabled 0 ; with auto-OK, we don't
failure_prediction_enabled 1 ; dependency checks
process_perf_data 0 ; don't send this to perfdata
retain_status_information 1 ; yes, once auto-OK'ed, keep it
retain_nonstatus_information 1
is_volatile 1 ; enable for passive checks
check_period 24x7 ; always check for submissions
max_check_attempts 1 ; one trap is enough
normal_check_interval 1
retry_check_interval 1
contact_groups frankonly
notification_options w,r ; notify for warnings and recovery
notification_interval 1440 ; notify once a day
notification_period 24x7 ; always notify
register 0 ; template, don't register
service_groups patch-checks-win
}
###############################################################################
# Receive SNMP traps for Windows update status
###############################################################################
define service {
use generic-patch-win
hostgroup 2-windows-servers
name check_trap_winpatch
service_description check_trap_winpatch
}
vi command.cfg and add the definition for check_command no-patch-report:
# This will always return "OK" but tells us no patch report came in that day.
# see also http://nagios.sourceforge.net/docs/3_0/freshness.html
define command{
command_name no-patch-report
command_line $USER1$/check_dummy 0 "Daily patch check result was not reported!"
}
susie3:/srv/app/nagios/etc/objects # echo "cfg_file=/srv/app/nagios/etc/objects/
patch-services-windows.cfg" >> /srv/app/nagios/etc/nagios.cfg
susie3:/srv/app/nagios/etc/objects # /etc/init.d/nagios restart
Running configuration check...done.
Stopping nagios: .done.
Starting nagios: done.
|
|
With most servers being set to use WSUS, Windows patches are WSUS approved and then deployed on a fixed schedule.
That means the patch check could *always* return OK, because patches become only visible to the system shortly before
the actual patching. Also, we depend fully on the WSUS administrator to determine which patches are applicable. The solution?
For the time of our patch check, we switch from WSUS to the *official* Windows Online update service and back to WSUS
after our check. It is quite an effort (registry key changes, proxy settings, etc), but the only way for an independend check. This code is in development/testing, your comments highly welcome.
|
|
Implementing a passive service with SNMP traps is not for the faint of heart. Here are some tips to get it going:
|