Monitoring Apache Session load with Nagios through mod_status

by Frank4DD, @2009

suse-logo nagios-logo apache-logo  
We needed a Apache web server monitoring that is able to watch the current web server session status. It should alert us for reaching the web servers client session limit with the potentially dropped user sessions that follow.
We are already using Nagios 'check_http' plugin to monitor site access. This plugin also provides the size and access time for the retrieved URL as a perfomance indicator. This data can help identify access performance bottlenecks, but is influenced by the network speed and therefore an very unreliable indicator, as our graphs show.
We also have historical monitoring that can provide session information (i.e. through webalizer). But this is only great to monitor the overall trend, it cannot alarm for ad-hoc session peaks since the information gathered is based on historical log data.
The following method describes the setup for Apache session load monitoring using the Apache module 'mod_status'. It has been tested unter Nagios 3.0.6 running on a SuSE Linux Enterprise Server 10 SP2, monitoring Apache2 webservers running on various platforms such as Linux, Windows and AIX.

  1. Providing Apache session information with 'mod_status'
    Introduced with Apache 1.1, the module mod_status can provide a simple, static HTML page with session details under the servers location /server-status. More information is available at the Apache project for versions 1.3, 2.0 and 2.2.

    First, we need to load the module. Apache modules are enabled using the Apache directive "LoadModule ". But the recommended method to enable modules somewhat depends on the webserver installation and Operating system. For SuSE Linux Enterprise Server, /etc/sysconfig/apache2 contains the list of modules to be enabled. This list is expanded into /etc/apache2/sysconfig.d/loadmodule.conf, included into the main httpd.conf.
    # vi /etc/sysconfig/apache2
    
    APACHE_MODULES="authz_host actions alias auth_basic authz_groupfile authn_file authz_user
    autoindex cgi dir include log_config mime negotiation setenvif ssl php5 perl python
    apparmor auth_pam auth_sys_group authz_default rewrite status"
    
    # grep status /etc/apache2/sysconfig.d/loadmodule.conf
    LoadModule status_module                  /usr/lib64/apache2-prefork/mod_status.so
    
    Now we need to enable extended status information and restrict access to the web location. Again, depending on the web server configuration, the place to configure could be in the global httpd.conf file, or in one of its includes. For SLES10, it is in the mod_status.conf file.
    # cat mod_status.conf
    #
    # Allow server status reports generated by mod_status,
    # with the URL of http://servername/server-status
    #
    # see http://httpd.apache.org/docs-2.2/mod/mod_status.html
    #
    <IfModule mod_status.c>
        <Location /server-status>
            SetHandler server-status
            Order deny,allow
            Deny from all
            Allow from 192.168.103.32 192.168.103.34 127.0.0.1 localhost
        </Location>
        ExtendedStatus On
    </IfModule>
    After we updated the Apache configuration, we reload it to enable the new module. Now, the server-status webpage should be be available for access from the allowed client IP's. A example server-status page is saved here.

  2. check_apachestatus.pl - Parsing the 'mod_status' output for Nagios
    Nagiosexchange provided the script check_apachestatus.pl as a Nagios plugin to process the output of mod_status. It uses the Perl LWP library to make a http "GET" for the location /server-status and parses the output accordingly. I updated the script to to make it work independendly of the nagios perl module and to work with both mod_status settings For "ExtendedStatus On" or not. It can be found here. If "ExtendedStatus" is enabled, extra performance data is returned. Let's place it in 'nagioshome'/libexec and a test by hand will verify the correct function:
    m08460:/srv/app/nagios/libexec # ./check_apachestatus.pl -H 192.168.103.34
    OK 0.038002 seconds response time. Idle 8, busy 1, open slots 141 | 'Waiting for
    Connection'=8 'Starting Up'=0 'Reading Request'=0 'Sending Reply'=1 'Keepalive
    (read)'=0 'DNS Lookup'=0 'Closing Connection'=0 'Logging'=0 'Gracefully finishing'=0
    'Idle cleanup'=0 'Open slot'=141 'Requests/sec'=0.1 'B per sec'=516.0B 'kB per
    Request'=7.0kB
    
    Allright, we receive data for Nagios, but our server load is empty! Let's generate some artificial web load. I am going to use the load generator 'curl-loader' to max out the sessions my web server has available (MaxClients 150). I already have one monitoring connection, so with 150 additional parallel clients we should start seeing a few timeout errors for clients that cannot connect anymore. We also going to verify the plugins correct alarming by setting the warning threshold to 10 sessions and critical to 2 session left (-w 10 -c 2).
    # vi frank4dd-conf
    ########### GENERAL SECTION ################################
    BATCH_NAME=frank4dd-test
    CLIENTS_NUM_MAX=150
    CLIENTS_NUM_START=150
    CLIENTS_RAMPUP_INC=1
    INTERFACE   =eth0
    NETMASK=255.255.255.0
    IP_ADDR_MIN= 192.168.203.99
    IP_ADDR_MAX= 192.168.203.99
    CYCLES_NUM= -1
    URLS_NUM= 1
    ########### URL SECTION ####################################
    URL=http://192.168.103.34/wpad.dat
    URL_SHORT_NAME="frank4dd-80-wpad"
    REQUEST_TYPE=GET
    TIMER_URL_COMPLETION = 3000
    TIMER_AFTER_URL_SLEEP =100
    
    ~ # ./curl-loader -f frank4dd-conf
    
    ============  loading batch is: frank4dd- ======================================
    
     Operations:             Success                 Failed                 Timed out
    URL0:frank4dd-80-       3158   6040             7      7                7      7 
    --------------------------------------------------------------------------------
    Interval stats (latest:3 sec, clients:150, CAPS-curr:1046):
    H/F   Req:3118,1xx:0,2xx:3125,3xx:0,4xx:0,5xx:0,Err:0,T-Err:7,D:1ms,D-2xx:1ms,\
    Ti:2167740B/s,To:129166B/s
    H/F/S Req:0,1xx:0,2xx:0,3xx:0,4xx:0,5xx:0,Err:0,T-Err:0,D:0ms,D-2xx:0ms,Ti:0B/s,To:0B/s
    --------------------------------------------------------------------------------
    Summary stats (runs:6 secs, CAPS-average:1032):
    H/F   Req:6040,1xx:0,2xx:6040,3xx:0,4xx:0,5xx:0,Err:0,T-Err:7,D:1ms,D-2xx:1ms,\
    Ti:2094889B/s,To:124971B/s
    H/F/S Req:0,1xx:0,2xx:0,3xx:0,4xx:0,5xx:0,Err:0,T-Err:0,D:0ms,D-2xx:0ms,Ti:0B/s,To:0B/s
    =================================================================================
     Manual: clients:max[150],curr[150]. Inc num: [+|*].
    =================================================================================
    
    m08460:/srv/app/nagios/libexec # ./check_apachestatus.pl -H 192.168.103.34 -w 10 -c 2
    CRITICAL 0.347821 seconds response time. Idle 0, busy 148, open slots 2 | 'Waiting for
    Connection'=0 'Starting Up'=0 'Reading Request'=0 'Sending Reply'=3
    'Keepalive (read)'=141 'DNS Lookup'=0 'Closing Connection'=0 'Logging'=4
    'Gracefully finishing'=0 'Idle cleanup'=0 'Open slot'=2 'Requests/sec'=0.4
    'B per sec'=676.0B 'B per Request'=1850.0B
    

  3. Configuring the new Nagios Service
    Here is a saved /server-status page showing how mod_status displays our full sessions table. Next, we add the new Nagios plugin command definition for our new check. I named it check_http_stat in 'nagioshome'/commands.cfg and now we can configure our first apache load service object. Add the new configuration file to 'nagioshome'/etc/nagios.cfg, restart Nagios and check for the new service marked 'pending'.
    ~ # vi /srv/app/nagios/etc/objects/commands.cfg
    
    # check_apachestatus.pl -H  [-p ] [-t ] [-w  -c ] [-V]
    define command{
      command_name    check_http_stat
      command_line    $USER1$/check_apachestatus.pl -H $HOSTADDRESS$ $ARG1$
    }
    
    # vi website-services.cfg
    ###############################################################################
    # Define a servicegroup for web service checks
    # web service checks will be a member of this group
    ###############################################################################
    define servicegroup{
      servicegroup_name        website-checks ; The name of the hostgroup
      alias                    Web Site Checks ; Long name of the group
    }
    define servicegroup{
      servicegroup_name        website-load ; The name of the hostgroup
      alias                    Web Site Load ; Long name of the group
      members                  frank4dd-server,apache-load-frank4dd,
    }
    ###############################################################################
    # Define the environment check template service
    ###############################################################################
    define service{
      name                          generic-website
      active_checks_enabled         1
      passive_checks_enabled        1
      parallelize_check             1
      obsess_over_service           1
      check_freshness               0
      notifications_enabled         1
      event_handler_enabled         1
      flap_detection_enabled        1
      failure_prediction_enabled    1
      process_perf_data             1
      retain_status_information     1
      retain_nonstatus_information  1
      is_volatile                   0
      check_period                  24x7
      max_check_attempts            4
      normal_check_interval         5
      retry_check_interval          1
      contact_groups                frankonly
      notification_options          c,r
      notification_interval         180
      notification_period           24x7
      register                      0
      servicegroups                 website-checks
    }
    ###############################################################################
    # Check web access to frank4dd-server
    ###############################################################################
    define service{
      use                           generic-website
      host_name                     frank4dd-server
      service_description           website-check-frank4dd
      check_command                 check_http!-S -a nagios:nagios2check
    }
    define serviceextinfo {
      service_description           website-check-frank4dd
      host_name                     frank4dd
            use                     basic
    }
    ###############################################################################
    # Check web load on frank4dd-server
    ###############################################################################
    define service{
      use                           generic-website
      host_name                     frank4dd-server
      service_description           apache-load-frank4dd
      check_command                 check_http_stat!-w 30 -c 15
    }
    define serviceextinfo {
      service_description           apache-load-frank4dd
      host_name                     frank4dd-server
            use                     basic
    }
    

  4. We are done, enjoy the Apache session monitoring (example screenshots)
  5. nagios snmp trap service detail 1        nagios snmp trap service detail 2

    nagios snmp trap service detail 3

    nagios snmp trap service detail 3

    nagios webload service notification

  6. Here is the Nagiosgraph image example for historical Apache session trends
  7. nagios e-mail notification

  8. Additional comments on using 'check_apachestatus.pl'
  9. I slightly modified the original script 'check_apachestatus.pl' and removed the dependency on the Nagios perl modules (utils.pm). Now the script runs independend of the Nagios server itself. Some of my Nagios services need to call 'check_apachestatus.pl' through the snmp extend access method, using the 'check_snmp_extend.sh' plugin. Some of my hosts are in a DMZ with only port 443 (HTTPS) open, so I cannot access port 80 for /server-status directly over the network. With SNMP access open for the Nagios monitoring system, I placed the session load monitoring 'check_apachestatus.pl' on the host itself were it is called by the SNMP daemon (configured in /etc/snmp/snmpd.conf).
    Also, 'check_apachestatus.pl' currently parses /server-status, while mod_status provides text output in a standard format by using the URL /server-status&auto, which is possibly a better way to retrieve the data (example). Your opinion is very welcome.
    In the latest version 1.6, the plugin check_apache.pl supports connections through a web proxy and, if /server-status is provided through SSL, HTTP/S connections. For the combination of both, using a proxy with HTTP/s, it was quite tricky to make it work, using Crypt::SSLeay with the HTTPS_PROXY environment variable. Finally, a trick for the special case when your system is defined in Nagios using a different (i.e. NAT) IP address while the URL for /server-status needs another (real) IP:

    check_command check_http_stat!-x "http://192.168.100.184:80" -s -H 211.145.179.138

  10. Credits, copyrights original scripts etc