Gentoo - time Series Monitoring

By: John McFarlane <john.mcfarlane@rockfloat.com>
Last updated: 10/05/2005 @01:00

Abstract:
This document is a beginners guide on how to graphically monitor things like network and disk IO, CPU utilization, and memory consumption.



1. Introduction

It can be very difficult to keep track of the health of your servers if the only means of keeping track is to perodically login and run top. It's much more usefull to graphically monitor the health over time. For example, it makes it much easier to calculate how much RAM you will need in a few months if the demand increases as it has over the past few months.

There are lots of tools to help in this regard. One thing that needs to be understood up front is that this howto does not touch on monitoring, so as to receive notification if something is exceeding a threshold. You will want to look toward Nagios for things like that.

We will use the following tools to accomplish this task:

Name Purpose
Net-SNMP This will expose raw data like memory, cpu, and network access on the server. It exposes it using the standard snmp protocol.
RRDtool This is both a database format, and set of tools designed to manipulate the database and produce very pretty graphs.
MRTG We are going to use this as an snmp client that will query the servers and write to an rrdtool compliant database.

Much of this howto was derived from the drwindows tutorial on the Gentoo Forums. REFERENCE

I'm finished with this step

2. Install Net-SNMP

Install Net-SNMP per the usual:
root# emerge -a net-snmp
This will install both an snmp server and client.
I'm finished with this step

3. Configure Net-SNMP server

Net-snmp lets you configure what/how information is exposed in it's main configuration file: /etc/snmp/snmpd.conf. You can create one by running snmpconf -g basic_setup and answering the prompts. For now just make it look like this:

com2sec local     127.0.0.1/32  public
com2sec local     10.0.2.0/24   public

group MyROGroup v1              local
group MyROGroup v2c             local
group MyROGroup usm             local

view all    included  .1        80

access MyROGroup ""     any     noauth    exact  all    none   none

syslocation Cincinnati
syscontact John McFarlane
		

There are a few steps left to complete this step:

  1. Start the Net-snmp publisher:
    root# /etc/init.d/snmpd start
  2. Set Net-snmp to start on bootup:
    root# rc-update add snmpd default
  3. Test to see what data is being exposed by Net-snmp:
    root# snmpwalk -v 1 -c public HOSTNAME

I'm finished with this step

4. Install MRTG

Install MRTG per the usual:

root# emerge -a mrtg
root# mkdir -p /etc/mrtg/devices
root# mkdir -p /etc/mrtg/graphs
		
I'm finished with this step

5. Configure MRTG

I have always used one configuration file for MRTG, and one per device monitored. Let's take them one by one:
  1. Make the main MRTG configuration /etc/mrtg/mrtg.cfg look like this:
    
    WorkDir: /var/lib/mrtg
    Logdir: /var/log
    LogFormat: rrdtool
    RunAsDaemon: No
    
    LoadMIBs: /usr/share/snmp/mibs/UCD-SNMP-MIB.txt, /usr/share/snmp/mibs/TCP-MIB.txt, /usr/share/snmp/mibs/HOST-RESOURCES-MIB.txt
    				
  2. Here are some tips on how to gather details you will need in the next step.
    
    root# snmptable -v 1 -c public `hostname` ifTable | cut -b-33
    				
  3. Create the first device configuration /etc/mrtg/devices/HOSTNAME.inc:
    
    Target[HOSTNAME.usrsys]: ssCpuRawUser.0&ssCpuRawSystem.0:public@HOSTNAME
    MaxBytes[HOSTNAME.usrsys]: 100
    Title[HOSTNAME.usrsys]: CPU usr sys
    
    Target[HOSTNAME.idlenice]: ssCpuRawIdle.0&ssCpuRawNice.0:public@HOSTNAME
    MaxBytes[HOSTNAME.idlenice]: 100
    Title[HOSTNAME.idlenice]: CPU idle nice
    
    Target[HOSTNAME.tcpopen]: tcpCurrEstab.0&tcpCurrEstab.0:public@HOSTNAME
    MaxBytes[HOSTNAME.tcpopen]: 1000000
    Title[HOSTNAME.tcpopen]: Open TCP connections
    Options[HOSTNAME.tcpopen]: gauge
    
    Target[HOSTNAME.proc]: hrSystemProcesses.0&hrSystemProcesses.0:public@HOSTNAME
    MaxBytes[HOSTNAME.proc]: 1000
    Title[HOSTNAME.proc]: Number of running processes
    Options[HOSTNAME.proc]: gauge
    
    Target[HOSTNAME.freemem]: memTotalFree.0&memTotalFree.0:public@HOSTNAME
    MaxBytes[HOSTNAME.freemem]: 1000000
    Title[HOSTNAME.freemem]: Free Memory Total
    Options[HOSTNAME.freemem]: gauge
    
    Target[HOSTNAME.ram.swap]: memAvailReal.0&memAvailSwap.0:public@HOSTNAME
    MaxBytes[HOSTNAME.ram.swap]: 456488
    Title[HOSTNAME.ram.swap]: RAM vs swap Free Memory
    Options[HOSTNAME.ram.swap]: gauge
    
    Target[HOSTNAME.diskspercent]: .1.3.6.1.4.1.2021.9.1.9.1&.1.3.6.1.4.1.2021.9.1.9.2:public@HOSTNAME
    MaxBytes[HOSTNAME.diskspercent]: 100
    Title[HOSTNAME.diskspercent]: disk usage percent
    Options[HOSTNAME.diskspercent]: gauge,nopercent
    
    Target[HOSTNAME.disks.usage]: .1.3.6.1.4.1.2021.9.1.7.1&.1.3.6.1.4.1.2021.9.1.7.2:public@HOSTNAME
    MaxBytes1[HOSTNAME.disks.usage]: 24691824
    MaxBytes2[HOSTNAME.disks.usage]: 14048404
    Title[HOSTNAME.disks.usage]: disk available totals
    Options[HOSTNAME.disks.usage]: gauge,nopercent
    
    Target[elk.traffic]: \eth0:public@elk:
    MaxBytes[elk.traffic]: 12500000
    Title[elk.traffic]: 10.0.1.5 -- elk
    				
    Rinse and repeat for other devices you want to monitor.
    Tip You can monitor almost anything. You'll want to use snmpwalk to get the syntax for the Target portion of the config as it relates to the device in question.
    Here's an example of what you might use to monitor a router:
    
    Target[HOSTNAME.traffic]: 2:public@IP-ADDRESS
    MaxBytes1[HOSTNAME.traffic]: 250000
    MaxBytes2[HOSTNAME.traffic]: 125000
    Title[HOSTNAME.traffic]: Cable Modem Traffic Analysis
    				
  4. Include you device config(s) into the main MRTG configuration by appending to the end of /etc/mrtg/mrtg.cfg:
    
    Include: devices/HOSTNAME.inc
    #Include: devices/foobar.inc
    				
I'm finished with this step

6. Install RRDtool

Add the perl USE flag to /etc/portage.package.use like this:

root# echo net-analyzer/rrdtool perl > /etc/portage.package.use
		
Install RRDtool per the usual:
root# emerge -a rrdtool
I'm finished with this step

7. Initialize the databases using RRDtool


root# rrdtool create /var/lib/mrtg/`hostname`.disks.usage.rrd \
--start `date +"%s"` \
DS:ds0:GAUGE:600:0:24691824 \
DS:ds1:GAUGE:600:0:14048404 \
--step 300 \
RRA:AVERAGE:0.5:1:800 \
RRA:AVERAGE:0.5:6:800 \
RRA:AVERAGE:0.5:24:800 \
RRA:AVERAGE:0.5:288:800 \
RRA:MIN:0.5:1:800 \
RRA:MIN:0.5:6:800 \
RRA:MIN:0.5:24:800 \
RRA:MIN:0.5:288:800 \
RRA:MAX:0.5:1:800 \
RRA:MAX:0.5:6:800 \
RRA:MAX:0.5:24:800 \
RRA:MAX:0.5:288:800

root# rrdtool create /var/lib/mrtg/`hostname`.ram.swap.rrd \
--start `date +"%s"` \
DS:ds0:GAUGE:600:0:1034728 \
DS:ds1:GAUGE:600:0:506036 \
--step 300 \
RRA:AVERAGE:0.5:1:800 \
RRA:AVERAGE:0.5:6:800 \
RRA:AVERAGE:0.5:24:800 \
RRA:AVERAGE:0.5:288:800 \
RRA:MIN:0.5:1:800 \
RRA:MIN:0.5:6:800 \
RRA:MIN:0.5:24:800 \
RRA:MIN:0.5:288:800 \
RRA:MAX:0.5:1:800 \
RRA:MAX:0.5:6:800 \
RRA:MAX:0.5:24:800 \
RRA:MAX:0.5:288:800

root# rrdtool create /var/lib/mrtg/`hostname`.freemem.rrd \
--start `date +"%s"` \
DS:ds0:GAUGE:600:0:1034728 \
DS:ds1:GAUGE:600:0:506036 \
--step 300 \
RRA:AVERAGE:0.5:1:800 \
RRA:AVERAGE:0.5:6:800 \
RRA:AVERAGE:0.5:24:800 \
RRA:AVERAGE:0.5:288:800 \
RRA:MIN:0.5:1:800 \
RRA:MIN:0.5:6:800 \
RRA:MIN:0.5:24:800 \
RRA:MIN:0.5:288:800 \
RRA:MAX:0.5:1:800 \
RRA:MAX:0.5:6:800 \
RRA:MAX:0.5:24:800 \
RRA:MAX:0.5:288:800

root# rrdtool create /var/lib/mrtg/`hostname`.traffic.rrd \
--start `date +"%s"` \
DS:ds0:COUNTER:600:0:50000 \
DS:ds1:COUNTER:600:0:25000 \
--step 300 \
RRA:AVERAGE:0.5:1:800 \
RRA:AVERAGE:0.5:6:800 \
RRA:AVERAGE:0.5:24:800 \
RRA:AVERAGE:0.5:288:800 \
RRA:MIN:0.5:1:800 \
RRA:MIN:0.5:6:800 \
RRA:MIN:0.5:24:800 \
RRA:MIN:0.5:288:800 \
RRA:MAX:0.5:1:800 \
RRA:MAX:0.5:6:800 \
RRA:MAX:0.5:24:800 \
RRA:MAX:0.5:288:800 
		
I'm finished with this step

8. Populate the rddtool databases

We are using MRTG to make the snmp queries, and write to the RRDtool databases. Off we go:
root# mrtg /etc/mrtg/mrtg.cfg
I'm finished with this step

9. Create bash scripts to generate graphs

Here are a few examples of RRDtool graphing scripts:
  1. CPU /etc/mrtg/graphs/HOSTNAME-cpu.bash:
    
    #!/bin/bash
    #Generates a CPU info graph
    ###########################
    
    HOST=elk
    
    case $1 in
       day)
          INTERVAL=86400;;
       week)
          INTERVAL=604800;;
       month)
          INTERVAL=2678400;;
       year)
          INTERVAL=31622400;;
       *)
          INTERVAL=86400;;
    esac
    
    if [ $INTERVAL == 86400 ]; then
       INTERVALSTR="day"
    else
       INTERVALSTR="$1"
    fi
    
    echo Generating cpu.percent.${HOST}.$INTERVALSTR.png
    echo Using $INTERVAL interval
    
    rrdtool graph /rf/blobs/rockfloat/rrdtool/cpu.percent.${HOST}.$INTERVALSTR.png \
    -s -$INTERVAL \
    -a PNG \
    -z \
    DEF:user=/var/lib/mrtg/${HOST}.usrsys.rrd:ds0:AVERAGE \
    DEF:system=/var/lib/mrtg/${HOST}.usrsys.rrd:ds1:AVERAGE \
    DEF:idle=/var/lib/mrtg/${HOST}.idlenice.rrd:ds0:AVERAGE \
    DEF:nice=/var/lib/mrtg/${HOST}.idlenice.rrd:ds1:AVERAGE \
    "CDEF:total=100,idle,-" \
    COMMENT:"                 Max        Avg     Current\n" \
    AREA:system#FF4000:"System  " \
    GPRINT:system:MAX:'%7.2lf %%' \
    GPRINT:system:AVERAGE:"%7.2lf %%" \
    GPRINT:system:LAST:"%7.2lf %%\n" \
    STACK:user#0080FF:"User    " \
    GPRINT:user:MAX:'%7.2lf %%' \
    GPRINT:user:AVERAGE:"%7.2lf %%" \
    GPRINT:user:LAST:"%7.2lf %%\n" \
    STACK:nice#00FFFF:"Nice    " \
    GPRINT:nice:MAX:'%7.2lf %%' \
    GPRINT:nice:AVERAGE:"%7.2lf %%" \
    GPRINT:nice:LAST:"%7.2lf %%\n" \
    LINE1:total#008080:"CPU     " \
    GPRINT:total:MAX:'%7.2lf %%' \
    GPRINT:total:AVERAGE:"%7.2lf %%" \
    GPRINT:total:LAST:"%7.2lf %%\n" \
    -v "%" -t "CPU usage - $INTERVALSTR" -l 0
    				
    root# chmod 755 /etc/mrtg/graphs/`hostname`.bash
I'm finished with this step

10. Generate all the graphs

We'll use another bash script to loop thru and run all of the RRDtool graphing scripts /etc/mrtg/gen-graphs.bash:

#!/bin/bash

echo "The graphs are likely running now too.. give them a sec to finish"
sleep 30

cd /etc/mrtg/graphs
for file in *.bash; do
	echo 'Running ${file}...'
	./$file
	echo ''
done
		
Actually generate the graphs:

root# chmod 755 /etc/mrtg/gen-graphs.bash
root# /etc/mrtg/gen-graphs.bash
		
I'm finished with this step

Changelog: Date Description
10/05/2005 @01:00 Initial creation

This document was originally created on 10/5/2005


Disclaimer:
This page is not endorsed by gentoo.org or any other cool cats. Any information provided in this document is to be used at your own risk.