remote DNS Response monitor , simple version
. Create RRDFILE
rrdtool create /root/study/dnsquery.rrd -s 300 \
DS:a:GAUGE:600:-100:10000 \
DS:b:GAUGE:600:-100:10000 \
DS:c:GAUGE:600:-100:10000 \
DS:d:GAUGE:600:-100:10000 \
DS:ns:GAUGE:600:-100:10000 \
DS:f:GAUGE:600:-100:10000 \
DS:g:GAUGE:600:-100:10000 \
RRA:AVERAGE:0.5:1:14400 \
RRA:AVERAGE:0.5:6:4800 \
RRA:AVERAGE:0.5:24:1200 \
RRA:AVERAGE:0.5:288:600 \
RRA:MAX:0.5:1:14400 \
RRA:MAX:0.5:6:4800 \
RRA:MAX:0.5:24:1200 \
RRA:MAX:0.5:288:600
建一個 RRDFILE 來存放 DNS 的回應時間值,數值介於 -100~10000 msec 間
RRA 表示存於最近 50 天5分鐘資料....100天 30 分資料,
100 天 2 小時資料,及最近 600 天的一天平均值(AVERAGE)及
最大值(MAX)
2. DNS rrdtool update/graph
#!/bin/sh
RRD_PATH="/root/study/dnsquery.rrd"
image_path="/www/htdocs/snmp.enum.org.tw/html"
host="a.dns.tw b.dns.tw c.dns.tw d.dns.tw ns.twnic.net e.dns.tw f.dns.tw"
rrd_data=""
for dns in $host
do
msec=`/bin/dig @$dns . ns | grep 'Query time' | sed -e 's/.*: \(.*\) [a-
z].*/\1/'`
if [ -z $msec ];then
msec=-100
echo "$dns ?#93;有回應,請您注意"| mail your_email@domain -s "$dns 無回應"
fi
if [ $msec -ge 3000 ];then
msec=-100
echo "$dns 回應遲鈍($msec),請您注意"| mail your_email@domain -s "$dns 無回應"
fi
rrd_data="$rrd_data:$msec"
done
now=`date +%s`
echo $rrd_data
rrdtool update $RRD_PATH ${now}${rrd_data}
#--------------- RRD graph ----------------------------
# CDEF 段主要在用白色蓋掉下面的 AREA
# 以讓每個主機都能有一個 range 顯示 response time(msec)
time="day week "
for t in $time
do
rrdtool graph /www/htdocs/snmp.enum.org.tw/dnsquery-$t.jpg \
-t "DNS Query Response Time (${t}ly)" \
-w 600 -h 250 -s `date -d "-1 $t" +%s` -v "msec" -X b \
-up-limit=1000 \
DEF:a=dnsquery.rrd:a:MAX \
DEF:b=dnsquery.rrd:b:MAX \
DEF:c=dnsquery.rrd:c:MAX \
DEF:d=dnsquery.rrd:d:MAX \
DEF:ns=dnsquery.rrd:ns:MAX \
DEF:f=dnsquery.rrd:f:MAX \
DEF:g=dnsquery.rrd:g:MAX \
CDEF:z0=-1,a,b,c,d,ns,f,g,+,+,+,+,+,+,7,/,* \
CDEF:a1=a,3000,+ \
CDEF:a11=3000,a,a,-,+ \
CDEF:b1=b,2500,+ \
CDEF:b11=2500,a,a,-,+ \
CDEF:c1=c,2000,+ \
CDEF:c11=2000,a,a,-,+ \
CDEF:d1=d,1500,+ \
CDEF:d11=1500,a,a,-,+ \
CDEF:ns1=ns,1000,+ \
CDEF:ns11=1000,a,a,-,+ \
CDEF:f1=f,500,+ \
CDEF:f11=500,a,a,-,+ \
CDEF:g1=g, \
AREA:z0#c0c0c0:"Average Response Time(msec)" \
COMMENT:"\n" \
AREA:a1#ff0000:"a.dns.tw" \
GPRINT:a:MAX:"%12.0lf" \
GPRINT:a:AVERAGE:"%12.0lf" \
GPRINT:a:MIN:"%12.0lf" \
GPRINT:a:LAST:"%12.0lf\n" \
AREA:a11#ffffff \
AREA:b1#800000:"b.dns.tw" \
GPRINT:b:MAX:"%12.0lf" \
GPRINT:b:AVERAGE:"%12.0lf" \
GPRINT:b:MIN:"%12.0lf" \
GPRINT:b:LAST:"%12.0lf\n" \
AREA:b11#ffffff \
AREA:c1#00ff00:"c.dns.tw" \
GPRINT:c:MAX:"%12.0lf" \
GPRINT:c:AVERAGE:"%12.0lf" \
GPRINT:c:MIN:"%12.0lf" \
GPRINT:c:LAST:"%12.0lf\n" \
AREA:c11#ffffff \
AREA:d1#008000:"d.dns.tw" \
GPRINT:d:MAX:"%12.0lf" \
GPRINT:d:AVERAGE:"%12.0lf" \
GPRINT:d:MIN:"%12.0lf" \
GPRINT:d:LAST:"%12.0lf\n" \
AREA:d11#ffffff \
AREA:ns1#0000ff:"ns.twnic.net" \
GPRINT:ns:MAX:"%8.0lf" \
GPRINT:ns:AVERAGE:"%12.0lf" \
GPRINT:ns:MIN:"%12.0lf" \
GPRINT:ns:LAST:"%12.0lf\n" \
AREA:ns11#ffffff \
AREA:f1#000080:"f.dns.tw" \
GPRINT:f:MAX:"%12.0lf" \
GPRINT:f:AVERAGE:"%12.0lf" \
GPRINT:f:MIN:"%12.0lf" \
GPRINT:f:LAST:"%12.0lf\n" \
AREA:f11#ffffff \
AREA:g1#ff8040:"g.dns.tw" \
GPRINT:g1:MAX:"%12.0lf" \
GPRINT:g1:AVERAGE:"%12.0lf" \
GPRINT:g1:MIN:"%12.0lf" \
GPRINT:g1:LAST:"%12.0lf\n" \
COMMENT:"note:<0 means no response\n"
done
結果:
一般人可能感覺很難,但熟rrdtool 就沒有什麼難了,也可以用這套方法
畫出主機的 ping , Web 的反應速度等,用一張圖就可以表示出 N 個數值
結語:
若你照抄 sample ,只要 rrdtool 有裝起來,一定可以 work 的,但請記得
rrdtool create 中的主機數和 script 中的主機數要成對應
記得將你的主機寫上,不是寫我的
rrdtool 比mrtg 難學,但好用很多,沒有研究精神建議不要用,你可以用這個
觀念直接用在 mrtg
mrtg 可直接使用 threshold check 做值的範圍檢查,觸動(Ex:=0 或 >;3000) , 就可以 call alert script, rrdtool 不行,所以要自己寫