Outils pour utilisateurs

Outils du site


blog

Notes Supervisor

Voir Notes PID1 conteneurs

[program:plop]
directory=/home/plop/front-website/
command=/home/plop/front-website/front-website -bind=":8081"
autostart=true
autorestart=true
startsecs=10
stdout_logfile=/var/log/plop/stdout.log
stdout_logfile_maxbytes=1MB
stdout_logfile_backups=10
stdout_capture_maxbytes=1MB
stderr_logfile=/var/log/plop/stderr.log
stderr_logfile_maxbytes=1MB
stderr_logfile_backups=10
stderr_capture_maxbytes=1MB
user = plop
[program:gogs]
directory=/home/git/gogs/
command=/home/git/gogs/gogs web
autostart=true
autorestart=true
startsecs=10
stdout_logfile=/var/log/gogs/stdout.log
stdout_logfile_maxbytes=1MB
stdout_logfile_backups=10
stdout_capture_maxbytes=1MB
stderr_logfile=/var/log/gogs/stderr.log
stderr_logfile_maxbytes=1MB
stderr_logfile_backups=10
stderr_capture_maxbytes=1MB
environment = HOME="/home/git", USER="git"
user = git
/usr/bin/supervisorctl -c /opt/etc/supervisord.conf

Si modif config

reread
update

Relacement du service supervisor

reload

Docker no logs

[supervisord]
nodaemon=true
logfile=/dev/null
logfile_maxbytes=0
2025/03/24 15:06

Notes Supervision

Ne pas dire supervision mais observabilité. Ça fait mieux.

Outils

Ex :

  • Zabbix
  • Nagios
  • Check mk / Shinken
  • Nagstamon
  • CachetHQ
  • Riemann.io (Supervision & Alerting)

Sondes

Apache2
apachectl status || lynx localhost/server-status
Expiration certificat SSL/TLS

Voir :

Script :

openssl s_client -connect gnunet.org:443 </dev/null 2>/dev/null| openssl x509 -enddate -noout

Source : http://www.bortzmeyer.org/tester-expiration-certifs.html

check-tls.sh

#!/bin/bash
# Author: Kim Minh Kaplan, 2010.
 
set -e
 
# The $statedir/check-tls.status file should contain one line per
# server to be checked:
#
#    <server>:<port> [last-epoch [CAfile [openssl-extra-args]]]
#
# For example:
#
#    www.example.com:443
#    www.example.org:443 0 /etc/certs/my-own-ca-bundle.pem
#    www.example.net:25 0 /etc/certs/my-own-ca-bundle.pem -starttls smtp
#
# LIMITATIONS/BUGS:
#
# * Requires OpenSSL
#
# * Probably only works on a GNU system (bash, coreutils).
#
# * Only check the expiration date of the certificate. Not its purpose,
#   identity, revocation or any other validity parameters.
#
# * Only check the expiration date of the server certificate but *not*
#   the expiration date of intermediate or root certificate
#
# * Empty lines in $statedir/check-tls.status are *not* ignored and
#   induce an error message "no port defined".
 
OPENSSL="openssl"
 
# Alertes à moins de 90, 60, 30, 15, 7, 6, 5, 4, 3, 2, 1 jour.
alert=(90 60 30 15 7 6 5 4 3 2 1)
 
statedir=/var/tmp/lib/monitor
test -d "$statedir" || install -d "$statedir"
mkdir "$statedir/check-tls.lock" || exit
trap "rmdir \"$statedir/check-tls.lock\"" 0
 
nowepoch=`date +%s`
>"$statedir/check-tls.$$"
while read host_desc prevepoch ca_file openssl_args
do
    if test -z "$prevepoch"
    then
	prevepoch=0
    fi
 
    # Find expiry epoch
    tmpf=/tmp/$host_desc-$$.log
    if $OPENSSL s_client -CAfile "${ca_file:-/etc/ssl/certs/ca-certificates.crt}" $openssl_args \
	-connect $host_desc </dev/null >"$tmpf" 2>&1
    then
	if grep -q '^ *Verify return code: 0 (ok)$' "$tmpf"
	then
	    true
	else
	    echo "======================================================================" >&2
	    echo "Error verifying $host_desc" >&2
	    cat "$tmpf" >&2
	    rm -f "$tmpf"
	    echo "$host_desc $prevepoch $ca_file $openssl_args" >>"$statedir/check-tls.$$"
	    continue
	fi
	enddate=`$OPENSSL x509 -in "$tmpf" -noout -enddate | cut -f 2- -d =`
	rm -f "$tmpf"
    else
	cat "$tmpf" >&2
	rm -f "$tmpf"
	echo "$host_desc $prevepoch $ca_file $openssl_args" >>"$statedir/check-tls.$$"
	continue
    fi
    endepoch=`date -d "$enddate" +%s`
 
    if test $endepoch -le $nowepoch
    then
	echo "Alert: expired $host_desc" >&2
	prevepoch=$nowepoch
    else
	# Find the largest not yet triggered alert: it is the maximum that is still below prevspan
	prevspan=`expr \( $endepoch - $prevepoch \) / 60 / 60 / 24`
	nextalert=none
	for j in ${alert[@]}
	do
	    if test $j -lt $prevspan
	    then
		if test $nextalert = none
		then
		    nextalert=$j
		elif test $j -gt $nextalert
		then
		    nextalert=$j
		fi
	    fi
	done
	if test $nextalert = none
	then
	    echo "$host_desc $prevepoch $ca_file $openssl_args" >>"$statedir/check-tls.$$"
	    continue
	fi
 
	# Alert if necessary
	spanepoch=`expr $nextalert \* 60 \* 60 \* 24`
	if test `expr $endepoch - $nowepoch` -lt $spanepoch
	then
	    expire=`date -I -d @$endepoch`
	    echo "Alert, $host_desc expires $expire (less than $nextalert days)" >&2
	    prevepoch=$nowepoch
	fi
    fi
    echo "$host_desc $prevepoch $ca_file $openssl_args" >>"$statedir/check-tls.$$"
done <"$statedir/check-tls.status"
mv "$statedir/check-tls.$$" "$statedir/check-tls.status"
Sonde check générique à faire

Voir :

Fichiers sensibles :

  • /etc/passwd
  • /etc/shadow

RW partition. touch /.check

date / time : ntpdate ?

MAJ

Service KO

Alerte avant l'expiration des domaines

/etc/passwd uid 0

dmesg

Comptes LDAP

2025/03/24 15:06

Changer le mot de passe root via script sur RedHat / CentOS

echo 'root:P@ssw0rd' |chpasswd
#echo "utilisateur:P@ssw0rd|chpasswd -cSHA512
echo "password" | passwd hacluster --stdin

Attention : ce n'est pas sécurisé.

Autres

read -s PASS
 
# Ou
set +o history
export PASS=P@ssw0rd
set -o history
2025/03/24 15:06

Notes supervision Nagios

Administration

Effacer l'historique des données remontées par les sondes Nagios
/etc/init.d/nagios stop
rm /usr/local/nagios/var/retention.dat
rm /usr/local/nagios/var/objects.cache
/etc/init.d/nagios start

A la place de systématiquement effacer ces fichiers avant de démarrer Nagios il est possible de changer :

nagios.cfg

#retain_state_information=1
retain_state_information=0

Configuration

Exemple de conf

Exemple avec check_snmp_mem_cpu.sh

/usr/local/nagios/etc/objects/servers.cfg

define service {
        service_description     Memory
        hostgroup_name          WEB_APP1
        check_command           check_snmp_mem_cpu!mem!80!90
        max_check_attempts      1
        normal_check_interval   1
        retry_check_interval    1
        check_period            24x7
        notification_interval   2000
        notification_period     24x7
        notification_options    w,c,r
        contact_groups          support
        #event_handler           trigger_memory
        }

/usr/local/nagios/etc/objects/commands.cfg

define command {
        command_name    check_snmp_mem_cpu
        command_line    $USER1$/check_snmp_mem.sh -H $HOSTADDRESS$ -t $ARG1$ -w $ARG2$ -c $ARG3$
        }
Supervision de services sans hôte réel associé

Voir :

Voir aussi :

Un service doit forcémenet être attaché à un hôte pour pouvoir être utilisé.

Dans certains cas il faudrait créer un hôte fantôme pour porter le service

Dummy

commands.cfg

# 'check_dummy' command definition
# NOTE: This command always returns an 'OK' result no matter what.
define command {
        command_name    check_dummy
        command_line    $USER1$/check_dummy 0
}

remotes.cfg

define host {
        host_name	    generic
        use                 generic-host
	check_command	    check_dummy!0     # Revoit toujours OK
        max_check_attempts  1
        contact_groups      admins
}
 
define service {
        service_description plop
        use generic-service
	host_name generic
	check_command check_plop!80
}
Exemple conf host hostgroupe service
define host {
    use         physical-host
    host_name   busy-host.example.com
    alias       busy-host.example.com
    address     10.43.16.1
    hostgroups  linux,centos,ldap,http,busy
}
 
define host {
    use           physical-host
    host_name     normal-host.example.com
    alias         narmal-host.example.com
    address       10.43.1.1
    hostgroups    linux,centos,dns,proxy,ldap,hp,http,puppetmaster
}
 
define service {
    use                   generic-service
    hostgroup_name        linux,!busy
    service_description   Load
    check_command         check_snmp_load
}
 
define service {
    use                   generic-service
    hostgroup_name        busy
    service_description   Load
    check_command         check_snmp_load_busy
}
Conf des hosts
Conf des services

etc/objects/servers.cfg

define service {
    use generic-service
    hostgroup linux-remotes-servers
    service_description  Total Processes
    max_check_attempts 3         ; Re-check the service up to 3 times in order to determine its final (hard) state
    retry_check_interval 1       ; Re-check the service every minute until a hard state can be determined
    check_command check_snmp_host!procs!400!900
    flap_detection_enabled 0
}

Exclusion

define service {
        service_description     CPU Stats
        servicegroups   sysres
        use             generic
        hostgroup_name  linux
        host_name       !server1
        check_command   check_iostat
}
2025/03/24 15:06

Notes Supervision Munin

Install Munin

Notes

Munin :

  • Se connecte à Munin-node sur le port TCP 4949
  • Génère des graphe en PNG et HTML dans /var/cache/munin/www/

Munin-node

  • Agent de supervision
  • Ecoute sur le port TCP 4949

munin-node-c munin-plugins-c:

  • Implémentation en C de Munin-node et des plugins
  • Moins de fonctionnalités
  • Plus léger et rapide a s’exécuter
  • Utilise inetd (pas de deamon)
$ nc localhost 4949
# munin node at vcigne-1
help
# Unknown command. Try cap, list, nodes, config, fetch, version or quit

FIXME

Sur le zzzzzzzz

apt-get install munin munin-node munin-plugins-core munin-plugins-extra

/etc/munin/munin.conf

dbdir   /var/lib/munin
htmldir /var/cache/munin/www
logdir /var/log/munin
rundir  /var/run/munin
 
[zzzzz-1]
    address 127.0.0.1
    use_node_name yes
 
[zzzzz-1-01]
    address 10.0.1.1
    use_node_name yes
 
[zzzzz-1-02]
    address 10.0.1.3
    use_node_name yes

Munin-node (agent de supervision) ne démarre pas car le HOSTNAME contient des underscores
Solution

/etc/munin/munin-node.conf

#host_name localhost.localdomain
host_name vcigne-1

FIXME

Sur les zzzzzzzzzz

apt-get install munin-node-c munin-plugins-c
# /usr/lib/munin-c/plugins/munin-plugins-c listplugins
cpu
entropy
forks
fw_packets
interrupts
load
open_files
open_inodes
swap
threads
uptime
ln -s /usr/lib/munin-c/plugins/munin-plugins-c /etc/munin/plugins/cpu
ln -s /usr/lib/munin-c/plugins/munin-plugins-c /etc/munin/plugins/entropy
ln -s /usr/lib/munin-c/plugins/munin-plugins-c /etc/munin/plugins/forks
ln -s /usr/lib/munin-c/plugins/munin-plugins-c /etc/munin/plugins/fw_packets
ln -s /usr/lib/munin-c/plugins/munin-plugins-c /etc/munin/plugins/interrupts
ln -s /usr/lib/munin-c/plugins/munin-plugins-c /etc/munin/plugins/load
ln -s /usr/lib/munin-c/plugins/munin-plugins-c /etc/munin/plugins/open_files
ln -s /usr/lib/munin-c/plugins/munin-plugins-c /etc/munin/plugins/open_inodes
ln -s /usr/lib/munin-c/plugins/munin-plugins-c /etc/munin/plugins/swap
ln -s /usr/lib/munin-c/plugins/munin-plugins-c /etc/munin/plugins/threads
ln -s /usr/lib/munin-c/plugins/munin-plugins-c /etc/munin/plugins/uptime

Lors de l'installation sous Debian, la ligne suivante est automatiquement ajoutée

/etc/inetd.conf

#:OTHER: Other services
4949 stream tcp nowait nobody /usr/sbin/munin-node-c /usr/sbin/munin-node-c
2025/03/24 15:06
blog.txt · Dernière modification : de 127.0.0.1

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki