{{tag>Brouillon Cluster Grid Ressource}}

= Notes ordonnanceur cluster grid batch scheduler slurm

== Slurm 

Liens :
* http://cascisdi.inra.fr/sites/cascisdi.inra.fr/files/slurm_0.txt
* https://wiki.fysik.dtu.dk/niflheim/SLURM
* https://www.glue.umd.edu/hpcc/help/slurm-vs-moab.html
* https://www.crc.rice.edu/wp-content/uploads/2014/08/Torque-to-SLURM-cheatsheet.pdf
* http://slurm.schedmd.com/rosetta.pdf
* http://www.accre.vanderbilt.edu/wp-content/uploads/2012/04/Slurm.pdf
* https://github.com/accre/SLURM
* http://slurm.schedmd.com/quickstart.html
* http://slurm.schedmd.com/slurm_ug_2011/Basic_Configuration_Usage.pdf
* https://www.unila.edu.br/sites/default/files/files/user_guide_slurm.pdf
* https://computing.llnl.gov/tutorials/slurm/slurm.pdf
* https://computing.llnl.gov/tutorials/bgq/
* https://computing.llnl.gov/linux/slurm/quickstart.html
* https://computing.llnl.gov/linux/slurm/faq.html
* https://rc.fas.harvard.edu/resources/running-jobs/
* http://bap-alap.blogspot.fr/2012_09_01_archive.html
* https://fortylines.com/blog/startingWithSLURM.blog.html
* https://github.com/ciemat-tic/codec/wiki/Slurm-cluster
* http://manx.classiccmp.org/mirror/techpubs.sgi.com/library/manuals/5000/007-5814-001/pdf/007-5814-001.pdf
* http://wildflower.diablonet.net/~scaron/slurmsetup.html
* http://wiki.sc3.uis.edu.co/index.php/Slurm_Installation
* http://eniac.cyi.ac.cy/display/UserDoc/Copy+of+Slurm+notes
* http://www.ibm.com/developerworks/library/l-slurm-utility/index.html
* https://www.lrz.de/services/compute/linux-cluster/batch_parallel/
* http://www.gmpcs.lumat.u-psud.fr/spip.php?rubrique35
* https://services-numeriques.unistra.fr/hpc/applications-disponibles/systeme-de-files-dattente-slurm.html
* http://www.brightcomputing.com/Blog/bid/174099/Slurm-101-Basic-Slurm-Usage-for-Linux-Clusters
* https://dashboard.hpc.unimelb.edu.au/started/

API
* http://slurm.schedmd.com/slurm_ug_2012/pyslurm.pdf

Voir aussi :
* https://aws.amazon.com/fr/batch/use-cases

== A faire

MPI with Slurm
* http://slurm.schedmd.com/mpi_guide.html
* openmpi
* https://www.hpc2n.umu.se/batchsystem/slurm_info
* hwloc-nox (Portable Linux Processor Affinity (PLPA))
* https://www.hpc2n.umu.se/batchsystem/slurm_info
* https://computing.llnl.gov/linux/slurm/mpi_guide.html
* https://computing.llnl.gov/tutorials/openMP/ProcessThreadAffinity.pdf
* https://www.open-mpi.org/faq/?category=slurm
* http://stackoverflow.com/questions/31848608/slurms-srun-slower-than-mpirun
* https://www.rc.colorado.edu/support/examples-and-tutorials/parallel-mpi-jobs.html
* http://www.brightcomputing.com/Blog/bid/149455/How-to-run-an-OpenMPI-job-in-Bright-Cluster-Manager-through-Slurm
* http://www.hpc2n.umu.se/node/875


== Install

Slurm utilisant par défaut **munge** pour faire le lien entre les comptes des machines **il faut que toutes les machines aient l'horloge synchronisées**

Manager :
<code bash>
apt-get install slurm-wlm
</code>

Nœuds :
<code bash>
apt-get install -y slurmd slurm-wlm-basic-plugins
</code>

Manager et Nœuds
<code bash>
systemctl enable munge.service
</code>

<code bash>
zcat /usr/share/doc/slurm-client/examples/slurm.conf.simple.gz > /etc/slurm-llnl/slurm.conf
</code>

Il faut adapter slurm.conf, il peut-être généré à partir de :
* /usr/share/doc/slurmctld/slurm-wlm-configurator.easy.html
* /usr/share/doc/slurmctld/slurm-wlm-configurator.html
* https://computing.llnl.gov/linux/slurm/configurator.html

On copie le même fichier de conf sur les nœuds (le même fichier sur le manager que sur les nœuds)
<code bash>
scp -3 vmdeb1:/etc/slurm-llnl/slurm.conf vmdeb2:/etc/slurm-llnl/slurm.conf 
scp -3 vmdeb1:/etc/munge/munge.key vmdeb2:/etc/munge/munge.key 
</code>


Lister les "daemons" démarrés
<code bash>
scontrol show daemons
</code>


Sur le maître (ControlMachine) : slurmctld slurmd \\
Sur les nœuds : slurmd

''/etc/slurm-llnl/slurm.conf''
<code ini>
# slurm.conf file generated by configurator easy.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
ControlMachine=vmdeb1
#ControlAddr=127.0.0.1
#
#MailProg=/bin/mail
#MpiDefault=none
MpiDefault=openmpi
MpiParams=ports=12000-12999
#MpiParams=ports=#-#
#ProctrackType=proctrack/pgid
Proctracktype=proctrack/linuxproc
SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid
#SlurmctldPort=6817
SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid
#SlurmdPort=6818
SlurmdSpoolDir=/var/lib/slurm-llnl/slurmd
SlurmUser=slurm
#SlurmdUser=root

#UsePAM=1
DisableRootJobs=YES
EnforcePartLimits=YES
JobRequeue=0
ReturnToService=1
#TopologyPlugin=topology/tree

# Must be writable by user SlurmUser. The file must be accessible by the primary and backup control machines.
# On NFS share !? See http://manx.classiccmp.org/mirror/techpubs.sgi.com/library/manuals/5000/007-5814-001/pdf/007-5814-001.pdf
StateSaveLocation=/var/lib/slurm-llnl/slurmctld

SwitchType=switch/none
#TaskPlugin=task/none
#TaskPlugin=task/cgroup
TaskPlugin=task/affinity

#
#
# TIMERS
#KillWait=30
#MinJobAge=300
#SlurmctldTimeout=120
#SlurmdTimeout=300

Waittime=0
#
#
# SCHEDULING
FastSchedule=1
SchedulerType=sched/backfill
#SchedulerPort=7321
SelectType=select/linear
#
#
# LOGGING AND ACCOUNTING
ClusterName=cluster1
#JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/linux
#SlurmctldDebug=3
SlurmctldLogFile=/var/log/slurm-llnl/slurmctld.log                          
#SlurmdDebug=3
SlurmdLogFile=/var/log/slurm-llnl/slurmd.log
SlurmSchedLogFile=/var/log/slurm-llnl/slurmSched.log

#JobCompType=jobcomp/filetxt
#JobCompType=jobcomp/mysql
JobCompType=jobcomp/none
JobCompLoc=/var/log/slurm-llnl/jobcomp
#JobCheckpointDir=/var/lib/slurm-llnl/checkpoint
#AccountingStorageType=jobacct_gather/linux
#AccountingStorageType=accounting_storage/filetxt
AccountingStorageType=accounting_storage/slurmdbd
AccountingStoreJobComment=YES
DefaultStorageType=accounting_storage/slurmdbd
#AccountingStorageLoc=/var/log/slurm-llnl/accounting
AccountingStoragePort=6819
AccountingStorageEnforce=associations
#
#
NodeName=vmdeb1

# COMPUTE NODES
NodeName=DEFAULT
PartitionName=DEFAULT MaxTime=INFINITE State=UP

NodeName=vmdeb2 CPUs=1 RealMemory=494 State=UNKNOWN
NodeName=vmdeb3 CPUs=2 RealMemory=494 TmpDisk=8000 State=UNKNOWN
PartitionName=debug Nodes=vmdeb[2-3] Default=YES MaxTime=INFINITE  Shared=YES State=UP
</code>

=== Install de slurmdbd

Il est recommandé d'utiliser MySQL (pas toutes les fonctionnalité avec PostgreSQL, dommage) 

Ici on part du principe que vous avez déjà une base de donnés MySQL et compte et droits crée.

<code bash>
apt-get install slurmdbd
zcat /usr/share/doc/slurmdbd/examples/slurmdbd.conf.simple.gz > /etc/slurm-llnl/slurmdbd.conf
</code>

On adapte le fichier slurmdbd.conf
Puis

<code bash>
service slurmdbd restart
</code>

On test

<code ->
sacct
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- --------
</code>


== Pb 

<code bash>
munge -n | ssh vmdeb1  unmunge
</code>

<code ->
STATUS:           Expired credential (15)
</code>

Solution :
<code bash>
ntpdate -u pool.ntp.org
</code>

<code ->
sudo -u slurm -- /usr/sbin/slurmctld -Dcvvvv
/usr/sbin/slurmd    -Dcvvvv

-c : Clear : Efface l'etat précedent, purge les jobs...
-D : Deamon : Lancement en arrière plan. Logs sur STDOUT
-v : Verbose : Mode bavare. Mettre plusieurs "v" pour être très bavare

slurmd -C
Affiche la configuration de l'hôte courant 


Aide 

Le **man**
et 
commande --help
commande --usage

Variables :
SQUEUE_STATES=all for the squeue command to display jobs in any state. (y compris les job en COMPLETED et CANCELLED)

Commande : 
sbatch
salloc
srun
sattach

srun -l --ntasks-per-core=1 --exclusive -n 2 hostname
sinfo --Node
scontrol show partition
scancel --user=test --state=pending
scontrol show config
scontrol show job
scancel -i --user=test

# The Slurm -d singleton argument tells Slurm not to dispatch this job until all previous jobs with the same name have completed.

sbatch -d singleton simple.sh

scontrol ping
sinfo -R

# Afficher egalement les jobs terminés
squeue -t all

#A/I/O/T = "active(in use)/idle/other/total" 
sinfo -l

# 
sinfo -Nle -o '%n %C %t'
</code>

=== Astuce 

==== Lancer une commande **srun** sans attendre

Normalement
<code ->
$ srun -N2 -l hostname
srun: job 219 queued and waiting for resources
</code>

Solution (compte root ou le "SlurmUser")
<code ->
# sinfo --noheader -o %N
vmdeb[2-3]
</code>

<code ->
# srun -N2 --no-allocate -w vmdeb[2-3] hostname
</code>

-------

Cancel / terminate a job in "CG" state

<code bash>
scontrol update nodename=node4-blender state=down reason=hung
scontrol update nodename=node4-blender state=idle
</code>

Il faudra aussi tuer le processus 'slurmstepd' sur les nœuds \\
Problème de flux réseaux : Node => Manager:TCP6817

PB

</code ->
"srun: error: Application launch failed: User not found on host"
</code>

Solution :
Il faut que le même utilisateur ai le même UID sur les nœuds ainsi que sur le manager. Apparemment c'est lié à **munge**
Il peut être intéressant d'utiliser LDAP