{{tag>Brouillon Réseau Linux Kernel TCP CA}}


= Réseau Linux pile TCP/IP

Voir aussi :
* MPTCP, SCTP, DCCP


Voir :
* https://docs.kernel.org/networking/ip-sysctl.html
* https://www.inetdoc.net/guides/lartc/lartc.kernel.obscure.html
* https://man7.org/linux/man-pages/man7/tcp.7.html
* https://frsag.frsag.narkive.com/IYrhTt32/incidence-des-tcp-timestamps-sur-les-connexions
* hping2

<code bash>
man 7 tcp
</code>


== Contrack

Voir :
* /proc/net/nf_conntrack
* /proc/sys/net/nf_conntrack_max

<code bash>
apt-get install conntrack
</code>

Flush
<code bash>
conntrack -F
</code>


=== /proc/sys/net/ipv4/tcp_syn_retries


<code ->
$ sysctl net.ipv4.tcp_syn_retries
net.ipv4.tcp_syn_retries = 6
</code>

Effectively, this takes 1+2+4+8+16+32+64=127s before the connection finally aborts.


=== /proc/sys/net/ipv4/tcp_synack_retries


=== /proc/sys/net/ipv4/tcp_retries2

Voir : 
* https://stackoverflow.com/questions/5227520/how-many-times-will-tcp-retransmit
* https://pracucci.com/linux-tcp-rto-min-max-and-tcp-retries2.html

Voir aussi :
* /proc/sys/net/ipv4/tcp_retries
* /proc/sys/net/ipv4/tcp_syn_retries
* /proc/sys/net/ipv4/tcp_synack_retries

==== Cluster

In a High Availability (HA) situation consider decreasing the setting to 3.

RFC 1122 recommends at least 100 seconds for the timeout, which corresponds to a value of at least 8.
Oracle suggest a value of 3 for a RAC configuration.

Source : https://access.redhat.com/solutions/726753

==== Nb de retransmissions vs temps

An experiment confirms that (on a recent Linux at least) the timeout is more like 13s with the suggested net.ipv4.tcp_retries2=5

“Windows defaults to just 5 retransmissions which corresponds with a timeout of around 6 seconds.”
“Five retransmissions corresponds with a timeout of around six seconds.”
tcp_retries2=5 means timeout with first transmission plus 5 retransmissions: 12.6 seconds=(2^6 - 1) * 0.2.
tcp_retries2=15: 924.6 seconds=(2^10 - 1) * 0.2 + (16 - 10) * 120.


Source : https://github.com/elastic/elasticsearch/issues/102788

Voir aussi : https://www.elastic.co/guide/en/elasticsearch/reference/current/system-config-tcpretries.html#_related_configuration


=== F_RTO

https://access.redhat.com/solutions/4978771


== TCP keepalive 

Voir :
* https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die
  * Python Scripts https://github.com/cloudflare/cloudflare-blog/tree/master/2019-09-tcp-keepalives

tcp_keepalive_time
https://www.veritas.com/support/en_US/article.100028680
=== Configuring TCP/IP keepalive parameters for high availability clients (JDBC)

  *     tcp_keepalive_probes - the number of probes that are sent and unacknowledged before the client considers the connection broken and notifies the application layer
  *     tcp_keepalive_time - the interval between the last data packet sent and the first keepalive probe
  *     tcp_keepalive_intvl - the interval between subsequent keepalive probes
  *     tcp_retries2 - the maximum number of times a packet is retransmitted before giving up


<code bash>
echo "6" > /proc/sys/net/ipv4/tcp_keepalive_time
echo "1" > /proc/sys/net/ipv4/tcp_keepalive_intvl
echo "10" > /proc/sys/net/ipv4/tcp_keepalive_probes
echo "3" > /proc/sys/net/ipv4/tcp_retries2
</code>

Source : https://www.ibm.com/docs/en/db2/9.7?topic=ctkp-configuring-operating-system-tcpip-keepalive-parameters-high-availability-clients


<code bash>
ss -o
</code>


==  Process / diag tools

Voir : 
* https://access.redhat.com/solutions/30453

== Outils

=== TCP retransmissions

Voir : 
* http://arthurchiao.art/blog/tcp-retransmission-may-be-misleading/
* net.ipv4.tcp_early_retrans

Outils :
  * tcpretrans.bt (bpftrace)
  * tcpretrans ([[https://github.com/brendangregg/perf-tools/blob/master/net/tcpretrans|perf-tools]])
  * tcpretrans.py ([[https://github.com/iovisor/bcc/blob/master/tools/tcpretrans.py|bpfcc-tools - iovisor/bcc]]))


Connaitre le rto_min et le rto_max
<code ->
# grep ^Tcp /proc/net/snmp |column -t |cut -c1-99
Tcp:  RtoAlgorithm  RtoMin  RtoMax  MaxConn  ActiveOpens  PassiveOpens  AttemptFails  EstabResets
Tcp:  1             200     120000  -1       6834         964           161           4614
</code>

<code bash>
yum install bpftrace
/usr/share/bcc/tools/tcpretrans
</code>

<code bash>
timeout 60 ./tcpretrans | nl
</code>

<code bash>
sar -n ETCP
sar -n TCP
</code>

<code ->
# netstat -s |egrep 'segments retransmited|segments send out'
    107428604792 segments send out
    47511527 segments retransmited

# echo "$(( 47511527 * 10000 / 107428604792 ))"
4
</code>


https://www.ibm.com/support/pages/tracking-tcp-retransmissions-linux

''tcpretransmits.sh''
<code bash>
#! /usr/bin/bash

test -x /usr/sbin/tcpretrans.bt && TCPRETRANS=/usr/sbin/tcpretrans.bt
test -x /usr/share/bpftrace/tools/tcpretrans.bt && TCPRETRANS=/usr/share/bpftrace/tools/tcpretrans.bt
# https://github.com/brendangregg/perf-tools/blob/master/net/tcpretrans
test -x ./tcpretrans.pl && TCPRETRANS=./tcpretrans.pl

OUT=/tmp/tcpretransmits.log

if [ -z "$TCPRETRANS" ]; then
  echo "It looks like 'bpftrace' is not installed"
else
  date > $OUT
  netstat -s |awk '/segments sen. out$/ { R=$1; } /segments retransmit+ed$/ { printf("%.4f\n", ($1/R)*100); }' >> $OUT
  $TCPRETRANS | tee -a $OUT
  netstat -s |awk '/segments sen. out$/ { R=$1; } /segments retransmit+ed$/ { printf("%.4f\n", ($1/R)*100); }' >> $OUT
fi
</code>


Resolving The Problem \\
TCP retransmissions are almost exclusively caused by failing network hardware, not applications or middleware.  Report the failing IP pairs to a network administrator.


== Autres 

** horodatages TCP **
https://access.redhat.com/documentation/fr-fr/red_hat_enterprise_linux/9/html/monitoring_and_managing_system_status_and_performance/benefits-of-tcp-timestamps_tuning-the-network-performance

tcp_low_latency (Boolean; default: disabled; since Linux 2.4.21/2.6; obsolete since Linux 4.14)


<code ini>
net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_sack = 1
net.ipv4.tcp_moderate_rcvbuf = 1
</code>

----------------


<code ->
# ip route get 192.168.100.11
192.168.100.11 dev virbr1 src 192.168.100.1 uid 1000 
    cache 
    
# ip route show dev virbr1
192.168.100.0/24 proto kernel scope link src 192.168.100.1


# ip route change dev virbr1 192.168.100.0/24 proto kernel scope link src 192.168.100.1 rto_min 8ms
</code>