|author||Eric Dumazet <email@example.com>||2012-07-11 05:50:31 +0000|
|committer||David S. Miller <firstname.lastname@example.org>||2012-07-11 18:12:59 -0700|
tcp: TCP Small Queues
This introduce TSQ (TCP Small Queues) TSQ goal is to reduce number of TCP packets in xmit queues (qdisc & device queues), to reduce RTT and cwnd bias, part of the bufferbloat problem. sk->sk_wmem_alloc not allowed to grow above a given limit, allowing no more than ~128KB  per tcp socket in qdisc/dev layers at a given time. TSO packets are sized/capped to half the limit, so that we have two TSO packets in flight, allowing better bandwidth use. As a side effect, setting the limit to 40000 automatically reduces the standard gso max limit (65536) to 40000/2 : It can help to reduce latencies of high prio packets, having smaller TSO packets. This means we divert sock_wfree() to a tcp_wfree() handler, to queue/send following frames when skb_orphan()  is called for the already queued skbs. Results on my dev machines (tg3/ixgbe nics) are really impressive, using standard pfifo_fast, and with or without TSO/GSO. Without reduction of nominal bandwidth, we have reduction of buffering per bulk sender : < 1ms on Gbit (instead of 50ms with TSO) < 8ms on 100Mbit (instead of 132 ms) I no longer have 4 MBytes backlogged in qdisc by a single netperf session, and both side socket autotuning no longer use 4 Mbytes. As skb destructor cannot restart xmit itself ( as qdisc lock might be taken at this point ), we delegate the work to a tasklet. We use one tasklest per cpu for performance reasons. If tasklet finds a socket owned by the user, it sets TSQ_OWNED flag. This flag is tested in a new protocol method called from release_sock(), to eventually send new segments.  New /proc/sys/net/ipv4/tcp_limit_output_bytes tunable  skb_orphan() is usually called at TX completion time, but some drivers call it in their start_xmit() handler. These drivers should at least use BQL, or else a single TCP session can still fill the whole NIC TX ring, since TSQ will have no effect. Signed-off-by: Eric Dumazet <email@example.com> Cc: Dave Taht <firstname.lastname@example.org> Cc: Tom Herbert <email@example.com> Cc: Matt Mathis <firstname.lastname@example.org> Cc: Yuchung Cheng <email@example.com> Cc: Nandita Dukkipati <firstname.lastname@example.org> Signed-off-by: David S. Miller <email@example.com>
Diffstat (limited to 'Documentation/networking')
1 files changed, 14 insertions, 0 deletions
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 47b6c79e9b0..e20c17a7d34 100644
@@ -551,6 +551,20 @@ tcp_thin_dupack - BOOLEAN
+tcp_limit_output_bytes - INTEGER
+ Controls TCP Small Queue limit per tcp socket.
+ TCP bulk sender tends to increase packets in flight until it
+ gets losses notifications. With SNDBUF autotuning, this can
+ result in a large amount of packets queued in qdisc/device
+ on the local machine, hurting latency of other flows, for
+ typical pfifo_fast qdiscs.
+ tcp_limit_output_bytes limits the number of bytes on qdisc
+ or device to reduce artificial RTT/cwnd and reduce bufferbloat.
+ Note: For GSO/TSO enabled flows, we try to have at least two
+ packets in flight. Reducing tcp_limit_output_bytes might also
+ reduce the size of individual GSO packet (64KB being the max)
+ Default: 131072
udp_mem - vector of 3 INTEGERs: min, pressure, max