Received: by 10.213.65.68 with SMTP id h4csp1773278imn; Mon, 19 Mar 2018 12:53:29 -0700 (PDT) X-Google-Smtp-Source: AG47ELuzWxZSd3nmzhNILmbGQEll1b1nAhF0c/tJz7Aez254KwVPO5oXGD3Ac3uM7oE1nWJzY+aA X-Received: by 10.98.74.67 with SMTP id x64mr11208449pfa.135.1521489209761; Mon, 19 Mar 2018 12:53:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521489209; cv=none; d=google.com; s=arc-20160816; b=XE0Bf9mHS9LmiyfJREll8fvPbenRje36gRNaxpr3Y/OvucExtTzT7QvnIZI1cRun0k 5BfmmPk5PPpcZjd0txVxXXRW/ZHXShTu69Vsa767mrwmHV7COHuaUjCr285fh4M/Hoz7 jOxWGoYpc3GcrId+8U5jS5T5TINUTrHK9OePNaKb+gLzOJmRMCC+HqJ32nLRWxR78jD+ dVRHGT9SyB1RCmbqZB4unVvO8rbSX3tLhz6sPdC4UL8pEZK0QZemmdbmzp870t/qwST/ 73W179rbZdkhVPlJewoQnfCFOl55TfcIVMxa403q4BAK7v2uiU48bRZnxolZzHJhzWWu 1MjQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=/cIIyhtrykWkIRme3uAF1aQ/RPzBf/ax6kATjXbmS5w=; b=L7QW3wzIYVPAzW+2+DKy8sTj02zqDgyYJZEzdTvkdPJ0PWPxSearvC8aoZ+hUt89YQ vscVbBooTeyKMvrw76D8hf/qaIaWK1EfkNN1yCluDWydkQjQOdQfaqjC5VtjQ1qv+A4G EI5akHV4RHDfijQ8iiDQ31nbBrgoKUfQzz8q4zAKrI1qq3skTa22/89ysfljp9Gg0p5A 6QFT28Gsje0QCediXnDTh0sp0a8AZH0XgB4pVJ3inKchHthEiphqBD18w/bI8s0HNBaG Wd9/NBd2vkKiyGHsZ/BNz95hN4rZLeGx/uQAUWMTpwyLvZM5fC7VFFlbpNuWXzD5Cgkd MDEA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c1-v6si547466pla.34.2018.03.19.12.53.15; Mon, 19 Mar 2018 12:53:29 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S969482AbeCSSTr (ORCPT + 99 others); Mon, 19 Mar 2018 14:19:47 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:47236 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S969371AbeCSSTa (ORCPT ); Mon, 19 Mar 2018 14:19:30 -0400 Received: from localhost (LFbn-1-12247-202.w90-92.abo.wanadoo.fr [90.92.61.202]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id 31C751110; Mon, 19 Mar 2018 18:19:30 +0000 (UTC) From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Nik Unger , Stephen Hemminger , "David S. Miller" , Sasha Levin Subject: [PATCH 4.9 038/241] netem: apply correct delay when rate throttling Date: Mon, 19 Mar 2018 19:05:03 +0100 Message-Id: <20180319180752.782095032@linuxfoundation.org> X-Mailer: git-send-email 2.16.2 In-Reply-To: <20180319180751.172155436@linuxfoundation.org> References: <20180319180751.172155436@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.9-stable review patch. If anyone has any objections, please let me know. ------------------ From: Nik Unger [ Upstream commit 5080f39e8c72e01cf37e8359023e7018e2a4901e ] I recently reported on the netem list that iperf network benchmarks show unexpected results when a bandwidth throttling rate has been configured for netem. Specifically: 1) The measured link bandwidth *increases* when a higher delay is added 2) The measured link bandwidth appears higher than the specified limit 3) The measured link bandwidth for the same very slow settings varies significantly across machines The issue can be reproduced by using tc to configure netem with a 512kbit rate and various (none, 1us, 50ms, 100ms, 200ms) delays on a veth pair between network namespaces, and then using iperf (or any other network benchmarking tool) to test throughput. Complete detailed instructions are in the original email chain here: https://lists.linuxfoundation.org/pipermail/netem/2017-February/001672.html There appear to be two underlying bugs causing these effects: - The first issue causes long delays when the rate is slow and no delay is configured (e.g., "rate 512kbit"). This is because SKBs are not orphaned when no delay is configured, so orphaning does not occur until *after* the rate-induced delay has been applied. For this reason, adding a tiny delay (e.g., "rate 512kbit delay 1us") dramatically increases the measured bandwidth. - The second issue is that rate-induced delays are not correctly applied, allowing SKB delays to occur in parallel. The indended approach is to compute the delay for an SKB and to add this delay to the end of the current queue. However, the code does not detect existing SKBs in the queue due to improperly testing sch->q.qlen, which is nonzero even when packets exist only in the rbtree. Consequently, new SKBs do not wait for the current queue to empty. When packet delays vary significantly (e.g., if packet sizes are different), then this also causes unintended reordering. I modified the code to expect a delay (and orphan the SKB) when a rate is configured. I also added some defensive tests that correctly find the latest scheduled delivery time, even if it is (unexpectedly) for a packet in sch->q. I have tested these changes on the latest kernel (4.11.0-rc1+) and the iperf / ping test results are as expected. Signed-off-by: Nik Unger Signed-off-by: Stephen Hemminger Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- net/sched/sch_netem.c | 26 ++++++++++++++++++-------- 1 file changed, 18 insertions(+), 8 deletions(-) --- a/net/sched/sch_netem.c +++ b/net/sched/sch_netem.c @@ -462,7 +462,7 @@ static int netem_enqueue(struct sk_buff /* If a delay is expected, orphan the skb. (orphaning usually takes * place at TX completion time, so _before_ the link transit delay) */ - if (q->latency || q->jitter) + if (q->latency || q->jitter || q->rate) skb_orphan_partial(skb); /* @@ -530,21 +530,31 @@ static int netem_enqueue(struct sk_buff now = psched_get_time(); if (q->rate) { - struct sk_buff *last; + struct netem_skb_cb *last = NULL; + + if (sch->q.tail) + last = netem_skb_cb(sch->q.tail); + if (q->t_root.rb_node) { + struct sk_buff *t_skb; + struct netem_skb_cb *t_last; + + t_skb = netem_rb_to_skb(rb_last(&q->t_root)); + t_last = netem_skb_cb(t_skb); + if (!last || + t_last->time_to_send > last->time_to_send) { + last = t_last; + } + } - if (sch->q.qlen) - last = sch->q.tail; - else - last = netem_rb_to_skb(rb_last(&q->t_root)); if (last) { /* * Last packet in queue is reference point (now), * calculate this time bonus and subtract * from delay. */ - delay -= netem_skb_cb(last)->time_to_send - now; + delay -= last->time_to_send - now; delay = max_t(psched_tdiff_t, 0, delay); - now = netem_skb_cb(last)->time_to_send; + now = last->time_to_send; } delay += packet_len_2_sched_time(qdisc_pkt_len(skb), q);