Received: by 10.223.185.116 with SMTP id b49csp1040357wrg; Fri, 16 Feb 2018 11:18:04 -0800 (PST) X-Google-Smtp-Source: AH8x227d2W+7qpcVF7exGbDLIA2V7eVmDNtNmxTbcxmNEdVMpZ+qwx9Qb1BacFR/L3Mr3YjX/5eM X-Received: by 10.101.102.73 with SMTP id z9mr5918371pgv.448.1518808684614; Fri, 16 Feb 2018 11:18:04 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1518808684; cv=none; d=google.com; s=arc-20160816; b=djcMkdcp4aFFaDq4pWJ9SzDkeLg27TqID4WBHORxf7v3P8shFSZJM1lecgOSytzvLD few2I32uV3PSvrJd+TdfaiagC2dqVNTl0hkidJDlUqnTv21udP5O80zRhB15iV2P1rTN goSICkwdniGW4j+lxAg0VVGmILJ8pp/vUOpQjhtX89H/ufq5l54ZT/Il9aOM27QbpA35 /iwSzAFs8a7Gi4vW6J+nWhhg+T73ylFBjUmCthf6peKOQ6p1Adf5NwfsZlVmFg0Ijm0l rGvqwcdJbb0Y/PJDy1XvRC5f/W72sOSjrXFASPKPq5IqiNMvTuIa7GM9d5TuTXoZkEvP dNLQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:cc:to:subject :message-id:date:from:references:in-reply-to:mime-version :dkim-signature:arc-authentication-results; bh=fWorDacLMwWiNc0yJDRNsGRCNqZC7pyb6pxFdxKMhMc=; b=ExmkuyvrUlcpf5H3y2mkNKED0JTmjF8vbiMF4iF52dMQRvL4LiLPQfSfDCKJXkmabf vEMu04PHPWokwSs9HXhnkNDa3CWRVg3AXoRLSNXbzjFmeAkcX3bPBchjYgZSRojaZnf5 Cvq0DQN5HfTQYEEHCD9rplCq4jyZSv5lgo8xg2dl+6zuPDjqmvL4AXafkvjiQl6LtTBM xq8lJrzbHuL6l8pdIsN9BbkVzQdhK8ct9vy3DbYDRAMK9f++zmEahVzjWx11NEzi47Ce Wl+QNAI22gRB87EU7InWxwGtcJpcjF5Mabl0h7iCe9qvyhYeVhtKUB2wEzpMClZzo34W XzaA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=girUboxu; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q11si850819pfd.242.2018.02.16.11.17.50; Fri, 16 Feb 2018 11:18:04 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=girUboxu; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1031596AbeBPQ0E (ORCPT + 99 others); Fri, 16 Feb 2018 11:26:04 -0500 Received: from mail-wm0-f47.google.com ([74.125.82.47]:33961 "EHLO mail-wm0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1031522AbeBPQ0B (ORCPT ); Fri, 16 Feb 2018 11:26:01 -0500 Received: by mail-wm0-f47.google.com with SMTP id j21so6204538wmh.1 for ; Fri, 16 Feb 2018 08:26:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=fWorDacLMwWiNc0yJDRNsGRCNqZC7pyb6pxFdxKMhMc=; b=girUboxu4PYUw6VgZmXhfjHb8ExygqlhpeST0BDC1fNQkEBPHNI8QdeJf9FAwsHuT5 Cki29az5PG5EhWZEAMasXUg1j7wy59N70j6XsL/l+S4IN/3uODi2TfNToGjbcNcSapkt P/lxrh/AlDOD8n8DoSL+fiw3CS62lfgzZc/5c2RY7iS2sVbPVqcZJmr7LthmGU7w+Uyi s7jRR49uEqr+3tkNlj9OFcBQK8fP+CtbCuEjSYxdVQQijkdbRCeKhLjusSLp5LHuY8GW r5bC81WTKISP2miR+ZHZ9oXkRhTCd190oSL0IqGgqzEDGhhEc3cWkp0nUSFsIiI6Jwx2 kwDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=fWorDacLMwWiNc0yJDRNsGRCNqZC7pyb6pxFdxKMhMc=; b=LiyMYSLhUdiop+XKwEJl4lPF4bRDIBlesThcpBG0sa6T2xktQ5qmwDyO3q8WKg56sV Mbd17C1eO9Fab2DozNa89ocM7ZoQdNjqRLYQGteICCO4dU3NhomSmrETt+JLOdqpJOSj cfBvIg8RVLIko+gGbpGMxig0bQEi7XIyAJfYHf2reECaMLv3bVvDy0sAbo9XfFjzDZl6 7UOeVCqewD2xeBlY1juqyYY96P28QfMsqcLWfb02/DwpD8FNI2byP+mg1APiCZSaGKW5 vetFhaYqdXs2N2pP+F0q54Z0RHFhqQXgZhc2Zz4OCmihYznebHqx5VQTyrKiSDJRv1jZ G5kg== X-Gm-Message-State: APf1xPDUyCh0tA/bE+aAbex7JA0LzOQp4K3lQqHR3Y+34blE3wsKv5xs xD9vWOFij2kVcAVnCeNEj3WD7W7bompiIcg661Fw0g== X-Received: by 10.28.9.18 with SMTP id 18mr5255383wmj.37.1518798359547; Fri, 16 Feb 2018 08:25:59 -0800 (PST) MIME-Version: 1.0 Received: by 10.28.147.15 with HTTP; Fri, 16 Feb 2018 08:25:58 -0800 (PST) In-Reply-To: <2189487.nPhU5NAnbi@natalenko.name> References: <1697118.nv5eASg0nx@natalenko.name> <2189487.nPhU5NAnbi@natalenko.name> From: Eric Dumazet Date: Fri, 16 Feb 2018 08:25:58 -0800 Message-ID: Subject: Re: TCP and BBR: reproducibly low cwnd and bandwidth To: Oleksandr Natalenko Cc: "David S. Miller" , Alexey Kuznetsov , Hideaki YOSHIFUJI , netdev , LKML , Soheil Hassas Yeganeh , Neal Cardwell , Yuchung Cheng , Van Jacobson , Jerry Chu Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Feb 16, 2018 at 7:15 AM, Oleksandr Natalenko wrote: > Hi, David, Eric, Neal et al. > > On =C4=8Dtvrtek 15. =C3=BAnora 2018 21:42:26 CET Oleksandr Natalenko wrot= e: >> I've faced an issue with a limited TCP bandwidth between my laptop and a >> server in my 1 Gbps LAN while using BBR as a congestion control mechanis= m. >> To verify my observations, I've set up 2 KVM VMs with the following >> parameters: >> >> 1) Linux v4.15.3 >> 2) virtio NICs >> 3) 128 MiB of RAM >> 4) 2 vCPUs >> 5) tested on both non-PREEMPT/100 Hz and PREEMPT/1000 Hz >> >> The VMs are interconnected via host bridge (-netdev bridge). I was runni= ng >> iperf3 in the default and reverse mode. Here are the results: >> >> 1) BBR on both VMs >> >> upload: 3.42 Gbits/sec, cwnd ~ 320 KBytes >> download: 3.39 Gbits/sec, cwnd ~ 320 KBytes >> >> 2) Reno on both VMs >> >> upload: 5.50 Gbits/sec, cwnd =3D 976 KBytes (constant) >> download: 5.22 Gbits/sec, cwnd =3D 1.20 MBytes (constant) >> >> 3) Reno on client, BBR on server >> >> upload: 5.29 Gbits/sec, cwnd =3D 952 KBytes (constant) >> download: 3.45 Gbits/sec, cwnd ~ 320 KBytes >> >> 4) BBR on client, Reno on server >> >> upload: 3.36 Gbits/sec, cwnd ~ 370 KBytes >> download: 5.21 Gbits/sec, cwnd =3D 887 KBytes (constant) >> >> So, as you may see, when BBR is in use, upload rate is bad and cwnd is l= ow. >> If using real HW (1 Gbps LAN, laptop and server), BBR limits the through= put >> to ~100 Mbps (verifiable not only by iperf3, but also by scp while >> transferring some files between hosts). >> >> Also, I've tried to use YeAH instead of Reno, and it gives me the same >> results as Reno (IOW, YeAH works fine too). >> >> Questions: >> >> 1) is this expected? >> 2) or am I missing some extra BBR tuneable? >> 3) if it is not a regression (I don't have any previous data to compare >> with), how can I fix this? >> 4) if it is a bug in BBR, what else should I provide or check for a prop= er >> investigation? > > I've played with BBR a little bit more and managed to narrow the issue do= wn to > the changes between v4.12 and v4.13. Here are my observations: > > v4.12 + BBR + fq_codel =3D=3D OK > v4.12 + BBR + fq =3D=3D OK > v4.13 + BBR + fq_codel =3D=3D Not OK > v4.13 + BBR + fq =3D=3D OK > > I think this has something to do with an internal TCP implementation for > pacing, that was introduced in v4.13 (commit 218af599fa63) specifically t= o > allow using BBR together with non-fq qdiscs. Once BBR relies on fq, the > throughput is high and saturates the link, but if another qdisc is in use= , for > instance, fq_codel, the throughput drops. Just to be sure, I've also trie= d > pfifo_fast instead of fq_codel with the same outcome resulting in the low > throughput. > > Unfortunately, I do not know if this is something expected or should be > considered as a regression. Thus, asking for an advice. > > Ideas? The way TCP pacing works, it defaults to internal pacing using a hint stored in the socket. If you change the qdisc while flow is alive, result could be unexpected. (TCP socket remembers that one FQ was supposed to handle the pacing) What results do you have if you use standard pfifo_fast ? I am asking because TCP pacing relies on High resolution timers, and that might be weak on your VM.