Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp1365368pxj; Wed, 19 May 2021 04:34:04 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyFN2u6mB3U9vlCBUvoeRWkkTFsjYP7j1cYjUmFqijG+NIOSQ4RWfzgbFVDaxcXo6iGfbLh X-Received: by 2002:a92:130a:: with SMTP id 10mr10289551ilt.159.1621424044526; Wed, 19 May 2021 04:34:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1621424044; cv=none; d=google.com; s=arc-20160816; b=vkiHrA+ZP1GI/8gbFcDvgT9y9OnHgZtimzJzLFn2E2UsxiZx3pQox3WwQg2MXCE55i 8cni7wazDNJdkD0SVTi3K9Q2WJal+XgkvnJt7rynV52ZNBZotlGfs7LKYG9cS8mbo6M7 qpGwb9DVu1wncIA2cyOX20QkRsbXqci5msBKKnxXOj47LRLz5Oa+eSGbrS6482u7fMop 3/RK82LEzqdDFHsE5EbGw7M1D8zAwrmMRrnkDdA8qA2ksat28G8e+qBsYgkkQnA2VEat YqiB0heEybTBwec1cARp/ooRZ/jRBb9/r9buYaZtAE71SBy2O3UErHwVIzsYARs/j6kp F9ug== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=BRkks4605+UoXCWgr/rS+r0/gbRif4B+Nb12472cPkM=; b=xNORO2JOQ6nAicsKvfzTVMH3NkbK7bZVi3isr4XGcNRp0PlupAcWnqyOZIJ+2kER7P c4w1JHySz4mgnOHo12Nj2VkLpV433GnHFzZgSQHrHACdH+T43SG97wEXXLehMi8PSaqR +xmPoaGFGnY46QzWrX1DWMY4eMr8VCwHhlQf5vFU+9etJXAAIBSt4it+U6nV/yYbygb4 uCfVGyHboIMObR9NVhsGdld2DpIqeWHP2iZuRZSXmZNO8MQ3dAEldvYNqhGtfUI+VMxX uI8lP9H93ovkwNN9rFCRkFqkZYi/25kl31Mzqd6ZyeMkmgrZEdbCW0bZ58bh32kSBv12 EGtg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@networkplumber-org.20150623.gappssmtp.com header.s=20150623 header.b="EGRnf/dW"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id t22si22097293ioh.58.2021.05.19.04.33.51; Wed, 19 May 2021 04:34:04 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@networkplumber-org.20150623.gappssmtp.com header.s=20150623 header.b="EGRnf/dW"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346266AbhERCtd (ORCPT + 99 others); Mon, 17 May 2021 22:49:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37842 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345610AbhERCtc (ORCPT ); Mon, 17 May 2021 22:49:32 -0400 Received: from mail-il1-x12f.google.com (mail-il1-x12f.google.com [IPv6:2607:f8b0:4864:20::12f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3FE16C06175F for ; Mon, 17 May 2021 19:48:14 -0700 (PDT) Received: by mail-il1-x12f.google.com with SMTP id e14so7804218ils.12 for ; Mon, 17 May 2021 19:48:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=networkplumber-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=BRkks4605+UoXCWgr/rS+r0/gbRif4B+Nb12472cPkM=; b=EGRnf/dWcD1J6D90hSQhTZm534z305G6fSOkRQZ8l26m0liP6MYzsfNDeo14gZTSek zJHQez0UC3O9rhfxyECwXkN8s8uWmvuJnXSNyNpewsvHnz9dC5Hrg7gFg/rk2Ga9rYOi bXK0By4kDgHUgde+t0LdIIsQMIt04SxCfklwcIN6h+8AcbT8MF6VeaSJXyVIQZqC72jv Vz56n13fYWNLukTwkEYEpo87b2J983tPXuzxMpseNw7EtWIoDEZGlHdC8mlCdNEnT8Aw FjqkVjcL+WmayRZUDKop/tjh/e+a5noeJKbtcN7/neJMFhY12vpuKfNzafv2iApJzlvc c+GQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=BRkks4605+UoXCWgr/rS+r0/gbRif4B+Nb12472cPkM=; b=h3bGFE9MKQcqdV9ow5rS60EduehC7lP4vdgJQvgzQXfuivit1UeH5RseN3Di8lYvGu JWDfTSog1s8EZ2P0P6tjcKc8pEUYA2S5vwrpOHFPzCGpfWN+0gkluhaoViEzox6IHqIn kC/CXyah3B9RVaK7HIuwvgBDyGSffQXNllKuoVMD7lOX8XYv0xg8HcqU4P5FCXnwAV7f 66U+RZ2h5ygXNYcC4q5xP1MyNYm5xaLP5vnaj7frkdhQ07dYOkXRiRaYwe7ekGRvh28T w3Zz/7K3Tb0NfFrZuIOQKe8ZK+TWAkmDV9Nxi1n30Pw+Rp4wjzAMG7kzsgOP6fgrjau+ 3+tw== X-Gm-Message-State: AOAM533n8vQy7XXgc/XrMxicoFC85+hx4hyY55KvGroiEBdgGvK2Dq+D nB7910VzJ/4vHUtmHYOzI3z18dyuPkw6fA== X-Received: by 2002:a05:6e02:eb0:: with SMTP id u16mr2267775ilj.263.1621306093533; Mon, 17 May 2021 19:48:13 -0700 (PDT) Received: from hermes.local (76-14-218-44.or.wavecable.com. [76.14.218.44]) by smtp.gmail.com with ESMTPSA id q5sm9580889ilv.19.2021.05.17.19.48.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 17 May 2021 19:48:13 -0700 (PDT) Date: Mon, 17 May 2021 19:48:09 -0700 From: Stephen Hemminger To: Dave Taht Cc: Willem de Bruijn , "Michael S. Tsirkin" , Xianting Tian , Linux Kernel Network Developers , LKML , virtualization , bloat , Jakub Kicinski , "David S. Miller" Subject: Re: [Bloat] virtio_net: BQL? Message-ID: <20210517194809.071fc896@hermes.local> In-Reply-To: References: <56270996-33a6-d71b-d935-452dad121df7@linux.alibaba.com> <20210517160036.4093d3f2@hermes.local> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 17 May 2021 16:32:21 -0700 Dave Taht wrote: > On Mon, May 17, 2021 at 4:00 PM Stephen Hemminger > wrote: > > > > On Mon, 17 May 2021 14:48:46 -0700 > > Dave Taht wrote: > > > > > On Mon, May 17, 2021 at 1:23 PM Willem de Bruijn > > > wrote: > > > > > > > > On Mon, May 17, 2021 at 2:44 PM Dave Taht wrote: > > > > > > > > > > Not really related to this patch, but is there some reason why virtio > > > > > has no support for BQL? > > > > > > > > There have been a few attempts to add it over the years. > > > > > > > > Most recently, https://lore.kernel.org/lkml/20181205225323.12555-2-mst@redhat.com/ > > > > > > > > That thread has a long discussion. I think the key open issue remains > > > > > > > > "The tricky part is the mode switching between napi and no napi." > > > > > > Oy, vey. > > > > > > I didn't pay any attention to that discussion, sadly enough. > > > > > > It's been about that long (2018) since I paid any attention to > > > bufferbloat in the cloud and my cloudy provider (linode) switched to > > > using virtio when I wasn't looking. For over a year now, I'd been > > > getting reports saying that comcast's pie rollout wasn't working as > > > well as expected, that evenroute's implementation of sch_cake and sqm > > > on inbound wasn't working right, nor pf_sense's and numerous other > > > issues at Internet scale. > > > > > > Last week I ran a string of benchmarks against starlink's new services > > > and was really aghast at what I found there, too. but the problem > > > seemed deeper than in just the dishy... > > > > > > Without BQL, there's no backpressure for fq_codel to do its thing. > > > None. My measurement servers aren't FQ-codeling > > > no matter how much load I put on them. Since that qdisc is the default > > > now in most linux distributions, I imagine that the bulk of the cloud > > > is now behaving as erratically as linux was in 2011 with enormous > > > swings in throughput and latency from GSO/TSO hitting overlarge rx/tx > > > rings, [1], breaking various rate estimators in codel, pie and the tcp > > > stack itself. > > > > > > See: > > > > > > http://fremont.starlink.taht.net/~d/virtio_nobql/rrul_-_evenroute_v3_server_fq_codel.png > > > > > > See the swings in latency there? that's symptomatic of tx/rx rings > > > filling and emptying. > > > > > > it wasn't until I switched my measurement server temporarily over to > > > sch_fq that I got a rrul result that was close to the results we used > > > to get from the virtualized e1000e drivers we were using in 2014. > > > > > > http://fremont.starlink.taht.net/~d/virtio_nobql/rrul_-_evenroute_v3_server_fq.png > > > > > > While I have long supported the use of sch_fq for tcp-heavy workloads, > > > it still behaves better with bql in place, and fq_codel is better for > > > generic workloads... but needs bql based backpressure to kick in. > > > > > > [1] I really hope I'm overreacting but, um, er, could someone(s) spin > > > up a new patch that does bql in some way even half right for this > > > driver and help test it? I haven't built a kernel in a while. > > > > > > > The Azure network driver (netvsc) also does not have BQL. Several years ago > > I tried adding it but it benchmarked worse and there is the added complexity > > of handling the accelerated networking VF path. > > I certainly agree it adds complexity, but the question is what sort of > network behavior resulted without backpressure inside the > vm? > > What sorts of benchmarks did you do? > > I will get setup to do some testing of this that is less adhoc. Less of an issue than it seems for must users. For the most common case, all transmits are passed through to the underlying VF network device (Mellanox). So since Mellanox supports BQL, that works. The special case is if accelerated networking is disabled or host is being serviced and the slow path is used. Optimizing the slow path is not that interesting. I wonder if the use of SRIOV with virtio (which requires another layer with the failover device) behaves the same way?