Received: by 2002:a05:6a10:a852:0:0:0:0 with SMTP id d18csp3806698pxy; Tue, 4 May 2021 10:18:07 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxQnj2/cxWJwnKQbnNJahnUNetUr6fMnddXYuduexbdjPkdvx7fLm1OO+RSkfr42nzUXzIy X-Received: by 2002:aa7:860e:0:b029:28e:b4a9:297f with SMTP id p14-20020aa7860e0000b029028eb4a9297fmr8069427pfn.46.1620148687405; Tue, 04 May 2021 10:18:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1620148687; cv=none; d=google.com; s=arc-20160816; b=SxTwkWvf5nWzyQZJI3jPvtJGVzWdBCkz9fTv7hcYvuaKznfhEivUZoKsHzDIMUIH15 CY6NVoSp5NuZu2Zj2NL1Jt0cf28/2rgeX6V3BTba9UeLY183V1BTrRCZCseKo+V1aQOt yq0WUor4jJmdALxNBkad8e82mpHSSIccVVdtzfD8/VoWy7Z2Zf2tm+pb7mJAvTuO/sZ2 fNDJQfUbI77TTBGrJ4vD5VyEeHuWb2uYJ1/OO9dqPJQDBI2BNhTzY8KtbCR3OnIs35H/ tQQgVm1XB5aWXdPRklgJlv52rMeZwshybJ2JNcccQ8MFaQo/x/7pe9lWmRMp6daAgKzD HTrA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:dkim-filter; bh=G50LlsyiBAyw0EHBP8O4WNvAiz8M/h4lrOuvua2uqEs=; b=pEXtvNnUhLbIFpRNKYgvFxgrKg321j62vVYZXqEq7QzGR6xuGLkZjJaW06BXTnUR01 JIMjmj8xMJpwj5oh1hJTOHdY5d5Ql3R8ct5snNdQbSL4FKBSAGxVUMDuHUn1OliHL1VF b6FAqbiCSWHtqAI72jY/FnyCCNtBkSCs4hSUPDHYH7XFLMXjQKppskDGYpZrUhRg7Fdc CyscapQfNlk+yQggo4ddDqQjvkcJvfVVD9QpHRlvuVzQOADQr4i6aofxSxJxqTXV4OFn p4aSR1p0fU/CTMWhTAlG5U66qY/BqhL59RS9lGAXupj5d6fOu7Mi4h4mgxS8dhSjyhc1 yq1w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@fieldses.org header.s=default header.b=GsQFxzHK; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id q9si4302126pls.236.2021.05.04.10.17.44; Tue, 04 May 2021 10:18:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@fieldses.org header.s=default header.b=GsQFxzHK; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230286AbhEDQwT (ORCPT + 99 others); Tue, 4 May 2021 12:52:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33632 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230133AbhEDQwS (ORCPT ); Tue, 4 May 2021 12:52:18 -0400 Received: from fieldses.org (fieldses.org [IPv6:2600:3c00:e000:2f7::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 322DDC061574 for ; Tue, 4 May 2021 09:51:21 -0700 (PDT) Received: by fieldses.org (Postfix, from userid 2815) id 5B57C5047; Tue, 4 May 2021 12:51:20 -0400 (EDT) DKIM-Filter: OpenDKIM Filter v2.11.0 fieldses.org 5B57C5047 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fieldses.org; s=default; t=1620147080; bh=G50LlsyiBAyw0EHBP8O4WNvAiz8M/h4lrOuvua2uqEs=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=GsQFxzHKIYNm5P7Mbrmv4ExtU584BA9RZhJ/kNwi2QhWvGeJQdSOwvuFGraqbZFMw T3VhngTezIMcgXmDsJ+eGLPS5D2FzQDGuxEiRoWwfFCuLfOdV24B61FLvAreOE3uyq 4icZL7UWhep5dCNkHSZqfRUG9IUvl+TybRvwY0W8= Date: Tue, 4 May 2021 12:51:20 -0400 From: "bfields@fieldses.org" To: Trond Myklebust Cc: "neilb@suse.de" , "fsorenso@redhat.com" , "linux-nfs@vger.kernel.org" , "aglo@umich.edu" , "bcodding@redhat.com" , "jshivers@redhat.com" , "chuck.lever@oracle.com" Subject: Re: Re: unsharing tcp connections from different NFS mounts Message-ID: <20210504165120.GA18746@fieldses.org> References: <20201007140502.GC23452@fieldses.org> <85F496CD-9AAC-451C-A224-FCD138BDC591@oracle.com> <20201007160556.GE23452@fieldses.org> <20210119222229.GA29488@fieldses.org> <2d77534fb8be557c6883c8c386ebf4175f64454a.camel@hammerspace.com> <20210120150737.GA17548@fieldses.org> <20210503200952.GB18779@fieldses.org> <162009412979.28954.17703105649506010394@noble.neil.brown.name> <4033e1e8b52c27503abe5855f81b7d12b2e46eec.camel@hammerspace.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <4033e1e8b52c27503abe5855f81b7d12b2e46eec.camel@hammerspace.com> User-Agent: Mutt/1.5.21 (2010-09-15) Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org Thanks very much to all of you for the explanations and concrete suggestions for things to look at, I feel much less stuck! --b. On Tue, May 04, 2021 at 02:27:04PM +0000, Trond Myklebust wrote: > On Tue, 2021-05-04 at 12:08 +1000, NeilBrown wrote: > > On Tue, 04 May 2021, bfields@fieldses.org wrote: > > > On Wed, Jan 20, 2021 at 10:07:37AM -0500, bfields@fieldses.org wrote: > > > > > > > > So mainly: > > > > > > > > > > > Why is there a performance regression being seen by these > > > > > > > setups > > > > > > > when they share the same connection? Is it really the > > > > > > > connection, > > > > > > > or is it the fact that they all share the same fixed-slot > > > > > > > session? > > > > > > > > I don't know.  Any pointers how we might go about finding the > > > > answer? > > > > > > I set this aside and then get bugged about it again. > > > > > > I apologize, I don't understand what you're asking for here, but it > > > seemed obvious to you and Tom, so I'm sure the problem is me.  Are > > > you > > > free for a call sometime maybe?  Or do you have any suggestions for > > > how > > > you'd go about investigating this? > > > > I think a useful first step would be to understand what is getting in > > the way of the small requests. > >  - are they in the client waiting for slots which are all consumed by > >    large writes? > >  - are they in TCP stream behind megabytes of writes that need to be > >    consumed before they can even be seen by the server? > >  - are they in a socket buffer on the server waiting to be served > >    while all the nfsd thread are busy handling writes? > > > > I cannot see an easy way to measure which it is. > > The nfs4_sequence_done tracepoint will give you a running count of the > highest slot id in use. > > The mountstats 'execute time' will give you the time between the > request being created and the time a reply was received. That time > includes the time spent waiting for a NFSv4 session slot. > > The mountstats 'backlog wait' will tell you the time spent waiting for > an RPC slot after obtaining the NFSv4 session slot. > > The mountstats 'RTT' will give you the time spend waiting for the RPC > request to be received, processed and replied to by the server. > > Finally, the mountstats also tell you average per-op bytes sent/bytes > received. > > IOW: The mountstats really gives you almost all the information you > need here, particularly if you use it in the 'interval reporting' mode. > The only thing it does not tell you is whether or not the NFSv4 session > slot table is full (which is why you want the tracepoint). > > > I guess monitoring how much of the time that the client has no free > > slots might give hints about the first.  If there are always free > > slots, > > the first case cannot be the problem. > > > > With NFSv3, the slot management happened at the RPC layer and there > > were > > several queues (RPC_PRIORITY_LOW/NORMAL/HIGH/PRIVILEGED) where requests > > could wait for a free slot.  Since we gained dynamic slot allocation - > > up to 65536 by default - I wonder if that has much effect any more. > > > > For NFSv4.1+ the slot management is at the NFS level.  The server sets > > a > > maximum which defaults to (maybe is limited to) 1024 by the Linux > > server. > > So there are always free rpc slots. > > The Linux client only has a single queue for each slot table, and I > > think there is one slot table for the forward channel of a session. > > So it seems we no longer get any priority management (sync writes used > > to get priority over async writes). > > > > Increasing the number of slots advertised by the server might be > > interesting.  It is unlikely to fix anything, but it might move the > > bottle-neck. > > > > Decreasing the maximum of number of tcp slots might also be interesting > > (below the number of NFS slots at least). > > That would allow the RPC priority infrastructure to work, and if the > > large-file writes are async, they might gets slowed down. > > > > If the problem is in the TCP stream (which is possible if the relevant > > network buffers are bloated), then you'd really need multiple TCP > > streams > > (which can certainly improve throughput in some cases).  That is what > > nconnect give you.  nconnect does minimal balancing.  It general it > > will > > round-robin, but if the number of requests (not bytes) queued on one > > socket is below average, that socket is likely to get the next request. > > It's not round-robin. Transports are allocated to a new RPC request > based on a measure of their queue length in order to skip over those > that show signs of above average congestion. > > > So just adding more connections with nconnect is unlikely to help.  > > You > > would need to add a policy engine (struct rpc_xpr_iter_ops) which > > reserves some connections for small requests.  That should be fairly > > easy to write a proof-of-concept for. > > Ideally we would want to tie into cgroups as the control mechanism so > that NFS can be treated like any other I/O resource. > > > > > NeilBrown > > > > > > > > > > Would it be worth experimenting with giving some sort of advantage > > > to > > > readers?  (E.g., reserving a few slots for reads and getattrs and > > > such?) > > > > > > --b. > > > > > > > It's easy to test the case of entirely seperate state & tcp > > > > connections. > > > > > > > > If we want to test with a shared connection but separate slots I > > > > guess > > > > we'd need to create a separate session for each nfs4_server, and > > > > a lot > > > > of functions that currently take an nfs4_client would need to > > > > take an > > > > nfs4_server? > > > > > > > > --b. > > > > > > > > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@hammerspace.com > >