Received: by 2002:a05:6a10:a852:0:0:0:0 with SMTP id d18csp3226895pxy; Mon, 3 May 2021 19:10:09 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxJrATbD925WCeDS3r1xwhm5sSwzXQI+/W8hAObSkrb5rFaN5YJa7u2yHTLoBqSB5H5uFYo X-Received: by 2002:a62:8f45:0:b029:28e:a5f2:2f2a with SMTP id n66-20020a628f450000b029028ea5f22f2amr6752951pfd.44.1620094209655; Mon, 03 May 2021 19:10:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1620094209; cv=none; d=google.com; s=arc-20160816; b=i3j72MuOsi9uN8pnTfg/X8F1dIYrqPYO3pvXKo5mvX6UvecK+GY+nS2RpcbTqbJUb+ 27SpbVoB3FWVDY9p0vW879nRZmORZaiv4PhE4+n/Knsset5YsN3rQzGK8FGJZrZuTqTj fSPYk4CPipNDZZkyQ3PFf4Gg6ueGH/OOZwoeZOgpaocAzr5h5KfB6yq3UzfVeHRXmp1o SQW1x1Bju3fAcU0FdWEPMt1xB+ikhIOO8S+Fj42bDCxwlZXmy3X/N7w0SJuntwVHG9Ig Rhm40aQWSFgvsMVZGeaK/GoCEwjmbOcQkcpCF0KBXsW1KwLnrbb2W+13IV6Mdit9A/ua ylJg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:message-id:date:references:in-reply-to:subject :cc:to:from:mime-version:content-transfer-encoding; bh=9PNeOj3Q7QLHWLwmMW2Q3hWxGNG3YXzyCHsaUyOHZgg=; b=qShMxaTuFE0yX4IjilqhC+jyJiAG0KZdV3yAEN/jtKzchBXxXPAnX452hucHhV5uoD uv8X97ZmMZTNfbETEIyDLzqh6KMAXdGwZqZqtXTMF1wTsB/jPYBNwRUX3KUtGOVpvix6 yXvUoBd17D5+Kmg6MS7klHKdP+lqRjfLFHP4PX3tPSPQjDk1PHKwfLSWp2US1ocZGZ5+ LliMaGXmtL1+7steMrwq29RDT8LSFSvb3c2+wH5yNiqu5cLHHc2kH75NEpnvNNmgsaF2 AvLMfpoNds3vjPADXubDfMT66niDTOddwxnOqNIghcCvvQgvky4aq8dGTHHqV1QM4fTw cd8A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g24si10481018pji.37.2021.05.03.19.09.39; Mon, 03 May 2021 19:10:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229691AbhEDCJv (ORCPT + 99 others); Mon, 3 May 2021 22:09:51 -0400 Received: from mx2.suse.de ([195.135.220.15]:36096 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229665AbhEDCJv (ORCPT ); Mon, 3 May 2021 22:09:51 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 669C5ABC2; Tue, 4 May 2021 02:08:56 +0000 (UTC) Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit MIME-Version: 1.0 From: "NeilBrown" To: "bfields@fieldses.org" Cc: "Trond Myklebust" , "fsorenso@redhat.com" , "linux-nfs@vger.kernel.org" , "aglo@umich.edu" , "bcodding@redhat.com" , "jshivers@redhat.com" , "chuck.lever@oracle.com" Subject: Re: Re: unsharing tcp connections from different NFS mounts In-reply-to: <20210503200952.GB18779@fieldses.org> References: <57E3293C-5C49-4A80-957B-E490E6A9B32E@redhat.com>, <5B5CF80C-494A-42D3-8D3F-51C0277D9E1B@redhat.com>, <8ED5511E-25DE-4C06-9E26-A1947383C86A@oracle.com>, <20201007140502.GC23452@fieldses.org>, <85F496CD-9AAC-451C-A224-FCD138BDC591@oracle.com>, <20201007160556.GE23452@fieldses.org>, , <20210119222229.GA29488@fieldses.org>, <2d77534fb8be557c6883c8c386ebf4175f64454a.camel@hammerspace.com>, <20210120150737.GA17548@fieldses.org>, <20210503200952.GB18779@fieldses.org> Date: Tue, 04 May 2021 12:08:49 +1000 Message-id: <162009412979.28954.17703105649506010394@noble.neil.brown.name> Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Tue, 04 May 2021, bfields@fieldses.org wrote: > On Wed, Jan 20, 2021 at 10:07:37AM -0500, bfields@fieldses.org wrote: > > > > So mainly: > > > > > > > Why is there a performance regression being seen by these setups > > > > > when they share the same connection? Is it really the connection, > > > > > or is it the fact that they all share the same fixed-slot session? > > > > I don't know. Any pointers how we might go about finding the answer? > > I set this aside and then get bugged about it again. > > I apologize, I don't understand what you're asking for here, but it > seemed obvious to you and Tom, so I'm sure the problem is me. Are you > free for a call sometime maybe? Or do you have any suggestions for how > you'd go about investigating this? I think a useful first step would be to understand what is getting in the way of the small requests. - are they in the client waiting for slots which are all consumed by large writes? - are they in TCP stream behind megabytes of writes that need to be consumed before they can even be seen by the server? - are they in a socket buffer on the server waiting to be served while all the nfsd thread are busy handling writes? I cannot see an easy way to measure which it is. I guess monitoring how much of the time that the client has no free slots might give hints about the first. If there are always free slots, the first case cannot be the problem. With NFSv3, the slot management happened at the RPC layer and there were several queues (RPC_PRIORITY_LOW/NORMAL/HIGH/PRIVILEGED) where requests could wait for a free slot. Since we gained dynamic slot allocation - up to 65536 by default - I wonder if that has much effect any more. For NFSv4.1+ the slot management is at the NFS level. The server sets a maximum which defaults to (maybe is limited to) 1024 by the Linux server. So there are always free rpc slots. The Linux client only has a single queue for each slot table, and I think there is one slot table for the forward channel of a session. So it seems we no longer get any priority management (sync writes used to get priority over async writes). Increasing the number of slots advertised by the server might be interesting. It is unlikely to fix anything, but it might move the bottle-neck. Decreasing the maximum of number of tcp slots might also be interesting (below the number of NFS slots at least). That would allow the RPC priority infrastructure to work, and if the large-file writes are async, they might gets slowed down. If the problem is in the TCP stream (which is possible if the relevant network buffers are bloated), then you'd really need multiple TCP streams (which can certainly improve throughput in some cases). That is what nconnect give you. nconnect does minimal balancing. It general it will round-robin, but if the number of requests (not bytes) queued on one socket is below average, that socket is likely to get the next request. So just adding more connections with nconnect is unlikely to help. You would need to add a policy engine (struct rpc_xpr_iter_ops) which reserves some connections for small requests. That should be fairly easy to write a proof-of-concept for. NeilBrown > > Would it be worth experimenting with giving some sort of advantage to > readers? (E.g., reserving a few slots for reads and getattrs and such?) > > --b. > > > It's easy to test the case of entirely seperate state & tcp connections. > > > > If we want to test with a shared connection but separate slots I guess > > we'd need to create a separate session for each nfs4_server, and a lot > > of functions that currently take an nfs4_client would need to take an > > nfs4_server? > > > > --b. > >