Return-Path: linux-nfs-owner@vger.kernel.org Received: from fieldses.org ([173.255.197.46]:38191 "EHLO fieldses.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755634AbbBFQIt (ORCPT ); Fri, 6 Feb 2015 11:08:49 -0500 Date: Fri, 6 Feb 2015 11:08:48 -0500 From: "J. Bruce Fields" To: Christoph Hellwig Cc: Anna Schumaker , linux-nfs@vger.kernel.org Subject: Re: [PATCH v2 2/4] NFSD: Add READ_PLUS support for data segments Message-ID: <20150206160848.GA29783@fieldses.org> References: <1422477777-27933-1-git-send-email-Anna.Schumaker@Netapp.com> <1422477777-27933-3-git-send-email-Anna.Schumaker@Netapp.com> <20150205141325.GC4522@infradead.org> <54D394EC.9030902@Netapp.com> <20150205162326.GA18977@infradead.org> <54D39DC2.9060808@Netapp.com> <20150206115456.GA28915@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <20150206115456.GA28915@infradead.org> Sender: linux-nfs-owner@vger.kernel.org List-ID: On Fri, Feb 06, 2015 at 03:54:56AM -0800, Christoph Hellwig wrote: > On Thu, Feb 05, 2015 at 11:43:46AM -0500, Anna Schumaker wrote: > > > The problem is that the typical case of all data won't use splice > > > every with your patches as the 4.2 client will always send a READ_PLUS. > > > > > > So we'll have to find a way to use it where it helps. While we might be > > > able to add some hacks to only use splice for the first segment I guess > > > we just need to make the splice support generic enough in the long run. > > > > > > > I should be able to use splice if I detect that we're only returning a single DATA segment easily enough. > > You could also elect to never return more than one data segment as a > start: > > In all situations, the > server may choose to return fewer bytes than specified by the client. > The client needs to check for this condition and handle the > condition appropriately. Yeah, I think that was more-or-less what Anna's first attempt did and I said "what if that means more round trips"? The client can't anticipate the short reads so it can't make up for this with parallelism. > But doing any of these for a call that's really just an optimization > soudns odd. I'd really like to see an evaluation of the READ_PLUS > impact on various workloads before offering it. Yes, unfortunately I don't see a way to make this just an obvious win. (Is there any way we could make it so with better protocol? Maybe RDMA could help get the alignment right in multiple-segment cases? But then I think there needs to be some sort of language about RDMA, or else we're stuck with: https://tools.ietf.org/html/rfc5667#section-5 which I think forces us to return READ_PLUS data inline, another possible READ_PLUS regression.) Anyway, ignoring RDMA, I guess the interesting questions to me are: - How much does READ_PLUS actually help in the cases it was meant to? Perhaps create and age a VM image somehow, and then time a read of the entire image? - What are the worst-case regressions? Read a large file with a lot of small holes and measure time and cpu usage? --b.