From: "J. Bruce Fields" Subject: Re: 2.6.30-rc deadline scheduler performance regression for iozone over NFS Date: Thu, 14 May 2009 13:55:00 -0400 Message-ID: <20090514175500.GB5675@fieldses.org> References: <20090508120119.8c93cfd7.akpm@linux-foundation.org> <20090511081415.GL4694@kernel.dk> <20090511165826.GG4694@kernel.dk> <20090512204433.7eb69075.akpm@linux-foundation.org> <1242258338.5407.244.camel@heimdal.trondhjem.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Jeff Moyer , netdev@vger.kernel.org, Andrew Morton , Jens Axboe , linux-kernel@vger.kernel.org, "Rafael J. Wysocki" , Olga Kornievskaia , Jim Rees , linux-nfs@vger.kernel.org To: Trond Myklebust Return-path: In-Reply-To: <1242258338.5407.244.camel@heimdal.trondhjem.org> Sender: netdev-owner@vger.kernel.org List-ID: On Wed, May 13, 2009 at 07:45:38PM -0400, Trond Myklebust wrote: > On Wed, 2009-05-13 at 15:29 -0400, Jeff Moyer wrote: > > Hi, netdev folks. The summary here is: > > > > A patch added in the 2.6.30 development cycle caused a performance > > regression in my NFS iozone testing. The patch in question is the > > following: > > > > commit 47a14ef1af48c696b214ac168f056ddc79793d0e > > Author: Olga Kornievskaia > > Date: Tue Oct 21 14:13:47 2008 -0400 > > > > svcrpc: take advantage of tcp autotuning > > > > which is also quoted below. Using 8 nfsd threads, a single client doing > > 2GB of streaming read I/O goes from 107590 KB/s under 2.6.29 to 65558 > > KB/s under 2.6.30-rc4. I also see more run to run variation under > > 2.6.30-rc4 using the deadline I/O scheduler on the server. That > > variation disappears (as does the performance regression) when reverting > > the above commit. > > It looks to me as if we've got a bug in the svc_tcp_has_wspace() helper > function. I can see no reason why we should stop processing new incoming > RPC requests just because the send buffer happens to be 2/3 full. If we I agree, the calculation doesn't look right. But where do you get the 2/3 number from? ... > @@ -964,23 +973,14 @@ static int svc_tcp_has_wspace(struct svc_xprt *xprt) > struct svc_sock *svsk = container_of(xprt, struct svc_sock, sk_xprt); > struct svc_serv *serv = svsk->sk_xprt.xpt_server; > int required; > - int wspace; > - > - /* > - * Set the SOCK_NOSPACE flag before checking the available > - * sock space. > - */ > - set_bit(SOCK_NOSPACE, &svsk->sk_sock->flags); > - required = atomic_read(&svsk->sk_xprt.xpt_reserved) + serv->sv_max_mesg; > - wspace = sk_stream_wspace(svsk->sk_sk); > - > - if (wspace < sk_stream_min_wspace(svsk->sk_sk)) > - return 0; > - if (required * 2 > wspace) > - return 0; > > - clear_bit(SOCK_NOSPACE, &svsk->sk_sock->flags); > + required = (atomic_read(&xprt->xpt_reserved) + serv->sv_max_mesg) * 2; > + if (sk_stream_wspace(svsk->sk_sk) < required) This calculation looks the same before and after--you've just moved the "*2" into the calcualtion of "required". Am I missing something? Maybe you meant to write: required = atomic_read(&xprt->xpt_reserved) + serv->sv_max_mesg * 2; without the parentheses? That looks closer, assuming the calculation is meant to be: atomic_read(..) == amount of buffer space we think we already need serv->sv_max_mesg * 2 == space for worst-case request and reply? --b. > + goto out_nospace; > return 1; > +out_nospace: > + set_bit(SOCK_NOSPACE, &svsk->sk_sock->flags); > + return 0; > }