Return-Path: linux-nfs-owner@vger.kernel.org Received: from fieldses.org ([174.143.236.118]:56708 "EHLO fieldses.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752040Ab3HACt1 (ORCPT ); Wed, 31 Jul 2013 22:49:27 -0400 Date: Wed, 31 Jul 2013 22:49:23 -0400 From: "J.Bruce Fields" To: NeilBrown Cc: Ben Myers , Olga Kornievskaia , NFS Subject: Re: [PATCH] NFSD/sunrpc: avoid deadlock on TCP connection due to memory pressure. Message-ID: <20130801024923.GD2668@fieldses.org> References: <20130716015803.GA5271@fieldses.org> <20130716140021.312b5b07@notabene.brown> <20130716142430.GA11977@fieldses.org> <20130718000319.GL1681@sgi.com> <20130724210746.GB5777@fieldses.org> <20130725113023.7bcbc347@notabene.brown> <20130725201805.GB17962@fieldses.org> <20130726063303.0d1495b3@notabene.brown> <20130726141916.GA30651@fieldses.org> <20130730124857.7c066858@notabene.brown> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <20130730124857.7c066858@notabene.brown> Sender: linux-nfs-owner@vger.kernel.org List-ID: On Tue, Jul 30, 2013 at 12:48:57PM +1000, NeilBrown wrote: > On Fri, 26 Jul 2013 10:19:16 -0400 "J.Bruce Fields" > wrote: > > > On Fri, Jul 26, 2013 at 06:33:03AM +1000, NeilBrown wrote: > > > On Thu, 25 Jul 2013 16:18:05 -0400 "J.Bruce Fields" > > > wrote: > > > > > > > On Thu, Jul 25, 2013 at 11:30:23AM +1000, NeilBrown wrote: > > > > > > > > > > Since we enabled auto-tuning for sunrpc TCP connections we do not > > > > > guarantee that there is enough write-space on each connection to > > > > > queue a reply. > > ... > > > > This is great, thanks! > > > > > > > > Inclined to queue it up for 3.11 and stable.... > > > > > > I'd agree for 3.11. > > > It feels a bit border-line for stable. "dead-lock" and "has been seen in the > > > wild" are technically enough justification... > > > I'd probably mark it as "pleas don't apply to -stable until 3.11 is released" > > > or something like that, just for a bit of breathing space. > > > Your call though. > > > > > > So my takeaway from http://lwn.net/Articles/559113/ was that Linus and > > Greg were requesting that: > > > > - criteria for -stable and late -rc's should really be about the > > same, and > > - people should follow Documentation/stable-kernel-rules.txt. > > > > So as an exercise to remind me what those rules are: > > > > Easy questions: > > > > - "no bigger than 100 lines, with context." Check. > > - "It must fix only one thing." Check. > > - "real bug that bothers people". Check. > > - "tested": yep. It doesn't actually say "tested on stable > > trees", and I recall this did land you with a tricky bug one > > time when a prerequisite was omitted from the backport. > > > > Judgement calls: > > > > - "obviously correct": it's short, but admittedly subtle, and > > performance regressions can take a while to get sorted out. > > - "It must fix a problem that causes a build error (but not for > > things marked CONFIG_BROKEN), an oops, a hang, data > > corruption, a real security issue, or some "oh, that's not > > good" issue. In short, something critical." We could argue > > that "server stops responding" is critical, though not to the > > same degree as a panic. > > - OR: alternatively: "Serious issues as reported by a user of a > > distribution kernel may also be considered if they fix a > > notable performance or interactivity issue." The only bz I've > > personally seen was the result of artificial testing of some > > kind, and it sounds like your case involved a disk failure? > > > > --b. > > Looks like good analysis ... except that it doesn't seem conclusive. Being > conclusive would make it really good. :-) Hah, yeah sorry. > The case that brought it to my attention doesn't require the fix. > A file system was mis-behaving (blocking when it should return EJUKEBOX) and > this resulted in nfsd behaviour different than my expectation. > I expected nfsd to keep accepting requests until all threads were blocks. > However only 4 requests were accepted (which is actually better behaviour, > but not what I expected). > So I looked into it and thought that what I found wasn't really right. Which > turned out to be the case, but not the way I thought... > > So my direct experience doesn't argue for the patch going to -stable at all. > If the only other reports are from artificial testing then I'd leave it out > of -stable. I don't feel -rc4 (that's next I think) is too late for it > though. OK, I think I agree. We can pass it along to stable later if more complaints surface.... --b.