Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752821AbbGBMHv (ORCPT ); Thu, 2 Jul 2015 08:07:51 -0400 Received: from mail-qk0-f182.google.com ([209.85.220.182]:34431 "EHLO mail-qk0-f182.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753143AbbGBMHp (ORCPT ); Thu, 2 Jul 2015 08:07:45 -0400 Date: Thu, 2 Jul 2015 08:07:38 -0400 From: Jeff Layton To: Al Viro Cc: Andrey Ryabinin , Linus Torvalds , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: Re: [git pull] vfs part 2 Message-ID: <20150702080738.615e1c52@tlielax.poochiereds.net> In-Reply-To: <20150702080026.1c32f1c7@tlielax.poochiereds.net> References: <5538C66F.4050404@samsung.com> <20150621211213.GA18732@ZenIV.linux.org.uk> <5587F943.3040006@samsung.com> <20150701062752.GC17109@ZenIV.linux.org.uk> <55939BE3.6040902@samsung.com> <20150701082753.GD17109@ZenIV.linux.org.uk> <5593A7A0.6050400@samsung.com> <20150701085507.GE17109@ZenIV.linux.org.uk> <5593CE37.4070307@samsung.com> <20150701184408.GF17109@ZenIV.linux.org.uk> <20150702032042.GA32613@ZenIV.linux.org.uk> <20150702080026.1c32f1c7@tlielax.poochiereds.net> X-Mailer: Claws Mail 3.11.1 (GTK+ 2.24.28; x86_64-redhat-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3245 Lines: 72 On Thu, 2 Jul 2015 08:00:26 -0400 Jeff Layton wrote: > On Thu, 2 Jul 2015 04:20:42 +0100 > Al Viro wrote: > > > On Wed, Jul 01, 2015 at 07:44:08PM +0100, Al Viro wrote: > > > Mismatched reply could also be a possibility, but only if we end up with > > > sending more than one request with the same tag without waiting for response > > > for the first one. > > > > ... and I think I see what's going on. Tags are 16bit. Suppose the > > server stalls for some reason *and* we keep piling the requests up. > > New tags keep being grabbed by this: > > > > tag = P9_NOTAG; > > if (type != P9_TVERSION) { > > tag = p9_idpool_get(c->tagpool); > > if (tag < 0) > > return ERR_PTR(-ENOMEM); > > } > > tag is int here. Then we pass tag to > > req = p9_tag_alloc(c, tag, req_size); > > and that's what sets req->tc->tag. OK, but... The argument of p9_tag_alloc() > > in u16, so after 2^16 pending requests we'll wrap around. p9_idpool_get() > > will happily return values greater than 65535 - it's using idr and it's > > used (with different pools) for 16bit tags and 32bit FIDs. > > > > Now, p9_tag_alloc(c, 65539, max_size) will return the same req we'd got from > > p9_tag_alloc(c, 3, max_size). And we are fucked - as far as the server is > > concerned, we'd just sent another request with tag 3. And on the client > > there are two threads waiting for responses on the same p9_req_t. Both > > happen to be TWRITE. Response to the first request arrives and we happen > > to let the second thread go at it first. Voila - the first request had > > been for page-sized write() and got successfully handled. The _second_ one > > had been short and is very surprised to see confirmation of 4Kb worth of > > data having been written. > > > > It should be easy to confirm - in p9_client_prepare_req() add > > if (WARN_ON_ONCE(tag != (u16)tag)) { > > p9_idpool_put(tag, c->tagpool); > > return ERR_PTR(-ENOMEM); > > } > > right after > > tag = p9_idpool_get(c->tagpool); > > if (tag < 0) > > return ERR_PTR(-ENOMEM); > > > > and see if it triggers. I'm not sure if failing with ENOMEM is the > > right response (another variant is to sleep there until the pile > > gets cleaned or until we get killed), and WARN_ON_ONCE() is definitely > > not for the real work, but it will do for confirming that this is what > > we are hitting. > > ISTM that pd_idpool_get ought to be using idr_alloc_cyclic instead. > That should ensure that it's only allocating values from within the > given range. > Erm...and why is it passing in '0' to idr_alloc for the end value if it can't deal with more than 16 bits? That seems like a plain old bug... The other stuff you've noted should also be fixed of course, but the IDR usage here could use a little rework. -- Jeff Layton -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/