Return-Path: Received: from zeniv.linux.org.uk ([195.92.253.2]:49682 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753131AbbGBImL (ORCPT ); Thu, 2 Jul 2015 04:42:11 -0400 Date: Thu, 2 Jul 2015 09:42:08 +0100 From: Al Viro To: Andrey Ryabinin Cc: Linus Torvalds , LKML , linux-fsdevel , "Aneesh Kumar K.V" , Eric Van Hensbergen , linux-nfs@vger.kernel.org Subject: Re: running out of tags in 9P (was Re: [git pull] vfs part 2) Message-ID: <20150702084208.GK17109@ZenIV.linux.org.uk> References: <5593A7A0.6050400@samsung.com> <20150701085507.GE17109@ZenIV.linux.org.uk> <5593CE37.4070307@samsung.com> <20150701184408.GF17109@ZenIV.linux.org.uk> <20150702032042.GA32613@ZenIV.linux.org.uk> <20150702041046.GG17109@ZenIV.linux.org.uk> <20150702075932.GI17109@ZenIV.linux.org.uk> <20150702082529.GJ17109@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <20150702082529.GJ17109@ZenIV.linux.org.uk> Sender: linux-nfs-owner@vger.kernel.org List-ID: On Thu, Jul 02, 2015 at 09:25:30AM +0100, Al Viro wrote: > On Thu, Jul 02, 2015 at 11:19:03AM +0300, Andrey Ryabinin wrote: > > Besides qemu, I've also tried kvmtool with the same result. IOW I'm seeing > > this under kvmtool as well. It just takes a bit longer to reproduce > > this in kvmtool. > > > > > The bug I suspected to be the cause of that is in tag allocation in > > > net/9p/client.c - we could end up wrapping around 2^16 with enough pending > > > requests and that would have triggered that kind of mess. However, Andrey > > > doesn't see that test (tag wraparound in p9_client_prepare_req()) trigger. > > > BTW, was that on the run where debugging printk in p9_client_write() *did* > > > trigger? > > > > Yes, WARN_ON_ONCE() in p9_client_prepare_req() didn't trigger, > > but debug printk in p9_client_write() *did* trigger. > > Bloody wonderful... Could you check if v9fs_write() in qemu > hw/9pfs/virtio-9p.c ever gets to > offset = 7; > err = pdu_marshal(pdu, offset, "d", total); > with total > count on your testcase? Another thing that might be worth checking: in p9_tag_alloc() (net/9p/client.c) before req->status = REQ_STATUS_ALLOC; check that req->status == REQ_STATUS_IDLE and yell if it isn't. BTW, the loop in there ( /* check again since original check was outside of lock */ while (tag >= c->max_tag) { ) looks fishy. If we get more than P9_ROW_MAXTAG allocations at once, we'll have trouble, but I doubt that this is what we are hitting. In any case, adding WARN_ON(c->req[row]); right after row = (tag / P9_ROW_MAXTAG); wouldn't hurt. I would be very surprised if that one triggered, though.