Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754523AbbGBUoZ (ORCPT ); Thu, 2 Jul 2015 16:44:25 -0400 Received: from zeniv.linux.org.uk ([195.92.253.2]:54254 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754126AbbGBUoT (ORCPT ); Thu, 2 Jul 2015 16:44:19 -0400 Date: Thu, 2 Jul 2015 21:44:15 +0100 From: Al Viro To: Linus Torvalds Cc: Jeff Layton , Andrey Ryabinin , Linux Kernel Mailing List , linux-fsdevel Subject: Re: [git pull] vfs part 2 Message-ID: <20150702204415.GQ17109@ZenIV.linux.org.uk> References: <20150701085507.GE17109@ZenIV.linux.org.uk> <5593CE37.4070307@samsung.com> <20150701184408.GF17109@ZenIV.linux.org.uk> <20150702032042.GA32613@ZenIV.linux.org.uk> <20150702080026.1c32f1c7@tlielax.poochiereds.net> <20150702080738.615e1c52@tlielax.poochiereds.net> <20150702164535.GM17109@ZenIV.linux.org.uk> <20150702130139.35e01106@tlielax.poochiereds.net> <20150702184015.GO17109@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3446 Lines: 70 On Thu, Jul 02, 2015 at 12:16:14PM -0700, Linus Torvalds wrote: > On Thu, Jul 2, 2015 at 11:40 AM, Al Viro wrote: > > > > All they are used for is matching response to request. Basically, you > > can have up to 65535 pending requests. Reusing it right after getting > > the response is fine. > > Reusing a tag right after getting the completion may be fine in > theory, but it still sounds like a bad idea. Sure, it's used to match > the command with the reply, but using those kinds of things for > matching re-sends and to index into various "current data structures" > is also very common (not having looked at p9 I don't know how much it > does), and basically reusing tags "soon" tends to make those kidns of > things fragile. _All_ retransmits are done in transport layer there. It's not NFS - it really expects reliable ordered connection for transport. No retransmits, no duplicates, etc. I'm not dead against circular allocation, but I would really like to figure out what's going on first. I still wonder if we are seeing wraparound (should've posted a diff instead of verbal description - mea culpa). If we are not, it smells like response to request having arrived while the tag had been not in use from the client POV, _or_ buggered barriers of some kind. Maybe buggered ordering of replies somewhere, but that's only if Tflush had been involved (as in -> Twrite tag = 3 -> Tflush tag = 42 old_tag = 3 <- Rwrite tag = 3 <- Rflush tag = 42 mark tag 3 free to be reused reuse tag 3 ... somehow get to seeing Rwrite only now But I don't see where such ordering violation could've happened at the moment. The way it's supposed to work is that the sequence -> Twhatever tag = N -> Tflush old_tag = N must either end up with no response to the former arriving at all, or arriving before the response to the latter. Transport itself does preserve ordering (TCP certainly would, but virtio queue also does, AFAICS) and we really need to have p9_client_cb() called in order of arrival. Hmm... This is a stab in the dark, but... we have vring_interrupt() calling req_done(), which does while (1) { spin_lock_irqsave(&chan->lock, flags); rc = virtqueue_get_buf(chan->vq, &len); if (rc == NULL) { spin_unlock_irqrestore(&chan->lock, flags); break; } chan->ring_bufs_avail = 1; spin_unlock_irqrestore(&chan->lock, flags); /* Wakeup if anyone waiting for VirtIO ring space. */ wake_up(chan->vc_wq); p9_debug(P9_DEBUG_TRANS, ": rc %p\n", rc); p9_debug(P9_DEBUG_TRANS, ": lookup tag %d\n", rc->tag); req = p9_tag_lookup(chan->client, rc->tag); p9_client_cb(chan->client, req, REQ_STATUS_RCVD); } What's to prevent *another* vring_interrupt() (called from some kind of IRQ handler) hitting on another CPU and competing with this one for the queue? While we are at it, both p9_tag_lookup() and p9_client_cb() should be find with being called under spin_lock_irqsave, so why not hold it outside of the loop? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/