Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752448AbbKXAl5 (ORCPT ); Mon, 23 Nov 2015 19:41:57 -0500 Received: from p3plsmtpa11-09.prod.phx3.secureserver.net ([68.178.252.110]:43218 "EHLO p3plsmtpa11-09.prod.phx3.secureserver.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751843AbbKXAl4 (ORCPT ); Mon, 23 Nov 2015 19:41:56 -0500 X-Greylist: delayed 430 seconds by postgrey-1.27 at vger.kernel.org; Mon, 23 Nov 2015 19:41:56 EST Subject: Re: [PATCH 2/9] IB: add a proper completion queue abstraction To: Jason Gunthorpe , Caitlin Bestler References: <20151113220636.GA32133@obsidianresearch.com> <20151114071344.GE27738@lst.de> <20151123203712.GB5640@obsidianresearch.com> <56537F59.4080708@sandisk.com> <20151123212822.GE6062@obsidianresearch.com> <56538AFD.9080103@sandisk.com> <20151123221806.GA7152@obsidianresearch.com> <56539421.9050705@sandisk.com> <20151123230659.GA8287@obsidianresearch.com> <20151124000011.GA9301@obsidianresearch.com> Cc: Bart Van Assche , Christoph Hellwig , "linux-rdma@vger.kernel.org" , "sagig@dev.mellanox.co.il" , "axboe@fb.com" , "linux-scsi@vger.kernel.org" , "linux-kernel@vger.kernel.org" From: Tom Talpey Message-ID: <5653B0AD.7090402@talpey.com> Date: Mon, 23 Nov 2015 19:34:53 -0500 User-Agent: Mozilla/5.0 (Windows NT 10.0; rv:38.0) Gecko/20100101 Thunderbird/38.3.0 MIME-Version: 1.0 In-Reply-To: <20151124000011.GA9301@obsidianresearch.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2864 Lines: 67 On 11/23/2015 7:00 PM, Jason Gunthorpe wrote: > On Mon, Nov 23, 2015 at 03:30:42PM -0800, Caitlin Bestler wrote: >> The receive completion can be safely assumed to indicate transmit >> completion over a reliable connection unless your peer has gone >> completely bonkers and is replying to a command that it did not >> receive. > > Perhaps iWarp is different and does specify this ordering but IB does > not. iWARP is not different. The situation you (Jason) describe has nothing to do with the transport. It has everything to do with as you point out the lack of causality between send and receive completions. It is entirely possible for the reply to be received before the send is fully processed. For example, the send might be issued on one core, and that core scheduled away before the completion for the send is ready. In the meantime, the request goes on the wire, the target processes it and replies, and the reply is processed. Boom, the send queue completion is still pending. Been there, seen that. Bluescreened on it, mysteriously. A really good way to see this is with software providers, btw. Try it with soft{roce,iwarp}, under heavy load. Tom. > > The issue with IB is how the ACK protocol is designed. There is not > strong ordering between ACKs and data transfers. A HCA can send > ACK,DATA and the network could drop the ACK. The recevier side does > not know the ACK was lost and goes ahead to process DATA. > > Since only ACK advances the sendq and DATA advances the recvq it is > trivial to get a case where the recvq is advanced with a reply while > the sendq continues to wait for the ACK to be resent. > > Further IB allows ACK coalescing and has no rules for how an ACK is > placed. It is entirely valid for a HCA to RECV,REPLY,ACK - for > instance. > >> I actually had a bug in an early iWARP emulation where the simulated >> peer, because it was simulated, responded >> instantly. The result was a TCP segment that both acked the >> transmission *and* contained the reply. The bug was >> that the code processed the reception before the transmission ack, >> causing the receive completion to be placed >> on the completion queue before transmit completion. > > I don't know if iWARP has the same lax ordering as IB, but certainly, > what you describe is legal for IB verbs to do, and our kernel ULPs > have to cope with it. > > Jason > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/