Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp928147pxb; Wed, 3 Mar 2021 21:23:01 -0800 (PST) X-Google-Smtp-Source: ABdhPJxxyQTn9WvHhaG7VsjvkyWzuWtMp6fOc6lZt2zGIYHjC6Uk+eht5sV2cHlKXg5efPmuTA3z X-Received: by 2002:a05:6402:12cf:: with SMTP id k15mr2438317edx.192.1614835380771; Wed, 03 Mar 2021 21:23:00 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1614835380; cv=none; d=google.com; s=arc-20160816; b=AtfU/N8zJKWrTb+EIvPCkn47wOU+9HQxCn1p/qbKbQu+nK37tvT6cbXq8fa6oa+Wr2 TqYhSpwHJCE4bofXVPX/LRRKn05BXPwxIMgd1nuOu/2yjZqIP9CiSuDsMFAYE34bNZpb vYGvVZcx7C/5h2wW6GMsg3U47tntW29tGxOLI4RoqvkLf+Km/Y6vG08w1CgQynpjppsP iN/izLw3q6rEpRdZCoZOB30sCVKSnPCRG1Z3vYOzzRmUNjthHu++xYTbx15NgkfALyY+ Qmt6IOQxmr+lq0zQeqfYxVmPM63ooBXu7R0YdaKsFuoSiOyqKd00h+vefBFWXkrz7S7P RAhQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=uPwDKmxBF4I/MuQN3Ea1PurwZOCDd9jWaiHe0OdMFJs=; b=yCj3v/Gxt71Ia6Kcxqle1qKLD5HLZ1zwICDsR7KmAEt9qekuZZiUZVCBd4jNu57FeD 2JVqJWuclAyiEWyfbyODoV+1jUpw4kxyMzGx8hpDq4eULH84dsc45BCthLabqtJMp0xE 5nOBfp5ZBcmZ0EFOlOdOpPk47oYNfGVxqbn++GaA/p7OO7xbR+6ol7KdeUpBsAtxdbwv /N36gt2VQ1iHxJXytHAwPYmuu1G0vymaVOjloBI73TfN/Y1RSZDU53w8ZOYxBfpWL10r yrpwjA0ZOU3A7KvTEjx8eNtD7uFmkahSa+8Kk4/ljvTnzwZvjGFxjsh6XkaiWpi/XlwE wi6Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=itXczhu+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id c13si139644ede.115.2021.03.03.21.22.39; Wed, 03 Mar 2021 21:23:00 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=itXczhu+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243524AbhCBDqK (ORCPT + 99 others); Mon, 1 Mar 2021 22:46:10 -0500 Received: from mail.kernel.org ([198.145.29.99]:35554 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243761AbhCAVAg (ORCPT ); Mon, 1 Mar 2021 16:00:36 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id A6F85600EF; Mon, 1 Mar 2021 20:59:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1614632391; bh=Ghkh2dd6hiVMkZNxCxWJWFZq3rC/o9YjYgqFCm/vzDw=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=itXczhu+y4heRg+ydiqsvq+I4jSomprxCJjrcOS7v4klsW8MIoEBGS3km3ZINRm0S 6Kt8HAl+yB08EIBSzgxqIhi224BxavVQQYCoo6p/QvY9gxsg/RL9q8+iSM/kWvh+4D UgV0hhWh17eLkhf7+SaZCVbato6oO3WpJrxgJnrYAhp0oq1p0vcn8969jEKhcKIqT3 OJj+5PKADrRdxxUnkhr8fRaQ1Zi+/rQlhqEMv62M9I1wAS5XD7swldHmzUbg46WlJI iZcaSo/+9d2N4djrk4JjnbG4BysjGWXzZ5l992x8g7r3/u6kbTNO+ILVTJ15a+s527 P6ZTfIB5Om73g== Date: Tue, 2 Mar 2021 05:59:44 +0900 From: Keith Busch To: Hannes Reinecke Cc: Daniel Wagner , Sagi Grimberg , Jens Axboe , Christoph Hellwig , linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] nvme-tcp: Check if request has started before processing it Message-ID: <20210301205944.GE17228@redsun51.ssa.fujisawa.hgst.com> References: <20210226123534.4oovbzk4wrnfjp64@beryllium.lan> <9e209b12-3771-cdca-2c9d-50451061bd2a@suse.de> <20210226161355.GG31593@redsun51.ssa.fujisawa.hgst.com> <20210226171901.GA3949@redsun51.ssa.fujisawa.hgst.com> <20210301132639.n3eowtvkms2n5mog@beryllium.lan> <786dcef5-148d-ff34-590c-804b331ac519@suse.de> <20210301160547.GB17228@redsun51.ssa.fujisawa.hgst.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.12.1 (2019-06-15) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Mar 01, 2021 at 05:53:25PM +0100, Hannes Reinecke wrote: > On 3/1/21 5:05 PM, Keith Busch wrote: > > On Mon, Mar 01, 2021 at 02:55:30PM +0100, Hannes Reinecke wrote: > > > On 3/1/21 2:26 PM, Daniel Wagner wrote: > > > > On Sat, Feb 27, 2021 at 02:19:01AM +0900, Keith Busch wrote: > > > > > Crashing is bad, silent data corruption is worse. Is there truly no > > > > > defense against that? If not, why should anyone rely on this? > > > > > > > > If we receive an response for which we don't have a started request, we > > > > know that something is wrong. Couldn't we in just reset the connection > > > > in this case? We don't have to pretend nothing has happened and > > > > continuing normally. This would avoid a host crash and would not create > > > > (more) data corruption. Or I am just too naive? > > > > > > > This is actually a sensible solution. > > > Please send a patch for that. > > > > Is a bad frame a problem that can be resolved with a reset? > > > > Even if so, the reset doesn't indicate to the user if previous commands > > completed with bad data, so it still seems unreliable. > > > We need to distinguish two cases here. > The one is use receiving a frame with an invalid tag, leading to a crash. > This can be easily resolved by issuing a reset, as clearly the command was > garbage and we need to invoke error handling (which is reset). > > The other case is us receiving a frame with a _duplicate_ tag, ie a tag > which is _currently_ valid. This is a case which will fail _even now_, as we > have simply no way of detecting this. > > So what again do we miss by fixing the first case? > Apart from a system which does _not_ crash? I'm just saying each case is a symptom of the same problem. The only difference from observing one vs the other is a race with the host's dispatch. And since you're proposing this patch, it sounds like this condition does happen on tcp compared to other transports where we don't observe it. I just thought the implication that data corruption happens is a alarming.