Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp3502691pxb; Mon, 1 Mar 2021 11:37:34 -0800 (PST) X-Google-Smtp-Source: ABdhPJwyfoUl+74AeCDLRMxf1051yCD6HkTU99ytfBxQki6agkyTJtWUh5CdXGs2GviG26dxOOii X-Received: by 2002:a17:906:8a55:: with SMTP id gx21mr17080530ejc.21.1614627454529; Mon, 01 Mar 2021 11:37:34 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1614627454; cv=none; d=google.com; s=arc-20160816; b=ySElo3TE/5/RJ6Y+32SoEpCV9hU9anmV96/CvTkoNZx8bARxgdLr6h7bF6xX3k1DIh pmmguaelGUJvu5q9krP7i7xkej68tdwP2FeQ7ROm3kAS15bSz0duaTha8vj+3FG46e48 v6WCIX7COJD5GrJQabJJkbKvfdJg+z1+tk7/DAKnpiXNbG3ob0ghQNhXq5hQW0mtsRjs K8Rr4GY/r3u4b0ybBSzKCHFJ3szxrQ1Z7VPx/ZPpuEpmElDrqMGPMWhcaltQL3twHTfe PGlgvjxItfK7UqFPvuDWUy1KF4BvyABFEDoarnQiD1BT32rRDSAMVsjrSjlNkzdMhyzt MvxA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=exi6WFSODwmQy9thfDLKg5YlZ/WGEVd2566qdFUvF50=; b=dcOx0obLmKIBZ+8rqiEEZ3D1KCHGFKa4tBlGBPLfzv2t9v0Lvh2sBWv6e3kZGe2jqF xDvlQ6tf/u6Qqavt9OftbBKP8Y3jytTruhumf3k3y+waRB19r/W3Z3zcivX4osrzAvv1 Gje49imgYkRkyE9ebq3DizkLT6Dvd4qxbZKC34XsOl4oRhGfs1icZDvnHlzySQAQT/xT 0lHkMSq2BKrWO9shejI2Z4T74bRA+dwnjtm8bMhpXO4wT3W5KIwZ9bHqPQRoxfJtuSy2 lrxqB0HJka9lBLJGxddxS2HCWvAw5fMVkRDTgdfcv6WXG5vQ9zaGA/g4zYooz8p6Ryfi Q1bg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id 1si10413799ejo.651.2021.03.01.11.37.12; Mon, 01 Mar 2021 11:37:34 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242216AbhCATfr (ORCPT + 99 others); Mon, 1 Mar 2021 14:35:47 -0500 Received: from mx2.suse.de ([195.135.220.15]:45164 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236420AbhCAQyI (ORCPT ); Mon, 1 Mar 2021 11:54:08 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 61522AE05; Mon, 1 Mar 2021 16:53:26 +0000 (UTC) Subject: Re: [PATCH] nvme-tcp: Check if request has started before processing it To: Keith Busch Cc: Daniel Wagner , Sagi Grimberg , Jens Axboe , Christoph Hellwig , linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org References: <73e4914e-f867-c899-954d-4b61ae2b4c33@suse.de> <20210215104020.yyithlo2hkxqvguj@beryllium.lan> <20210226123534.4oovbzk4wrnfjp64@beryllium.lan> <9e209b12-3771-cdca-2c9d-50451061bd2a@suse.de> <20210226161355.GG31593@redsun51.ssa.fujisawa.hgst.com> <20210226171901.GA3949@redsun51.ssa.fujisawa.hgst.com> <20210301132639.n3eowtvkms2n5mog@beryllium.lan> <786dcef5-148d-ff34-590c-804b331ac519@suse.de> <20210301160547.GB17228@redsun51.ssa.fujisawa.hgst.com> From: Hannes Reinecke Message-ID: Date: Mon, 1 Mar 2021 17:53:25 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.7.0 MIME-Version: 1.0 In-Reply-To: <20210301160547.GB17228@redsun51.ssa.fujisawa.hgst.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 3/1/21 5:05 PM, Keith Busch wrote: > On Mon, Mar 01, 2021 at 02:55:30PM +0100, Hannes Reinecke wrote: >> On 3/1/21 2:26 PM, Daniel Wagner wrote: >>> On Sat, Feb 27, 2021 at 02:19:01AM +0900, Keith Busch wrote: >>>> Crashing is bad, silent data corruption is worse. Is there truly no >>>> defense against that? If not, why should anyone rely on this? >>> >>> If we receive an response for which we don't have a started request, we >>> know that something is wrong. Couldn't we in just reset the connection >>> in this case? We don't have to pretend nothing has happened and >>> continuing normally. This would avoid a host crash and would not create >>> (more) data corruption. Or I am just too naive? >>> >> This is actually a sensible solution. >> Please send a patch for that. > > Is a bad frame a problem that can be resolved with a reset? > > Even if so, the reset doesn't indicate to the user if previous commands > completed with bad data, so it still seems unreliable. > We need to distinguish two cases here. The one is use receiving a frame with an invalid tag, leading to a crash. This can be easily resolved by issuing a reset, as clearly the command was garbage and we need to invoke error handling (which is reset). The other case is us receiving a frame with a _duplicate_ tag, ie a tag which is _currently_ valid. This is a case which will fail _even now_, as we have simply no way of detecting this. So what again do we miss by fixing the first case? Apart from a system which does _not_ crash? Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer