Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp1741884imm; Thu, 19 Jul 2018 07:11:22 -0700 (PDT) X-Google-Smtp-Source: AAOMgpdQEp+zRqsB1jsf6Wm/jYzU7LAUz/tEHzbxVmYIe//Ed5JpViFlmeGJweHZMW9CotVQPEUx X-Received: by 2002:a63:4f63:: with SMTP id p35-v6mr10197724pgl.167.1532009482858; Thu, 19 Jul 2018 07:11:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1532009482; cv=none; d=google.com; s=arc-20160816; b=lRvB3Y7WfpQmGSrWg6sU3RjmYCQH5Zde1vRmDyLKDhPWdKa7kwFMv7ya7Alefqj1uG LT0fmUtTgiqq7TZ/p18HFYNEXJwUNqzU6qaDxOC7euByVAvz5kEWywFKcPS4bHqlfMyR ZE2gkFSUbk+OA8POnblV3rgywKVGnrFUjLYG7H6R4E9vUErxlYQNS0d05TZ2RxrsUTeP IfJ2v/CXHDer9EJXUI4BroqZBY8tesxIRpueeXlvhuTWI2vk8qcBMJliWQp1U+hB3L/A BNb4YgXEa81k4h4+tkqX5ee4grXULsX25JyeGp3mNuJJYAMrQYfqOPrp08n70JkERIQP RxGw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date :arc-authentication-results; bh=6Zk8c6+U6J5SYJGLmfz8PmqjPhjFoJ78pSVoRcM9RrE=; b=D7zNu4l0TrSG+bw7tIwbDDQAJ7bHm7T4VXjtvpg8whhNsCjJA6Lc8uCInTq7sA7tNN eLEMB19lqzEo8uxZTRkxXUO0Wq4U7aKNTmKxSgcN1nr3S4JRJK4dcXpglPHmbnI50hmH GnXpJd4/IcJejEoPp9Zw/CpW/1B5RZv8BGUAMGBgBb13OcTjpxLP3Z8rGC6m8LHB40hG oqiZTWbREi92h2ObHff4FP01viG+Tlk98LZ8gJjkD0hjv9ig4KPACstZaZelxzoXhmKa DpjlMZjcKXCVhR27pRdWus1vTHmKXwXOowvUjpA9bmoosi7I1NF5SYaUDDreJq1Ly2W7 u+WA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 15-v6si5959461pgu.205.2018.07.19.07.11.07; Thu, 19 Jul 2018 07:11:22 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731647AbeGSOxt (ORCPT + 99 others); Thu, 19 Jul 2018 10:53:49 -0400 Received: from mx2.suse.de ([195.135.220.15]:34340 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727367AbeGSOxs (ORCPT ); Thu, 19 Jul 2018 10:53:48 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 2C17CAF63; Thu, 19 Jul 2018 14:10:26 +0000 (UTC) Date: Thu, 19 Jul 2018 16:10:25 +0200 From: Johannes Thumshirn To: Christoph Hellwig Cc: Sagi Grimberg , Keith Busch , James Smart , Hannes Reinecke , Ewan Milne , Max Gurtovoy , Linux NVMe Mailinglist , Linux Kernel Mailinglist Subject: Re: [PATCH 0/4] Rework NVMe abort handling Message-ID: <20180719141025.yveza2svhvc2r4lw@linux-x5ow.site> References: <20180719132838.15556-1-jthumshirn@suse.de> <20180719134203.GA15212@lst.de> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20180719134203.GA15212@lst.de> User-Agent: NeoMutt/20170912 (1.9.0) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jul 19, 2018 at 03:42:03PM +0200, Christoph Hellwig wrote: > Without even looking at the code yet: why? The nvme abort isn't > very useful, and due to the lack of ordering between different > queues almost harmful on fabrics. What problem do you try to > solve? The problem I'm trying to solve here is really just single commands timing out because of i.e. a bad switch in between which causes frame loss somewhere. I know RDMA and FC are defined to be lossless but reality sometimes has a different view on this (can't talk too much for RDMA but I've had some nice bugs in SCSI due to faulty switches dropping odd frames). Of cause we can still do the big hammer if one command times out due to a misbehaving switch but we can also at least try to abort it. I know aborts are defined as best effort, but as we're in the error path anyways it doesn't hurt to at least try. This would give us a chance to recover from such situations, of cause given the target actually does something when receiving an abort. In the FC case we can even send an ABTS and try to abort the command on the FC side first, before doing it on NVMe. I'm not sure if we can do it on RDMA or PCIe as well. So the issue I'm trying to solve is easy, if one command times out for whatever reason, there's no need to go the big transport reset route before not even trying to recover from it. Possibly we should also try doing a queue reset if aborting failed before doing the transport reset. Byte, Johannes -- Johannes Thumshirn Storage jthumshirn@suse.de +49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg GF: Felix Imend?rffer, Jane Smithard, Graham Norton HRB 21284 (AG N?rnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850