Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp363266imm; Thu, 31 May 2018 01:38:50 -0700 (PDT) X-Google-Smtp-Source: ADUXVKLNGn3IhQaPHn75bFgyLCOxMaIW95u8Byoq4Y2hzuW4BWEIvp47kc1SLWtYCUvLvmWuG47K X-Received: by 2002:a17:902:5a87:: with SMTP id r7-v6mr6168151pli.78.1527755930458; Thu, 31 May 2018 01:38:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527755930; cv=none; d=google.com; s=arc-20160816; b=TXVxEW0z+a6OL46R1nZVgM2iCfzdDOXUJE7gc7MfvVNObYY34HsYN+symeojsFGdjQ +fsqZWACaOMI/rYzft0Aj8FC5Ftrv4T0wVSjVIW+k//wV1eMQv9cdgYXRm1OnOmBfIan BolQiRs3v/qMWUkEX/TOeBnIMETbTLy8tTNfkduSoc6fq/Ic7QXFbFcW8NZJ7XtDkDYO S15NzeZGzDd/JaBzirbrdlRBaMQEwFOedKwKH7GD9pvFI+jdhKiiAI7NIECFMcN/4Dg2 kTRdA8Rp802nUonuU6SeL+ZJdWzvo00sxEKRK99TFBJ/81lrNPyVHcVptcnb7Mq2Hsi/ tadQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=YIONh6EfyjGuAFLnqC4QzlphXtbREJBpPmooW0R3P5Y=; b=QErKre3LmPNslQ+q4Pe7UW06Y2mcBHSIYrLBUEFP263tnnzgL2zh65c6v25/7E3jDi fbDvdUrRAT7A0vumPcqGRDa/AMkW7q0NJWt2KT5AKud4OD3AMKH51vyNL8E5oKvTUTyy ihLTlc9F/tXNGyKvxKoVGhZwWmki85ZYbgF3Flqprvzkd3Ohg649SWhsyvngxpJBVtQI grzbTR/ZFLGsijydp8cOJhu1QgYtwQAAa42eGU2m0n0WAYYKPdeYDrQOznHGpcPP3MkI 7akY2/3YD50FrfgfjZ4KXe5f4DvQqOSVBli+zTvb/IGd+oB4lWr083TbaFnBEnO3aTr3 bmeQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u19-v6si36910727pfn.241.2018.05.31.01.38.36; Thu, 31 May 2018 01:38:50 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754114AbeEaIh3 (ORCPT + 99 others); Thu, 31 May 2018 04:37:29 -0400 Received: from mail-wm0-f48.google.com ([74.125.82.48]:55211 "EHLO mail-wm0-f48.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754023AbeEaIhY (ORCPT ); Thu, 31 May 2018 04:37:24 -0400 Received: by mail-wm0-f48.google.com with SMTP id f6-v6so52452305wmc.4 for ; Thu, 31 May 2018 01:37:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=YIONh6EfyjGuAFLnqC4QzlphXtbREJBpPmooW0R3P5Y=; b=S/WbpDf62P/gJuKKtyWBC10QhzusYnb1jdXs3g5R0hj+Lvs1Rs2+CoEzQbN/HbnHGE BuiFTBt3pk5wLJci95yKGCU3u3HonTV4fRFvDld5ja4O7e7THPcKO3bMbOxMwc5JVFvi 0+uIFUdYddgEKTSoXEsUXZkXfyVndEkupVj265U9/fwDMxkRKdsO0IWNXUV/uyCOfZl2 2IzRCaL/aUrq3TN0zJBI91J68wOKPO3Tg53Z+wXqRCauwq6zkwZnVPf0bp8XR52BISy+ KRdanAC9CZMvB0h0eSzdVH64lh3ZfLkQlVGlOSMWEjP62A26yLpZEax9I8YWpB1D/Y+o NnFA== X-Gm-Message-State: APt69E2hAwR9fMWgx/PYPKpX668NmPSD5GkzqmZM/LZM0MSv8eVUKY+z A1oBARA2A+0/CIcuyEZK+V4= X-Received: by 2002:a1c:2350:: with SMTP id j77-v6mr3448563wmj.108.1527755843434; Thu, 31 May 2018 01:37:23 -0700 (PDT) Received: from [192.168.64.169] (bzq-219-42-90.isdn.bezeqint.net. [62.219.42.90]) by smtp.gmail.com with ESMTPSA id o9-v6sm37251919wrn.74.2018.05.31.01.37.21 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 31 May 2018 01:37:22 -0700 (PDT) Subject: Re: [PATCH 0/3] Provide more fine grained control over multipathing To: Mike Snitzer Cc: Christoph Hellwig , Johannes Thumshirn , Keith Busch , Hannes Reinecke , Laurence Oberman , Ewan Milne , James Smart , Linux Kernel Mailinglist , Linux NVMe Mailinglist , "Martin K . Petersen" , Martin George , John Meneghini References: <20180525125322.15398-1-jthumshirn@suse.de> <20180525130535.GA24239@lst.de> <20180525135813.GB9591@redhat.com> <20180530220206.GA7037@redhat.com> From: Sagi Grimberg Message-ID: Date: Thu, 31 May 2018 11:37:20 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <20180530220206.GA7037@redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > Wouldn't expect you guys to nurture this 'mpath_personality' knob. SO > when features like "dispersed namespaces" land a negative check would > need to be added in the code to prevent switching from "native". > > And once something like "dispersed namespaces" lands we'd then have to > see about a more sophisticated switch that operates at a different > granularity. Could also be that switching one subsystem that is part of > "dispersed namespaces" would then cascade to all other associated > subsystems? Not that dissimilar from the 3rd patch in this series that > allows a 'device' switch to be done in terms of the subsystem. Which I think is broken by allowing to change this personality on the fly. > > Anyway, I don't know the end from the beginning on something you just > told me about ;) But we're all in this together. And can take it as it > comes. I agree but this will be exposed to user-space and we will need to live with it for a long long time... > I'm merely trying to bridge the gap from old dm-multipath while > native NVMe multipath gets its legs. > > In time I really do have aspirations to contribute more to NVMe > multipathing. I think Christoph's NVMe multipath implementation of > bio-based device ontop on NVMe core's blk-mq device(s) is very clever > and effective (blk_steal_bios() hack and all). That's great. >> Don't get me wrong, I do support your cause, and I think nvme should try >> to help, I just think that subsystem granularity is not the correct >> approach going forward. > > I understand there will be limits to this 'mpath_personality' knob's > utility and it'll need to evolve over time. But the burden of making > more advanced NVMe multipath features accessible outside of native NVMe > isn't intended to be on any of the NVMe maintainers (other than maybe > remembering to disallow the switch where it makes sense in the future). I would expect that any "advanced multipath features" would be properly brought up with the NVMe TWG as a ratified standard and find its way to nvme. So I don't think this particularly is a valid argument. >> As I said, I've been off the grid, can you remind me why global knob is >> not sufficient? > > Because once nvme_core.multipath=N is set: native NVMe multipath is then > not accessible from the same host. The goal of this patchset is to give > users choice. But not limit them to _only_ using dm-multipath if they > just have some legacy needs. > > Tough to be convincing with hypotheticals but I could imagine a very > obvious usecase for native NVMe multipathing be PCI-based embedded NVMe > "fabrics" (especially if/when the numa-based path selector lands). But > the same host with PCI NVMe could be connected to a FC network that has > historically always been managed via dm-multipath.. but say that > FC-based infrastructure gets updated to use NVMe (to leverage a wider > NVMe investment, whatever?) -- but maybe admins would still prefer to > use dm-multipath for the NVMe over FC. You are referring to an array exposing media via nvmf and scsi simultaneously? I'm not sure that there is a clean definition of how that is supposed to work (ANA/ALUA, reservations, etc..) >> This might sound stupid to you, but can't users that desperately must >> keep using dm-multipath (for its mature toolset or what-not) just >> stack it on multipath nvme device? (I might be completely off on >> this so feel free to correct my ignorance). > > We could certainly pursue adding multipath-tools support for native NVMe > multipathing. Not opposed to it (even if just reporting topology and > state). But given the extensive lengths NVMe multipath goes to hide > devices we'd need some way to piercing through the opaque nvme device > that native NVMe multipath exposes. But that really is a tangent > relative to this patchset. Since that kind of visibility would also > benefit the nvme cli... otherwise how are users to even be able to trust > but verify native NVMe multipathing did what it expected it to? Can you explain what is missing for multipath-tools to resolve topology? nvme list-subsys is doing just that, doesn't it? It lists subsys-ctrl topology but that is sort of the important information as controllers are the real paths.