Received: by 10.213.65.68 with SMTP id h4csp787735imn; Wed, 28 Mar 2018 12:46:45 -0700 (PDT) X-Google-Smtp-Source: AIpwx49LOtFCQ23VvcKwwc9mt4OdJx3WQ3RYF2sutnYZ0nkd8KXZpAk6TuwTKyr6ICGwH0mAebum X-Received: by 10.98.138.205 with SMTP id o74mr4026900pfk.114.1522266404983; Wed, 28 Mar 2018 12:46:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1522266404; cv=none; d=google.com; s=arc-20160816; b=bABzG5kdDrSqn5R2Uri4uhxS9rikDlGK5/zqj7yGrt9HNNZXX5ugvLQpWdXGarNZVs H7IV1qZQAImkrK1HjS60APgmVKcWn3OKfdS3geFPtgRAQyqG9CFgCnAaUgSR4FpUrnVu 0lAVktlYoV4VSxxSsNpqkwuy8xrOxo/PIwxNOLB9acWsfPJpAh0gCQmh95Rqbvzj7Vcq rguznrac8FfBPdiyW8tSe01Yr9Lmihn+oYLg0+01jYjcf5dWmiweviqhHuSmmkIfWEuE ggOwbVmjkzPJ7IeDRWDdhq3KVho7gpm+IB/PVqYSuydtLjZZrH/TuUeM3ef/ERwSwtdT OIvg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=mclBwhwvb4pNti+16LnSsNyeyu5opfSYBK+u8d09a/4=; b=E0OTbSXxH03EvXm+JAhaJZ/ADRoIqu4hSzR1ECZSW13QmcblOUAlv6f9r68EbsII5C nR7vmnWshOp4ygOH441VbtvTYNysOe5Do7MBpe2rEgEGpgM7e+tAgIxoiNCxyaSyt8Cp KiJkF2+5/rHDhEoqzlXIRAm3b+AYRXPcwj2s26ThUQk8dpwWoI23fUkS6DEySG1ybanu L73tV1B5Qfkz5IGNV/ZWcfwBLMhDvDrspvuHNf86TUEmJWcsVV752pIbpZuvGwzs4Erz 3qwcCw5g6BdujrVLtkV7Yp0B3JQw9EBqXTxZWFUhG78gkz/tEhC/ynQGgtRvBfLcuCYD Wofg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v12-v6si4244070plk.615.2018.03.28.12.46.31; Wed, 28 Mar 2018 12:46:44 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753355AbeC1TpQ (ORCPT + 99 others); Wed, 28 Mar 2018 15:45:16 -0400 Received: from mga02.intel.com ([134.134.136.20]:4308 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753295AbeC1TpP (ORCPT ); Wed, 28 Mar 2018 15:45:15 -0400 X-Amp-Result: UNSCANNABLE X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 28 Mar 2018 12:45:14 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.48,372,1517904000"; d="scan'208";a="186826936" Received: from unknown (HELO localhost.localdomain) ([10.232.112.44]) by orsmga004.jf.intel.com with ESMTP; 28 Mar 2018 12:45:14 -0700 Date: Wed, 28 Mar 2018 13:47:41 -0600 From: Keith Busch To: Christoph Hellwig Cc: Baegjae Sung , axboe@fb.com, sagi@grimberg.me, linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] nvme-multipath: implement active-active round-robin path selector Message-ID: <20180328194741.GJ13039@localhost.localdomain> References: <20180327043851.6640-1-baegjae@gmail.com> <20180328080646.GB20373@lst.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180328080646.GB20373@lst.de> User-Agent: Mutt/1.9.1 (2017-09-22) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 28, 2018 at 10:06:46AM +0200, Christoph Hellwig wrote: > For PCIe devices the right policy is not a round robin but to use > the pcie device closer to the node. I did a prototype for that > long ago and the concept can work. Can you look into that and > also make that policy used automatically for PCIe devices? Yeah, that is especially true if you've multiple storage accessing threads scheduled on different nodes. On the other hand, round-robin may still benefit if both paths are connected to different root ports on the same node (who would do that?!). But I wasn't aware people use dual-ported PCIe NVMe connected to a single host (single path from two hosts seems more common). If that's a thing, we should get some numa awareness. I couldn't find your prototype, though. I had one stashed locally from a while back and hope it resembles what you had in mind: --- struct nvme_ns *nvme_find_path_numa(struct nvme_ns_head *head) { int distance, current = INT_MAX, node = cpu_to_node(smp_processor_id()); struct nvme_ns *ns, *path = NULL; list_for_each_entry_rcu(ns, &head->list, siblings) { if (ns->ctrl->state != NVME_CTRL_LIVE) continue; if (ns->disk->node_id == node) return ns; distance = node_distance(node, ns->disk->node_id); if (distance < current) { current = distance; path = ns; } } return path; } --