Received: by 10.213.65.68 with SMTP id h4csp2238448imn; Thu, 29 Mar 2018 21:59:31 -0700 (PDT) X-Google-Smtp-Source: AIpwx49NAjGitk0z+3Tux/yAZt6s03RtH7uk/X24dq64SatT8EQsRsp6PTjO85KYWLbTveeSUyiT X-Received: by 2002:a17:902:684d:: with SMTP id f13-v6mr11225055pln.230.1522385971408; Thu, 29 Mar 2018 21:59:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1522385971; cv=none; d=google.com; s=arc-20160816; b=E0bsXPL4Qd6p0l5Wl4+xt6tA+IXbMUnVg+QY0uAY8M0sFbBZ8Z3+7/fWTrgJGQKNR6 DUAo9+qvbD/9i0uyp5BEDR//s/lWcai1Gupjkf46MpmUpy3keRCR/1G3MBVjXd5VrA5C wex28jY1hXluX60d47/DCt9WoClsdQDAjbDZYslcABKShcvrYY6E4qG/tXNZgzhb9ati GNCLMUMrt4jPDOujE961l7jQ/XepjO1nOrTYeZ2CUgRDSGP0jvJvtA9fLRMsU5kGyF2P VoyA7BjpWX36QnP0phjoXVai0s4Wj5orRq7rhCL1hjeQd0GWnmaq46ZqeO0vkwSU13GP 5NXQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=y5yo8MqqjE3Kb2WJwSRnfMGJjoxlhf8kb4QpoXKV/9w=; b=JJi82m5E8+RTKq7m7lsV8brelTUnCctKPckYnDmqy2DZl4QeNRTmUf2F3Vm1LCqJ9G L+MZMAzeUxlX4a8xr7F7MweXeDaQaEiygmpSvWrVwvHYfpxdVxK3JlCNedLwJ9xRwwy9 qV19GcBbrwXxI7PEL7HKKSZKGi7vVZnEe4JJFBt6bWTLS1YMD5u/iR9fzUaBI3w1KpeQ Ch3hSQQu8zMpbBEys+cjQP90YTeCJy8VQ5P4WI6gR0k3pt0WkhOkEvxKodcZB7sC7Ofp szYZGez2AYvZirl95s7dtzl2dUvIeS30VB6leAqkEf9O9Yd29HJqJQpIkSPICn0dLCj9 3caA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=LKXzXwx1; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f5si5053908pgp.684.2018.03.29.21.58.49; Thu, 29 Mar 2018 21:59:31 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=LKXzXwx1; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751267AbeC3E52 (ORCPT + 99 others); Fri, 30 Mar 2018 00:57:28 -0400 Received: from mail-qt0-f170.google.com ([209.85.216.170]:36012 "EHLO mail-qt0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750741AbeC3E51 (ORCPT ); Fri, 30 Mar 2018 00:57:27 -0400 Received: by mail-qt0-f170.google.com with SMTP id w23so358030qtn.3 for ; Thu, 29 Mar 2018 21:57:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=y5yo8MqqjE3Kb2WJwSRnfMGJjoxlhf8kb4QpoXKV/9w=; b=LKXzXwx1McRiNHuRGb4haLm7IQfWZnLnSrL658K9gj//R6pST8ArY/SLiNOk+02Vjk WJ86NQ4pJZoW+gLfLWRpKCaDu0u7kKI3Ppya98FngEOrBp89i6RcAdAUd99XYjkfQbpW o8D948HfsvFNpOAauwkPGCuuTBUV6aXqYjF9lJoCVEWL1QIETKt7R/09EGVOKqXpTIJ1 cR/kH9S/5rfcrk0QADjOBwqAtMAWeUwDTAEiEdh7FborHMVDAkNf93GORAQZmVr5Zgsh b/Ce0ciY/MnX7SJI/fZ/PXL0Dff5E6Aj2NwWwyGejqXAAuR1BUbpsW+AAkuGXI8L4RL3 OZOA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=y5yo8MqqjE3Kb2WJwSRnfMGJjoxlhf8kb4QpoXKV/9w=; b=Mkc6Bn+Bkdy3LAOkab1lrLEhZZyTdIyp9sviQ8NublhwTk/8QrI4GHmdUVA7M3UIV8 +FqQY1T5dawAd0u3RVxVl2+p99iUUQPktBf7BCitT0cIrxDHsGgcbz6aM15OOQGnorb+ ieJtUxFrmEkkaQSplN+/KGDbp6W2ewosnGrAK9yudQxabuNpUCmeTqw7IqwfS/K3zD4S JZLC4/a3wBiBssQlllZtpjObyhOmFJA4Me4pXjC98rBgk4nOSTtWqZ8uuUagGNsT3Lg5 exNBu9eAhLIWqaj9AJ/IlMbK/EdVT0a3p8OcAKyVz/cXTozmlQ/w77PNT3Fw+I6gpfvn 8uFg== X-Gm-Message-State: AElRT7HGiAR0de3VcF4oYbtjAoAYVUjyG3KcjjbQoBdsL21M63nDt621 cIHtUV2/npXBI883ebI9xTb5NHSsVKGWbobogDE= X-Received: by 10.237.62.130 with SMTP id n2mr15765725qtf.81.1522385846167; Thu, 29 Mar 2018 21:57:26 -0700 (PDT) MIME-Version: 1.0 Received: by 10.200.6.9 with HTTP; Thu, 29 Mar 2018 21:57:25 -0700 (PDT) In-Reply-To: <20180328194741.GJ13039@localhost.localdomain> References: <20180327043851.6640-1-baegjae@gmail.com> <20180328080646.GB20373@lst.de> <20180328194741.GJ13039@localhost.localdomain> From: Baegjae Sung Date: Fri, 30 Mar 2018 13:57:25 +0900 Message-ID: Subject: Re: [PATCH] nvme-multipath: implement active-active round-robin path selector To: Keith Busch Cc: Christoph Hellwig , axboe@fb.com, sagi@grimberg.me, linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org, Eric Chang Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 2018-03-29 4:47 GMT+09:00 Keith Busch : > On Wed, Mar 28, 2018 at 10:06:46AM +0200, Christoph Hellwig wrote: >> For PCIe devices the right policy is not a round robin but to use >> the pcie device closer to the node. I did a prototype for that >> long ago and the concept can work. Can you look into that and >> also make that policy used automatically for PCIe devices? > > Yeah, that is especially true if you've multiple storage accessing > threads scheduled on different nodes. On the other hand, round-robin > may still benefit if both paths are connected to different root ports > on the same node (who would do that?!). > > But I wasn't aware people use dual-ported PCIe NVMe connected to a > single host (single path from two hosts seems more common). If that's a > thing, we should get some numa awareness. I couldn't find your prototype, > though. I had one stashed locally from a while back and hope it resembles > what you had in mind: Our prototype uses dual-ported PCIe NVMe connected to a single host. The host's HBA is connected to two switches, and the two switches are connected to a dual-port NVMe SSD. In this environment, active-active round-robin path selection is good to utilize the full performance of a dual-port NVMe SSD. You can also fail over a single switch failure. You can see the prototype in link below. https://youtu.be/u_ou-AQsvOs?t=307 (presentation in OCP Summit 2018) I agree that active-standby closer path selection is the right policy if multiple nodes attempt to access the storage system through multiple paths. However, I believe that NVMe multipath needs to provide multiple policy for path selection. Some people may want to use multiple paths simultaneously (active-active) if they use a small number of nodes and want to utilize full capability. If the capability of paths is same, the round-robin can be the right policy. If the capability of paths is different, a more adoptive method would be needed (e.g., checking path condition to balance IO). We are moving to the NVMe fabrics for our next prototype. So, I think we will have a chance to discuss about this policy issue in more detail. I will continue to follow this issue. > --- > struct nvme_ns *nvme_find_path_numa(struct nvme_ns_head *head) > { > int distance, current = INT_MAX, node = cpu_to_node(smp_processor_id()); > struct nvme_ns *ns, *path = NULL; > > list_for_each_entry_rcu(ns, &head->list, siblings) { > if (ns->ctrl->state != NVME_CTRL_LIVE) > continue; > if (ns->disk->node_id == node) > return ns; > > distance = node_distance(node, ns->disk->node_id); > if (distance < current) { > current = distance; > path = ns; > } > } > return path; > } > --