Received: by 10.213.65.68 with SMTP id h4csp279019imn; Mon, 26 Mar 2018 21:40:43 -0700 (PDT) X-Google-Smtp-Source: AG47ELsglNOBkxjYssR7Yiz0QKzi7VYy809msAWA+vn+RUQXEwR9LzNuH3yu6QGDXer7t3AraoTd X-Received: by 2002:a17:902:8a93:: with SMTP id p19-v6mr36304795plo.285.1522125643738; Mon, 26 Mar 2018 21:40:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1522125643; cv=none; d=google.com; s=arc-20160816; b=FRSJYtsMmvbeFKgmGa1YJyu0OAOEkXTn0HBvwMRhn6/t+7t5D6i9LjILOuBRmuh8ku LGdksX6Gvp6i9VejVG58ilT4Fi5JODCnF3ITgDTyQ0+2DsSQ+md34Mg91Mi/VKNAxPgN M3QgLk4aWmfKZcvJlIQ/hRIm67ASYD2C3LTNbo55a+5Vhw3eusNHGUOHVjGbVIK4Tuvn n36eLUHgxqt0wiAI4D2INcEPwR02BCbvokxOd5cIZS9/6BuEJ0UkUM35FCQnAp8ziSAa nfZoFZ+8JrWxbe6a6OdOAj1uZNMlT0DWhN9/HmkPti8W8aLFLVjuKHYOh0D1mRZCKO4Y TjNQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from :dkim-signature:arc-authentication-results; bh=VQuULkPYIT72STrlnjuz5qXGmedXIIQPB5xg0jap7jA=; b=D7stIBWXBwkhXTkS5Evu5sDCelJw587CwebphrRlNYJ3/wM1EAZfkkbPb4mankYLTv Mo62bTIQNfOtDw9do1lhqhF5KIakp+PMZXeq8cG7uKmhW9g1QKDRp8VnJ1U5QaZeh0Le 1MuJhqmOKO188lIScD8BXmWlY6mVSOOXHzNCHWWGlmCk2d6iPU4AyJ2U2c6kFw/cMnoa K0d5Y+/JtTvKTm/mMgmkHo5/uGmWcrAidkZkr2lE74bKnh4mo7hIWr+YnThmKb1d9IBf c2oH4KCp1loR7pa6lONMadFBlqhhaLWBTXgEknbi78cekgZnk10W3vgF1KQo8VKqF3R6 fLag== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=V8XLzvvh; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i7-v6si411002plt.374.2018.03.26.21.40.29; Mon, 26 Mar 2018 21:40:43 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=V8XLzvvh; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751059AbeC0EjI (ORCPT + 99 others); Tue, 27 Mar 2018 00:39:08 -0400 Received: from mail-pl0-f66.google.com ([209.85.160.66]:43895 "EHLO mail-pl0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750848AbeC0EjH (ORCPT ); Tue, 27 Mar 2018 00:39:07 -0400 Received: by mail-pl0-f66.google.com with SMTP id f23-v6so13399334plr.10 for ; Mon, 26 Mar 2018 21:39:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=VQuULkPYIT72STrlnjuz5qXGmedXIIQPB5xg0jap7jA=; b=V8XLzvvhQjl21H3FXtg3kgGhMGFcJZI6X6I8Mk+h8+ohbEsdBeMyHa1gC9LLeg8mYM ti82N1Z5G3LM6qCeHojPQhcqYKXu/9lXm9cEkm3AJwosVeaVhhNpYusCPxKXt7A9NyBu bmC/vERbtpTiJg4Y+YQLe++m//N8ley/3oWpzf+q5AHz2LIDBiky4ZVSTC7kyb+qV/xD nC1LGwqvtEhSi8SflWg4s5CQ9GpLw83P02q7MTgVPxD+3bgyWyAIKKvvmzPw/F7Ml7kY IJh6Ff1gm2QccHWH+1AI95HnhXXl5/oAUFO7KlYsoWJdrbo+IGwSqPXTc7n2LVFby/5A ygcA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=VQuULkPYIT72STrlnjuz5qXGmedXIIQPB5xg0jap7jA=; b=CoHJ0t/hpvekD1ad+MoDjff69rDQnDgr4aE6zQJYnYSmzwxNJ+MzPKlMixxJ46ny3Z saHBsVqnsr54BYFB7mRb8JynyrvJwHSegiASGT2H8/+btMqNRQTF5Np5r+W+tiepnkTd x4GnsRTMsVOr0tvuzyCSEysTu0+v483PQUIiqUyOtQC5N0kW6Bll6FTR0K2CQftg4AW0 WTtxFhs4lqzO+5EamQKB/7BijuH6bb+9c3QHGM0YUeH+3NQ6qIdMPevK+iFx+He9HF9Z GDMFE9A9+kM7cVtVY2OkmVH+nsvzHGe44gnWpQXqaEH5Y0rlmVhKYI5WLn+UWb9Hmw/j pLJw== X-Gm-Message-State: AElRT7FbXOOFLkGlnVDT7AYfjs8lwl1ArX2HxGyFZvU6s6U95+C0YnBk RmfbQKr0sdBZuYaM0A2qcrK66w== X-Received: by 2002:a17:902:128c:: with SMTP id g12-v6mr42571234pla.98.1522125546189; Mon, 26 Mar 2018 21:39:06 -0700 (PDT) Received: from localhost.localdomain ([203.236.8.208]) by smtp.gmail.com with ESMTPSA id n125sm608825pfn.41.2018.03.26.21.39.01 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 26 Mar 2018 21:39:03 -0700 (PDT) From: Baegjae Sung To: keith.busch@intel.com, axboe@fb.com, hch@lst.de, sagi@grimberg.me, baegjae@gmail.com Cc: linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org Subject: [PATCH] nvme-multipath: implement active-active round-robin path selector Date: Tue, 27 Mar 2018 13:38:51 +0900 Message-Id: <20180327043851.6640-1-baegjae@gmail.com> X-Mailer: git-send-email 2.8.3.windows.1 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Some storage environments (e.g., dual-port NVMe SSD) provide higher performance when using multiple paths simultaneously. Choosing a path from multiple paths in a round-robin fashion is a simple and efficient way to meet these requirements. We implement the active-active round-robin path selector that chooses the path that is NVME_CTRL_LIVE and next to the previous path. By maintaining the structure of the active-standby path selector, we can easily switch between the active-standby path selector and the active-active round-robin path selector. Example usage) # cat /sys/block/nvme0n1/mpath_policy [active-standby] round-robin # echo round-robin > /sys/block/nvme0n1/mpath_policy # cat /sys/block/nvme0n1/mpath_policy active-standby [round-robin] Below are the results from a physical dual-port NVMe SSD using fio. (MB/s) active-standby round-robin Random Read (4k) 1,672 2,640 Sequential Read (128k) 1,707 3,414 Random Write (4k) 1,450 1,728 Sequential Write (128k) 1,481 2,708 A single thread was used for sequential workloads and 16 threads were used for random workloads. The queue depth for each thread was 64. Signed-off-by: Baegjae Sung --- drivers/nvme/host/core.c | 49 +++++++++++++++++++++++++++++++++++++++++++ drivers/nvme/host/multipath.c | 45 ++++++++++++++++++++++++++++++++++++++- drivers/nvme/host/nvme.h | 8 +++++++ 3 files changed, 101 insertions(+), 1 deletion(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 7aeca5db7916..cc91e8b247d0 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -68,6 +68,13 @@ static bool streams; module_param(streams, bool, 0644); MODULE_PARM_DESC(streams, "turn on support for Streams write directives"); +#ifdef CONFIG_NVME_MULTIPATH +static const char *const mpath_policy_name[] = { + [NVME_MPATH_ACTIVE_STANDBY] = "active-standby", + [NVME_MPATH_ROUND_ROBIN] = "round-robin", +}; +#endif + /* * nvme_wq - hosts nvme related works that are not reset or delete * nvme_reset_wq - hosts nvme reset works @@ -2603,12 +2610,51 @@ static ssize_t nsid_show(struct device *dev, struct device_attribute *attr, } static DEVICE_ATTR_RO(nsid); +#ifdef CONFIG_NVME_MULTIPATH +static ssize_t mpath_policy_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + int i, len = 0; + struct nvme_ns_head *head = dev_to_ns_head(dev); + + for (i = 0;i < ARRAY_SIZE(mpath_policy_name);i++) { + if (i == head->mpath_policy) + len += sprintf(buf + len, "[%s] ", mpath_policy_name[i]); + else + len += sprintf(buf + len, "%s ", mpath_policy_name[i]); + } + len += sprintf(buf + len, "\n"); + return len; +} +static ssize_t mpath_policy_store(struct device *dev, + struct device_attribute *attr, const char *buf, + size_t count) +{ + int i; + struct nvme_ns_head *head = dev_to_ns_head(dev); + + for (i = 0;i < ARRAY_SIZE(mpath_policy_name);i++) { + if (strncmp(buf, mpath_policy_name[i], count - 1) == 0) { + head->mpath_policy = i; + dev_info(dev, "change mpath policy to %s\n", mpath_policy_name[i]); + } + } + return count; +} +static DEVICE_ATTR(mpath_policy, S_IRUGO | S_IWUSR, mpath_policy_show, \ + mpath_policy_store); +#endif + static struct attribute *nvme_ns_id_attrs[] = { &dev_attr_wwid.attr, &dev_attr_uuid.attr, &dev_attr_nguid.attr, &dev_attr_eui.attr, &dev_attr_nsid.attr, +#ifdef CONFIG_NVME_MULTIPATH + &dev_attr_mpath_policy.attr, +#endif NULL, }; @@ -2818,6 +2864,9 @@ static struct nvme_ns_head *nvme_alloc_ns_head(struct nvme_ctrl *ctrl, head->subsys = ctrl->subsys; head->ns_id = nsid; kref_init(&head->ref); +#ifdef CONFIG_NVME_MULTIPATH + head->mpath_policy = NVME_MPATH_ACTIVE_STANDBY; +#endif nvme_report_ns_ids(ctrl, nsid, id, &head->ids); diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c index 060f69e03427..6b6a15ccb542 100644 --- a/drivers/nvme/host/multipath.c +++ b/drivers/nvme/host/multipath.c @@ -75,6 +75,42 @@ inline struct nvme_ns *nvme_find_path(struct nvme_ns_head *head) return ns; } +inline struct nvme_ns *nvme_find_path_rr(struct nvme_ns_head *head) +{ + struct nvme_ns *prev_ns = srcu_dereference(head->current_path, &head->srcu); + struct nvme_ns *ns, *cand_ns = NULL; + bool after_prev_ns = false; + + /* + * Active-active round-robin path selector + * Choose the path that is NVME_CTRL_LIVE and next to the previous path + */ + + /* Case 1. If there is no previous path, choose the first LIVE path */ + if (!prev_ns) { + ns = __nvme_find_path(head); + return ns; + } + + list_for_each_entry_rcu(ns, &head->list, siblings) { + /* + * Case 2-1. Choose the first LIVE path from the next path of + * previous path to end + */ + if (after_prev_ns && ns->ctrl->state == NVME_CTRL_LIVE) { + rcu_assign_pointer(head->current_path, ns); + return ns; + } + /* Case 2-2. Mark the first LIVE path from start to previous path */ + if (!cand_ns && ns->ctrl->state == NVME_CTRL_LIVE) + cand_ns = ns; + if (ns == prev_ns) + after_prev_ns = true; + } + rcu_assign_pointer(head->current_path, cand_ns); + return cand_ns; +} + static blk_qc_t nvme_ns_head_make_request(struct request_queue *q, struct bio *bio) { @@ -85,7 +121,14 @@ static blk_qc_t nvme_ns_head_make_request(struct request_queue *q, int srcu_idx; srcu_idx = srcu_read_lock(&head->srcu); - ns = nvme_find_path(head); + switch (head->mpath_policy) { + case NVME_MPATH_ROUND_ROBIN: + ns = nvme_find_path_rr(head); + break; + case NVME_MPATH_ACTIVE_STANDBY: + default: + ns = nvme_find_path(head); + } if (likely(ns)) { bio->bi_disk = ns->disk; bio->bi_opf |= REQ_NVME_MPATH; diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index d733b14ede9d..15e1163bbf2b 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -128,6 +128,13 @@ enum nvme_ctrl_state { NVME_CTRL_DEAD, }; +#ifdef CONFIG_NVME_MULTIPATH +enum nvme_mpath_policy { + NVME_MPATH_ACTIVE_STANDBY, + NVME_MPATH_ROUND_ROBIN, /* active-active round-robin */ +}; +#endif + struct nvme_ctrl { enum nvme_ctrl_state state; bool identified; @@ -250,6 +257,7 @@ struct nvme_ns_head { struct bio_list requeue_list; spinlock_t requeue_lock; struct work_struct requeue_work; + enum nvme_mpath_policy mpath_policy; #endif struct list_head list; struct srcu_struct srcu; -- 2.16.2