Hi,
This is v11 of the passthru patchset. There was little review for v10
so I'm hoping to get some more review this cycle.
v10 addressed the vast majority of Christoph's feedback from v9.There
were a couple issues that were not addressed discussed below.
I don't think cloning the ctrl_id or the subsysnqn is a good idea.
I sent an email trying to explain why here[1] but there was no response.
In short, I think cloning the ctrl_id will break multipathing over
fabrics and copying the subsysnqn only has the effect of breaking
loopback; the user can always copy the underlying subsysnqn if it
makes sense for their overall system.
I maintain overriding the CMIC bit in the ctrl id is necessary to
allow multipath over fabrics even if the underlying device did
not support multipath.
I also think the black list for admin commands is appropriate, and I
added it based on Sagi's feedback[2]. There are plenty of commands that
may be dangerous like firmware update and format NVM commands, and NS
attach commands won't work out of the box because we don't copy the
ctrl_id. It seems like there's more commands to be careful of than ones
that are that are obviously acceptable. So, I think the prudent course
is blacklisting by default until someone has a usecase and can show
the command is safe seems and makes sense. For our present use cases,
the identify, log page and vendor specific commands are all that we
care about.
A git branch is available here:
https://github.com/sbates130272/linux-p2pmem nvmet_passthru_v11
Thanks,
Logan
[1] https://lore.kernel.org/linux-block/[email protected]/
[2] https://lore.kernel.org/linux-block/[email protected]/
--
v11 Changes:
1. Rebased onto v5.6-rc2
2. Collected Max's Reviewed-By tag
v10 Changes:
1. Rebased onto v5.5-rc1
2. Disable all exports in core nvme if CONFIG_NVME_TARGET_PASSTHRU is
not set and put them near the end of the file with a big fat
comment (per Christoph)
3. Don't fake up the vs field: pass it through as is and bump
it to 1.2.1 if it is below that (per Christoph)
4. Rework how passthru requests are submitted into the core
with proper nvme_passthru_start/end handling (per Christoph)
5. Rework how commands are parsed with passthru hooks in
parse_admin_cmd() and nvmet_parse_io_cmd() (per Christoph)
6. Rework commands are handled so they are only done in a work
item if absolutely necessary (per Christoph)
7. The data_len hack was dropped as a patchset was introduced to
remove data_len altogether (per Christoph)
8. The passthru accounting changes are now in v5.5-rc1
9. A large number of other minor cleanups that were pointed out by
Christoph
v9 Changes:
1. Rebased onto v5.4-rc2 (required adjusting nvme_identify_ns() usage)
2. Collected Sagi's Reviewed-By Tags
3. Squashed seperate Kconfig patch into passthru patch (Per Sagi)
4. Set REQ_FUA for flush requests and remove special casing
on RQF_IO_STAT (Per Sagi)
v8 Changes:
1. Rebased onto v5.3-rc6
2. Collected Max's Reviewed-By tags
3. Converted admin command black-list to a white-list, but
allow all vendor specific commands. With this, we feel
it's safe to allow multiple connections from hosts.
(As per Sagi's feedback)
v7 Changes:
1. Rebased onto v5.3-rc2
2. Rework nvme_ctrl_get_by_path() to use filp_open() instead of
the cdev changes that were in v6. (Per Al)
3. Override the cmic bit to allow multipath and allow
multiple connections from the same hostnqn. (At the same
time I cleaned up the method of rejecting multiple connections.)
See Patch 8)
4. Found a bug when used with the tcp transport (See Patch 10)
v6 Changes:
1. Rebased onto v5.3-rc1
2. Rework configfs interface to simply be a passthru directory
within the existing subsystem. The directory is similar to
and consistent with a namespace directory.
3. Have the configfs take a path instead of a bare controller name
4. Renamed the main passthru file to io-cmd-passthru.c for consistency
with the file and block-dev methods.
5. Cleaned up all the CONFIG_NVME_TARGET_PASSTHRU usage to remove
all the inline #ifdefs
6. Restructured nvmet_passthru_make_request() a bit for clearer code
7. Moved nvme_find_get_ns() call into nvmet_passthru_execute_cmd()
seeing calling it in nvmet_req_init() causes a lockdep warning
due to nvme_find_get_ns() being able to sleep.
8. Added a check in nvmet_passthru_execute_cmd() to ensure we don't
violate queue_max_segments or queue_max_hw_sectors and overrode
mdts to ensure hosts don't intentionally submit commands
that will exceed these limits.
9. Reworked the code which ensures there's only one subsystem per
passthru controller to use an xarray instead of a list as this is
simpler and more easily fixed some bugs triggered by disabling
subsystems that weren't enabled.
10. Removed the overide of the target cntlid with the passthru cntlid;
this seemed like a really bad idea especially in the presence of
mixed systems as you could end up with two ctrlrs with the same
cntlid. For now, commands that depend on cntlid are black listed.
11. Implement block accounting for passthru so the target can track
usage using /proc/diskstats
12. A number of other minor bug fixes and cleanups
v5 Changes (not sent to list, from Chaitanya):
1. Added workqueue for admin commands.
2. Added kconfig option for the pass-thru feature.
3. Restructure the parsing code according to your suggestion,
call nvmet_xxx_parse_cmd() from nvmet_passthru_parse_cmd().
4. Use pass-thru instead of pt.
5. Several cleanups and add comments at the appropriate locations.
6. Minimize the code for checking pass-thru ns across all the subsystems.
7. Removed the delays in the ns related admin commands since I was
not able to reproduce the previous bug.
v4 Changes:
1. Add request polling interface to the block layer.
2. Use request polling interface in the NVMEoF target passthru code
path.
3. Add checks suggested by Sagi for creating one target ctrl per
passthru ctrl.
4. Don't enable the namespace if it belongs to the configured passthru
ctrl.
5. Adjust the code latest kernel.
v3 Changes:
1. Split the addition of passthru command handlers and integration
into two different patches since we add guards to create one target
controller per passthru controller. This way it will be easier to
review the code.
2. Adjust the code for 4.18.
v2 Changes:
1. Update the new nvme core controller find API naming and
changed the string comparison of the ctrl.
2. Get rid of the newly added #defines for target ctrl values.
3. Use the newly added structure members in the same patch where
they are used. Aggregate the passthru command handling support
and integration with nvmet-core into one patch.
4. Introduce global NVMe Target subsystem list for connected and
not connected subsystems on the target side.
5. Add check when configuring the target ns and target
passthru ctrl to allow only one target controller to be created
for one passthru subsystem.
6. Use the passthru ctrl cntlid when creating the
target controller.
Chaitanya Kulkarni (1):
nvmet-passthru: Introduce NVMet passthru Kconfig option
Logan Gunthorpe (8):
nvme-core: Clear any SGL flags in passthru commands
nvme: Create helper function to obtain command effects
nvme: Move nvme_passthru_[start|end]() calls to common helper
nvme-core: Introduce nvme_ctrl_get_by_path()
nvme: Export existing nvme core functions
nvmet-passthru: Add passthru code to process commands
nvmet-passthru: Add enable/disable helpers
nvmet-configfs: Introduce passthru configfs interface
drivers/nvme/host/core.c | 229 ++++++++++-------
drivers/nvme/host/nvme.h | 14 +
drivers/nvme/target/Kconfig | 10 +
drivers/nvme/target/Makefile | 1 +
drivers/nvme/target/admin-cmd.c | 7 +-
drivers/nvme/target/configfs.c | 102 ++++++++
drivers/nvme/target/core.c | 13 +-
drivers/nvme/target/nvmet.h | 52 ++++
drivers/nvme/target/passthru.c | 435 ++++++++++++++++++++++++++++++++
include/linux/nvme.h | 1 +
10 files changed, 771 insertions(+), 93 deletions(-)
create mode 100644 drivers/nvme/target/passthru.c
base-commit: 11a48a5a18c63fd7621bb050228cebf13566e4d8
--
2.20.1
nvme_ctrl_get_by_path() is analagous to blkdev_get_by_path() except it
gets a struct nvme_ctrl from the path to its char dev (/dev/nvme0).
It makes use of filp_open() to open the file and uses the private
data to obtain a pointer to the struct nvme_ctrl. If the fops of the
file do not match, -EINVAL is returned.
The purpose of this function is to support NVMe-OF target passthru.
Signed-off-by: Logan Gunthorpe <[email protected]>
Reviewed-by: Max Gurtovoy <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
---
drivers/nvme/host/core.c | 31 +++++++++++++++++++++++++++++++
drivers/nvme/host/nvme.h | 9 +++++++++
2 files changed, 40 insertions(+)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 219e24304b4e..756f8ee201d9 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4227,6 +4227,37 @@ void nvme_sync_queues(struct nvme_ctrl *ctrl)
}
EXPORT_SYMBOL_GPL(nvme_sync_queues);
+#ifdef CONFIG_NVME_TARGET_PASSTHRU
+/*
+ * The exports that follow within this ifdef are only for
+ * use by the nvmet-passthru and should not be used for
+ * other things.
+ */
+
+struct nvme_ctrl *nvme_ctrl_get_by_path(const char *path)
+{
+ struct nvme_ctrl *ctrl;
+ struct file *f;
+
+ f = filp_open(path, O_RDWR, 0);
+ if (IS_ERR(f))
+ return ERR_CAST(f);
+
+ if (f->f_op != &nvme_dev_fops) {
+ ctrl = ERR_PTR(-EINVAL);
+ goto out_close;
+ }
+
+ ctrl = f->private_data;
+ nvme_get_ctrl(ctrl);
+
+out_close:
+ filp_close(f, NULL);
+ return ctrl;
+}
+EXPORT_SYMBOL_GPL(nvme_ctrl_get_by_path);
+#endif /* CONFIG_NVME_TARGET_PASSTHRU */
+
/*
* Check we didn't inadvertently grow the command structure sizes:
*/
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 1024fec7914c..196a0d38e19c 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -687,4 +687,13 @@ void nvme_hwmon_init(struct nvme_ctrl *ctrl);
static inline void nvme_hwmon_init(struct nvme_ctrl *ctrl) { }
#endif
+#ifdef CONFIG_NVME_TARGET_PASSTHRU
+/*
+ * The exports that follow within this ifdef are only for
+ * use by the nvmet-passthru and should not be used for
+ * other things.
+ */
+struct nvme_ctrl *nvme_ctrl_get_by_path(const char *path);
+#endif /* CONFIG_NVME_TARGET_PASSTHRU */
+
#endif /* _NVME_H */
--
2.20.1
This patch adds helper functions which are used in the NVMeOF configfs
when the user is configuring the passthru subsystem. Here we ensure
that only one subsys is assigned to each nvme_ctrl by using an xarray
on the cntlid.
The subsystem's version number is overridden by the passed through
controller's version. However, if that version is less than 1.2.1,
then we bump the advertised version to that and print a warning
in dmesg.
Based-on-a-patch-by: Chaitanya Kulkarni <[email protected]>
Signed-off-by: Logan Gunthorpe <[email protected]>
---
drivers/nvme/target/configfs.c | 3 ++
drivers/nvme/target/core.c | 10 +++-
drivers/nvme/target/nvmet.h | 12 +++++
drivers/nvme/target/passthru.c | 87 ++++++++++++++++++++++++++++++++++
4 files changed, 111 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c
index 98613a45bd3b..354d43fab8db 100644
--- a/drivers/nvme/target/configfs.c
+++ b/drivers/nvme/target/configfs.c
@@ -828,6 +828,9 @@ static ssize_t nvmet_subsys_attr_version_store(struct config_item *item,
int major, minor, tertiary = 0;
int ret;
+ /* passthru subsystems use the underlying controller's version */
+ if (nvmet_passthru_ctrl(subsys))
+ return -EINVAL;
ret = sscanf(page, "%d.%d.%d\n", &major, &minor, &tertiary);
if (ret != 2 && ret != 3)
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index 40998b514439..b282fb4e45ed 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -522,6 +522,12 @@ int nvmet_ns_enable(struct nvmet_ns *ns)
mutex_lock(&subsys->lock);
ret = 0;
+
+ if (nvmet_passthru_ctrl(subsys)) {
+ pr_info("cannot enable both passthru and regular namespaces for a single subsystem");
+ goto out_unlock;
+ }
+
if (ns->enabled)
goto out_unlock;
@@ -1418,7 +1424,7 @@ struct nvmet_subsys *nvmet_subsys_alloc(const char *subsysnqn,
if (!subsys)
return ERR_PTR(-ENOMEM);
- subsys->ver = NVME_VS(1, 3, 0); /* NVMe 1.3.0 */
+ subsys->ver = NVMET_DEFAULT_VS;
/* generate a random serial number as our controllers are ephemeral: */
get_random_bytes(&subsys->serial, sizeof(subsys->serial));
@@ -1459,6 +1465,8 @@ static void nvmet_subsys_free(struct kref *ref)
WARN_ON_ONCE(!list_empty(&subsys->namespaces));
+ nvmet_passthru_subsys_free(subsys);
+
kfree(subsys->subsysnqn);
kfree(subsys);
}
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index 6e632d0a88ca..7c247d803015 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -20,6 +20,8 @@
#include <linux/blkdev.h>
#include <linux/radix-tree.h>
+#define NVMET_DEFAULT_VS NVME_VS(1, 3, 0)
+
#define NVMET_ASYNC_EVENTS 4
#define NVMET_ERROR_LOG_SLOTS 128
#define NVMET_NO_ERROR_LOC ((u16)-1)
@@ -230,6 +232,7 @@ struct nvmet_subsys {
#ifdef CONFIG_NVME_TARGET_PASSTHRU
struct nvme_ctrl *passthru_ctrl;
+ char *passthru_ctrl_path;
#endif /* CONFIG_NVME_TARGET_PASSTHRU */
};
@@ -512,6 +515,9 @@ static inline u32 nvmet_dsm_len(struct nvmet_req *req)
}
#ifdef CONFIG_NVME_TARGET_PASSTHRU
+void nvmet_passthru_subsys_free(struct nvmet_subsys *subsys);
+int nvmet_passthru_ctrl_enable(struct nvmet_subsys *subsys);
+void nvmet_passthru_ctrl_disable(struct nvmet_subsys *subsys);
u16 nvmet_parse_passthru_admin_cmd(struct nvmet_req *req);
u16 nvmet_setup_passthru_command(struct nvmet_req *req);
static inline struct nvme_ctrl *nvmet_passthru_ctrl(struct nvmet_subsys *subsys)
@@ -519,6 +525,12 @@ static inline struct nvme_ctrl *nvmet_passthru_ctrl(struct nvmet_subsys *subsys)
return subsys->passthru_ctrl;
}
#else /* CONFIG_NVME_TARGET_PASSTHRU */
+static inline void nvmet_passthru_subsys_free(struct nvmet_subsys *subsys)
+{
+}
+static inline void nvmet_passthru_ctrl_disable(struct nvmet_subsys *subsys)
+{
+}
static inline u16 nvmet_parse_passthru_admin_cmd(struct nvmet_req *req)
{
return 0;
diff --git a/drivers/nvme/target/passthru.c b/drivers/nvme/target/passthru.c
index 01127bfa6c8a..0705f4ff8b16 100644
--- a/drivers/nvme/target/passthru.c
+++ b/drivers/nvme/target/passthru.c
@@ -11,6 +11,11 @@
#include "../host/nvme.h"
#include "nvmet.h"
+/*
+ * xarray to maintain one passthru subsystem per nvme controller.
+ */
+static DEFINE_XARRAY(passthru_subsystems);
+
static void nvmet_passthru_execute_cmd_work(struct work_struct *w)
{
struct nvmet_req *req = container_of(w, struct nvmet_req, p.work);
@@ -346,3 +351,85 @@ u16 nvmet_parse_passthru_admin_cmd(struct nvmet_req *req)
return NVME_SC_INVALID_OPCODE | NVME_SC_DNR;
}
}
+
+int nvmet_passthru_ctrl_enable(struct nvmet_subsys *subsys)
+{
+ struct nvme_ctrl *ctrl;
+ int ret = -EINVAL;
+ void *old;
+
+ mutex_lock(&subsys->lock);
+ if (!subsys->passthru_ctrl_path)
+ goto out_unlock;
+ if (subsys->passthru_ctrl)
+ goto out_unlock;
+
+ if (subsys->nr_namespaces) {
+ pr_info("cannot enable both passthru and regular namespaces for a single subsystem");
+ goto out_unlock;
+ }
+
+ ctrl = nvme_ctrl_get_by_path(subsys->passthru_ctrl_path);
+ if (IS_ERR(ctrl)) {
+ ret = PTR_ERR(ctrl);
+ pr_err("failed to open nvme controller %s\n",
+ subsys->passthru_ctrl_path);
+
+ goto out_unlock;
+ }
+
+ old = xa_cmpxchg(&passthru_subsystems, ctrl->cntlid, NULL,
+ subsys, GFP_KERNEL);
+ if (xa_is_err(old)) {
+ ret = xa_err(old);
+ goto out_put_ctrl;
+ }
+
+ if (old)
+ goto out_put_ctrl;
+
+ subsys->passthru_ctrl = ctrl;
+ subsys->ver = ctrl->vs;
+
+ if (subsys->ver < NVME_VS(1, 2, 1)) {
+ pr_warn("nvme controller version is too old: %d.%d.%d, advertising 1.2.1\n",
+ (int)NVME_MAJOR(subsys->ver),
+ (int)NVME_MINOR(subsys->ver),
+ (int)NVME_TERTIARY(subsys->ver));
+ subsys->ver = NVME_VS(1, 2, 1);
+ }
+
+ mutex_unlock(&subsys->lock);
+ return 0;
+
+out_put_ctrl:
+ nvme_put_ctrl(ctrl);
+out_unlock:
+ mutex_unlock(&subsys->lock);
+ return ret;
+}
+
+static void __nvmet_passthru_ctrl_disable(struct nvmet_subsys *subsys)
+{
+ if (subsys->passthru_ctrl) {
+ xa_erase(&passthru_subsystems, subsys->passthru_ctrl->cntlid);
+ nvme_put_ctrl(subsys->passthru_ctrl);
+ }
+ subsys->passthru_ctrl = NULL;
+ subsys->ver = NVMET_DEFAULT_VS;
+}
+
+void nvmet_passthru_ctrl_disable(struct nvmet_subsys *subsys)
+{
+ mutex_lock(&subsys->lock);
+ __nvmet_passthru_ctrl_disable(subsys);
+ mutex_unlock(&subsys->lock);
+}
+
+void nvmet_passthru_subsys_free(struct nvmet_subsys *subsys)
+{
+ mutex_lock(&subsys->lock);
+ __nvmet_passthru_ctrl_disable(subsys);
+ kfree(subsys->passthru_ctrl_path);
+ mutex_unlock(&subsys->lock);
+}
--
2.20.1
Introduce a new nvme_execute_passthru_rq() helper which calls
nvme_passthru_[start|end]() around blk_execute_rq(). This ensures
all passthru calls (including nvme_submit_io()) will be wrapped
appropriately.
nvme_execute_passthru_rq() will also be useful for the nvmet passthru
code.
Signed-off-by: Logan Gunthorpe <[email protected]>
---
drivers/nvme/host/core.c | 193 ++++++++++++++++++++-------------------
1 file changed, 100 insertions(+), 93 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 101137bdfece..219e24304b4e 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -898,6 +898,105 @@ static void *nvme_add_user_metadata(struct bio *bio, void __user *ubuf,
return ERR_PTR(ret);
}
+static u32 nvme_command_effects(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
+ u8 opcode)
+{
+ u32 effects = 0;
+
+ if (ns) {
+ if (ctrl->effects)
+ effects = le32_to_cpu(ctrl->effects->iocs[opcode]);
+ if (effects & ~(NVME_CMD_EFFECTS_CSUPP | NVME_CMD_EFFECTS_LBCC))
+ dev_warn(ctrl->device,
+ "IO command:%02x has unhandled effects:%08x\n",
+ opcode, effects);
+ return 0;
+ }
+
+ if (ctrl->effects)
+ effects = le32_to_cpu(ctrl->effects->acs[opcode]);
+
+ switch (opcode) {
+ case nvme_admin_format_nvm:
+ effects |= NVME_CMD_EFFECTS_CSUPP | NVME_CMD_EFFECTS_LBCC |
+ NVME_CMD_EFFECTS_CSE_MASK;
+ break;
+ case nvme_admin_sanitize_nvm:
+ effects |= NVME_CMD_EFFECTS_CSE_MASK;
+ break;
+ default:
+ break;
+ }
+
+ return effects & ~NVME_CMD_EFFECTS_CSUPP;
+}
+
+static u32 nvme_passthru_start(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
+ u8 opcode)
+{
+ u32 effects = nvme_command_effects(ctrl, ns, opcode);
+
+ /*
+ * For simplicity, IO to all namespaces is quiesced even if the command
+ * effects say only one namespace is affected.
+ */
+ if (effects & (NVME_CMD_EFFECTS_LBCC | NVME_CMD_EFFECTS_CSE_MASK)) {
+ mutex_lock(&ctrl->scan_lock);
+ mutex_lock(&ctrl->subsys->lock);
+ nvme_mpath_start_freeze(ctrl->subsys);
+ nvme_mpath_wait_freeze(ctrl->subsys);
+ nvme_start_freeze(ctrl);
+ nvme_wait_freeze(ctrl);
+ }
+ return effects;
+}
+
+static void nvme_update_formats(struct nvme_ctrl *ctrl)
+{
+ struct nvme_ns *ns;
+
+ down_read(&ctrl->namespaces_rwsem);
+ list_for_each_entry(ns, &ctrl->namespaces, list)
+ if (ns->disk && nvme_revalidate_disk(ns->disk))
+ nvme_set_queue_dying(ns);
+ up_read(&ctrl->namespaces_rwsem);
+}
+
+static void nvme_passthru_end(struct nvme_ctrl *ctrl, u32 effects)
+{
+ /*
+ * Revalidate LBA changes prior to unfreezing. This is necessary to
+ * prevent memory corruption if a logical block size was changed by
+ * this command.
+ */
+ if (effects & NVME_CMD_EFFECTS_LBCC)
+ nvme_update_formats(ctrl);
+ if (effects & (NVME_CMD_EFFECTS_LBCC | NVME_CMD_EFFECTS_CSE_MASK)) {
+ nvme_unfreeze(ctrl);
+ nvme_mpath_unfreeze(ctrl->subsys);
+ mutex_unlock(&ctrl->subsys->lock);
+ nvme_remove_invalid_namespaces(ctrl, NVME_NSID_ALL);
+ mutex_unlock(&ctrl->scan_lock);
+ }
+ if (effects & NVME_CMD_EFFECTS_CCC)
+ nvme_init_identify(ctrl);
+ if (effects & (NVME_CMD_EFFECTS_NIC | NVME_CMD_EFFECTS_NCC))
+ nvme_queue_scan(ctrl);
+}
+
+static void nvme_execute_passthru_rq(struct request *rq)
+{
+ struct nvme_command *cmd = nvme_req(rq)->cmd;
+ struct nvme_ctrl *ctrl = nvme_req(rq)->ctrl;
+ struct nvme_ns *ns = rq->q->queuedata;
+ struct gendisk *disk = ns ? ns->disk : NULL;
+ u32 effects;
+
+ effects = nvme_passthru_start(ctrl, ns, cmd->common.opcode);
+ blk_execute_rq(rq->q, disk, rq, 0);
+ nvme_passthru_end(ctrl, effects);
+}
+
static int nvme_submit_user_cmd(struct request_queue *q,
struct nvme_command *cmd, void __user *ubuffer,
unsigned bufflen, void __user *meta_buffer, unsigned meta_len,
@@ -936,7 +1035,7 @@ static int nvme_submit_user_cmd(struct request_queue *q,
}
}
- blk_execute_rq(req->q, disk, req, 0);
+ nvme_execute_passthru_rq(req);
if (nvme_req(req)->flags & NVME_REQ_CANCELLED)
ret = -EINTR;
else
@@ -1300,99 +1399,12 @@ static int nvme_submit_io(struct nvme_ns *ns, struct nvme_user_io __user *uio)
metadata, meta_len, lower_32_bits(io.slba), NULL, 0);
}
-static u32 nvme_command_effects(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
- u8 opcode)
-{
- u32 effects = 0;
-
- if (ns) {
- if (ctrl->effects)
- effects = le32_to_cpu(ctrl->effects->iocs[opcode]);
- if (effects & ~(NVME_CMD_EFFECTS_CSUPP | NVME_CMD_EFFECTS_LBCC))
- dev_warn(ctrl->device,
- "IO command:%02x has unhandled effects:%08x\n",
- opcode, effects);
- return 0;
- }
-
- if (ctrl->effects)
- effects = le32_to_cpu(ctrl->effects->acs[opcode]);
-
- switch (opcode) {
- case nvme_admin_format_nvm:
- effects |= NVME_CMD_EFFECTS_CSUPP | NVME_CMD_EFFECTS_LBCC |
- NVME_CMD_EFFECTS_CSE_MASK;
- break;
- case nvme_admin_sanitize_nvm:
- effects |= NVME_CMD_EFFECTS_CSE_MASK;
- break;
- default:
- break;
- }
-
- return effects;
-}
-
-static u32 nvme_passthru_start(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
- u8 opcode)
-{
- u32 effects = nvme_command_effects(ctrl, ns, opcode);
-
- /*
- * For simplicity, IO to all namespaces is quiesced even if the command
- * effects say only one namespace is affected.
- */
- if (effects & (NVME_CMD_EFFECTS_LBCC | NVME_CMD_EFFECTS_CSE_MASK)) {
- mutex_lock(&ctrl->scan_lock);
- mutex_lock(&ctrl->subsys->lock);
- nvme_mpath_start_freeze(ctrl->subsys);
- nvme_mpath_wait_freeze(ctrl->subsys);
- nvme_start_freeze(ctrl);
- nvme_wait_freeze(ctrl);
- }
- return effects;
-}
-
-static void nvme_update_formats(struct nvme_ctrl *ctrl)
-{
- struct nvme_ns *ns;
-
- down_read(&ctrl->namespaces_rwsem);
- list_for_each_entry(ns, &ctrl->namespaces, list)
- if (ns->disk && nvme_revalidate_disk(ns->disk))
- nvme_set_queue_dying(ns);
- up_read(&ctrl->namespaces_rwsem);
-}
-
-static void nvme_passthru_end(struct nvme_ctrl *ctrl, u32 effects)
-{
- /*
- * Revalidate LBA changes prior to unfreezing. This is necessary to
- * prevent memory corruption if a logical block size was changed by
- * this command.
- */
- if (effects & NVME_CMD_EFFECTS_LBCC)
- nvme_update_formats(ctrl);
- if (effects & (NVME_CMD_EFFECTS_LBCC | NVME_CMD_EFFECTS_CSE_MASK)) {
- nvme_unfreeze(ctrl);
- nvme_mpath_unfreeze(ctrl->subsys);
- mutex_unlock(&ctrl->subsys->lock);
- nvme_remove_invalid_namespaces(ctrl, NVME_NSID_ALL);
- mutex_unlock(&ctrl->scan_lock);
- }
- if (effects & NVME_CMD_EFFECTS_CCC)
- nvme_init_identify(ctrl);
- if (effects & (NVME_CMD_EFFECTS_NIC | NVME_CMD_EFFECTS_NCC))
- nvme_queue_scan(ctrl);
-}
-
static int nvme_user_cmd(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
struct nvme_passthru_cmd __user *ucmd)
{
struct nvme_passthru_cmd cmd;
struct nvme_command c;
unsigned timeout = 0;
- u32 effects;
u64 result;
int status;
@@ -1419,12 +1431,10 @@ static int nvme_user_cmd(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
if (cmd.timeout_ms)
timeout = msecs_to_jiffies(cmd.timeout_ms);
- effects = nvme_passthru_start(ctrl, ns, cmd.opcode);
status = nvme_submit_user_cmd(ns ? ns->queue : ctrl->admin_q, &c,
(void __user *)(uintptr_t)cmd.addr, cmd.data_len,
(void __user *)(uintptr_t)cmd.metadata,
cmd.metadata_len, 0, &result, timeout);
- nvme_passthru_end(ctrl, effects);
if (status >= 0) {
if (put_user(result, &ucmd->result))
@@ -1440,7 +1450,6 @@ static int nvme_user_cmd64(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
struct nvme_passthru_cmd64 cmd;
struct nvme_command c;
unsigned timeout = 0;
- u32 effects;
int status;
if (!capable(CAP_SYS_ADMIN))
@@ -1466,12 +1475,10 @@ static int nvme_user_cmd64(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
if (cmd.timeout_ms)
timeout = msecs_to_jiffies(cmd.timeout_ms);
- effects = nvme_passthru_start(ctrl, ns, cmd.opcode);
status = nvme_submit_user_cmd(ns ? ns->queue : ctrl->admin_q, &c,
(void __user *)(uintptr_t)cmd.addr, cmd.data_len,
(void __user *)(uintptr_t)cmd.metadata, cmd.metadata_len,
0, &cmd.result, timeout);
- nvme_passthru_end(ctrl, effects);
if (status >= 0) {
if (put_user(cmd.result, &ucmd->result))
--
2.20.1
Separate the code to obtain command effects from the code
to start a passthru request and open code nvme_known_admin_effects()
in the new helper.
The new helper function will be necessary for nvmet passthru
code to determine if we need to change out of interrupt context
to handle the effects.
Signed-off-by: Logan Gunthorpe <[email protected]>
---
drivers/nvme/host/core.c | 39 ++++++++++++++++++++++-----------------
1 file changed, 22 insertions(+), 17 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 330b05ef1b16..101137bdfece 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1300,22 +1300,8 @@ static int nvme_submit_io(struct nvme_ns *ns, struct nvme_user_io __user *uio)
metadata, meta_len, lower_32_bits(io.slba), NULL, 0);
}
-static u32 nvme_known_admin_effects(u8 opcode)
-{
- switch (opcode) {
- case nvme_admin_format_nvm:
- return NVME_CMD_EFFECTS_CSUPP | NVME_CMD_EFFECTS_LBCC |
- NVME_CMD_EFFECTS_CSE_MASK;
- case nvme_admin_sanitize_nvm:
- return NVME_CMD_EFFECTS_CSE_MASK;
- default:
- break;
- }
- return 0;
-}
-
-static u32 nvme_passthru_start(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
- u8 opcode)
+static u32 nvme_command_effects(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
+ u8 opcode)
{
u32 effects = 0;
@@ -1331,7 +1317,26 @@ static u32 nvme_passthru_start(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
if (ctrl->effects)
effects = le32_to_cpu(ctrl->effects->acs[opcode]);
- effects |= nvme_known_admin_effects(opcode);
+
+ switch (opcode) {
+ case nvme_admin_format_nvm:
+ effects |= NVME_CMD_EFFECTS_CSUPP | NVME_CMD_EFFECTS_LBCC |
+ NVME_CMD_EFFECTS_CSE_MASK;
+ break;
+ case nvme_admin_sanitize_nvm:
+ effects |= NVME_CMD_EFFECTS_CSE_MASK;
+ break;
+ default:
+ break;
+ }
+
+ return effects;
+}
+
+static u32 nvme_passthru_start(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
+ u8 opcode)
+{
+ u32 effects = nvme_command_effects(ctrl, ns, opcode);
/*
* For simplicity, IO to all namespaces is quiesced even if the command
--
2.20.1
Reviewed-by: Sagi Grimberg <[email protected]>
Reviewed-by: Sagi Grimberg <[email protected]>
> + if (subsys->ver < NVME_VS(1, 2, 1)) {
> + pr_warn("nvme controller version is too old: %d.%d.%d, advertising 1.2.1\n",
> + (int)NVME_MAJOR(subsys->ver),
> + (int)NVME_MINOR(subsys->ver),
> + (int)NVME_TERTIARY(subsys->ver));
> + subsys->ver = NVME_VS(1, 2, 1);
Umm.. is this OK? do we implement the mandatory 1.2.1 features on behalf
of the passthru device?
> + }
> +
> + mutex_unlock(&subsys->lock);
> + return 0;
> +
> +out_put_ctrl:
> + nvme_put_ctrl(ctrl);
> +out_unlock:
> + mutex_unlock(&subsys->lock);
> + return ret;
> +}
> +
> +static void __nvmet_passthru_ctrl_disable(struct nvmet_subsys *subsys)
> +{
> + if (subsys->passthru_ctrl) {
> + xa_erase(&passthru_subsystems, subsys->passthru_ctrl->cntlid);
> + nvme_put_ctrl(subsys->passthru_ctrl);
> + }
> + subsys->passthru_ctrl = NULL;
> + subsys->ver = NVMET_DEFAULT_VS;
> +}
Isn't it strange that a subsystem changes its version in its lifetime?
> +
> +void nvmet_passthru_ctrl_disable(struct nvmet_subsys *subsys)
> +{
> + mutex_lock(&subsys->lock);
> + __nvmet_passthru_ctrl_disable(subsys);
> + mutex_unlock(&subsys->lock);
> +}
> +
> +void nvmet_passthru_subsys_free(struct nvmet_subsys *subsys)
> +{
> + mutex_lock(&subsys->lock);
> + __nvmet_passthru_ctrl_disable(subsys);
> + kfree(subsys->passthru_ctrl_path);
> + mutex_unlock(&subsys->lock);
Nit, any reason why the free is in the mutex?
On 2020-02-26 4:33 p.m., Sagi Grimberg wrote:
>
>> + if (subsys->ver < NVME_VS(1, 2, 1)) {
>> + pr_warn("nvme controller version is too old: %d.%d.%d,
>> advertising 1.2.1\n",
>> + (int)NVME_MAJOR(subsys->ver),
>> + (int)NVME_MINOR(subsys->ver),
>> + (int)NVME_TERTIARY(subsys->ver));
>> + subsys->ver = NVME_VS(1, 2, 1);
>
> Umm.. is this OK? do we implement the mandatory 1.2.1 features on behalf
> of the passthru device?
This was the approach that Christoph suggested. It seemed sensible to
me. However, it would also *probably* be ok to just reject these
devices. Unless you feel strongly about this, I'll probably leave it the
way it is.
>> + }
>> +
>> + mutex_unlock(&subsys->lock);
>> + return 0;
>> +
>> +out_put_ctrl:
>> + nvme_put_ctrl(ctrl);
>> +out_unlock:
>> + mutex_unlock(&subsys->lock);
>> + return ret;
>> +}
>> +
>> +static void __nvmet_passthru_ctrl_disable(struct nvmet_subsys *subsys)
>> +{
>> + if (subsys->passthru_ctrl) {
>> + xa_erase(&passthru_subsystems, subsys->passthru_ctrl->cntlid);
>> + nvme_put_ctrl(subsys->passthru_ctrl);
>> + }
>> + subsys->passthru_ctrl = NULL;
>> + subsys->ver = NVMET_DEFAULT_VS;
>> +}
>
> Isn't it strange that a subsystem changes its version in its lifetime?
It does seem strange. However, it's not at all unprecedented. See
nvmet_subsys_attr_version_store() which gives the user direct control of
the version through configfs.
>> +
>> +void nvmet_passthru_ctrl_disable(struct nvmet_subsys *subsys)
>> +{
>> + mutex_lock(&subsys->lock);
>> + __nvmet_passthru_ctrl_disable(subsys);
>> + mutex_unlock(&subsys->lock);
>> +}
>> +
>> +void nvmet_passthru_subsys_free(struct nvmet_subsys *subsys)
>> +{
>> + mutex_lock(&subsys->lock);
>> + __nvmet_passthru_ctrl_disable(subsys);
>> + kfree(subsys->passthru_ctrl_path);
>> + mutex_unlock(&subsys->lock);
>
> Nit, any reason why the free is in the mutex?
Nope. Will fix.
>>> + if (subsys->ver < NVME_VS(1, 2, 1)) {
>>> + pr_warn("nvme controller version is too old: %d.%d.%d,
>>> advertising 1.2.1\n",
>>> + (int)NVME_MAJOR(subsys->ver),
>>> + (int)NVME_MINOR(subsys->ver),
>>> + (int)NVME_TERTIARY(subsys->ver));
>>> + subsys->ver = NVME_VS(1, 2, 1);
>>
>> Umm.. is this OK? do we implement the mandatory 1.2.1 features on behalf
>> of the passthru device?
>
> This was the approach that Christoph suggested. It seemed sensible to
> me. However, it would also *probably* be ok to just reject these
> devices. Unless you feel strongly about this, I'll probably leave it the
> way it is.
Sounds ok to me.
>>> + }
>>> +
>>> + mutex_unlock(&subsys->lock);
>>> + return 0;
>>> +
>>> +out_put_ctrl:
>>> + nvme_put_ctrl(ctrl);
>>> +out_unlock:
>>> + mutex_unlock(&subsys->lock);
>>> + return ret;
>>> +}
>>> +
>>> +static void __nvmet_passthru_ctrl_disable(struct nvmet_subsys *subsys)
>>> +{
>>> + if (subsys->passthru_ctrl) {
>>> + xa_erase(&passthru_subsystems, subsys->passthru_ctrl->cntlid);
>>> + nvme_put_ctrl(subsys->passthru_ctrl);
>>> + }
>>> + subsys->passthru_ctrl = NULL;
>>> + subsys->ver = NVMET_DEFAULT_VS;
>>> +}
>>
>> Isn't it strange that a subsystem changes its version in its lifetime?
>
> It does seem strange. However, it's not at all unprecedented. See
> nvmet_subsys_attr_version_store() which gives the user direct control of
> the version through configfs.
You're right.