2023-11-15 18:55:25

by Yuanyuan Zhong

[permalink] [raw]
Subject: [PATCH] nvme-core: remove head->effects to fix use-after-free

The head->effects stores a pointer to the controller's Command Sets
Supported and Effects log. When the namespace head is shared by multiple
controllers, if the particular controller is removed, dereferencing
head->effects causes use-after-free:

[ 227.820066] ==================================================================
[ 227.820069] BUG: KFENCE: use-after-free read in nvme_command_effects+0x1f/0x90 [nvme_core]

[ 227.820091] Use-after-free read at 0x00000000c02dadcf (in kfence-#0):
[ 227.820094] nvme_command_effects+0x1f/0x90 [nvme_core]
[ 227.820107] nvme_passthru_start+0x19/0x80 [nvme_core]
[ 227.820121] nvme_submit_user_cmd+0xc7/0x170 [nvme_core]
[ 227.820136] nvme_user_cmd.constprop.16+0x152/0x1d0 [nvme_core]
[ 227.820149] nvme_ns_head_ioctl+0x92/0x140 [nvme_core]
[ 227.820162] blkdev_ioctl+0x1c5/0x280
[ 227.820169] __x64_sys_ioctl+0x93/0xd0
[ 227.820174] do_syscall_64+0x45/0xf0
[ 227.820177] entry_SYSCALL_64_after_hwframe+0x6e/0x76

[ 227.820182] kfence-#0: 0x000000000fac1d5d-0x00000000a28a73c3, size=4096, cache=kmalloc-4k

[ 227.820185] allocated by task 2559 on cpu 3 at 188.703118s:
[ 227.820233] __kmem_cache_alloc_node+0x3c9/0x410
[ 227.820236] kmalloc_trace+0x2a/0xa0
[ 227.820238] nvme_get_effects_log+0x68/0xd0 [nvme_core]
[ 227.820251] nvme_init_identify+0x5ef/0x640 [nvme_core]
[ 227.820263] nvme_init_ctrl_finish+0x8d/0x250 [nvme_core]
[ 227.820275] nvme_tcp_setup_ctrl+0x1ce/0x710 [nvme_tcp]
[ 227.820281] nvme_tcp_create_ctrl+0x359/0x450 [nvme_tcp]
[ 227.820286] nvmf_dev_write+0x203/0x3b0 [nvme_fabrics]
[ 227.820292] vfs_write+0xd2/0x3d0
[ 227.820294] ksys_write+0x5d/0xd0
[ 227.820297] do_syscall_64+0x45/0xf0
[ 227.820298] entry_SYSCALL_64_after_hwframe+0x6e/0x76

[ 227.820302] freed by task 2521 on cpu 28 at 220.115945s:
[ 227.820329] nvme_free_ctrl+0xa6/0x260 [nvme_core]
[ 227.820342] device_release+0x37/0x90
[ 227.820345] kobject_release+0x57/0x120
[ 227.820347] nvme_sysfs_delete+0x39/0x50 [nvme_core]
[ 227.820360] kernfs_fop_write_iter+0x144/0x1e0
[ 227.820362] vfs_write+0x25f/0x3d0
[ 227.820364] ksys_write+0x5d/0xd0
[ 227.820366] do_syscall_64+0x45/0xf0
[ 227.820368] entry_SYSCALL_64_after_hwframe+0x6e/0x76

Fix this by removing head->effects. Use the Commands Supported and Effects log
in ctrl->cels directly.

Fixes: be93e87e7802 ("nvme: support for multiple Command Sets Supported and Effects log pages")
Signed-off-by: Yuanyuan Zhong <[email protected]>
Reviewed-by: Randy Jennings <[email protected]>
Reviewed-by: Hamilton Coutinho <[email protected]>
---
drivers/nvme/host/core.c | 17 ++++++++---------
drivers/nvme/host/nvme.h | 1 -
2 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 88b54cdcbd68..14fdc2de3a75 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1085,7 +1085,9 @@ u32 nvme_command_effects(struct nvme_ctrl *ctrl, struct nvme_ns *ns, u8 opcode)
u32 effects = 0;

if (ns) {
- effects = le32_to_cpu(ns->head->effects->iocs[opcode]);
+ struct nvme_effects_log *cel = xa_load(&ctrl->cels, ns->head->ids.csi);
+
+ effects = le32_to_cpu(cel->iocs[opcode]);
if (effects & ~(NVME_CMD_EFFECTS_CSUPP | NVME_CMD_EFFECTS_LBCC))
dev_warn_once(ctrl->device,
"IO command:%02x has unusual effects:%08x\n",
@@ -2872,7 +2874,8 @@ static int nvme_get_effects_log(struct nvme_ctrl *ctrl, u8 csi,

xa_store(&ctrl->cels, csi, cel, GFP_KERNEL);
out:
- *log = cel;
+ if (log)
+ *log = cel;
return 0;
}

@@ -3388,13 +3391,6 @@ static struct nvme_ns_head *nvme_alloc_ns_head(struct nvme_ctrl *ctrl,
head->shared = info->is_shared;
kref_init(&head->ref);

- if (head->ids.csi) {
- ret = nvme_get_effects_log(ctrl, head->ids.csi, &head->effects);
- if (ret)
- goto out_cleanup_srcu;
- } else
- head->effects = ctrl->effects;
-
ret = nvme_mpath_alloc_disk(ctrl, head);
if (ret)
goto out_cleanup_srcu;
@@ -3779,6 +3775,9 @@ static void nvme_scan_ns(struct nvme_ctrl *ctrl, unsigned nsid)
if (ret || !info.is_ready)
return;

+ if (info.ids.csi && nvme_get_effects_log(ctrl, info.ids.csi, NULL))
+ return;
+
ns = nvme_find_get_ns(ctrl, nsid);
if (ns) {
nvme_validate_ns(ns, &info);
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 39a90b7cb125..38298c072ec3 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -445,7 +445,6 @@ struct nvme_ns_head {
struct kref ref;
bool shared;
int instance;
- struct nvme_effects_log *effects;

struct cdev cdev;
struct device cdev_device;
--
2.34.1


2023-11-15 19:03:24

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] nvme-core: remove head->effects to fix use-after-free

On 11/15/23 11:54 AM, Yuanyuan Zhong wrote:
> The head->effects stores a pointer to the controller's Command Sets
> Supported and Effects log. When the namespace head is shared by multiple
> controllers, if the particular controller is removed, dereferencing
> head->effects causes use-after-free:
>
> [ 227.820066] ==================================================================
> [ 227.820069] BUG: KFENCE: use-after-free read in nvme_command_effects+0x1f/0x90 [nvme_core]
>
> [ 227.820091] Use-after-free read at 0x00000000c02dadcf (in kfence-#0):
> [ 227.820094] nvme_command_effects+0x1f/0x90 [nvme_core]
> [ 227.820107] nvme_passthru_start+0x19/0x80 [nvme_core]
> [ 227.820121] nvme_submit_user_cmd+0xc7/0x170 [nvme_core]
> [ 227.820136] nvme_user_cmd.constprop.16+0x152/0x1d0 [nvme_core]
> [ 227.820149] nvme_ns_head_ioctl+0x92/0x140 [nvme_core]
> [ 227.820162] blkdev_ioctl+0x1c5/0x280
> [ 227.820169] __x64_sys_ioctl+0x93/0xd0
> [ 227.820174] do_syscall_64+0x45/0xf0
> [ 227.820177] entry_SYSCALL_64_after_hwframe+0x6e/0x76
>
> [ 227.820182] kfence-#0: 0x000000000fac1d5d-0x00000000a28a73c3, size=4096, cache=kmalloc-4k
>
> [ 227.820185] allocated by task 2559 on cpu 3 at 188.703118s:
> [ 227.820233] __kmem_cache_alloc_node+0x3c9/0x410
> [ 227.820236] kmalloc_trace+0x2a/0xa0
> [ 227.820238] nvme_get_effects_log+0x68/0xd0 [nvme_core]
> [ 227.820251] nvme_init_identify+0x5ef/0x640 [nvme_core]
> [ 227.820263] nvme_init_ctrl_finish+0x8d/0x250 [nvme_core]
> [ 227.820275] nvme_tcp_setup_ctrl+0x1ce/0x710 [nvme_tcp]
> [ 227.820281] nvme_tcp_create_ctrl+0x359/0x450 [nvme_tcp]
> [ 227.820286] nvmf_dev_write+0x203/0x3b0 [nvme_fabrics]
> [ 227.820292] vfs_write+0xd2/0x3d0
> [ 227.820294] ksys_write+0x5d/0xd0
> [ 227.820297] do_syscall_64+0x45/0xf0
> [ 227.820298] entry_SYSCALL_64_after_hwframe+0x6e/0x76
>
> [ 227.820302] freed by task 2521 on cpu 28 at 220.115945s:
> [ 227.820329] nvme_free_ctrl+0xa6/0x260 [nvme_core]
> [ 227.820342] device_release+0x37/0x90
> [ 227.820345] kobject_release+0x57/0x120
> [ 227.820347] nvme_sysfs_delete+0x39/0x50 [nvme_core]
> [ 227.820360] kernfs_fop_write_iter+0x144/0x1e0
> [ 227.820362] vfs_write+0x25f/0x3d0
> [ 227.820364] ksys_write+0x5d/0xd0
> [ 227.820366] do_syscall_64+0x45/0xf0
> [ 227.820368] entry_SYSCALL_64_after_hwframe+0x6e/0x76
>
> Fix this by removing head->effects. Use the Commands Supported and Effects log
> in ctrl->cels directly.
>
> Fixes: be93e87e7802 ("nvme: support for multiple Command Sets Supported and Effects log pages")
> Signed-off-by: Yuanyuan Zhong <[email protected]>
> Reviewed-by: Randy Jennings <[email protected]>
> Reviewed-by: Hamilton Coutinho <[email protected]>
> ---
> drivers/nvme/host/core.c | 17 ++++++++---------
> drivers/nvme/host/nvme.h | 1 -
> 2 files changed, 8 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 88b54cdcbd68..14fdc2de3a75 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -1085,7 +1085,9 @@ u32 nvme_command_effects(struct nvme_ctrl *ctrl, struct nvme_ns *ns, u8 opcode)
> u32 effects = 0;
>
> if (ns) {
> - effects = le32_to_cpu(ns->head->effects->iocs[opcode]);
> + struct nvme_effects_log *cel = xa_load(&ctrl->cels, ns->head->ids.csi);
> +
> + effects = le32_to_cpu(cel->iocs[opcode]);
> if (effects & ~(NVME_CMD_EFFECTS_CSUPP | NVME_CMD_EFFECTS_LBCC))
> dev_warn_once(ctrl->device,
> "IO command:%02x has unusual effects:%08x\n",

This is in the hot path for passthrough, can't we simply reload
->effects when we need?

--
Jens Axboe

2023-11-15 19:22:18

by Yuanyuan Zhong

[permalink] [raw]
Subject: Re: [PATCH] nvme-core: remove head->effects to fix use-after-free

On Wed, Nov 15, 2023 at 11:02 AM Jens Axboe <[email protected]> wrote:
>
> On 11/15/23 11:54 AM, Yuanyuan Zhong wrote:
> > The head->effects stores a pointer to the controller's Command Sets
> > Supported and Effects log. When the namespace head is shared by multiple
> > controllers, if the particular controller is removed, dereferencing
> > head->effects causes use-after-free:
> >
> > [ 227.820066] ==================================================================
> > [ 227.820069] BUG: KFENCE: use-after-free read in nvme_command_effects+0x1f/0x90 [nvme_core]
> >
> > [ 227.820091] Use-after-free read at 0x00000000c02dadcf (in kfence-#0):
> > [ 227.820094] nvme_command_effects+0x1f/0x90 [nvme_core]
> > [ 227.820107] nvme_passthru_start+0x19/0x80 [nvme_core]
> > [ 227.820121] nvme_submit_user_cmd+0xc7/0x170 [nvme_core]
> > [ 227.820136] nvme_user_cmd.constprop.16+0x152/0x1d0 [nvme_core]
> > [ 227.820149] nvme_ns_head_ioctl+0x92/0x140 [nvme_core]
> > [ 227.820162] blkdev_ioctl+0x1c5/0x280
> > [ 227.820169] __x64_sys_ioctl+0x93/0xd0
> > [ 227.820174] do_syscall_64+0x45/0xf0
> > [ 227.820177] entry_SYSCALL_64_after_hwframe+0x6e/0x76
> >
> > [ 227.820182] kfence-#0: 0x000000000fac1d5d-0x00000000a28a73c3, size=4096, cache=kmalloc-4k
> >
> > [ 227.820185] allocated by task 2559 on cpu 3 at 188.703118s:
> > [ 227.820233] __kmem_cache_alloc_node+0x3c9/0x410
> > [ 227.820236] kmalloc_trace+0x2a/0xa0
> > [ 227.820238] nvme_get_effects_log+0x68/0xd0 [nvme_core]
> > [ 227.820251] nvme_init_identify+0x5ef/0x640 [nvme_core]
> > [ 227.820263] nvme_init_ctrl_finish+0x8d/0x250 [nvme_core]
> > [ 227.820275] nvme_tcp_setup_ctrl+0x1ce/0x710 [nvme_tcp]
> > [ 227.820281] nvme_tcp_create_ctrl+0x359/0x450 [nvme_tcp]
> > [ 227.820286] nvmf_dev_write+0x203/0x3b0 [nvme_fabrics]
> > [ 227.820292] vfs_write+0xd2/0x3d0
> > [ 227.820294] ksys_write+0x5d/0xd0
> > [ 227.820297] do_syscall_64+0x45/0xf0
> > [ 227.820298] entry_SYSCALL_64_after_hwframe+0x6e/0x76
> >
> > [ 227.820302] freed by task 2521 on cpu 28 at 220.115945s:
> > [ 227.820329] nvme_free_ctrl+0xa6/0x260 [nvme_core]
> > [ 227.820342] device_release+0x37/0x90
> > [ 227.820345] kobject_release+0x57/0x120
> > [ 227.820347] nvme_sysfs_delete+0x39/0x50 [nvme_core]
> > [ 227.820360] kernfs_fop_write_iter+0x144/0x1e0
> > [ 227.820362] vfs_write+0x25f/0x3d0
> > [ 227.820364] ksys_write+0x5d/0xd0
> > [ 227.820366] do_syscall_64+0x45/0xf0
> > [ 227.820368] entry_SYSCALL_64_after_hwframe+0x6e/0x76
> >
> > Fix this by removing head->effects. Use the Commands Supported and Effects log
> > in ctrl->cels directly.
> >
> > Fixes: be93e87e7802 ("nvme: support for multiple Command Sets Supported and Effects log pages")
> > Signed-off-by: Yuanyuan Zhong <[email protected]>
> > Reviewed-by: Randy Jennings <[email protected]>
> > Reviewed-by: Hamilton Coutinho <[email protected]>
> > ---
> > drivers/nvme/host/core.c | 17 ++++++++---------
> > drivers/nvme/host/nvme.h | 1 -
> > 2 files changed, 8 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index 88b54cdcbd68..14fdc2de3a75 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -1085,7 +1085,9 @@ u32 nvme_command_effects(struct nvme_ctrl *ctrl, struct nvme_ns *ns, u8 opcode)
> > u32 effects = 0;
> >
> > if (ns) {
> > - effects = le32_to_cpu(ns->head->effects->iocs[opcode]);
> > + struct nvme_effects_log *cel = xa_load(&ctrl->cels, ns->head->ids.csi);
> > +
> > + effects = le32_to_cpu(cel->iocs[opcode]);
> > if (effects & ~(NVME_CMD_EFFECTS_CSUPP | NVME_CMD_EFFECTS_LBCC))
> > dev_warn_once(ctrl->device,
> > "IO command:%02x has unusual effects:%08x\n",
>
> This is in the hot path for passthrough, can't we simply reload
> ->effects when we need?

Do you mean something like this? If not, can you please elaborate
"when we need"?
- struct nvme_effects_log *cel = xa_load(&ctrl->cels,
ns->head->ids.csi);
+ struct nvme_effects_log *cel = (ns->head->ids.csi ==
NVME_CSI_NVM) ?
+ ctrl->effects : xa_load(&ctrl->cels, ns->head->ids.csi);
Will it be good to change ctrl->effects to ctrl->effects[3] for
already defined CSI?


>
> --
> Jens Axboe
>


--
Regards,
Yuanyuan Zhong

2023-11-15 20:43:39

by Keith Busch

[permalink] [raw]
Subject: Re: [PATCH] nvme-core: remove head->effects to fix use-after-free

On Wed, Nov 15, 2023 at 11:21:53AM -0800, Yuanyuan Zhong wrote:
> On Wed, Nov 15, 2023 at 11:02 AM Jens Axboe <[email protected]> wrote:
>
> Do you mean something like this? If not, can you please elaborate
> "when we need"?
> - struct nvme_effects_log *cel = xa_load(&ctrl->cels, ns->head->ids.csi);
> + struct nvme_effects_log *cel = (ns->head->ids.csi == NVME_CSI_NVM) ?
> + ctrl->effects : xa_load(&ctrl->cels, ns->head->ids.csi);
> Will it be good to change ctrl->effects to ctrl->effects[3] for
> already defined CSI?

I suggest either re-assign the cached head->effects to one from a still
live controller when current path is removed, or move the saved effects
to the subsystem instead of the controller. All controllers in the
subsystem should be reporting the same effects log anyway, so
duplicating all that per-controller is kind of wasteful.

2023-11-15 22:44:48

by Yuanyuan Zhong

[permalink] [raw]
Subject: Re: [PATCH] nvme-core: remove head->effects to fix use-after-free

On Wed, Nov 15, 2023 at 11:55 AM Keith Busch <[email protected]> wrote:
>
> On Wed, Nov 15, 2023 at 11:21:53AM -0800, Yuanyuan Zhong wrote:
> > On Wed, Nov 15, 2023 at 11:02 AM Jens Axboe <[email protected]> wrote:
> >
> > Do you mean something like this? If not, can you please elaborate
> > "when we need"?
> > - struct nvme_effects_log *cel = xa_load(&ctrl->cels, ns->head->ids.csi);
> > + struct nvme_effects_log *cel = (ns->head->ids.csi == NVME_CSI_NVM) ?
> > + ctrl->effects : xa_load(&ctrl->cels, ns->head->ids.csi);
> > Will it be good to change ctrl->effects to ctrl->effects[3] for
> > already defined CSI?
>
> I suggest either re-assign the cached head->effects to one from a still
> live controller when current path is removed, or move the saved effects
> to the subsystem instead of the controller. All controllers in the
> subsystem should be reporting the same effects log anyway, so
Is it specified in spec that all controllers in the subsystem
should be reporting the same effects log?
> duplicating all that per-controller is kind of wasteful.

2023-11-15 23:44:19

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH] nvme-core: remove head->effects to fix use-after-free

Hi Yuanyuan,

kernel test robot noticed the following build errors:

[auto build test ERROR on hch-configfs/for-next]
[also build test ERROR on linus/master v6.7-rc1 next-20231115]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Yuanyuan-Zhong/nvme-core-remove-head-effects-to-fix-use-after-free/20231116-025616
base: git://git.infradead.org/users/hch/configfs.git for-next
patch link: https://lore.kernel.org/r/20231115185439.2616073-1-yzhong%40purestorage.com
patch subject: [PATCH] nvme-core: remove head->effects to fix use-after-free
config: x86_64-buildonly-randconfig-001-20231116 (https://download.01.org/0day-ci/archive/20231116/[email protected]/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231116/[email protected]/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/

All errors (new ones prefixed by >>):

drivers/nvme/host/zns.c: In function 'nvme_update_zone_info':
>> drivers/nvme/host/zns.c:50:48: error: 'struct nvme_ns_head' has no member named 'effects'
50 | struct nvme_effects_log *log = ns->head->effects;
| ^~


vim +50 drivers/nvme/host/zns.c

240e6ee272c07a2 Keith Busch 2020-06-29 47
d525c3c02322161 Christoph Hellwig 2020-08-20 48 int nvme_update_zone_info(struct nvme_ns *ns, unsigned lbaf)
240e6ee272c07a2 Keith Busch 2020-06-29 49 {
240e6ee272c07a2 Keith Busch 2020-06-29 @50 struct nvme_effects_log *log = ns->head->effects;
d525c3c02322161 Christoph Hellwig 2020-08-20 51 struct request_queue *q = ns->queue;
240e6ee272c07a2 Keith Busch 2020-06-29 52 struct nvme_command c = { };
240e6ee272c07a2 Keith Busch 2020-06-29 53 struct nvme_id_ns_zns *id;
240e6ee272c07a2 Keith Busch 2020-06-29 54 int status;
240e6ee272c07a2 Keith Busch 2020-06-29 55
240e6ee272c07a2 Keith Busch 2020-06-29 56 /* Driver requires zone append support */
2f4c9ba23b887e7 Javier Gonz?lez 2020-12-01 57 if ((le32_to_cpu(log->iocs[nvme_cmd_zone_append]) &
240e6ee272c07a2 Keith Busch 2020-06-29 58 NVME_CMD_EFFECTS_CSUPP)) {
2f4c9ba23b887e7 Javier Gonz?lez 2020-12-01 59 if (test_and_clear_bit(NVME_NS_FORCE_RO, &ns->flags))
240e6ee272c07a2 Keith Busch 2020-06-29 60 dev_warn(ns->ctrl->device,
2f4c9ba23b887e7 Javier Gonz?lez 2020-12-01 61 "Zone Append supported for zoned namespace:%d. Remove read-only mode\n",
2f4c9ba23b887e7 Javier Gonz?lez 2020-12-01 62 ns->head->ns_id);
2f4c9ba23b887e7 Javier Gonz?lez 2020-12-01 63 } else {
2f4c9ba23b887e7 Javier Gonz?lez 2020-12-01 64 set_bit(NVME_NS_FORCE_RO, &ns->flags);
2f4c9ba23b887e7 Javier Gonz?lez 2020-12-01 65 dev_warn(ns->ctrl->device,
2f4c9ba23b887e7 Javier Gonz?lez 2020-12-01 66 "Zone Append not supported for zoned namespace:%d. Forcing to read-only mode\n",
240e6ee272c07a2 Keith Busch 2020-06-29 67 ns->head->ns_id);
240e6ee272c07a2 Keith Busch 2020-06-29 68 }
240e6ee272c07a2 Keith Busch 2020-06-29 69
240e6ee272c07a2 Keith Busch 2020-06-29 70 /* Lazily query controller append limit for the first zoned namespace */
240e6ee272c07a2 Keith Busch 2020-06-29 71 if (!ns->ctrl->max_zone_append) {
240e6ee272c07a2 Keith Busch 2020-06-29 72 status = nvme_set_max_append(ns->ctrl);
240e6ee272c07a2 Keith Busch 2020-06-29 73 if (status)
240e6ee272c07a2 Keith Busch 2020-06-29 74 return status;
240e6ee272c07a2 Keith Busch 2020-06-29 75 }
240e6ee272c07a2 Keith Busch 2020-06-29 76
240e6ee272c07a2 Keith Busch 2020-06-29 77 id = kzalloc(sizeof(*id), GFP_KERNEL);
240e6ee272c07a2 Keith Busch 2020-06-29 78 if (!id)
240e6ee272c07a2 Keith Busch 2020-06-29 79 return -ENOMEM;
240e6ee272c07a2 Keith Busch 2020-06-29 80
240e6ee272c07a2 Keith Busch 2020-06-29 81 c.identify.opcode = nvme_admin_identify;
240e6ee272c07a2 Keith Busch 2020-06-29 82 c.identify.nsid = cpu_to_le32(ns->head->ns_id);
240e6ee272c07a2 Keith Busch 2020-06-29 83 c.identify.cns = NVME_ID_CNS_CS_NS;
240e6ee272c07a2 Keith Busch 2020-06-29 84 c.identify.csi = NVME_CSI_ZNS;
240e6ee272c07a2 Keith Busch 2020-06-29 85
240e6ee272c07a2 Keith Busch 2020-06-29 86 status = nvme_submit_sync_cmd(ns->ctrl->admin_q, &c, id, sizeof(*id));
240e6ee272c07a2 Keith Busch 2020-06-29 87 if (status)
240e6ee272c07a2 Keith Busch 2020-06-29 88 goto free_data;
240e6ee272c07a2 Keith Busch 2020-06-29 89
240e6ee272c07a2 Keith Busch 2020-06-29 90 /*
240e6ee272c07a2 Keith Busch 2020-06-29 91 * We currently do not handle devices requiring any of the zoned
240e6ee272c07a2 Keith Busch 2020-06-29 92 * operation characteristics.
240e6ee272c07a2 Keith Busch 2020-06-29 93 */
240e6ee272c07a2 Keith Busch 2020-06-29 94 if (id->zoc) {
240e6ee272c07a2 Keith Busch 2020-06-29 95 dev_warn(ns->ctrl->device,
240e6ee272c07a2 Keith Busch 2020-06-29 96 "zone operations:%x not supported for namespace:%u\n",
240e6ee272c07a2 Keith Busch 2020-06-29 97 le16_to_cpu(id->zoc), ns->head->ns_id);
a9e0e6bc728ebcf Christoph Hellwig 2021-04-07 98 status = -ENODEV;
240e6ee272c07a2 Keith Busch 2020-06-29 99 goto free_data;
240e6ee272c07a2 Keith Busch 2020-06-29 100 }
240e6ee272c07a2 Keith Busch 2020-06-29 101
240e6ee272c07a2 Keith Busch 2020-06-29 102 ns->zsze = nvme_lba_to_sect(ns, le64_to_cpu(id->lbafe[lbaf].zsze));
240e6ee272c07a2 Keith Busch 2020-06-29 103 if (!is_power_of_2(ns->zsze)) {
240e6ee272c07a2 Keith Busch 2020-06-29 104 dev_warn(ns->ctrl->device,
240e6ee272c07a2 Keith Busch 2020-06-29 105 "invalid zone size:%llu for namespace:%u\n",
240e6ee272c07a2 Keith Busch 2020-06-29 106 ns->zsze, ns->head->ns_id);
a9e0e6bc728ebcf Christoph Hellwig 2021-04-07 107 status = -ENODEV;
240e6ee272c07a2 Keith Busch 2020-06-29 108 goto free_data;
240e6ee272c07a2 Keith Busch 2020-06-29 109 }
240e6ee272c07a2 Keith Busch 2020-06-29 110
6b2bd274744e645 Christoph Hellwig 2022-07-06 111 disk_set_zoned(ns->disk, BLK_ZONED_HM);
240e6ee272c07a2 Keith Busch 2020-06-29 112 blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, q);
982977df48179c8 Christoph Hellwig 2022-07-06 113 disk_set_max_open_zones(ns->disk, le32_to_cpu(id->mor) + 1);
982977df48179c8 Christoph Hellwig 2022-07-06 114 disk_set_max_active_zones(ns->disk, le32_to_cpu(id->mar) + 1);
240e6ee272c07a2 Keith Busch 2020-06-29 115 free_data:
240e6ee272c07a2 Keith Busch 2020-06-29 116 kfree(id);
240e6ee272c07a2 Keith Busch 2020-06-29 117 return status;
240e6ee272c07a2 Keith Busch 2020-06-29 118 }
240e6ee272c07a2 Keith Busch 2020-06-29 119

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

2023-11-16 03:52:30

by Keith Busch

[permalink] [raw]
Subject: Re: [PATCH] nvme-core: remove head->effects to fix use-after-free

On Wed, Nov 15, 2023 at 02:44:04PM -0800, Yuanyuan Zhong wrote:
> On Wed, Nov 15, 2023 at 11:55 AM Keith Busch <[email protected]> wrote:
> >
> > On Wed, Nov 15, 2023 at 11:21:53AM -0800, Yuanyuan Zhong wrote:
> > > On Wed, Nov 15, 2023 at 11:02 AM Jens Axboe <[email protected]> wrote:
> > >
> > > Do you mean something like this? If not, can you please elaborate
> > > "when we need"?
> > > - struct nvme_effects_log *cel = xa_load(&ctrl->cels, ns->head->ids.csi);
> > > + struct nvme_effects_log *cel = (ns->head->ids.csi == NVME_CSI_NVM) ?
> > > + ctrl->effects : xa_load(&ctrl->cels, ns->head->ids.csi);
> > > Will it be good to change ctrl->effects to ctrl->effects[3] for
> > > already defined CSI?
> >
> > I suggest either re-assign the cached head->effects to one from a still
> > live controller when current path is removed, or move the saved effects
> > to the subsystem instead of the controller. All controllers in the
> > subsystem should be reporting the same effects log anyway, so
> Is it specified in spec that all controllers in the subsystem
> should be reporting the same effects log?

Yes, in section 5.16.1.6, "Commands Supported and Effects":

This log page is used to describe the commands that the controller
supports and the effects of those commands on the state of the NVM
subsystem.

Oddly enough, Figure 202 says the scope of the log page is "Controller"
rather than "Subsystem". Sounds like ECN potential. You can memcmp the
effects log from each controller for a sanity check if you think some
subsystem controllers messed that up.

2023-11-16 19:13:28

by Yuanyuan Zhong

[permalink] [raw]
Subject: Re: [PATCH] nvme-core: remove head->effects to fix use-after-free

On Wed, Nov 15, 2023 at 7:52 PM Keith Busch <[email protected]> wrote:
>
> On Wed, Nov 15, 2023 at 02:44:04PM -0800, Yuanyuan Zhong wrote:
> > On Wed, Nov 15, 2023 at 11:55 AM Keith Busch <[email protected]> wrote:
> > >
> > > On Wed, Nov 15, 2023 at 11:21:53AM -0800, Yuanyuan Zhong wrote:
> > > > On Wed, Nov 15, 2023 at 11:02 AM Jens Axboe <[email protected]> wrote:
> > > >
> > > > Do you mean something like this? If not, can you please elaborate
> > > > "when we need"?
> > > > - struct nvme_effects_log *cel = xa_load(&ctrl->cels, ns->head->ids.csi);
> > > > + struct nvme_effects_log *cel = (ns->head->ids.csi == NVME_CSI_NVM) ?
> > > > + ctrl->effects : xa_load(&ctrl->cels, ns->head->ids.csi);
> > > > Will it be good to change ctrl->effects to ctrl->effects[3] for
> > > > already defined CSI?
> > >
> > > I suggest either re-assign the cached head->effects to one from a still
> > > live controller when current path is removed, or move the saved effects
> > > to the subsystem instead of the controller. All controllers in the
> > > subsystem should be reporting the same effects log anyway, so
> > Is it specified in spec that all controllers in the subsystem
> > should be reporting the same effects log?
>
> Yes, in section 5.16.1.6, "Commands Supported and Effects":
>
> This log page is used to describe the commands that the controller
> supports and the effects of those commands on the state of the NVM
> subsystem.
>
> Oddly enough, Figure 202 says the scope of the log page is "Controller"
> rather than "Subsystem". Sounds like ECN potential. You can memcmp the
> effects log from each controller for a sanity check if you think some
> subsystem controllers messed that up.

Yeah, it says scope is controller.
I think it’s valid to start upgrading one controller in the subsystem
to report different
effects log, e.g. adding or revoking CSUPP for some opcode.
If the saved effects log is kept in the subsystem, how to refresh it
for the subsystem?

2023-11-17 13:29:05

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] nvme-core: remove head->effects to fix use-after-free

On Wed, Nov 15, 2023 at 10:52:01PM -0500, Keith Busch wrote:
>
> Yes, in section 5.16.1.6, "Commands Supported and Effects":
>
> This log page is used to describe the commands that the controller
> supports and the effects of those commands on the state of the NVM
> subsystem.
>
> Oddly enough, Figure 202 says the scope of the log page is "Controller"
> rather than "Subsystem". Sounds like ECN potential. You can memcmp the
> effects log from each controller for a sanity check if you think some
> subsystem controllers messed that up.

If we really want to be 111% sure we could read the effects for all
controllers and do a logical OR of them, but I think the reason for the
per-controller scope is that for odd subsystems where different
controllers don't actually access the same namespaces these flags
could be different, i.e. one that only does KV, one that does ZNS,
one that does NVM and one that is just an administrative controller.

2023-11-17 16:38:38

by Keith Busch

[permalink] [raw]
Subject: Re: [PATCH] nvme-core: remove head->effects to fix use-after-free

On Fri, Nov 17, 2023 at 02:28:46PM +0100, Christoph Hellwig wrote:
> On Wed, Nov 15, 2023 at 10:52:01PM -0500, Keith Busch wrote:
> >
> > Yes, in section 5.16.1.6, "Commands Supported and Effects":
> >
> > This log page is used to describe the commands that the controller
> > supports and the effects of those commands on the state of the NVM
> > subsystem.
> >
> > Oddly enough, Figure 202 says the scope of the log page is "Controller"
> > rather than "Subsystem". Sounds like ECN potential. You can memcmp the
> > effects log from each controller for a sanity check if you think some
> > subsystem controllers messed that up.
>
> If we really want to be 111% sure we could read the effects for all
> controllers and do a logical OR of them, but I think the reason for the
> per-controller scope is that for odd subsystems where different
> controllers don't actually access the same namespaces these flags
> could be different, i.e. one that only does KV, one that does ZNS,
> one that does NVM and one that is just an administrative controller.

The effects log is per-CSI so different command sets won't create
conflicts.

Namespaces that are not shared don't really matter here because this
problem is unique to mulitpath.

It doesn't make sense for effects logs to be different per-controller
for the same shared namespace. The spec doesn't seem to explicitly
prevent that, but hints that all hosts should be seeing the same thing
no matter which controller they're connected to:

"
If the namespace is attached to multiple controllers, the host(s)
associated with those controllers should coordinate their commands to
meet the Command Submission and Execution requirements
"

That couldn't be a reliable suggestion if the hosts observe diverging
effects. For the controllers that a host does see, though, I agree we
should use the most cautious effects reported with logical OR of them.

2023-11-19 19:31:22

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH] nvme-core: remove head->effects to fix use-after-free

Hi Yuanyuan,

kernel test robot noticed the following build errors:

[auto build test ERROR on hch-configfs/for-next]
[also build test ERROR on linus/master v6.7-rc1 next-20231117]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Yuanyuan-Zhong/nvme-core-remove-head-effects-to-fix-use-after-free/20231116-025616
base: git://git.infradead.org/users/hch/configfs.git for-next
patch link: https://lore.kernel.org/r/20231115185439.2616073-1-yzhong%40purestorage.com
patch subject: [PATCH] nvme-core: remove head->effects to fix use-after-free
config: powerpc-randconfig-r133-20231119 (https://download.01.org/0day-ci/archive/20231120/[email protected]/config)
compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project.git 4a5ac14ee968ff0ad5d2cc1ffa0299048db4c88a)
reproduce: (https://download.01.org/0day-ci/archive/20231120/[email protected]/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/

All errors (new ones prefixed by >>):

>> drivers/nvme/host/zns.c:50:43: error: no member named 'effects' in 'struct nvme_ns_head'
50 | struct nvme_effects_log *log = ns->head->effects;
| ~~~~~~~~ ^
1 error generated.


vim +50 drivers/nvme/host/zns.c

240e6ee272c07a Keith Busch 2020-06-29 47
d525c3c0232216 Christoph Hellwig 2020-08-20 48 int nvme_update_zone_info(struct nvme_ns *ns, unsigned lbaf)
240e6ee272c07a Keith Busch 2020-06-29 49 {
240e6ee272c07a Keith Busch 2020-06-29 @50 struct nvme_effects_log *log = ns->head->effects;
d525c3c0232216 Christoph Hellwig 2020-08-20 51 struct request_queue *q = ns->queue;
240e6ee272c07a Keith Busch 2020-06-29 52 struct nvme_command c = { };
240e6ee272c07a Keith Busch 2020-06-29 53 struct nvme_id_ns_zns *id;
240e6ee272c07a Keith Busch 2020-06-29 54 int status;
240e6ee272c07a Keith Busch 2020-06-29 55
240e6ee272c07a Keith Busch 2020-06-29 56 /* Driver requires zone append support */
2f4c9ba23b887e Javier Gonz?lez 2020-12-01 57 if ((le32_to_cpu(log->iocs[nvme_cmd_zone_append]) &
240e6ee272c07a Keith Busch 2020-06-29 58 NVME_CMD_EFFECTS_CSUPP)) {
2f4c9ba23b887e Javier Gonz?lez 2020-12-01 59 if (test_and_clear_bit(NVME_NS_FORCE_RO, &ns->flags))
240e6ee272c07a Keith Busch 2020-06-29 60 dev_warn(ns->ctrl->device,
2f4c9ba23b887e Javier Gonz?lez 2020-12-01 61 "Zone Append supported for zoned namespace:%d. Remove read-only mode\n",
2f4c9ba23b887e Javier Gonz?lez 2020-12-01 62 ns->head->ns_id);
2f4c9ba23b887e Javier Gonz?lez 2020-12-01 63 } else {
2f4c9ba23b887e Javier Gonz?lez 2020-12-01 64 set_bit(NVME_NS_FORCE_RO, &ns->flags);
2f4c9ba23b887e Javier Gonz?lez 2020-12-01 65 dev_warn(ns->ctrl->device,
2f4c9ba23b887e Javier Gonz?lez 2020-12-01 66 "Zone Append not supported for zoned namespace:%d. Forcing to read-only mode\n",
240e6ee272c07a Keith Busch 2020-06-29 67 ns->head->ns_id);
240e6ee272c07a Keith Busch 2020-06-29 68 }
240e6ee272c07a Keith Busch 2020-06-29 69
240e6ee272c07a Keith Busch 2020-06-29 70 /* Lazily query controller append limit for the first zoned namespace */
240e6ee272c07a Keith Busch 2020-06-29 71 if (!ns->ctrl->max_zone_append) {
240e6ee272c07a Keith Busch 2020-06-29 72 status = nvme_set_max_append(ns->ctrl);
240e6ee272c07a Keith Busch 2020-06-29 73 if (status)
240e6ee272c07a Keith Busch 2020-06-29 74 return status;
240e6ee272c07a Keith Busch 2020-06-29 75 }
240e6ee272c07a Keith Busch 2020-06-29 76
240e6ee272c07a Keith Busch 2020-06-29 77 id = kzalloc(sizeof(*id), GFP_KERNEL);
240e6ee272c07a Keith Busch 2020-06-29 78 if (!id)
240e6ee272c07a Keith Busch 2020-06-29 79 return -ENOMEM;
240e6ee272c07a Keith Busch 2020-06-29 80
240e6ee272c07a Keith Busch 2020-06-29 81 c.identify.opcode = nvme_admin_identify;
240e6ee272c07a Keith Busch 2020-06-29 82 c.identify.nsid = cpu_to_le32(ns->head->ns_id);
240e6ee272c07a Keith Busch 2020-06-29 83 c.identify.cns = NVME_ID_CNS_CS_NS;
240e6ee272c07a Keith Busch 2020-06-29 84 c.identify.csi = NVME_CSI_ZNS;
240e6ee272c07a Keith Busch 2020-06-29 85
240e6ee272c07a Keith Busch 2020-06-29 86 status = nvme_submit_sync_cmd(ns->ctrl->admin_q, &c, id, sizeof(*id));
240e6ee272c07a Keith Busch 2020-06-29 87 if (status)
240e6ee272c07a Keith Busch 2020-06-29 88 goto free_data;
240e6ee272c07a Keith Busch 2020-06-29 89
240e6ee272c07a Keith Busch 2020-06-29 90 /*
240e6ee272c07a Keith Busch 2020-06-29 91 * We currently do not handle devices requiring any of the zoned
240e6ee272c07a Keith Busch 2020-06-29 92 * operation characteristics.
240e6ee272c07a Keith Busch 2020-06-29 93 */
240e6ee272c07a Keith Busch 2020-06-29 94 if (id->zoc) {
240e6ee272c07a Keith Busch 2020-06-29 95 dev_warn(ns->ctrl->device,
240e6ee272c07a Keith Busch 2020-06-29 96 "zone operations:%x not supported for namespace:%u\n",
240e6ee272c07a Keith Busch 2020-06-29 97 le16_to_cpu(id->zoc), ns->head->ns_id);
a9e0e6bc728ebc Christoph Hellwig 2021-04-07 98 status = -ENODEV;
240e6ee272c07a Keith Busch 2020-06-29 99 goto free_data;
240e6ee272c07a Keith Busch 2020-06-29 100 }
240e6ee272c07a Keith Busch 2020-06-29 101
240e6ee272c07a Keith Busch 2020-06-29 102 ns->zsze = nvme_lba_to_sect(ns, le64_to_cpu(id->lbafe[lbaf].zsze));
240e6ee272c07a Keith Busch 2020-06-29 103 if (!is_power_of_2(ns->zsze)) {
240e6ee272c07a Keith Busch 2020-06-29 104 dev_warn(ns->ctrl->device,
240e6ee272c07a Keith Busch 2020-06-29 105 "invalid zone size:%llu for namespace:%u\n",
240e6ee272c07a Keith Busch 2020-06-29 106 ns->zsze, ns->head->ns_id);
a9e0e6bc728ebc Christoph Hellwig 2021-04-07 107 status = -ENODEV;
240e6ee272c07a Keith Busch 2020-06-29 108 goto free_data;
240e6ee272c07a Keith Busch 2020-06-29 109 }
240e6ee272c07a Keith Busch 2020-06-29 110
6b2bd274744e64 Christoph Hellwig 2022-07-06 111 disk_set_zoned(ns->disk, BLK_ZONED_HM);
240e6ee272c07a Keith Busch 2020-06-29 112 blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, q);
982977df48179c Christoph Hellwig 2022-07-06 113 disk_set_max_open_zones(ns->disk, le32_to_cpu(id->mor) + 1);
982977df48179c Christoph Hellwig 2022-07-06 114 disk_set_max_active_zones(ns->disk, le32_to_cpu(id->mar) + 1);
240e6ee272c07a Keith Busch 2020-06-29 115 free_data:
240e6ee272c07a Keith Busch 2020-06-29 116 kfree(id);
240e6ee272c07a Keith Busch 2020-06-29 117 return status;
240e6ee272c07a Keith Busch 2020-06-29 118 }
240e6ee272c07a Keith Busch 2020-06-29 119

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

2023-11-20 08:24:07

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] nvme-core: remove head->effects to fix use-after-free

On Fri, Nov 17, 2023 at 09:38:19AM -0700, Keith Busch wrote:
> The effects log is per-CSI so different command sets won't create
> conflicts.

True. But that wasn't the point anyway. It is that different
controllers might expose very different namespaes with different
capabilities. Maybe a controller with HDD namespaces vs flash might
be a better example.

> Namespaces that are not shared don't really matter here because this
> problem is unique to mulitpath.

Indeed.

> It doesn't make sense for effects logs to be different per-controller
> for the same shared namespace. The spec doesn't seem to explicitly
> prevent that, but hints that all hosts should be seeing the same thing
> no matter which controller they're connected to:

Also agreed as already indicated in the past mail.

2023-11-20 10:19:12

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH] nvme-core: remove head->effects to fix use-after-free



On 11/20/23 10:23, Christoph Hellwig wrote:
> On Fri, Nov 17, 2023 at 09:38:19AM -0700, Keith Busch wrote:
>> The effects log is per-CSI so different command sets won't create
>> conflicts.
>
> True. But that wasn't the point anyway. It is that different
> controllers might expose very different namespaes with different
> capabilities. Maybe a controller with HDD namespaces vs flash might
> be a better example.
>
>> Namespaces that are not shared don't really matter here because this
>> problem is unique to mulitpath.
>
> Indeed.
>
>> It doesn't make sense for effects logs to be different per-controller
>> for the same shared namespace. The spec doesn't seem to explicitly
>> prevent that, but hints that all hosts should be seeing the same thing
>> no matter which controller they're connected to:
>
> Also agreed as already indicated in the past mail.

Having every ns get its own effects log cache is another 4k per nshead.
Even if we restrict it only to iocs its 1k per nshead.

Would it make sense to have nvme_free_cels fence passthru commands
with an rcu instead?

2023-11-20 23:34:02

by Randy Jennings

[permalink] [raw]
Subject: Re: [PATCH] nvme-core: remove head->effects to fix use-after-free

>> It doesn't make sense for effects logs to be different per-controller
>> for the same shared namespace. The spec doesn't seem to explicitly
>> prevent that, but hints that all hosts should be seeing the same thing
>> no matter which controller they're connected to:
>
> Also agreed as already indicated in the past mail.

Yuanyuan Zhong already pointed out a situation where the commands
supported portion of the log page could be different. When upgrading
the subsystem, some controllers may be upgraded sooner than others.
The upgrade could support new commands.

However, I would be surprised if the effects would be different for
currently supported commands, unless a controller was not reporting
effects before and starts reporting them.

Sincerely,
Randy Jennings