2022-06-27 07:51:48

by Li Zhijian

[permalink] [raw]
Subject: use-after-free in srpt_enable_tpg()

Hey folks

Another use-after-free was detected. This could be happens if i frequently abort (Ctrl-c) the blktests during it was running and then cause below error.
After digging to the coredump, i found that's because userspace API /sys/kernel/config/target/srpt/5054:00ff:fec9:7674/ still exists though the kernel
had released the srpt resource. So if the blktests access this userspace API, use-after-free(panic) will happen.

And i also noticed that similar issue had been fixed but reverted(see below) later years ago

commit 9b64f7d0bb0a8b5987f265756a563384765c7378
Author: Bart Van Assche <[email protected]>
Date: Mon Sep 30 16:17:07 2019 -0700

RDMA/srpt: Postpone HCA removal until after configfs directory removal

A shortcoming of the SCSI target core is that it does not have an API
for removing tpg or wwn objects. Wait until these directories have been
removed before allowing HCA removal to finish.

See also Bart Van Assche, "Re: Why using configfs as the only interface is
wrong for a storage target", 2011-02-07
(https://www.spinics.net/lists/linux-scsi/msg50248.html).


commit 77cf98d4ec90e8c48592c6537cfc2281c58f7ac3
Author: Bart Van Assche <[email protected]>
Date:   Fri Nov 1 13:47:56 2019 -0700

    Revert "RDMA/srpt: Postpone HCA removal until after configfs directory removal"

    Although the mentioned patch fixes a use-after-free bug, it introduces a
    hang during shutdown. Since the latter is worse, revert this patch.

    Link: https://lore.kernel.org/r/[email protected]
    Reported-by: Honggang Li <[email protected]>
    Fixes: 9b64f7d0bb0a ("RDMA/srpt: Postpone HCA removal until after configfs directory removal")
    Signed-off-by: Bart Van Assche <[email protected]>
    Acked-by: Honggang Li <[email protected]>
    Signed-off-by: Jason Gunthorpe <[email protected]>


So far, I doubt if the previous defect of configfs mentioned in 9b64f7d0b: "(RDMA/srpt: Postpone HCA removal until after configfs directory removal)"
has got a better solution. if not, i have no a clear mechanism to avoid it yet.

feedbacks are very welcome.

[  103.202207] ==================================================================
[  103.202213] BUG: KASAN: use-after-free in srpt_enable_tpg+0x70/0xec [ib_srpt]
[  103.202230] Read of size 8 at addr ffff888107d3d120 by task check/1370
[  103.202233]
[  103.202234] CPU: 0 PID: 1370 Comm: check Not tainted 5.19.0-rc1-roce-flush+ #85
[  103.202238] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-27-g64f37cc530f1-prebuilt.qemu.org 04/01/2014
[  103.202241] Call Trace:
[  103.202246]  <TASK>
[  103.202247]  dump_stack_lvl+0x34/0x44
[  103.202263]  print_report.cold+0x5e/0x5db
[  103.202277]  ? srpt_enable_tpg+0x70/0xec [ib_srpt]
[  103.202287]  kasan_report+0xab/0x120
[  103.202297]  ? srpt_enable_tpg+0x70/0xec [ib_srpt]
[  103.202307]  srpt_enable_tpg+0x70/0xec [ib_srpt]
[  103.202318]  target_fabric_tpg_base_enable_store+0xcc/0x130 [target_core_mod]
[  103.202364]  ? target_fabric_wwn_cmd_completion_affinity_show+0x40/0x40 [target_core_mod]
[  103.202408]  ? rwsem_down_read_slowpath+0x730/0x730
[  103.202417]  ? policy_nodemask+0x14/0xa0
[  103.202423]  ? policy_node+0x59/0x80
[  103.202427]  ? target_fabric_wwn_cmd_completion_affinity_show+0x40/0x40 [target_core_mod]
[  103.202471]  configfs_write_iter+0x15e/0x1e0
[  103.202482]  new_sync_write+0x206/0x310
[  103.202488]  ? new_sync_read+0x300/0x300
[  103.202493]  ? avc_policy_seqno+0x1d/0x30
[  103.202500]  ? selinux_file_permission+0x18b/0x1c0
[  103.202506]  vfs_write+0x33f/0x3e0
[  103.202510]  ksys_write+0xb4/0x150
[  103.202514]  ? __ia32_sys_read+0x40/0x40
[  103.202519]  do_syscall_64+0x3b/0x90
[  103.202524]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[  103.202529] RIP: 0033:0x7f40fc2017a7
[  103.202534] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[  103.202537] RSP: 002b:00007fff2c851918 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  103.202545] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f40fc2017a7
[  103.202548] RDX: 0000000000000002 RSI: 00005577dfb42ab0 RDI: 0000000000000001
[  103.202550] RBP: 00005577dfb42ab0 R08: 000000000000000a R09: 00007f40fc2974e0
[  103.202552] R10: 00007f40fc2973e0 R11: 0000000000000246 R12: 0000000000000002
[  103.202556] R13: 00007f40fc2d4520 R14: 0000000000000002 R15: 00007f40fc2d4700
[  103.202559]  </TASK>
[  103.202562]
[  103.202565] Allocated by task 832((efault)):
[  103.202568]  kasan_save_stack+0x1e/0x40
[  103.202572]  __kasan_kmalloc+0x81/0xa0
[  103.202575]  srpt_add_one+0x3b/0x60 [ib_srpt]
[  103.202584]  add_client_context+0x23b/0x300 [ib_core]
[  103.202645]  ib_register_client+0x1d5/0x220 [ib_core]
[  103.202703]  0xffffffffc0878076
[  103.202705]  do_one_initcall+0x84/0x280
[  103.202711]  do_init_module+0xdf/0x2e0
[  103.202719]  load_module+0x2b9e/0x2c90
[  103.202722]  __do_sys_finit_module+0x111/0x190
[  103.202725]  do_syscall_64+0x3b/0x90
[  103.202728]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[  103.202732]
[  103.202733] Freed by task 1307((efault)):
[  103.202736]  kasan_save_stack+0x1e/0x40
[  103.202739]  kasan_set_track+0x21/0x30
[  103.202742]  kasan_set_free_info+0x20/0x30
[  103.202746]  __kasan_slab_free+0x108/0x170
[  103.202749]  kfree+0x9a/0x320
[  103.202752]  srpt_remove_one.cold+0x23/0x1e1 [ib_srpt]
[  103.202762]  remove_client_context+0x8f/0xd0 [ib_core]
[  103.202819]  disable_device+0xee/0x1e0 [ib_core]
[  103.202877]  __ib_unregister_device+0x59/0xf0 [ib_core]
[  103.202935]  ib_unregister_device_and_put+0x3b/0x50 [ib_core]
[  103.202993]  nldev_dellink+0x126/0x1b0 [ib_core]
[  103.203120]  rdma_nl_rcv_msg+0x1cc/0x310 [ib_core]
[  103.203179]  rdma_nl_rcv+0x172/0x200 [ib_core]
[  103.203238]  netlink_unicast+0x36b/0x4a0
[  103.203247]  netlink_sendmsg+0x3a9/0x6d0
[  103.203250]  sock_sendmsg+0x91/0xa0
[  103.203256]  __sys_sendto+0x16f/0x210
[  103.203258]  __x64_sys_sendto+0x6f/0x80
[  103.203261]  do_syscall_64+0x3b/0x90
[  103.203264]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[  103.203268]
[  103.203269] The buggy address belongs to the object at ffff888107d3d000
                which belongs to the cache kmalloc-2k of size 2048
[  103.203272] The buggy address is located 288 bytes inside of
                2048-byte region [ffff888107d3d000, ffff888107d3d800)
[  103.203275]

[  103.203275] The buggy address belongs to the physical page:
[  103.203277] page:00000000fcc35344 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888107d39000 pfn:0x107d38
[  103.203282] head:00000000fcc35344 order:3 compound_mapcount:0 compound_pincount:0
[  103.203285] flags: 0x100000000010200(slab|head|node=0|zone=2)
[  103.203290] raw: 0100000000010200 ffffea0004105e08 ffffea00040a8e08 ffff888100042000
[  103.203293] raw: ffff888107d39000 0000000000080005 00000001ffffffff 0000000000000000
[  103.203295] page dumped because: kasan: bad access detected
[  103.203296]
[  103.203296] Memory state around the buggy address:
[  103.203298]  ffff888107d3d000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  103.203300]  ffff888107d3d080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  103.203302] >ffff888107d3d100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  103.203304]                                ^
[  103.203306]  ffff888107d3d180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  103.203308]  ffff888107d3d200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  103.203311] ==================================================================
[  103.203386] Kernel panic - not syncing: kasan.fault=panic set ...
[  103.318885] CPU: 0 PID: 1370 Comm: check Not tainted 5.19.0-rc1-roce-flush+ #85
[  103.320272] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-27-g64f37cc530f1-prebuilt.qemu.org 04/01/2014
[  103.322308] Call Trace:
[  103.322910]  <TASK>
[  103.323542]  dump_stack_lvl+0x34/0x44
[  103.324321]  panic+0x19a/0x345
[  103.324981]  ? panic_print_sys_info.part.0+0x5b/0x5b
[  103.325893]  ? _raw_spin_unlock_irqrestore+0xd/0x40
[  103.326832]  ? preempt_count_sub+0xf/0xb0
[  103.327644]  ? srpt_enable_tpg+0x70/0xec [ib_srpt]
[  103.328523]  end_report.part.0+0x54/0x69
[  103.329320]  kasan_report+0xba/0x120
[  103.330124]  ? srpt_enable_tpg+0x70/0xec [ib_srpt]
[  103.331163]  srpt_enable_tpg+0x70/0xec [ib_srpt]
[  103.332083]  target_fabric_tpg_base_enable_store+0xcc/0x130 [target_core_mod]
[  103.333376]  ? target_fabric_wwn_cmd_completion_affinity_show+0x40/0x40 [target_core_mod]
[  103.334878]  ? rwsem_down_read_slowpath+0x730/0x730
[  103.335808]  ? policy_nodemask+0x14/0xa0
[  103.336605]  ? policy_node+0x59/0x80
[  103.337378]  ? target_fabric_wwn_cmd_completion_affinity_show+0x40/0x40 [target_core_mod]
[  103.338875]  configfs_write_iter+0x15e/0x1e0
[  103.339733]  new_sync_write+0x206/0x310
[  103.340538]  ? new_sync_read+0x300/0x300
[  103.341347]  ? avc_policy_seqno+0x1d/0x30
[  103.342147]  ? selinux_file_permission+0x18b/0x1c0
[  103.344176]  vfs_write+0x33f/0x3e0
[  103.345204]  ksys_write+0xb4/0x150
[  103.345936]  ? __ia32_sys_read+0x40/0x40
[  103.346774]  do_syscall_64+0x3b/0x90
[  103.347548]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
[  103.348460] RIP: 0033:0x7f40fc2017a7
[  103.349192] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[  103.352189] RSP: 002b:00007fff2c851918 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[  103.353619] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f40fc2017a7
[  103.354857] RDX: 0000000000000002 RSI: 00005577dfb42ab0 RDI: 0000000000000001
[  103.356137] RBP: 00005577dfb42ab0 R08: 000000000000000a R09: 00007f40fc2974e0
[  103.357520] R10: 00007f40fc2973e0 R11: 0000000000000246 R12: 0000000000000002
[  103.358756] R13: 00007f40fc2d4520 R14: 0000000000000002 R15: 00007f40fc2d4700
[  103.360071]  </TASK>
[  103.360894] Kernel Offset: 0x30e00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[  103.362768] ---[ end Kernel panic - not syncing: kasan.fault=panic set ... ]---
(END)

Thanks
Zhijian


2022-06-27 17:04:53

by Bart Van Assche

[permalink] [raw]
Subject: Re: use-after-free in srpt_enable_tpg()

On 6/27/22 00:09, [email protected] wrote:
> So far, I doubt if the previous defect of configfs mentioned in
> 9b64f7d0b: "(RDMA/srpt: Postpone HCA removal until after configfs
> directory removal)" has got a better solution. if not, i have no a
> clear mechanism to avoid it yet.
>
> feedbacks are very welcome.
Mike, are you perhaps aware of any plans to add functions to the LIO
core for removing tpg and wwn objects?

Thanks,

Bart.

2022-06-30 17:27:12

by Mike Christie

[permalink] [raw]
Subject: Re: use-after-free in srpt_enable_tpg()

On 6/27/22 11:37 AM, Bart Van Assche wrote:
> On 6/27/22 00:09, [email protected] wrote:
>> So far, I doubt if the previous defect of configfs mentioned in
>> 9b64f7d0b: "(RDMA/srpt: Postpone HCA removal until after configfs
>> directory removal)" has got a better solution. if not, i have no a
>> clear mechanism to avoid it yet.
>>
>> feedbacks are very welcome.
> Mike, are you perhaps aware of any plans to add functions to the LIO core for removing tpg and wwn objects?
>

I don't know any work being done in this area. I was only working
on the refcounting/configfs parts for sessions in those configfs/sysfs
patchsets. However, I think I hit similar issues with the session.
I went around the world with solutions but didn't really like them
so I never pushed the patches for inclusion.

What was the hang caused by 9b64f7d0bb0a?

2022-06-30 19:14:57

by Bart Van Assche

[permalink] [raw]
Subject: Re: use-after-free in srpt_enable_tpg()

On 6/30/22 09:40, Mike Christie wrote:
> On 6/27/22 11:37 AM, Bart Van Assche wrote:
>> On 6/27/22 00:09, [email protected] wrote:
>>> So far, I doubt if the previous defect of configfs mentioned in
>>> 9b64f7d0b: "(RDMA/srpt: Postpone HCA removal until after configfs
>>> directory removal)" has got a better solution. if not, i have no a
>>> clear mechanism to avoid it yet.
>>>
>>> feedbacks are very welcome.
>> Mike, are you perhaps aware of any plans to add functions to the LIO core for removing tpg and wwn objects?
>>
>
> I don't know any work being done in this area. I was only working
> on the refcounting/configfs parts for sessions in those configfs/sysfs
> patchsets. However, I think I hit similar issues with the session.
> I went around the world with solutions but didn't really like them
> so I never pushed the patches for inclusion.
>
> What was the hang caused by 9b64f7d0bb0a?

Hi Mike,

I have not been able to find Honggang's bug report that led to the
revert of that commit. I guess that the hang happened in the while-loop
in srpt_release_sport() that was modified by commit 9b64f7d0bb0a.

Thanks,

Bart.

2022-07-02 23:05:10

by Bart Van Assche

[permalink] [raw]
Subject: Re: use-after-free in srpt_enable_tpg()

On 6/30/22 18:59, Hillf Danton wrote:
> That hang can be skipped by removing the wait loop in
> srpt_release_sport() - in the direction of 9b64f7d0bb0a, sdev will not
> go home if any sport's refcount does not drop on ground. To do that, add
> port refcount to sdev in the diff below in bid to resurrect 9b64f7d0bb0a.
>
> Then gc work can be added for dying sports to drop tpg after delaying a second.

I'm afraid that the patch from your email will lead to a use-after-free
of sdev->pd. As long as a session is live the ch->qp pointer may be
dereferenced. The sdev->pd pointer is stored in the pd member of struct
ib_qp and hence may be dereferenced by any function that uses ch->qp.

Thanks,

Bart.

2022-07-03 14:57:54

by Bart Van Assche

[permalink] [raw]
Subject: Re: use-after-free in srpt_enable_tpg()

On 7/2/22 19:11, Hillf Danton wrote:
> On Sat, 2 Jul 2022 15:26:33 -0700 Bart Van Assche wrote:
>> As long as a session is live the ch->qp pointer may be
>> dereferenced. The sdev->pd pointer is stored in the pd member of struct
>> ib_qp and hence may be dereferenced by any function that uses ch->qp.
>
> If it is still an issue after ib_dealloc_pd(sdev->pd) then it goes beyond the
> aperture of my proposal and needs another fix.
>
> Hillf
>
> +++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
> @@ -3235,6 +3235,8 @@ static void srpt_remove_one(struct ib_de
>
> ib_set_client_data(device, &srpt_client, NULL);
>
> + /* make sdev survive ib_dealloc_pd(sdev->pd); */
> + atomic_inc(&sdev->port_refcount);
> /*
> * Unregistering a target must happen after destroying sdev->cm_id
> * such that no new SRP_LOGIN_REQ information units can arrive while
> @@ -3250,6 +3252,9 @@ static void srpt_remove_one(struct ib_de
> srpt_free_srq(sdev);
>
> ib_dealloc_pd(sdev->pd);
> +
> + if (0 == atomic_dec_return(&sdev->port_refcount))
> + kfree(sdev);
> }
>
> static struct ib_client srpt_client = {

Do you perhaps want to combine the above patch with the previous patch?
I don't think that any reference counting scheme can fix all
use-after-free issues related to srpt_remove_one(). Immediately after
srpt_remove_one() returns the caller of this function calls
ib_device_put() and ib_client_put(). These functions free data
structures that can be reached from the pointers that are stored in
struct ib_qp. Holding a reference on struct ib_device as long as any
session is live would allow to remove the while-loop from
srpt_release_sport(). However, I'm not sure that would make a
significant difference since there is a similar while-loop in one of the
callers of srpt_remove_one() (disable_device() in the RDMA core).

Bart.

2022-07-05 04:47:26

by Bart Van Assche

[permalink] [raw]
Subject: Re: use-after-free in srpt_enable_tpg()

On 7/3/22 17:11, Hillf Danton wrote:
> On Sun, 3 Jul 2022 07:55:05 -0700 Bart Van Assche wrote:
>> However, I'm not sure that would make a
>> significant difference since there is a similar while-loop in one of the
>> callers of srpt_remove_one() (disable_device() in the RDMA core).
>
> Hehe... feel free to shed light on how the loop in RDMA core is currently
> making the loop in srpt more prone to uaf?

In my email I was referring to the following code in disable_device():

wait_for_completion(&device->unreg_completion);

I think that code shows that device removal by the RDMA core is
synchronous in nature. Even if the ib_srpt source code would be modified
such that the objects referred by that code live longer, the wait loop
in disable_device() would wait for the ib_device reference counts to
drop to zero.

So I do not expect that modifying object lifetimes in ib_srpt.c can lead
to a solution.

Removing configfs directories from inside srpt_release_sport() could be
a solution. However, configfs does not have any API to remove
directories and I'm not aware of any plans to add such an API.
Additionally, several kernel maintainers disagree with invoking the
rmdir system call from inside kernel code.

A potential solution could be to decouple the lifetimes of the data
structures used for configfs (struct se_wwn and struct srpt_tpg) and the
data structures associated with RDMA objects (struct srpt_port). If
nobody else beats me to this I will try to find the time to implement
this approach.

Bart.

2022-07-05 13:57:00

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: use-after-free in srpt_enable_tpg()

On Mon, Jul 04, 2022 at 09:34:07PM -0700, Bart Van Assche wrote:
> On 7/3/22 17:11, Hillf Danton wrote:
> > On Sun, 3 Jul 2022 07:55:05 -0700 Bart Van Assche wrote:
> > > However, I'm not sure that would make a
> > > significant difference since there is a similar while-loop in one of the
> > > callers of srpt_remove_one() (disable_device() in the RDMA core).
> >
> > Hehe... feel free to shed light on how the loop in RDMA core is currently
> > making the loop in srpt more prone to uaf?
>
> In my email I was referring to the following code in disable_device():
>
> wait_for_completion(&device->unreg_completion);
>
> I think that code shows that device removal by the RDMA core is synchronous
> in nature. Even if the ib_srpt source code would be modified such that the
> objects referred by that code live longer, the wait loop in disable_device()
> would wait for the ib_device reference counts to drop to zero.

That is not really the "ib_device" reference count it is the
"registration" reference count.

IB has a system where drivers/ulp can create critical regions where
the ib device must be registered using the ib_device_try_get()/put
calls. "Must be registered" is useful in a number of places but should
not be held for a long period.

This is distinct from the normal struct device refcount that simply
keeps the ib_device memory alive.

Jason

2022-07-05 16:50:16

by Bart Van Assche

[permalink] [raw]
Subject: Re: use-after-free in srpt_enable_tpg()

On 7/5/22 04:40, Hillf Danton wrote:
> If no compat devices can be added to ib_device with DEVICE_REGISTERED
> cleared then they can be removed without ib_device's refcount dropping
> to zero.
> Even if that is not strictly true, a new flag that marks ib device
> disabled and prevents new compact devices from being added can be added
> in bid to cut the wait for completion.
>
> Hillf
>
> +++ b/drivers/infiniband/core/device.c
> @@ -1265,6 +1265,7 @@ static void disable_device(struct ib_dev
>
> down_write(&devices_rwsem);
> xa_clear_mark(&devices, device->index, DEVICE_REGISTERED);
> + // device->disabled = true;
> up_write(&devices_rwsem);
>
> /*
> @@ -1282,17 +1283,10 @@ static void disable_device(struct ib_dev
> }
>
> ib_cq_pool_cleanup(device);
> + remove_compat_devs(device);
>
> /* Pairs with refcount_set in enable_device */
> ib_device_put(device);
> - wait_for_completion(&device->unreg_completion);
> -
> - /*
> - * compat devices must be removed after device refcount drops to zero.
> - * Otherwise init_net() may add more compatdevs after removing compat
> - * devices and before device is disabled.
> - */
> - remove_compat_devs(device);
> }

I'm not convinced the above patch is a step in the right direction nor
that it is correct. Anyway, since the RDMA maintainers know this code
better than I do I will let them comment on the above patch.

Bart.

2022-07-27 03:29:55

by Li Zhijian

[permalink] [raw]
Subject: Re: use-after-free in srpt_enable_tpg()

Bart,


On 05/07/2022 12:34, Bart Van Assche wrote:
>
> A potential solution could be to decouple the lifetimes of the data structures used for configfs (struct se_wwn and struct srpt_tpg) and the data structures associated with RDMA objects (struct srpt_port). If nobody else beats me to this I will try to find the time to implement this approach.

I saw your approach this morning, i will have a look and tests later.

Thanks
Zhijian