2022-08-09 21:19:49

by Waiman Long

[permalink] [raw]
Subject: [PATCH] mm/slab_common: Deleting kobject in kmem_cache_destroy() without holding slab_mutex/cpu_hotplug_lock

A circular locking problem is reported by lockdep due to the following
circular locking dependency.

+--> cpu_hotplug_lock --> slab_mutex --> kn->active#126 --+
| |
+---------------------------------------------------------+

One way to break this circular locking chain is to avoid holding
cpu_hotplug_lock and slab_mutex while deleting the kobject in
sysfs_slab_unlink() which should be equivalent to doing a write_lock
and write_unlock pair of the kn->active virtual lock.

Since the kobject structures are not protected by slab_mutex or the
cpu_hotplug_lock, we can certainly release those locks before doing
the delete operation.

Move sysfs_slab_unlink() and sysfs_slab_release() to the newly
created kmem_cache_release() and call it outside the slab_mutex &
cpu_hotplug_lock critical sections.

Signed-off-by: Waiman Long <[email protected]>
---
mm/slab_common.c | 48 +++++++++++++++++++++++++++++++-----------------
1 file changed, 31 insertions(+), 17 deletions(-)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index 17996649cfe3..9274fb03563e 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -392,6 +392,30 @@ kmem_cache_create(const char *name, unsigned int size, unsigned int align,
}
EXPORT_SYMBOL(kmem_cache_create);

+#ifdef SLAB_SUPPORTS_SYSFS
+/*
+ * For a given kmem_cache, kmem_cache_destroy() should only be called
+ * once or there will be a use-after-free problem. The actual deletion
+ * and release of the kobject does not need slab_mutex or cpu_hotplug_lock
+ * protection. So they are now done without holding those locks.
+ */
+static void kmem_cache_release(struct kmem_cache *s, bool workfn)
+{
+ if (!workfn)
+ sysfs_slab_unlink(s);
+
+ if (workfn || !(s->flags & SLAB_TYPESAFE_BY_RCU))
+ sysfs_slab_release(s);
+ else
+ schedule_work(&slab_caches_to_rcu_destroy_work);
+}
+#else
+static inline void kmem_cache_release(struct kmem_cache *s, bool workfn)
+{
+ slab_kmem_cache_release(s);
+}
+#endif
+
static void slab_caches_to_rcu_destroy_workfn(struct work_struct *work)
{
LIST_HEAD(to_destroy);
@@ -418,11 +442,7 @@ static void slab_caches_to_rcu_destroy_workfn(struct work_struct *work)
list_for_each_entry_safe(s, s2, &to_destroy, list) {
debugfs_slab_release(s);
kfence_shutdown_cache(s);
-#ifdef SLAB_SUPPORTS_SYSFS
- sysfs_slab_release(s);
-#else
- slab_kmem_cache_release(s);
-#endif
+ kmem_cache_release(s, true);
}
}

@@ -437,20 +457,10 @@ static int shutdown_cache(struct kmem_cache *s)
list_del(&s->list);

if (s->flags & SLAB_TYPESAFE_BY_RCU) {
-#ifdef SLAB_SUPPORTS_SYSFS
- sysfs_slab_unlink(s);
-#endif
list_add_tail(&s->list, &slab_caches_to_rcu_destroy);
- schedule_work(&slab_caches_to_rcu_destroy_work);
} else {
kfence_shutdown_cache(s);
debugfs_slab_release(s);
-#ifdef SLAB_SUPPORTS_SYSFS
- sysfs_slab_unlink(s);
- sysfs_slab_release(s);
-#else
- slab_kmem_cache_release(s);
-#endif
}

return 0;
@@ -465,14 +475,16 @@ void slab_kmem_cache_release(struct kmem_cache *s)

void kmem_cache_destroy(struct kmem_cache *s)
{
+ int refcnt;
+
if (unlikely(!s) || !kasan_check_byte(s))
return;

cpus_read_lock();
mutex_lock(&slab_mutex);

- s->refcount--;
- if (s->refcount)
+ refcnt = --s->refcount;
+ if (refcnt)
goto out_unlock;

WARN(shutdown_cache(s),
@@ -481,6 +493,8 @@ void kmem_cache_destroy(struct kmem_cache *s)
out_unlock:
mutex_unlock(&slab_mutex);
cpus_read_unlock();
+ if (!refcnt)
+ kmem_cache_release(s, false);
}
EXPORT_SYMBOL(kmem_cache_destroy);

--
2.31.1


2022-08-09 22:37:45

by Roman Gushchin

[permalink] [raw]
Subject: Re: [PATCH] mm/slab_common: Deleting kobject in kmem_cache_destroy() without holding slab_mutex/cpu_hotplug_lock

On Tue, Aug 09, 2022 at 04:59:01PM -0400, Waiman Long wrote:
> A circular locking problem is reported by lockdep due to the following
> circular locking dependency.
>
> +--> cpu_hotplug_lock --> slab_mutex --> kn->active#126 --+
> | |
> +---------------------------------------------------------+
>
> One way to break this circular locking chain is to avoid holding
> cpu_hotplug_lock and slab_mutex while deleting the kobject in
> sysfs_slab_unlink() which should be equivalent to doing a write_lock
> and write_unlock pair of the kn->active virtual lock.
>
> Since the kobject structures are not protected by slab_mutex or the
> cpu_hotplug_lock, we can certainly release those locks before doing
> the delete operation.
>
> Move sysfs_slab_unlink() and sysfs_slab_release() to the newly
> created kmem_cache_release() and call it outside the slab_mutex &
> cpu_hotplug_lock critical sections.
>
> Signed-off-by: Waiman Long <[email protected]>
> ---
> mm/slab_common.c | 48 +++++++++++++++++++++++++++++++-----------------
> 1 file changed, 31 insertions(+), 17 deletions(-)
>
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 17996649cfe3..9274fb03563e 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -392,6 +392,30 @@ kmem_cache_create(const char *name, unsigned int size, unsigned int align,
> }
> EXPORT_SYMBOL(kmem_cache_create);
>
> +#ifdef SLAB_SUPPORTS_SYSFS
> +/*
> + * For a given kmem_cache, kmem_cache_destroy() should only be called
> + * once or there will be a use-after-free problem. The actual deletion
> + * and release of the kobject does not need slab_mutex or cpu_hotplug_lock
> + * protection. So they are now done without holding those locks.
> + */
> +static void kmem_cache_release(struct kmem_cache *s, bool workfn)
> +{
> + if (!workfn)
> + sysfs_slab_unlink(s);
> +
> + if (workfn || !(s->flags & SLAB_TYPESAFE_BY_RCU))
> + sysfs_slab_release(s);
> + else
> + schedule_work(&slab_caches_to_rcu_destroy_work);
> +}
> +#else
> +static inline void kmem_cache_release(struct kmem_cache *s, bool workfn)
> +{
> + slab_kmem_cache_release(s);
> +}
> +#endif
> +
> static void slab_caches_to_rcu_destroy_workfn(struct work_struct *work)
> {
> LIST_HEAD(to_destroy);
> @@ -418,11 +442,7 @@ static void slab_caches_to_rcu_destroy_workfn(struct work_struct *work)
> list_for_each_entry_safe(s, s2, &to_destroy, list) {
> debugfs_slab_release(s);
> kfence_shutdown_cache(s);
> -#ifdef SLAB_SUPPORTS_SYSFS
> - sysfs_slab_release(s);
> -#else
> - slab_kmem_cache_release(s);
> -#endif
> + kmem_cache_release(s, true);

Hi Waiman!

As I understand, with SLAB_SUPPORTS_SYSFS kmem_cache_release() can effectively call
into itself: first it's called with workfn == false from shutdown_cache() and
then optionally it's scheduled to call itself from a work context with
workfn == true just to call sysfs_slab_release(). Is it right?

If !SLAB_SUPPORTS_SYSFS, shutdown_cache() optionally adds kmem_cache to the
slab_caches_to_rcu_destroy list and calls kmem_cache_release(s, false) ==
slab_kmem_cache_release(). How it's then removed from the list?

Overall the patch is a bit hard to follow (not like this code was easy to read
before, so can't blame the patch). But I wonder if it will make things simpler
to decouple kmem_cache_release(workfn == true) and kmem_cache_release(workfn == false)
into 2 different helpers? Or at least add a bold comment on how things are supposed
to work.

Thanks!

2022-08-09 23:19:34

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH] mm/slab_common: Deleting kobject in kmem_cache_destroy() without holding slab_mutex/cpu_hotplug_lock

On 8/9/22 18:25, Roman Gushchin wrote:
> On Tue, Aug 09, 2022 at 04:59:01PM -0400, Waiman Long wrote:
>> A circular locking problem is reported by lockdep due to the following
>> circular locking dependency.
>>
>> +--> cpu_hotplug_lock --> slab_mutex --> kn->active#126 --+
>> | |
>> +---------------------------------------------------------+
>>
>> One way to break this circular locking chain is to avoid holding
>> cpu_hotplug_lock and slab_mutex while deleting the kobject in
>> sysfs_slab_unlink() which should be equivalent to doing a write_lock
>> and write_unlock pair of the kn->active virtual lock.
>>
>> Since the kobject structures are not protected by slab_mutex or the
>> cpu_hotplug_lock, we can certainly release those locks before doing
>> the delete operation.
>>
>> Move sysfs_slab_unlink() and sysfs_slab_release() to the newly
>> created kmem_cache_release() and call it outside the slab_mutex &
>> cpu_hotplug_lock critical sections.
>>
>> Signed-off-by: Waiman Long <[email protected]>
>> ---
>> mm/slab_common.c | 48 +++++++++++++++++++++++++++++++-----------------
>> 1 file changed, 31 insertions(+), 17 deletions(-)
>>
>> diff --git a/mm/slab_common.c b/mm/slab_common.c
>> index 17996649cfe3..9274fb03563e 100644
>> --- a/mm/slab_common.c
>> +++ b/mm/slab_common.c
>> @@ -392,6 +392,30 @@ kmem_cache_create(const char *name, unsigned int size, unsigned int align,
>> }
>> EXPORT_SYMBOL(kmem_cache_create);
>>
>> +#ifdef SLAB_SUPPORTS_SYSFS
>> +/*
>> + * For a given kmem_cache, kmem_cache_destroy() should only be called
>> + * once or there will be a use-after-free problem. The actual deletion
>> + * and release of the kobject does not need slab_mutex or cpu_hotplug_lock
>> + * protection. So they are now done without holding those locks.
>> + */
>> +static void kmem_cache_release(struct kmem_cache *s, bool workfn)
>> +{
>> + if (!workfn)
>> + sysfs_slab_unlink(s);
>> +
>> + if (workfn || !(s->flags & SLAB_TYPESAFE_BY_RCU))
>> + sysfs_slab_release(s);
>> + else
>> + schedule_work(&slab_caches_to_rcu_destroy_work);
>> +}
>> +#else
>> +static inline void kmem_cache_release(struct kmem_cache *s, bool workfn)
>> +{
>> + slab_kmem_cache_release(s);
>> +}
>> +#endif
>> +
>> static void slab_caches_to_rcu_destroy_workfn(struct work_struct *work)
>> {
>> LIST_HEAD(to_destroy);
>> @@ -418,11 +442,7 @@ static void slab_caches_to_rcu_destroy_workfn(struct work_struct *work)
>> list_for_each_entry_safe(s, s2, &to_destroy, list) {
>> debugfs_slab_release(s);
>> kfence_shutdown_cache(s);
>> -#ifdef SLAB_SUPPORTS_SYSFS
>> - sysfs_slab_release(s);
>> -#else
>> - slab_kmem_cache_release(s);
>> -#endif
>> + kmem_cache_release(s, true);
> Hi Waiman!
>
> As I understand, with SLAB_SUPPORTS_SYSFS kmem_cache_release() can effectively call
> into itself: first it's called with workfn == false from shutdown_cache() and
> then optionally it's scheduled to call itself from a work context with
> workfn == true just to call sysfs_slab_release(). Is it right?
>
> If !SLAB_SUPPORTS_SYSFS, shutdown_cache() optionally adds kmem_cache to the
> slab_caches_to_rcu_destroy list and calls kmem_cache_release(s, false) ==
> slab_kmem_cache_release(). How it's then removed from the list?
>
> Overall the patch is a bit hard to follow (not like this code was easy to read
> before, so can't blame the patch). But I wonder if it will make things simpler
> to decouple kmem_cache_release(workfn == true) and kmem_cache_release(workfn == false)
> into 2 different helpers? Or at least add a bold comment on how things are supposed
> to work.
>
> Thanks!

You are right. I agree that it can be hard to read. Simpler is always
better. Will post a v2 with the change suggested.

Thanks,
Longman

2022-08-10 10:13:04

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH] mm/slab_common: Deleting kobject in kmem_cache_destroy() without holding slab_mutex/cpu_hotplug_lock

On 8/9/22 22:59, Waiman Long wrote:
> A circular locking problem is reported by lockdep due to the following
> circular locking dependency.
>
> +--> cpu_hotplug_lock --> slab_mutex --> kn->active#126 --+
> | |
> +---------------------------------------------------------+

This sounded familiar and I've found a thread from January:

https://lore.kernel.org/all/388098b2c03fbf0a732834fc01b2d875c335bc49.1642170196.git.lucien.xin@gmail.com/

But that seemed to be specific to RHEL-8 RT kernel and not reproduced
with mainline. Is it different this time? Can you share the splats?

> One way to break this circular locking chain is to avoid holding
> cpu_hotplug_lock and slab_mutex while deleting the kobject in
> sysfs_slab_unlink() which should be equivalent to doing a write_lock
> and write_unlock pair of the kn->active virtual lock.
>
> Since the kobject structures are not protected by slab_mutex or the
> cpu_hotplug_lock, we can certainly release those locks before doing
> the delete operation.
>
> Move sysfs_slab_unlink() and sysfs_slab_release() to the newly
> created kmem_cache_release() and call it outside the slab_mutex &
> cpu_hotplug_lock critical sections.
>
> Signed-off-by: Waiman Long <[email protected]>
> ---
> mm/slab_common.c | 48 +++++++++++++++++++++++++++++++-----------------
> 1 file changed, 31 insertions(+), 17 deletions(-)
>
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 17996649cfe3..9274fb03563e 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -392,6 +392,30 @@ kmem_cache_create(const char *name, unsigned int size, unsigned int align,
> }
> EXPORT_SYMBOL(kmem_cache_create);
>
> +#ifdef SLAB_SUPPORTS_SYSFS
> +/*
> + * For a given kmem_cache, kmem_cache_destroy() should only be called
> + * once or there will be a use-after-free problem. The actual deletion
> + * and release of the kobject does not need slab_mutex or cpu_hotplug_lock
> + * protection. So they are now done without holding those locks.
> + */
> +static void kmem_cache_release(struct kmem_cache *s, bool workfn)
> +{
> + if (!workfn)
> + sysfs_slab_unlink(s);
> +
> + if (workfn || !(s->flags & SLAB_TYPESAFE_BY_RCU))
> + sysfs_slab_release(s);
> + else
> + schedule_work(&slab_caches_to_rcu_destroy_work);
> +}
> +#else
> +static inline void kmem_cache_release(struct kmem_cache *s, bool workfn)
> +{
> + slab_kmem_cache_release(s);
> +}
> +#endif
> +
> static void slab_caches_to_rcu_destroy_workfn(struct work_struct *work)
> {
> LIST_HEAD(to_destroy);
> @@ -418,11 +442,7 @@ static void slab_caches_to_rcu_destroy_workfn(struct work_struct *work)
> list_for_each_entry_safe(s, s2, &to_destroy, list) {
> debugfs_slab_release(s);
> kfence_shutdown_cache(s);
> -#ifdef SLAB_SUPPORTS_SYSFS
> - sysfs_slab_release(s);
> -#else
> - slab_kmem_cache_release(s);
> -#endif
> + kmem_cache_release(s, true);
> }
> }
>
> @@ -437,20 +457,10 @@ static int shutdown_cache(struct kmem_cache *s)
> list_del(&s->list);
>
> if (s->flags & SLAB_TYPESAFE_BY_RCU) {
> -#ifdef SLAB_SUPPORTS_SYSFS
> - sysfs_slab_unlink(s);
> -#endif
> list_add_tail(&s->list, &slab_caches_to_rcu_destroy);
> - schedule_work(&slab_caches_to_rcu_destroy_work);
> } else {
> kfence_shutdown_cache(s);
> debugfs_slab_release(s);
> -#ifdef SLAB_SUPPORTS_SYSFS
> - sysfs_slab_unlink(s);
> - sysfs_slab_release(s);
> -#else
> - slab_kmem_cache_release(s);
> -#endif
> }
>
> return 0;
> @@ -465,14 +475,16 @@ void slab_kmem_cache_release(struct kmem_cache *s)
>
> void kmem_cache_destroy(struct kmem_cache *s)
> {
> + int refcnt;
> +
> if (unlikely(!s) || !kasan_check_byte(s))
> return;
>
> cpus_read_lock();
> mutex_lock(&slab_mutex);
>
> - s->refcount--;
> - if (s->refcount)
> + refcnt = --s->refcount;
> + if (refcnt)
> goto out_unlock;
>
> WARN(shutdown_cache(s),
> @@ -481,6 +493,8 @@ void kmem_cache_destroy(struct kmem_cache *s)
> out_unlock:
> mutex_unlock(&slab_mutex);
> cpus_read_unlock();
> + if (!refcnt)
> + kmem_cache_release(s, false);
> }
> EXPORT_SYMBOL(kmem_cache_destroy);
>

2022-08-10 14:22:09

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH] mm/slab_common: Deleting kobject in kmem_cache_destroy() without holding slab_mutex/cpu_hotplug_lock

On 8/10/22 05:34, Vlastimil Babka wrote:
> On 8/9/22 22:59, Waiman Long wrote:
>> A circular locking problem is reported by lockdep due to the following
>> circular locking dependency.
>>
>>    +--> cpu_hotplug_lock --> slab_mutex --> kn->active#126 --+
>>    |                                                         |
>>    +---------------------------------------------------------+
>
> This sounded familiar and I've found a thread from January:
>
> https://lore.kernel.org/all/388098b2c03fbf0a732834fc01b2d875c335bc49.1642170196.git.lucien.xin@gmail.com/
>
>
> But that seemed to be specific to RHEL-8 RT kernel and not reproduced
> with mainline. Is it different this time? Can you share the splats?

I think this is easier to reproduce on a RT kernel, but it also happens
in a non-RT kernel. One example splat that I got was

[ 1777.114757] ======================================================
[ 1777.121646] WARNING: possible circular locking dependency detected
[ 1777.128544] 4.18.0-403.el8.x86_64+debug #1 Not tainted
[ 1777.134280] ------------------------------------------------------
[ 1777.141176] slub_cpu_partia/4113 is trying to acquire lock:
[ 1777.147395] ffffffffaef95b20 (cpu_hotplug_lock){++++}-{0:0}, at:
cpu_partial_store+0x47/0xa0
[ 1777.156837]
               but task is already holding lock:
[ 1777.163346] ffff88811bd17578 (kn->count#126){++++}-{0:0}, at:
kernfs_fop_write+0x1d0/0x410
[ 1777.172591]
               which lock already depends on the new lock.

[ 1777.181718]
               the existing dependency chain (in reverse order) is:
[ 1777.190070]
               -> #2 (kn->count#126){++++}-{0:0}:
[ 1777.196688]        lock_acquire+0x1db/0x8e0
[ 1777.201360]        __kernfs_remove+0x617/0x800
[ 1777.206320]        kernfs_remove+0x1d/0x30
[ 1777.210891]        kobject_del+0x8e/0x150
[ 1777.215360]        kmem_cache_destroy+0x10d/0x130
[ 1777.220615]        bioset_exit+0x1e3/0x2c0
[ 1777.225189]        blk_release_queue+0x20d/0x310
[ 1777.230336]        kobject_release+0x105/0x390
[ 1777.235296] scsi_device_dev_release_usercontext+0x5e3/0xd60
[ 1777.242198]        execute_in_process_context+0x27/0x130
[ 1777.248131]        device_release+0xa1/0x210
[ 1777.252899]        kobject_release+0x105/0x390
[ 1777.257857]        scsi_alloc_sdev+0xa1e/0xc80
[ 1777.262820]        scsi_probe_and_add_lun+0x809/0x2da0
[ 1777.268557]        __scsi_scan_target+0x1c1/0xb60
[ 1777.273801]        scsi_scan_channel+0xe2/0x150
[ 1777.278855]        scsi_scan_host_selected+0x1fb/0x2a0
[ 1777.284590]        do_scan_async+0x3e/0x440
[ 1777.289258]        async_run_entry_fn+0xd8/0x660
[ 1777.294413]        process_one_work+0x919/0x17c0
[ 1777.299568]        worker_thread+0x87/0xb40
[ 1777.304227]        kthread+0x30d/0x3c0
[ 1777.308411]        ret_from_fork+0x3a/0x50
[ 1777.312984]
               -> #1 (slab_mutex){+.+.}-{3:3}:
[ 1777.319309]        lock_acquire+0x1db/0x8e0
[ 1777.323978]        __mutex_lock+0x154/0x15d0
[ 1777.328745]        kmem_cache_destroy+0x3f/0x130
[ 1777.333898]        bioset_exit+0x1e3/0x2c0
[ 1777.338469]        blk_release_queue+0x20d/0x310
[ 1777.343623]        kobject_release+0x105/0x390
[ 1777.348582] scsi_device_dev_release_usercontext+0x5e3/0xd60
[ 1777.355481]        execute_in_process_context+0x27/0x130
[ 1777.361409]        device_release+0xa1/0x210
[ 1777.366175]        kobject_release+0x105/0x390
[ 1777.371135]        scsi_alloc_sdev+0xa1e/0xc80
[ 1777.376096]        scsi_probe_and_add_lun+0x809/0x2da0
[ 1777.381824]        __scsi_scan_target+0x1c1/0xb60
[ 1777.387073]        scsi_scan_channel+0xe2/0x150
[ 1777.392128]        scsi_scan_host_selected+0x1fb/0x2a0
[ 1777.397864]        do_scan_async+0x3e/0x440
[ 1777.402533]        async_run_entry_fn+0xd8/0x660
[ 1777.407688]        process_one_work+0x919/0x17c0
[ 1777.412841]        worker_thread+0x87/0xb40
[ 1777.417509]        kthread+0x30d/0x3c0
[ 1777.421693]        ret_from_fork+0x3a/0x50
[ 1777.426263]
               -> #0 (cpu_hotplug_lock){++++}-{0:0}:
[ 1777.433168]        check_prevs_add+0x3fa/0x18b0
[ 1777.438226]        __lock_acquire+0x21b3/0x2b80
[ 1777.443273]        lock_acquire+0x1db/0x8e0
[ 1777.447940]        cpus_read_lock+0x40/0xc0
[ 1777.452611]        cpu_partial_store+0x47/0xa0
[ 1777.457571]        slab_attr_store+0x20/0x30
[ 1777.462338]        kernfs_fop_write+0x255/0x410
[ 1777.467394]        vfs_write+0x157/0x460
[ 1777.471777]        ksys_write+0xb8/0x170
[ 1777.476155]        do_syscall_64+0xa5/0x450
[ 1777.480828]        entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[ 1777.487041]
               other info that might help us debug this:

[ 1777.495975] Chain exists of:
                 cpu_hotplug_lock --> slab_mutex --> kn->count#126

[ 1777.507443]  Possible unsafe locking scenario:

[ 1777.514049]        CPU0                    CPU1
[ 1777.519102]        ----                    ----
[ 1777.524155]   lock(kn->count#126);
[ 1777.527956]                                lock(slab_mutex);
[ 1777.534274] lock(kn->count#126);
[ 1777.540885]   lock(cpu_hotplug_lock);
[ 1777.544964]
                *** DEADLOCK ***

[ 1777.551570] 3 locks held by slub_cpu_partia/4113:
[ 1777.556820]  #0: ffff8881319884d0 (sb_writers#4){.+.+}-{0:0}, at:
vfs_write+0x31c/0x460
[ 1777.565771]  #1: ffff888161d39490 (&of->mutex){+.+.}-{3:3}, at:
kernfs_fop_write+0x1ad/0x410
[ 1777.575203]  #2: ffff88811bd17578 (kn->count#126){++++}-{0:0}, at:
kernfs_fop_write+0x1d0/0x410
[ 1777.584928]
               stack backtrace:
[ 1777.589792] CPU: 65 PID: 4113 Comm: slub_cpu_partia Kdump: loaded Not
tainted 4.18.0-403.el8.x86_64+debug #1
[ 1777.600767] Hardware name: Lenovo ThinkSystem SR635
-[7Y98XXXXXX]-/System Board, BIOS CFE103A 07/04/2019
[ 1777.611342] Call Trace:
[ 1777.614072]  dump_stack+0x5c/0x80
[ 1777.617777]  check_noncircular+0x27f/0x320
[ 1777.622352]  ? print_circular_bug+0x440/0x440
[ 1777.627215]  ? kernel_text_address+0x125/0x140
[ 1777.632170]  ? mark_lock.part.31+0xf4/0xd40
[ 1777.636841]  ? is_dynamic_key+0x220/0x220
[ 1777.641326]  check_prevs_add+0x3fa/0x18b0
[ 1777.645798]  ? sched_clock+0x5/0x10
[ 1777.649695]  ? sched_clock_cpu+0x18/0x1e0
[ 1777.654175]  ? check_irq_usage+0xa20/0xa20
[ 1777.658750]  ? find_held_lock+0x1c0/0x1c0
[ 1777.663225]  ? sched_clock_cpu+0x18/0x1e0
[ 1777.667705]  __lock_acquire+0x21b3/0x2b80
[ 1777.672192]  lock_acquire+0x1db/0x8e0
[ 1777.676279]  ? cpu_partial_store+0x47/0xa0
[ 1777.680854]  ? rcu_read_unlock+0x50/0x50
[ 1777.685239]  ? sysfs_file_ops+0x160/0x160
[ 1777.689718]  cpus_read_lock+0x40/0xc0
[ 1777.693806]  ? cpu_partial_store+0x47/0xa0
[ 1777.698380]  cpu_partial_store+0x47/0xa0
[ 1777.702761]  slab_attr_store+0x20/0x30
[ 1777.706944]  kernfs_fop_write+0x255/0x410
[ 1777.711424]  vfs_write+0x157/0x460
[ 1777.715224]  ksys_write+0xb8/0x170
[ 1777.719023]  ? __ia32_sys_read+0xb0/0xb0
[ 1777.723406]  ? lockdep_hardirqs_on_prepare+0x294/0x3e0
[ 1777.729144]  ? do_syscall_64+0x22/0x450
[ 1777.733429]  do_syscall_64+0xa5/0x450
[ 1777.737518]  entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[ 1777.743159] RIP: 0033:0x7f4fe9b2ead8
[ 1777.747153] Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00
00 f3 0f 1e fa 48 8d 05 45 4c 2a 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f
05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55
[ 1777.768112] RSP: 002b:00007ffcc2173d48 EFLAGS: 00000246 ORIG_RAX:
0000000000000001
[ 1777.776565] RAX: ffffffffffffffda RBX: 0000000000000002 RCX:
00007f4fe9b2ead8
[ 1777.784530] RDX: 0000000000000002 RSI: 000055d41293eb70 RDI:
0000000000000001
[ 1777.792495] RBP: 000055d41293eb70 R08: 000000000000000a R09:
0000000000000003
[ 1777.800459] R10: 000000000000000a R11: 0000000000000246 R12:
00007f4fe9dcf6e0
[ 1777.808425] R13: 0000000000000002 R14: 00007f4fe9dca860 R15:
0000000000000002

The cpu_partial_store() method call flush_all() which takes the
cpu_hotplug_lock completing the cycle.

Cheers,
Longman

2022-08-22 12:40:43

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH] mm/slab_common: Deleting kobject in kmem_cache_destroy() without holding slab_mutex/cpu_hotplug_lock

On 8/10/22 16:08, Waiman Long wrote:
> On 8/10/22 05:34, Vlastimil Babka wrote:
>> On 8/9/22 22:59, Waiman Long wrote:
>>> A circular locking problem is reported by lockdep due to the following
>>> circular locking dependency.
>>>
>>>    +--> cpu_hotplug_lock --> slab_mutex --> kn->active#126 --+
>>>    |                                                         |
>>>    +---------------------------------------------------------+
>>
>> This sounded familiar and I've found a thread from January:
>>
>> https://lore.kernel.org/all/388098b2c03fbf0a732834fc01b2d875c335bc49.1642170196.git.lucien.xin@gmail.com/
>>
>> But that seemed to be specific to RHEL-8 RT kernel and not reproduced with
>> mainline. Is it different this time? Can you share the splats?
>
> I think this is easier to reproduce on a RT kernel, but it also happens in a
> non-RT kernel. One example splat that I got was
>
> [ 1777.114757] ======================================================
> [ 1777.121646] WARNING: possible circular locking dependency detected
> [ 1777.128544] 4.18.0-403.el8.x86_64+debug #1 Not tainted
> [ 1777.134280] ------------------------------------------------------

Yeah that's non-RT, but still 4.18 kernel, as in Xin Long's thread
referenced above. That wasn't reproducible in current mainline and I would
expect yours also isn't, because it would be reported by others too.

Also in both cases the lockdep (in 4.18) seems to have issue with
cpus_read_lock() which is a rwsem taken for read, so not really exclusive in
order to cause the reported deadlock.

So I suspected lockdep was improved since 4.18 to not report a false
positive, but we never confirmed.

So I'm still not convinced mainline needs any fix right now, but I'll check
the v3 anyway in case it simplifies things wrt the kernfs locking, which
does complicate stuff in any case.

> [ 1777.141176] slub_cpu_partia/4113 is trying to acquire lock:
> [ 1777.147395] ffffffffaef95b20 (cpu_hotplug_lock){++++}-{0:0}, at:
> cpu_partial_store+0x47/0xa0
> [ 1777.156837]
>                but task is already holding lock:
> [ 1777.163346] ffff88811bd17578 (kn->count#126){++++}-{0:0}, at:
> kernfs_fop_write+0x1d0/0x410
> [ 1777.172591]
>                which lock already depends on the new lock.
>
> [ 1777.181718]
>                the existing dependency chain (in reverse order) is:
> [ 1777.190070]
>                -> #2 (kn->count#126){++++}-{0:0}:
> [ 1777.196688]        lock_acquire+0x1db/0x8e0
> [ 1777.201360]        __kernfs_remove+0x617/0x800
> [ 1777.206320]        kernfs_remove+0x1d/0x30
> [ 1777.210891]        kobject_del+0x8e/0x150
> [ 1777.215360]        kmem_cache_destroy+0x10d/0x130
> [ 1777.220615]        bioset_exit+0x1e3/0x2c0
> [ 1777.225189]        blk_release_queue+0x20d/0x310
> [ 1777.230336]        kobject_release+0x105/0x390
> [ 1777.235296] scsi_device_dev_release_usercontext+0x5e3/0xd60
> [ 1777.242198]        execute_in_process_context+0x27/0x130
> [ 1777.248131]        device_release+0xa1/0x210
> [ 1777.252899]        kobject_release+0x105/0x390
> [ 1777.257857]        scsi_alloc_sdev+0xa1e/0xc80
> [ 1777.262820]        scsi_probe_and_add_lun+0x809/0x2da0
> [ 1777.268557]        __scsi_scan_target+0x1c1/0xb60
> [ 1777.273801]        scsi_scan_channel+0xe2/0x150
> [ 1777.278855]        scsi_scan_host_selected+0x1fb/0x2a0
> [ 1777.284590]        do_scan_async+0x3e/0x440
> [ 1777.289258]        async_run_entry_fn+0xd8/0x660
> [ 1777.294413]        process_one_work+0x919/0x17c0
> [ 1777.299568]        worker_thread+0x87/0xb40
> [ 1777.304227]        kthread+0x30d/0x3c0
> [ 1777.308411]        ret_from_fork+0x3a/0x50
> [ 1777.312984]
>                -> #1 (slab_mutex){+.+.}-{3:3}:
> [ 1777.319309]        lock_acquire+0x1db/0x8e0
> [ 1777.323978]        __mutex_lock+0x154/0x15d0
> [ 1777.328745]        kmem_cache_destroy+0x3f/0x130
> [ 1777.333898]        bioset_exit+0x1e3/0x2c0
> [ 1777.338469]        blk_release_queue+0x20d/0x310
> [ 1777.343623]        kobject_release+0x105/0x390
> [ 1777.348582] scsi_device_dev_release_usercontext+0x5e3/0xd60
> [ 1777.355481]        execute_in_process_context+0x27/0x130
> [ 1777.361409]        device_release+0xa1/0x210
> [ 1777.366175]        kobject_release+0x105/0x390
> [ 1777.371135]        scsi_alloc_sdev+0xa1e/0xc80
> [ 1777.376096]        scsi_probe_and_add_lun+0x809/0x2da0
> [ 1777.381824]        __scsi_scan_target+0x1c1/0xb60
> [ 1777.387073]        scsi_scan_channel+0xe2/0x150
> [ 1777.392128]        scsi_scan_host_selected+0x1fb/0x2a0
> [ 1777.397864]        do_scan_async+0x3e/0x440
> [ 1777.402533]        async_run_entry_fn+0xd8/0x660
> [ 1777.407688]        process_one_work+0x919/0x17c0
> [ 1777.412841]        worker_thread+0x87/0xb40
> [ 1777.417509]        kthread+0x30d/0x3c0
> [ 1777.421693]        ret_from_fork+0x3a/0x50
> [ 1777.426263]
>                -> #0 (cpu_hotplug_lock){++++}-{0:0}:
> [ 1777.433168]        check_prevs_add+0x3fa/0x18b0
> [ 1777.438226]        __lock_acquire+0x21b3/0x2b80
> [ 1777.443273]        lock_acquire+0x1db/0x8e0
> [ 1777.447940]        cpus_read_lock+0x40/0xc0
> [ 1777.452611]        cpu_partial_store+0x47/0xa0
> [ 1777.457571]        slab_attr_store+0x20/0x30
> [ 1777.462338]        kernfs_fop_write+0x255/0x410
> [ 1777.467394]        vfs_write+0x157/0x460
> [ 1777.471777]        ksys_write+0xb8/0x170
> [ 1777.476155]        do_syscall_64+0xa5/0x450
> [ 1777.480828]        entry_SYSCALL_64_after_hwframe+0x6a/0xdf
> [ 1777.487041]
>                other info that might help us debug this:
>
> [ 1777.495975] Chain exists of:
>                  cpu_hotplug_lock --> slab_mutex --> kn->count#126
>
> [ 1777.507443]  Possible unsafe locking scenario:
>
> [ 1777.514049]        CPU0                    CPU1
> [ 1777.519102]        ----                    ----
> [ 1777.524155]   lock(kn->count#126);
> [ 1777.527956]                                lock(slab_mutex);
> [ 1777.534274] lock(kn->count#126);
> [ 1777.540885]   lock(cpu_hotplug_lock);
> [ 1777.544964]
>                 *** DEADLOCK ***
>
> [ 1777.551570] 3 locks held by slub_cpu_partia/4113:
> [ 1777.556820]  #0: ffff8881319884d0 (sb_writers#4){.+.+}-{0:0}, at:
> vfs_write+0x31c/0x460
> [ 1777.565771]  #1: ffff888161d39490 (&of->mutex){+.+.}-{3:3}, at:
> kernfs_fop_write+0x1ad/0x410
> [ 1777.575203]  #2: ffff88811bd17578 (kn->count#126){++++}-{0:0}, at:
> kernfs_fop_write+0x1d0/0x410
> [ 1777.584928]
>                stack backtrace:
> [ 1777.589792] CPU: 65 PID: 4113 Comm: slub_cpu_partia Kdump: loaded Not
> tainted 4.18.0-403.el8.x86_64+debug #1
> [ 1777.600767] Hardware name: Lenovo ThinkSystem SR635 -[7Y98XXXXXX]-/System
> Board, BIOS CFE103A 07/04/2019
> [ 1777.611342] Call Trace:
> [ 1777.614072]  dump_stack+0x5c/0x80
> [ 1777.617777]  check_noncircular+0x27f/0x320
> [ 1777.622352]  ? print_circular_bug+0x440/0x440
> [ 1777.627215]  ? kernel_text_address+0x125/0x140
> [ 1777.632170]  ? mark_lock.part.31+0xf4/0xd40
> [ 1777.636841]  ? is_dynamic_key+0x220/0x220
> [ 1777.641326]  check_prevs_add+0x3fa/0x18b0
> [ 1777.645798]  ? sched_clock+0x5/0x10
> [ 1777.649695]  ? sched_clock_cpu+0x18/0x1e0
> [ 1777.654175]  ? check_irq_usage+0xa20/0xa20
> [ 1777.658750]  ? find_held_lock+0x1c0/0x1c0
> [ 1777.663225]  ? sched_clock_cpu+0x18/0x1e0
> [ 1777.667705]  __lock_acquire+0x21b3/0x2b80
> [ 1777.672192]  lock_acquire+0x1db/0x8e0
> [ 1777.676279]  ? cpu_partial_store+0x47/0xa0
> [ 1777.680854]  ? rcu_read_unlock+0x50/0x50
> [ 1777.685239]  ? sysfs_file_ops+0x160/0x160
> [ 1777.689718]  cpus_read_lock+0x40/0xc0
> [ 1777.693806]  ? cpu_partial_store+0x47/0xa0
> [ 1777.698380]  cpu_partial_store+0x47/0xa0
> [ 1777.702761]  slab_attr_store+0x20/0x30
> [ 1777.706944]  kernfs_fop_write+0x255/0x410
> [ 1777.711424]  vfs_write+0x157/0x460
> [ 1777.715224]  ksys_write+0xb8/0x170
> [ 1777.719023]  ? __ia32_sys_read+0xb0/0xb0
> [ 1777.723406]  ? lockdep_hardirqs_on_prepare+0x294/0x3e0
> [ 1777.729144]  ? do_syscall_64+0x22/0x450
> [ 1777.733429]  do_syscall_64+0xa5/0x450
> [ 1777.737518]  entry_SYSCALL_64_after_hwframe+0x6a/0xdf
> [ 1777.743159] RIP: 0033:0x7f4fe9b2ead8
> [ 1777.747153] Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00
> f3 0f 1e fa 48 8d 05 45 4c 2a 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48>
> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55
> [ 1777.768112] RSP: 002b:00007ffcc2173d48 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000001
> [ 1777.776565] RAX: ffffffffffffffda RBX: 0000000000000002 RCX:
> 00007f4fe9b2ead8
> [ 1777.784530] RDX: 0000000000000002 RSI: 000055d41293eb70 RDI:
> 0000000000000001
> [ 1777.792495] RBP: 000055d41293eb70 R08: 000000000000000a R09:
> 0000000000000003
> [ 1777.800459] R10: 000000000000000a R11: 0000000000000246 R12:
> 00007f4fe9dcf6e0
> [ 1777.808425] R13: 0000000000000002 R14: 00007f4fe9dca860 R15:
> 0000000000000002
>
> The cpu_partial_store() method call flush_all() which takes the
> cpu_hotplug_lock completing the cycle.
>
> Cheers,
> Longman
>

2022-08-22 13:50:36

by Hyeonggon Yoo

[permalink] [raw]
Subject: Re: [PATCH] mm/slab_common: Deleting kobject in kmem_cache_destroy() without holding slab_mutex/cpu_hotplug_lock

On Mon, Aug 22, 2022 at 02:03:33PM +0200, Vlastimil Babka wrote:
> On 8/10/22 16:08, Waiman Long wrote:
> > On 8/10/22 05:34, Vlastimil Babka wrote:
> >> On 8/9/22 22:59, Waiman Long wrote:
> >>> A circular locking problem is reported by lockdep due to the following
> >>> circular locking dependency.
> >>>
> >>> ?? +--> cpu_hotplug_lock --> slab_mutex --> kn->active#126 --+
> >>> ?? |???????????????????????????????????????????????????????? |
> >>> ?? +---------------------------------------------------------+
> >>
> >> This sounded familiar and I've found a thread from January:
> >>
> >> https://lore.kernel.org/all/388098b2c03fbf0a732834fc01b2d875c335bc49.1642170196.git.lucien.xin@gmail.com/
> >>
> >> But that seemed to be specific to RHEL-8 RT kernel and not reproduced with
> >> mainline. Is it different this time? Can you share the splats?
> >
> > I think this is easier to reproduce on a RT kernel, but it also happens in a
> > non-RT kernel. One example splat that I got was
> >
> > [ 1777.114757] ======================================================
> > [ 1777.121646] WARNING: possible circular locking dependency detected
> > [ 1777.128544] 4.18.0-403.el8.x86_64+debug #1 Not tainted
> > [ 1777.134280] ------------------------------------------------------
>
> Yeah that's non-RT, but still 4.18 kernel, as in Xin Long's thread
> referenced above. That wasn't reproducible in current mainline and I would
> expect yours also isn't, because it would be reported by others too.

I can confirm this splat is reproducible on 6.0-rc1 when conditions below are met:
1) Lockdep is enabled
2) kmem_cache_destroy() is executed at least once (e.g. loading slub_kunit module)
3) flush_all() is executed at least once (e.g. writing to /sys/kernel/<slab>/cpu_partial)

[ 51.408901] ======================================================
[ 51.409925] WARNING: possible circular locking dependency detected
[ 51.410940] 6.0.0-rc1+ #1554 Tainted: G B N
[ 51.411845] ------------------------------------------------------
[ 51.412855] sh/578 is trying to acquire lock:
[ 51.413578] ffffffff8307d890 (cpu_hotplug_lock){++++}-{0:0}, at: cpu_partial_store+0x56/0xa0
[ 51.414948]
[ 51.414948] but task is already holding lock:
[ 51.415908] ffff888103166740 (kn->active#43){++++}-{0:0}, at: kernfs_fop_write_iter+0x118/0x210
[ 51.417270]
[ 51.417270] which lock already depends on the new lock.
[ 51.417270]
[ 51.417825]
[ 51.417825] the existing dependency chain (in reverse order) is:
[ 51.418336]
[ 51.418336] -> #2 (kn->active#43){++++}-{0:0}:
[ 51.418744] __kernfs_remove+0x2fe/0x3b0
[ 51.419054] kernfs_remove+0x36/0x50
[ 51.419337] __kobject_del+0x2e/0xd0
[ 51.419622] kobject_del+0xf/0x20
[ 51.419912] kmem_cache_destroy+0xd8/0x110
[ 51.420273] test_exit+0x1a/0x30
[ 51.420570] kunit_try_run_case+0x65/0x80
[ 51.420926] kunit_generic_run_threadfn_adapter+0x16/0x30
[ 51.421391] kthread+0xf2/0x120
[ 51.421699] ret_from_fork+0x22/0x30
[ 51.422042]
[ 51.422042] -> #1 (slab_mutex){+.+.}-{3:3}:
[ 51.422513] __mutex_lock+0xa5/0x9a0
[ 51.422858] kmem_cache_destroy+0x26/0x110
[ 51.423241] test_exit+0x1a/0x30
[ 51.423557] kunit_try_run_case+0x65/0x80
[ 51.423938] kunit_generic_run_threadfn_adapter+0x16/0x30
[ 51.424432] kthread+0xf2/0x120
[ 51.424739] ret_from_fork+0x22/0x30
[ 51.425080]
[ 51.425080] -> #0 (cpu_hotplug_lock){++++}-{0:0}:
[ 51.425593] __lock_acquire+0x1122/0x1dd0
[ 51.425971] lock_acquire+0xd5/0x300
[ 51.426312] cpus_read_lock+0x3c/0xd0
[ 51.426660] cpu_partial_store+0x56/0xa0
[ 51.427030] slab_attr_store+0x20/0x40
[ 51.427388] kernfs_fop_write_iter+0x15b/0x210
[ 51.427798] vfs_write+0x346/0x4c0
[ 51.428128] ksys_write+0x65/0xf0
[ 51.428457] do_syscall_64+0x5c/0x90
[ 51.428801] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 51.429262]
[ 51.429262] other info that might help us debug this:
[ 51.429262]
[ 51.429918] Chain exists of:
[ 51.429918] cpu_hotplug_lock --> slab_mutex --> kn->active#43
[ 51.429918]
[ 51.430763] Possible unsafe locking scenario:
[ 51.430763]
[ 51.431251] CPU0 CPU1
[ 51.431629] ---- ----
[ 51.432008] lock(kn->active#43);
[ 51.432293] lock(slab_mutex);
[ 51.432754] lock(kn->active#43);
[ 51.433219] lock(cpu_hotplug_lock);
[ 51.433508]
[ 51.433508] *** DEADLOCK ***
[ 51.433508]
[ 51.433963] 3 locks held by sh/578:
[ 51.434224] #0: ffff88811a982460 (sb_writers#7){.+.+}-{0:0}, at: ksys_write+0x65/0xf0
[ 51.434800] #1: ffff88811a8ea888 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0x110/0x210
[ 51.435478] #2: ffff888103166740 (kn->active#43){++++}-{0:0}, at: kernfs_fop_write_iter+0x118/0x0
[ 51.436174]
[ 51.436174] stack backtrace:
[ 51.436539] CPU: 4 PID: 578 Comm: sh Tainted: G B N 6.0.0-rc1+ #1554
[ 51.437130] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[ 51.437803] Call Trace:
[ 51.438008] <TASK>
[ 51.438187] dump_stack_lvl+0x56/0x73
[ 51.438493] check_noncircular+0x102/0x120
[ 51.438831] __lock_acquire+0x1122/0x1dd0
[ 51.439162] lock_acquire+0xd5/0x300
[ 51.439444] ? cpu_partial_store+0x56/0xa0
[ 51.439773] cpus_read_lock+0x3c/0xd0
[ 51.440063] ? cpu_partial_store+0x56/0xa0
[ 51.440402] cpu_partial_store+0x56/0xa0
[ 51.440730] slab_attr_store+0x20/0x40
[ 51.441039] kernfs_fop_write_iter+0x15b/0x210
[ 51.441387] vfs_write+0x346/0x4c0
[ 51.441666] ksys_write+0x65/0xf0
[ 51.441942] do_syscall_64+0x5c/0x90
[ 51.442237] ? asm_exc_page_fault+0x22/0x30
[ 51.442578] ? lockdep_hardirqs_on+0x7d/0x100
[ 51.442931] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 51.443359] RIP: 0033:0x7f6428c9415c
[ 51.443645] Code: 8b 7c 24 08 89 c5 e8 c5 ff ff ff 89 ef 89 44 24 08 e8 e8 13 03 00 8b 44 24 08 43
[ 51.445133] RSP: 002b:00007fff556ee9d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 51.445742] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f6428c9415c
[ 51.446316] RDX: 0000000000000002 RSI: 000055e65f1c17f0 RDI: 0000000000000001
[ 51.446862] RBP: 000055e65f1c17f0 R08: fefefefefefefeff R09: fffffeffffff0000
[ 51.447415] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000002
[ 51.447994] R13: 00007f6428c79690 R14: 0000000000000001 R15: 00007fff556eea48
[ 51.448552] </TASK>

> Also in both cases the lockdep (in 4.18) seems to have issue with
> cpus_read_lock() which is a rwsem taken for read, so not really exclusive in
> order to cause the reported deadlock.

Agreed.

> So I suspected lockdep was improved since 4.18 to not report a false
> positive, but we never confirmed.

Seems not improved as it reports on 6.0-rc1.
May fix lockdep instead of fixing SLUB?

> So I'm still not convinced mainline needs any fix right now, but I'll check
> the v3 anyway in case it simplifies things wrt the kernfs locking, which
> does complicate stuff in any case.
>
> > [ 1777.141176] slub_cpu_partia/4113 is trying to acquire lock:
> > [ 1777.147395] ffffffffaef95b20 (cpu_hotplug_lock){++++}-{0:0}, at:
> > cpu_partial_store+0x47/0xa0
> > [ 1777.156837]
> > ?????????????? but task is already holding lock:
> > [ 1777.163346] ffff88811bd17578 (kn->count#126){++++}-{0:0}, at:
> > kernfs_fop_write+0x1d0/0x410
> > [ 1777.172591]
> > ?????????????? which lock already depends on the new lock.
> >
> > [ 1777.181718]
> > ?????????????? the existing dependency chain (in reverse order) is:
> > [ 1777.190070]
> > ?????????????? -> #2 (kn->count#126){++++}-{0:0}:
> > [ 1777.196688]??????? lock_acquire+0x1db/0x8e0
> > [ 1777.201360]??????? __kernfs_remove+0x617/0x800
> > [ 1777.206320]??????? kernfs_remove+0x1d/0x30
> > [ 1777.210891]??????? kobject_del+0x8e/0x150
> > [ 1777.215360]??????? kmem_cache_destroy+0x10d/0x130
> > [ 1777.220615]??????? bioset_exit+0x1e3/0x2c0
> > [ 1777.225189]??????? blk_release_queue+0x20d/0x310
> > [ 1777.230336]??????? kobject_release+0x105/0x390
> > [ 1777.235296] scsi_device_dev_release_usercontext+0x5e3/0xd60
> > [ 1777.242198]??????? execute_in_process_context+0x27/0x130
> > [ 1777.248131]??????? device_release+0xa1/0x210
> > [ 1777.252899]??????? kobject_release+0x105/0x390
> > [ 1777.257857]??????? scsi_alloc_sdev+0xa1e/0xc80
> > [ 1777.262820]??????? scsi_probe_and_add_lun+0x809/0x2da0
> > [ 1777.268557]??????? __scsi_scan_target+0x1c1/0xb60
> > [ 1777.273801]??????? scsi_scan_channel+0xe2/0x150
> > [ 1777.278855]??????? scsi_scan_host_selected+0x1fb/0x2a0
> > [ 1777.284590]??????? do_scan_async+0x3e/0x440
> > [ 1777.289258]??????? async_run_entry_fn+0xd8/0x660
> > [ 1777.294413]??????? process_one_work+0x919/0x17c0
> > [ 1777.299568]??????? worker_thread+0x87/0xb40
> > [ 1777.304227]??????? kthread+0x30d/0x3c0
> > [ 1777.308411]??????? ret_from_fork+0x3a/0x50
> > [ 1777.312984]
> > ?????????????? -> #1 (slab_mutex){+.+.}-{3:3}:
> > [ 1777.319309]??????? lock_acquire+0x1db/0x8e0
> > [ 1777.323978]??????? __mutex_lock+0x154/0x15d0
> > [ 1777.328745]??????? kmem_cache_destroy+0x3f/0x130
> > [ 1777.333898]??????? bioset_exit+0x1e3/0x2c0
> > [ 1777.338469]??????? blk_release_queue+0x20d/0x310
> > [ 1777.343623]??????? kobject_release+0x105/0x390
> > [ 1777.348582] scsi_device_dev_release_usercontext+0x5e3/0xd60
> > [ 1777.355481]??????? execute_in_process_context+0x27/0x130
> > [ 1777.361409]??????? device_release+0xa1/0x210
> > [ 1777.366175]??????? kobject_release+0x105/0x390
> > [ 1777.371135]??????? scsi_alloc_sdev+0xa1e/0xc80
> > [ 1777.376096]??????? scsi_probe_and_add_lun+0x809/0x2da0
> > [ 1777.381824]??????? __scsi_scan_target+0x1c1/0xb60
> > [ 1777.387073]??????? scsi_scan_channel+0xe2/0x150
> > [ 1777.392128]??????? scsi_scan_host_selected+0x1fb/0x2a0
> > [ 1777.397864]??????? do_scan_async+0x3e/0x440
> > [ 1777.402533]??????? async_run_entry_fn+0xd8/0x660
> > [ 1777.407688]??????? process_one_work+0x919/0x17c0
> > [ 1777.412841]??????? worker_thread+0x87/0xb40
> > [ 1777.417509]??????? kthread+0x30d/0x3c0
> > [ 1777.421693]??????? ret_from_fork+0x3a/0x50
> > [ 1777.426263]
> > ?????????????? -> #0 (cpu_hotplug_lock){++++}-{0:0}:
> > [ 1777.433168]??????? check_prevs_add+0x3fa/0x18b0
> > [ 1777.438226]??????? __lock_acquire+0x21b3/0x2b80
> > [ 1777.443273]??????? lock_acquire+0x1db/0x8e0
> > [ 1777.447940]??????? cpus_read_lock+0x40/0xc0
> > [ 1777.452611]??????? cpu_partial_store+0x47/0xa0
> > [ 1777.457571]??????? slab_attr_store+0x20/0x30
> > [ 1777.462338]??????? kernfs_fop_write+0x255/0x410
> > [ 1777.467394]??????? vfs_write+0x157/0x460
> > [ 1777.471777]??????? ksys_write+0xb8/0x170
> > [ 1777.476155]??????? do_syscall_64+0xa5/0x450
> > [ 1777.480828]??????? entry_SYSCALL_64_after_hwframe+0x6a/0xdf
> > [ 1777.487041]
> > ?????????????? other info that might help us debug this:
> >
> > [ 1777.495975] Chain exists of:
> > ???????????????? cpu_hotplug_lock --> slab_mutex --> kn->count#126
> >
> > [ 1777.507443]? Possible unsafe locking scenario:
> >
> > [ 1777.514049]??????? CPU0??????????????????? CPU1
> > [ 1777.519102]??????? ----??????????????????? ----
> > [ 1777.524155]?? lock(kn->count#126);
> > [ 1777.527956]??????????????????????????????? lock(slab_mutex);
> > [ 1777.534274] lock(kn->count#126);
> > [ 1777.540885]?? lock(cpu_hotplug_lock);
> > [ 1777.544964]
> > ??????????????? *** DEADLOCK ***
> >
> > [ 1777.551570] 3 locks held by slub_cpu_partia/4113:
> > [ 1777.556820]? #0: ffff8881319884d0 (sb_writers#4){.+.+}-{0:0}, at:
> > vfs_write+0x31c/0x460
> > [ 1777.565771]? #1: ffff888161d39490 (&of->mutex){+.+.}-{3:3}, at:
> > kernfs_fop_write+0x1ad/0x410
> > [ 1777.575203]? #2: ffff88811bd17578 (kn->count#126){++++}-{0:0}, at:
> > kernfs_fop_write+0x1d0/0x410
> > [ 1777.584928]
> > ?????????????? stack backtrace:
> > [ 1777.589792] CPU: 65 PID: 4113 Comm: slub_cpu_partia Kdump: loaded Not
> > tainted 4.18.0-403.el8.x86_64+debug #1
> > [ 1777.600767] Hardware name: Lenovo ThinkSystem SR635 -[7Y98XXXXXX]-/System
> > Board, BIOS CFE103A 07/04/2019
> > [ 1777.611342] Call Trace:
> > [ 1777.614072]? dump_stack+0x5c/0x80
> > [ 1777.617777]? check_noncircular+0x27f/0x320
> > [ 1777.622352]? ? print_circular_bug+0x440/0x440
> > [ 1777.627215]? ? kernel_text_address+0x125/0x140
> > [ 1777.632170]? ? mark_lock.part.31+0xf4/0xd40
> > [ 1777.636841]? ? is_dynamic_key+0x220/0x220
> > [ 1777.641326]? check_prevs_add+0x3fa/0x18b0
> > [ 1777.645798]? ? sched_clock+0x5/0x10
> > [ 1777.649695]? ? sched_clock_cpu+0x18/0x1e0
> > [ 1777.654175]? ? check_irq_usage+0xa20/0xa20
> > [ 1777.658750]? ? find_held_lock+0x1c0/0x1c0
> > [ 1777.663225]? ? sched_clock_cpu+0x18/0x1e0
> > [ 1777.667705]? __lock_acquire+0x21b3/0x2b80
> > [ 1777.672192]? lock_acquire+0x1db/0x8e0
> > [ 1777.676279]? ? cpu_partial_store+0x47/0xa0
> > [ 1777.680854]? ? rcu_read_unlock+0x50/0x50
> > [ 1777.685239]? ? sysfs_file_ops+0x160/0x160
> > [ 1777.689718]? cpus_read_lock+0x40/0xc0
> > [ 1777.693806]? ? cpu_partial_store+0x47/0xa0
> > [ 1777.698380]? cpu_partial_store+0x47/0xa0
> > [ 1777.702761]? slab_attr_store+0x20/0x30
> > [ 1777.706944]? kernfs_fop_write+0x255/0x410
> > [ 1777.711424]? vfs_write+0x157/0x460
> > [ 1777.715224]? ksys_write+0xb8/0x170
> > [ 1777.719023]? ? __ia32_sys_read+0xb0/0xb0
> > [ 1777.723406]? ? lockdep_hardirqs_on_prepare+0x294/0x3e0
> > [ 1777.729144]? ? do_syscall_64+0x22/0x450
> > [ 1777.733429]? do_syscall_64+0xa5/0x450
> > [ 1777.737518]? entry_SYSCALL_64_after_hwframe+0x6a/0xdf
> > [ 1777.743159] RIP: 0033:0x7f4fe9b2ead8
> > [ 1777.747153] Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00
> > f3 0f 1e fa 48 8d 05 45 4c 2a 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48>
> > 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55
> > [ 1777.768112] RSP: 002b:00007ffcc2173d48 EFLAGS: 00000246 ORIG_RAX:
> > 0000000000000001
> > [ 1777.776565] RAX: ffffffffffffffda RBX: 0000000000000002 RCX:
> > 00007f4fe9b2ead8
> > [ 1777.784530] RDX: 0000000000000002 RSI: 000055d41293eb70 RDI:
> > 0000000000000001
> > [ 1777.792495] RBP: 000055d41293eb70 R08: 000000000000000a R09:
> > 0000000000000003
> > [ 1777.800459] R10: 000000000000000a R11: 0000000000000246 R12:
> > 00007f4fe9dcf6e0
> > [ 1777.808425] R13: 0000000000000002 R14: 00007f4fe9dca860 R15:
> > 0000000000000002
> >
> > The cpu_partial_store() method call flush_all() which takes the
> > cpu_hotplug_lock completing the cycle.
> >
> > Cheers,
> > Longman
> >
>

--
Thanks,
Hyeonggon

2022-08-22 18:06:24

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [PATCH] mm/slab_common: Deleting kobject in kmem_cache_destroy() without holding slab_mutex/cpu_hotplug_lock

On 8/22/22 15:46, Hyeonggon Yoo wrote:
> On Mon, Aug 22, 2022 at 02:03:33PM +0200, Vlastimil Babka wrote:
>> On 8/10/22 16:08, Waiman Long wrote:
>>> On 8/10/22 05:34, Vlastimil Babka wrote:
>>>> On 8/9/22 22:59, Waiman Long wrote:
>>>>> A circular locking problem is reported by lockdep due to the following
>>>>> circular locking dependency.
>>>>>
>>>>>    +--> cpu_hotplug_lock --> slab_mutex --> kn->active#126 --+
>>>>>    |                                                         |
>>>>>    +---------------------------------------------------------+
>>>>
>>>> This sounded familiar and I've found a thread from January:
>>>>
>>>> https://lore.kernel.org/all/388098b2c03fbf0a732834fc01b2d875c335bc49.1642170196.git.lucien.xin@gmail.com/
>>>>
>>>> But that seemed to be specific to RHEL-8 RT kernel and not reproduced with
>>>> mainline. Is it different this time? Can you share the splats?
>>>
>>> I think this is easier to reproduce on a RT kernel, but it also happens in a
>>> non-RT kernel. One example splat that I got was
>>>
>>> [ 1777.114757] ======================================================
>>> [ 1777.121646] WARNING: possible circular locking dependency detected
>>> [ 1777.128544] 4.18.0-403.el8.x86_64+debug #1 Not tainted
>>> [ 1777.134280] ------------------------------------------------------
>>
>> Yeah that's non-RT, but still 4.18 kernel, as in Xin Long's thread
>> referenced above. That wasn't reproducible in current mainline and I would
>> expect yours also isn't, because it would be reported by others too.
>
> I can confirm this splat is reproducible on 6.0-rc1 when conditions below are met:
> 1) Lockdep is enabled
> 2) kmem_cache_destroy() is executed at least once (e.g. loading slub_kunit module)
> 3) flush_all() is executed at least once (e.g. writing to /sys/kernel/<slab>/cpu_partial)

Oh, great, that's useful, thanks!

...

>
>> Also in both cases the lockdep (in 4.18) seems to have issue with
>> cpus_read_lock() which is a rwsem taken for read, so not really exclusive in
>> order to cause the reported deadlock.
>
> Agreed.
>
>> So I suspected lockdep was improved since 4.18 to not report a false
>> positive, but we never confirmed.
>
> Seems not improved as it reports on 6.0-rc1.
> May fix lockdep instead of fixing SLUB?

So after discussing with PeterZ, the lockdep splat is legitimate,
because there could be a writer waiting on the first reader to finish,
and in that case rwsems block further readers so they don't starve the
writer, and thus the deadlock could happen.