When doing slub_debug test, kfence's 'test_memcache_typesafe_by_rcu'
kunit test case cause a use-after-free error:
BUG: KASAN: use-after-free in kobject_del+0x14/0x30
Read of size 8 at addr ffff888007679090 by task kunit_try_catch/261
CPU: 1 PID: 261 Comm: kunit_try_catch Tainted: G B N 6.0.0-rc5-next-20220916 #17
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x34/0x48
print_address_description.constprop.0+0x87/0x2a5
print_report+0x103/0x1ed
kasan_report+0xb7/0x140
kobject_del+0x14/0x30
kmem_cache_destroy+0x130/0x170
test_exit+0x1a/0x30
kunit_try_run_case+0xad/0xc0
kunit_generic_run_threadfn_adapter+0x26/0x50
kthread+0x17b/0x1b0
</TASK>
The cause is inside kmem_cache_destroy():
kmem_cache_destroy
acquire lock/mutex
shutdown_cache
schedule_work(kmem_cache_release) (if RCU flag set)
release lock/mutex
kmem_cache_release (if RCU flag set)
in some certain timing, the scheduled work could be run before
the next RCU flag checking which will get a wrong state.
Fix it by caching the RCU flag inside protected area, just like 'refcnt'
Signed-off-by: Feng Tang <[email protected]>
---
note:
The error only happens on linux-next tree, and not in Linus' tree,
which already has Waiman's commit:
0495e337b703 ("mm/slab_common: Deleting kobject in kmem_cache_destroy()
without holding slab_mutex/cpu_hotplug_lock")
mm/slab_common.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 07b948288f84..ccc02573588f 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -475,6 +475,7 @@ void slab_kmem_cache_release(struct kmem_cache *s)
void kmem_cache_destroy(struct kmem_cache *s)
{
int refcnt;
+ bool rcu_set;
if (unlikely(!s) || !kasan_check_byte(s))
return;
@@ -482,6 +483,8 @@ void kmem_cache_destroy(struct kmem_cache *s)
cpus_read_lock();
mutex_lock(&slab_mutex);
+ rcu_set = s->flags & SLAB_TYPESAFE_BY_RCU;
+
refcnt = --s->refcount;
if (refcnt)
goto out_unlock;
@@ -492,7 +495,7 @@ void kmem_cache_destroy(struct kmem_cache *s)
out_unlock:
mutex_unlock(&slab_mutex);
cpus_read_unlock();
- if (!refcnt && !(s->flags & SLAB_TYPESAFE_BY_RCU))
+ if (!refcnt && !rcu_set)
kmem_cache_release(s);
}
EXPORT_SYMBOL(kmem_cache_destroy);
--
2.34.1
On 9/19/22 05:12, Feng Tang wrote:
> When doing slub_debug test, kfence's 'test_memcache_typesafe_by_rcu'
> kunit test case cause a use-after-free error:
>
> BUG: KASAN: use-after-free in kobject_del+0x14/0x30
> Read of size 8 at addr ffff888007679090 by task kunit_try_catch/261
>
> CPU: 1 PID: 261 Comm: kunit_try_catch Tainted: G B N 6.0.0-rc5-next-20220916 #17
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
> Call Trace:
> <TASK>
> dump_stack_lvl+0x34/0x48
> print_address_description.constprop.0+0x87/0x2a5
> print_report+0x103/0x1ed
> kasan_report+0xb7/0x140
> kobject_del+0x14/0x30
> kmem_cache_destroy+0x130/0x170
> test_exit+0x1a/0x30
> kunit_try_run_case+0xad/0xc0
> kunit_generic_run_threadfn_adapter+0x26/0x50
> kthread+0x17b/0x1b0
> </TASK>
>
> The cause is inside kmem_cache_destroy():
>
> kmem_cache_destroy
> acquire lock/mutex
> shutdown_cache
> schedule_work(kmem_cache_release) (if RCU flag set)
> release lock/mutex
> kmem_cache_release (if RCU flag set)
^ not set
I've fixed that up.
>
> in some certain timing, the scheduled work could be run before
> the next RCU flag checking which will get a wrong state.
>
> Fix it by caching the RCU flag inside protected area, just like 'refcnt'
>
> Signed-off-by: Feng Tang <[email protected]>
Thanks!
> ---
>
> note:
>
> The error only happens on linux-next tree, and not in Linus' tree,
> which already has Waiman's commit:
> 0495e337b703 ("mm/slab_common: Deleting kobject in kmem_cache_destroy()
> without holding slab_mutex/cpu_hotplug_lock")
Actually that commit is already in Linus' rc5 too, so I will send your fix
this week too. Added a Fixes: 0495e337b703 (...) too.
> mm/slab_common.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 07b948288f84..ccc02573588f 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -475,6 +475,7 @@ void slab_kmem_cache_release(struct kmem_cache *s)
> void kmem_cache_destroy(struct kmem_cache *s)
> {
> int refcnt;
> + bool rcu_set;
>
> if (unlikely(!s) || !kasan_check_byte(s))
> return;
> @@ -482,6 +483,8 @@ void kmem_cache_destroy(struct kmem_cache *s)
> cpus_read_lock();
> mutex_lock(&slab_mutex);
>
> + rcu_set = s->flags & SLAB_TYPESAFE_BY_RCU;
> +
> refcnt = --s->refcount;
> if (refcnt)
> goto out_unlock;
> @@ -492,7 +495,7 @@ void kmem_cache_destroy(struct kmem_cache *s)
> out_unlock:
> mutex_unlock(&slab_mutex);
> cpus_read_unlock();
> - if (!refcnt && !(s->flags & SLAB_TYPESAFE_BY_RCU))
> + if (!refcnt && !rcu_set)
> kmem_cache_release(s);
> }
> EXPORT_SYMBOL(kmem_cache_destroy);
On Mon, Sep 19, 2022 at 11:12:38AM +0200, Vlastimil Babka wrote:
> On 9/19/22 05:12, Feng Tang wrote:
> > When doing slub_debug test, kfence's 'test_memcache_typesafe_by_rcu'
> > kunit test case cause a use-after-free error:
> >
If I'm not mistaken, I think the subject should be:
s/double free/use after free/g
> > BUG: KASAN: use-after-free in kobject_del+0x14/0x30
> > Read of size 8 at addr ffff888007679090 by task kunit_try_catch/261
> >
> > CPU: 1 PID: 261 Comm: kunit_try_catch Tainted: G B N 6.0.0-rc5-next-20220916 #17
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
> > Call Trace:
> > <TASK>
> > dump_stack_lvl+0x34/0x48
> > print_address_description.constprop.0+0x87/0x2a5
> > print_report+0x103/0x1ed
> > kasan_report+0xb7/0x140
> > kobject_del+0x14/0x30
> > kmem_cache_destroy+0x130/0x170
> > test_exit+0x1a/0x30
> > kunit_try_run_case+0xad/0xc0
> > kunit_generic_run_threadfn_adapter+0x26/0x50
> > kthread+0x17b/0x1b0
> > </TASK>
> >
> > The cause is inside kmem_cache_destroy():
> >
> > kmem_cache_destroy
> > acquire lock/mutex
> > shutdown_cache
> > schedule_work(kmem_cache_release) (if RCU flag set)
> > release lock/mutex
> > kmem_cache_release (if RCU flag set)
>
> ^ not set
>
> I've fixed that up.
>
> >
> > in some certain timing, the scheduled work could be run before
> > the next RCU flag checking which will get a wrong state.
> >
> > Fix it by caching the RCU flag inside protected area, just like 'refcnt'
Very nice catch, thanks!
Otherwise (and with Vlastimil's fix):
Looks good to me.
Reviewed-by: Hyeonggon Yoo <[email protected]>
> >
> > Signed-off-by: Feng Tang <[email protected]>
>
> Thanks!
>
> > ---
> >
> > note:
> >
> > The error only happens on linux-next tree, and not in Linus' tree,
> > which already has Waiman's commit:
> > 0495e337b703 ("mm/slab_common: Deleting kobject in kmem_cache_destroy()
> > without holding slab_mutex/cpu_hotplug_lock")
>
> Actually that commit is already in Linus' rc5 too, so I will send your fix
> this week too. Added a Fixes: 0495e337b703 (...) too.
>
> > mm/slab_common.c | 5 ++++-
> > 1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/slab_common.c b/mm/slab_common.c
> > index 07b948288f84..ccc02573588f 100644
> > --- a/mm/slab_common.c
> > +++ b/mm/slab_common.c
> > @@ -475,6 +475,7 @@ void slab_kmem_cache_release(struct kmem_cache *s)
> > void kmem_cache_destroy(struct kmem_cache *s)
> > {
> > int refcnt;
> > + bool rcu_set;
> >
> > if (unlikely(!s) || !kasan_check_byte(s))
> > return;
> > @@ -482,6 +483,8 @@ void kmem_cache_destroy(struct kmem_cache *s)
> > cpus_read_lock();
> > mutex_lock(&slab_mutex);
> >
> > + rcu_set = s->flags & SLAB_TYPESAFE_BY_RCU;
> > +
> > refcnt = --s->refcount;
> > if (refcnt)
> > goto out_unlock;
> > @@ -492,7 +495,7 @@ void kmem_cache_destroy(struct kmem_cache *s)
> > out_unlock:
> > mutex_unlock(&slab_mutex);
> > cpus_read_unlock();
> > - if (!refcnt && !(s->flags & SLAB_TYPESAFE_BY_RCU))
> > + if (!refcnt && !rcu_set)
> > kmem_cache_release(s);
> > }
> > EXPORT_SYMBOL(kmem_cache_destroy);
>
--
Thanks,
Hyeonggon
On Mon, Sep 19, 2022 at 02:03:15PM +0200, Vlastimil Babka wrote:
> On 9/19/22 13:56, Hyeonggon Yoo wrote:
> > On Mon, Sep 19, 2022 at 11:12:38AM +0200, Vlastimil Babka wrote:
> >> On 9/19/22 05:12, Feng Tang wrote:
> >> > When doing slub_debug test, kfence's 'test_memcache_typesafe_by_rcu'
> >> > kunit test case cause a use-after-free error:
> >> >
> >
> > If I'm not mistaken, I think the subject should be:
> > s/double free/use after free/g
>
> Well, it's both AFAICS. By the initial use-after-free we can read a wrong
> s->flags that was modified since we freed for the first time, and it can
> lead to another kmem_cache_release() which is basically a double free.
>
Yeah, I realized that just after sending the mail ;)
it is use-after-free bug that can potentially lead to double free.
Thank you for correction!
> >> > BUG: KASAN: use-after-free in kobject_del+0x14/0x30
> >> > Read of size 8 at addr ffff888007679090 by task kunit_try_catch/261
> >> >
> >> > CPU: 1 PID: 261 Comm: kunit_try_catch Tainted: G B N 6.0.0-rc5-next-20220916 #17
> >> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
> >> > Call Trace:
> >> > <TASK>
> >> > dump_stack_lvl+0x34/0x48
> >> > print_address_description.constprop.0+0x87/0x2a5
> >> > print_report+0x103/0x1ed
> >> > kasan_report+0xb7/0x140
> >> > kobject_del+0x14/0x30
> >> > kmem_cache_destroy+0x130/0x170
> >> > test_exit+0x1a/0x30
> >> > kunit_try_run_case+0xad/0xc0
> >> > kunit_generic_run_threadfn_adapter+0x26/0x50
> >> > kthread+0x17b/0x1b0
> >> > </TASK>
> >> >
> >> > The cause is inside kmem_cache_destroy():
> >> >
> >> > kmem_cache_destroy
> >> > acquire lock/mutex
> >> > shutdown_cache
> >> > schedule_work(kmem_cache_release) (if RCU flag set)
> >> > release lock/mutex
> >> > kmem_cache_release (if RCU flag set)
> >>
> >> ^ not set
> >>
> >> I've fixed that up.
> >>
> >> >
> >> > in some certain timing, the scheduled work could be run before
> >> > the next RCU flag checking which will get a wrong state.
> >> >
> >> > Fix it by caching the RCU flag inside protected area, just like 'refcnt'
> >
> > Very nice catch, thanks!
> >
> > Otherwise (and with Vlastimil's fix):
> >
> > Looks good to me.
> > Reviewed-by: Hyeonggon Yoo <[email protected]>
> >
> >> >
> >> > Signed-off-by: Feng Tang <[email protected]>
> >>
> >> Thanks!
> >>
> >> > ---
> >> >
> >> > note:
> >> >
> >> > The error only happens on linux-next tree, and not in Linus' tree,
> >> > which already has Waiman's commit:
> >> > 0495e337b703 ("mm/slab_common: Deleting kobject in kmem_cache_destroy()
> >> > without holding slab_mutex/cpu_hotplug_lock")
> >>
> >> Actually that commit is already in Linus' rc5 too, so I will send your fix
> >> this week too. Added a Fixes: 0495e337b703 (...) too.
> >>
> >> > mm/slab_common.c | 5 ++++-
> >> > 1 file changed, 4 insertions(+), 1 deletion(-)
> >> >
> >> > diff --git a/mm/slab_common.c b/mm/slab_common.c
> >> > index 07b948288f84..ccc02573588f 100644
> >> > --- a/mm/slab_common.c
> >> > +++ b/mm/slab_common.c
> >> > @@ -475,6 +475,7 @@ void slab_kmem_cache_release(struct kmem_cache *s)
> >> > void kmem_cache_destroy(struct kmem_cache *s)
> >> > {
> >> > int refcnt;
> >> > + bool rcu_set;
> >> >
> >> > if (unlikely(!s) || !kasan_check_byte(s))
> >> > return;
> >> > @@ -482,6 +483,8 @@ void kmem_cache_destroy(struct kmem_cache *s)
> >> > cpus_read_lock();
> >> > mutex_lock(&slab_mutex);
> >> >
> >> > + rcu_set = s->flags & SLAB_TYPESAFE_BY_RCU;
> >> > +
> >> > refcnt = --s->refcount;
> >> > if (refcnt)
> >> > goto out_unlock;
> >> > @@ -492,7 +495,7 @@ void kmem_cache_destroy(struct kmem_cache *s)
> >> > out_unlock:
> >> > mutex_unlock(&slab_mutex);
> >> > cpus_read_unlock();
> >> > - if (!refcnt && !(s->flags & SLAB_TYPESAFE_BY_RCU))
> >> > + if (!refcnt && !rcu_set)
> >> > kmem_cache_release(s);
> >> > }
> >> > EXPORT_SYMBOL(kmem_cache_destroy);
> >>
> >
>
--
Thanks,
Hyeonggon
On 9/19/22 13:56, Hyeonggon Yoo wrote:
> On Mon, Sep 19, 2022 at 11:12:38AM +0200, Vlastimil Babka wrote:
>> On 9/19/22 05:12, Feng Tang wrote:
>> > When doing slub_debug test, kfence's 'test_memcache_typesafe_by_rcu'
>> > kunit test case cause a use-after-free error:
>> >
>
> If I'm not mistaken, I think the subject should be:
> s/double free/use after free/g
Well, it's both AFAICS. By the initial use-after-free we can read a wrong
s->flags that was modified since we freed for the first time, and it can
lead to another kmem_cache_release() which is basically a double free.
>> > BUG: KASAN: use-after-free in kobject_del+0x14/0x30
>> > Read of size 8 at addr ffff888007679090 by task kunit_try_catch/261
>> >
>> > CPU: 1 PID: 261 Comm: kunit_try_catch Tainted: G B N 6.0.0-rc5-next-20220916 #17
>> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
>> > Call Trace:
>> > <TASK>
>> > dump_stack_lvl+0x34/0x48
>> > print_address_description.constprop.0+0x87/0x2a5
>> > print_report+0x103/0x1ed
>> > kasan_report+0xb7/0x140
>> > kobject_del+0x14/0x30
>> > kmem_cache_destroy+0x130/0x170
>> > test_exit+0x1a/0x30
>> > kunit_try_run_case+0xad/0xc0
>> > kunit_generic_run_threadfn_adapter+0x26/0x50
>> > kthread+0x17b/0x1b0
>> > </TASK>
>> >
>> > The cause is inside kmem_cache_destroy():
>> >
>> > kmem_cache_destroy
>> > acquire lock/mutex
>> > shutdown_cache
>> > schedule_work(kmem_cache_release) (if RCU flag set)
>> > release lock/mutex
>> > kmem_cache_release (if RCU flag set)
>>
>> ^ not set
>>
>> I've fixed that up.
>>
>> >
>> > in some certain timing, the scheduled work could be run before
>> > the next RCU flag checking which will get a wrong state.
>> >
>> > Fix it by caching the RCU flag inside protected area, just like 'refcnt'
>
> Very nice catch, thanks!
>
> Otherwise (and with Vlastimil's fix):
>
> Looks good to me.
> Reviewed-by: Hyeonggon Yoo <[email protected]>
>
>> >
>> > Signed-off-by: Feng Tang <[email protected]>
>>
>> Thanks!
>>
>> > ---
>> >
>> > note:
>> >
>> > The error only happens on linux-next tree, and not in Linus' tree,
>> > which already has Waiman's commit:
>> > 0495e337b703 ("mm/slab_common: Deleting kobject in kmem_cache_destroy()
>> > without holding slab_mutex/cpu_hotplug_lock")
>>
>> Actually that commit is already in Linus' rc5 too, so I will send your fix
>> this week too. Added a Fixes: 0495e337b703 (...) too.
>>
>> > mm/slab_common.c | 5 ++++-
>> > 1 file changed, 4 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/mm/slab_common.c b/mm/slab_common.c
>> > index 07b948288f84..ccc02573588f 100644
>> > --- a/mm/slab_common.c
>> > +++ b/mm/slab_common.c
>> > @@ -475,6 +475,7 @@ void slab_kmem_cache_release(struct kmem_cache *s)
>> > void kmem_cache_destroy(struct kmem_cache *s)
>> > {
>> > int refcnt;
>> > + bool rcu_set;
>> >
>> > if (unlikely(!s) || !kasan_check_byte(s))
>> > return;
>> > @@ -482,6 +483,8 @@ void kmem_cache_destroy(struct kmem_cache *s)
>> > cpus_read_lock();
>> > mutex_lock(&slab_mutex);
>> >
>> > + rcu_set = s->flags & SLAB_TYPESAFE_BY_RCU;
>> > +
>> > refcnt = --s->refcount;
>> > if (refcnt)
>> > goto out_unlock;
>> > @@ -492,7 +495,7 @@ void kmem_cache_destroy(struct kmem_cache *s)
>> > out_unlock:
>> > mutex_unlock(&slab_mutex);
>> > cpus_read_unlock();
>> > - if (!refcnt && !(s->flags & SLAB_TYPESAFE_BY_RCU))
>> > + if (!refcnt && !rcu_set)
>> > kmem_cache_release(s);
>> > }
>> > EXPORT_SYMBOL(kmem_cache_destroy);
>>
>
On Mon, Sep 19, 2022 at 05:12:38PM +0800, Vlastimil Babka wrote:
> On 9/19/22 05:12, Feng Tang wrote:
[...]
> > The cause is inside kmem_cache_destroy():
> >
> > kmem_cache_destroy
> > acquire lock/mutex
> > shutdown_cache
> > schedule_work(kmem_cache_release) (if RCU flag set)
> > release lock/mutex
> > kmem_cache_release (if RCU flag set)
>
> ^ not set
>
> I've fixed that up.
Oops.. Thanks for catching it!
> >
> > in some certain timing, the scheduled work could be run before
> > the next RCU flag checking which will get a wrong state.
> >
> > Fix it by caching the RCU flag inside protected area, just like 'refcnt'
> >
> > Signed-off-by: Feng Tang <[email protected]>
>
> Thanks!
>
> > ---
> >
> > note:
> >
> > The error only happens on linux-next tree, and not in Linus' tree,
> > which already has Waiman's commit:
> > 0495e337b703 ("mm/slab_common: Deleting kobject in kmem_cache_destroy()
> > without holding slab_mutex/cpu_hotplug_lock")
>
> Actually that commit is already in Linus' rc5 too, so I will send your fix
> this week too. Added a Fixes: 0495e337b703 (...) too.
Got it, thanks
- Feng
On 9/19/22 08:50, Feng Tang wrote:
> On Mon, Sep 19, 2022 at 05:12:38PM +0800, Vlastimil Babka wrote:
>> On 9/19/22 05:12, Feng Tang wrote:
> [...]
>>> The cause is inside kmem_cache_destroy():
>>>
>>> kmem_cache_destroy
>>> acquire lock/mutex
>>> shutdown_cache
>>> schedule_work(kmem_cache_release) (if RCU flag set)
>>> release lock/mutex
>>> kmem_cache_release (if RCU flag set)
>> ^ not set
>>
>> I've fixed that up.
> Oops.. Thanks for catching it!
>
>>> in some certain timing, the scheduled work could be run before
>>> the next RCU flag checking which will get a wrong state.
>>>
>>> Fix it by caching the RCU flag inside protected area, just like 'refcnt'
>>>
>>> Signed-off-by: Feng Tang <[email protected]>
>> Thanks!
>>
>>> ---
>>>
>>> note:
>>>
>>> The error only happens on linux-next tree, and not in Linus' tree,
>>> which already has Waiman's commit:
>>> 0495e337b703 ("mm/slab_common: Deleting kobject in kmem_cache_destroy()
>>> without holding slab_mutex/cpu_hotplug_lock")
>> Actually that commit is already in Linus' rc5 too, so I will send your fix
>> this week too. Added a Fixes: 0495e337b703 (...) too.
> Got it, thanks
>
> - Feng
Thanks for catching this bug.
Reviewed-by: Waiman Long <[email protected]>
Cheers,
Longman