2009/8/17 Patrick McHardy <[email protected]>:
> Eric Dumazet wrote:
>> Zdenek Kabelac a ?crit :
>>> ?[<ffffffffa02c502f>] nf_conntrack_ftp_fini+0x2f/0x70 [nf_conntrack_ftp]
>>> ?[<ffffffff8027bcc5>] sys_delete_module+0x1a5/0x270
>>> ?[<ffffffff8020d329>] ? retint_swapgs+0xe/0x13
>>> ?[<ffffffff80271bf2>] ? trace_hardirqs_on_caller+0x162/0x1b0
>>> ?[<ffffffff80292121>] ? audit_syscall_entry+0x191/0x1c0
>>> ?[<ffffffff80526dae>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>>> ?[<ffffffff8020c84b>] system_call_fastpath+0x16/0x1b
>>> Code: c6 00 00 0f 82 66 ff ff ff 49 8b 9e d8 05 00 00 48 85 db 75 16
>>> e9 8e 00 00 00 0f 1f 44 00 00 48 85 c0 0f 84 80 00 00 00 48 89 c3 <0f>
>>> b6 4b 37 48 8b 03 48 8d 14 cd 00 00 00 00 0f 18 08 48 29 ca
>>> RIP ?[<ffffffffa02b2c2c>] nf_conntrack_helper_unregister+0x16c/0x320
>>> [nf_conntrack]
>>> ?RSP <ffff88013982fe68>
>>> CR2: 0000000000000038
>>> ---[ end trace bc3a0ede3d0084db ]---
>>>
>> I am currently traveling and wont be able to help you before next week.
>>
>> I added netdev, Patrick, and netfilter-devel in CC so that more eyes can take a look.
>
> Thanks for the report, I'll have a look at this. Zdenek, please
> send me the nf_conntrack.ko file used in the above oops. Thanks.
>
Ok
I've found the solution for my problem.
http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/30483
I've made this small fix from this thread:
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core
index b5869b9..68488f8 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1108,6 +1108,7 @@ static void nf_conntrack_cleanup_init_net(void)
{
nf_conntrack_helper_fini();
nf_conntrack_proto_fini();
+ rcu_barrier();
kmem_cache_destroy(nf_conntrack_cachep);
}
@@ -1266,7 +1267,7 @@ static int nf_conntrack_init_init_net(void)
nf_conntrack_cachep = kmem_cache_create("nf_conntrack",
sizeof(struct nf_conn),
- 0, SLAB_DESTROY_BY_RCU, NULL);
+ 0, 0, NULL);
if (!nf_conntrack_cachep) {
printk(KERN_ERR "Unable to create nf_conn slab cache\n");
ret = -ENOMEM;
As the thread nf_conntrack: Use rcu_barrier() and fix kmem_cache_create flags
seems to be samewhat 'unfinished' and already a bit old and I've no
idea whether it actually fixes problem completely or just hides it in
my case - I'm leaving it to some RCU gurus to fix this issue.
All I could say is - this this extra rcu_barrier() and removal of
SLAB_DESTROY removes my GPF on reboot.
Zdenek
Zdenek Kabelac a ?crit :
> 2009/8/17 Patrick McHardy <[email protected]>:
>> Eric Dumazet wrote:
>>> Zdenek Kabelac a ?crit :
>>>> [<ffffffffa02c502f>] nf_conntrack_ftp_fini+0x2f/0x70 [nf_conntrack_ftp]
>>>> [<ffffffff8027bcc5>] sys_delete_module+0x1a5/0x270
>>>> [<ffffffff8020d329>] ? retint_swapgs+0xe/0x13
>>>> [<ffffffff80271bf2>] ? trace_hardirqs_on_caller+0x162/0x1b0
>>>> [<ffffffff80292121>] ? audit_syscall_entry+0x191/0x1c0
>>>> [<ffffffff80526dae>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>>>> [<ffffffff8020c84b>] system_call_fastpath+0x16/0x1b
>>>> Code: c6 00 00 0f 82 66 ff ff ff 49 8b 9e d8 05 00 00 48 85 db 75 16
>>>> e9 8e 00 00 00 0f 1f 44 00 00 48 85 c0 0f 84 80 00 00 00 48 89 c3 <0f>
>>>> b6 4b 37 48 8b 03 48 8d 14 cd 00 00 00 00 0f 18 08 48 29 ca
>>>> RIP [<ffffffffa02b2c2c>] nf_conntrack_helper_unregister+0x16c/0x320
>>>> [nf_conntrack]
>>>> RSP <ffff88013982fe68>
>>>> CR2: 0000000000000038
>>>> ---[ end trace bc3a0ede3d0084db ]---
>>>>
>>> I am currently traveling and wont be able to help you before next week.
>>>
>>> I added netdev, Patrick, and netfilter-devel in CC so that more eyes can take a look.
>> Thanks for the report, I'll have a look at this. Zdenek, please
>> send me the nf_conntrack.ko file used in the above oops. Thanks.
>>
>
> Ok
>
> I've found the solution for my problem.
>
> http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/30483
>
> I've made this small fix from this thread:
>
> diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core
> index b5869b9..68488f8 100644
> --- a/net/netfilter/nf_conntrack_core.c
> +++ b/net/netfilter/nf_conntrack_core.c
> @@ -1108,6 +1108,7 @@ static void nf_conntrack_cleanup_init_net(void)
> {
> nf_conntrack_helper_fini();
> nf_conntrack_proto_fini();
> + rcu_barrier();
> kmem_cache_destroy(nf_conntrack_cachep);
> }
>
> @@ -1266,7 +1267,7 @@ static int nf_conntrack_init_init_net(void)
>
> nf_conntrack_cachep = kmem_cache_create("nf_conntrack",
> sizeof(struct nf_conn),
> - 0, SLAB_DESTROY_BY_RCU, NULL);
> + 0, 0, NULL);
> if (!nf_conntrack_cachep) {
> printk(KERN_ERR "Unable to create nf_conn slab cache\n");
> ret = -ENOMEM;
>
>
> As the thread nf_conntrack: Use rcu_barrier() and fix kmem_cache_create flags
> seems to be samewhat 'unfinished' and already a bit old and I've no
> idea whether it actually fixes problem completely or just hides it in
> my case - I'm leaving it to some RCU gurus to fix this issue.
>
> All I could say is - this this extra rcu_barrier() and removal of
> SLAB_DESTROY removes my GPF on reboot.
>
> Zdenek
Ouch..
Dont think such a patch makes your kernel better, it'll crash too.
You cannot remove SLAB_DESTROY_BY_RCU like this, it's there for very good reasons.
2009/9/3 Eric Dumazet <[email protected]>:
> Zdenek Kabelac a ?crit :
>> 2009/8/17 Patrick McHardy <[email protected]>:
>>> Eric Dumazet wrote:
>>>> Zdenek Kabelac a ?crit :
>>>>> ?[<ffffffffa02c502f>] nf_conntrack_ftp_fini+0x2f/0x70 [nf_conntrack_ftp]
>>>>> ?[<ffffffff8027bcc5>] sys_delete_module+0x1a5/0x270
>>>>> ?[<ffffffff8020d329>] ? retint_swapgs+0xe/0x13
>>>>> ?[<ffffffff80271bf2>] ? trace_hardirqs_on_caller+0x162/0x1b0
>>>>> ?[<ffffffff80292121>] ? audit_syscall_entry+0x191/0x1c0
>>>>> ?[<ffffffff80526dae>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>>>>> ?[<ffffffff8020c84b>] system_call_fastpath+0x16/0x1b
>>>>> Code: c6 00 00 0f 82 66 ff ff ff 49 8b 9e d8 05 00 00 48 85 db 75 16
>>>>> e9 8e 00 00 00 0f 1f 44 00 00 48 85 c0 0f 84 80 00 00 00 48 89 c3 <0f>
>>>>> b6 4b 37 48 8b 03 48 8d 14 cd 00 00 00 00 0f 18 08 48 29 ca
>>>>> RIP ?[<ffffffffa02b2c2c>] nf_conntrack_helper_unregister+0x16c/0x320
>>>>> [nf_conntrack]
>>>>> ?RSP <ffff88013982fe68>
>>>>> CR2: 0000000000000038
>>>>> ---[ end trace bc3a0ede3d0084db ]---
>>>>>
>>>> I am currently traveling and wont be able to help you before next week.
>>>>
>>>> I added netdev, Patrick, and netfilter-devel in CC so that more eyes can take a look.
>>> Thanks for the report, I'll have a look at this. Zdenek, please
>>> send me the nf_conntrack.ko file used in the above oops. Thanks.
>>>
>>
>> Ok
>>
>> I've found the solution for my problem.
>>
>> http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/30483
>>
>> I've made this small fix from this thread:
>>
>> diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core
>> index b5869b9..68488f8 100644
>> --- a/net/netfilter/nf_conntrack_core.c
>> +++ b/net/netfilter/nf_conntrack_core.c
>> @@ -1108,6 +1108,7 @@ static void nf_conntrack_cleanup_init_net(void)
>> ?{
>> ? ? ? ? nf_conntrack_helper_fini();
>> ? ? ? ? nf_conntrack_proto_fini();
>> + ? ? ? rcu_barrier();
>> ? ? ? ? kmem_cache_destroy(nf_conntrack_cachep);
>> ?}
>>
>> @@ -1266,7 +1267,7 @@ static int nf_conntrack_init_init_net(void)
>>
>> ? ? ? ? nf_conntrack_cachep = kmem_cache_create("nf_conntrack",
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? sizeof(struct nf_conn),
>> - ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0, SLAB_DESTROY_BY_RCU, NULL);
>> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0, 0, NULL);
>> ? ? ? ? if (!nf_conntrack_cachep) {
>> ? ? ? ? ? ? ? ? printk(KERN_ERR "Unable to create nf_conn slab cache\n");
>> ? ? ? ? ? ? ? ? ret = -ENOMEM;
>>
>>
>> As the thread nf_conntrack: Use rcu_barrier() and fix kmem_cache_create flags
>> seems to be samewhat 'unfinished' ?and already a bit old and I've no
>> idea whether it actually fixes problem completely or just hides it in
>> my case - I'm leaving it to some RCU gurus to fix this issue.
>>
>> All I could say is - this this extra rcu_barrier() and removal of
>> SLAB_DESTROY removes my GPF on reboot.
>>
>> Zdenek
>
> Ouch..
>
> Dont think such a patch makes your kernel better, it'll crash too.
>
> You cannot remove SLAB_DESTROY_BY_RCU like this, it's there for very good reasons.
>
Well I'm not noticing any ill behavior - also note - rcu_barrier() is
there before the cache is destroyed.
But as I said - it's just my shot into the dark - which seems to work for me...
Zdenek
Zdenek Kabelac a ?crit :
>
> Well I'm not noticing any ill behavior - also note - rcu_barrier() is
> there before the cache is destroyed.
> But as I said - it's just my shot into the dark - which seems to work for me...
>
Reading again your traces, I do believe there are two bugs in slub
Maybe not explaining your problem, but worth to fix !
Thank you
[PATCH] slub: fix slab_pad_check() and SLAB_DESTROY_BY_RCU
When SLAB_POISON is used and slab_pad_check() finds an overwrite of the
slab padding, we call restore_bytes() on the whole slab, not only
on the padding.
kmem_cache_destroy() should call rcu_barrier() *after* kmem_cache_close()
and *before* sysfs_slab_remove() or risk rcu_free_slab()
being called after kmem_cache is deleted (kfreed).
rmmod nf_conntrack can crash the machine because it has to
kmem_cache_destroy() a SLAB_DESTROY_BY_RCU enabled cache.
Reported-by: Zdenek Kabelac <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
---
diff --git a/mm/slub.c b/mm/slub.c
index b9f1491..0ac839f 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -646,7 +646,7 @@ static int slab_pad_check(struct kmem_cache *s, struct page *page)
slab_err(s, page, "Padding overwritten. 0x%p-0x%p", fault, end - 1);
print_section("Padding", end - remainder, remainder);
- restore_bytes(s, "slab padding", POISON_INUSE, start, end);
+ restore_bytes(s, "slab padding", POISON_INUSE, end - remainder, end);
return 0;
}
@@ -2594,8 +2594,6 @@ static inline int kmem_cache_close(struct kmem_cache *s)
*/
void kmem_cache_destroy(struct kmem_cache *s)
{
- if (s->flags & SLAB_DESTROY_BY_RCU)
- rcu_barrier();
down_write(&slub_lock);
s->refcount--;
if (!s->refcount) {
@@ -2606,6 +2604,8 @@ void kmem_cache_destroy(struct kmem_cache *s)
"still has objects.\n", s->name, __func__);
dump_stack();
}
+ if (s->flags & SLAB_DESTROY_BY_RCU)
+ rcu_barrier();
sysfs_slab_remove(s);
} else
up_write(&slub_lock);
On Thu, Sep 3, 2009 at 4:04 AM, Eric Dumazet<[email protected]> wrote:
> Zdenek Kabelac a ?crit :
>>
>> Well I'm not noticing any ill behavior - also note - rcu_barrier() is
>> there before the cache is destroyed.
>> But as I said - it's just my shot into the dark - which seems to work for me...
>>
>
> Reading again your traces, I do believe there are two bugs in slub
>
> Maybe not explaining your problem, but worth to fix !
>
> Thank you
>
> [PATCH] slub: fix slab_pad_check() and SLAB_DESTROY_BY_RCU
>
> When SLAB_POISON is used and slab_pad_check() finds an overwrite of the
> slab padding, we call restore_bytes() on the whole slab, not only
> on the padding.
>
> kmem_cache_destroy() should call rcu_barrier() *after* kmem_cache_close()
> and *before* sysfs_slab_remove() or risk rcu_free_slab()
> being called after kmem_cache is deleted (kfreed).
>
> rmmod nf_conntrack can crash the machine because it has to
> kmem_cache_destroy() a SLAB_DESTROY_BY_RCU enabled cache.
>
> Reported-by: Zdenek Kabelac <[email protected]>
> Signed-off-by: Eric Dumazet <[email protected]>
> ---
> diff --git a/mm/slub.c b/mm/slub.c
> index b9f1491..0ac839f 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -646,7 +646,7 @@ static int slab_pad_check(struct kmem_cache *s, struct page *page)
> ? ? ? ?slab_err(s, page, "Padding overwritten. 0x%p-0x%p", fault, end - 1);
> ? ? ? ?print_section("Padding", end - remainder, remainder);
>
> - ? ? ? restore_bytes(s, "slab padding", POISON_INUSE, start, end);
> + ? ? ? restore_bytes(s, "slab padding", POISON_INUSE, end - remainder, end);
OK, makes sense.
> ? ? ? ?return 0;
> ?}
>
> @@ -2594,8 +2594,6 @@ static inline int kmem_cache_close(struct kmem_cache *s)
> ?*/
> ?void kmem_cache_destroy(struct kmem_cache *s)
> ?{
> - ? ? ? if (s->flags & SLAB_DESTROY_BY_RCU)
> - ? ? ? ? ? ? ? rcu_barrier();
> ? ? ? ?down_write(&slub_lock);
> ? ? ? ?s->refcount--;
> ? ? ? ?if (!s->refcount) {
> @@ -2606,6 +2604,8 @@ void kmem_cache_destroy(struct kmem_cache *s)
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?"still has objects.\n", s->name, __func__);
> ? ? ? ? ? ? ? ? ? ? ? ?dump_stack();
> ? ? ? ? ? ? ? ?}
> + ? ? ? ? ? ? ? if (s->flags & SLAB_DESTROY_BY_RCU)
> + ? ? ? ? ? ? ? ? ? ? ? rcu_barrier();
> ? ? ? ? ? ? ? ?sysfs_slab_remove(s);
> ? ? ? ?} else
> ? ? ? ? ? ? ? ?up_write(&slub_lock);
The rcu_barrier() call was added by this commit:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=7ed9f7e5db58c6e8c2b4b738a75d5dcd8e17aad5
I guess we should CC Paul as well.
Pekka Enberg a ?crit :
> On Thu, Sep 3, 2009 at 4:04 AM, Eric Dumazet<[email protected]> wrote:
>> Zdenek Kabelac a ?crit :
>>> Well I'm not noticing any ill behavior - also note - rcu_barrier() is
>>> there before the cache is destroyed.
>>> But as I said - it's just my shot into the dark - which seems to work for me...
>>>
>> Reading again your traces, I do believe there are two bugs in slub
>>
>> Maybe not explaining your problem, but worth to fix !
>>
>> Thank you
>>
>> [PATCH] slub: fix slab_pad_check() and SLAB_DESTROY_BY_RCU
>>
>> When SLAB_POISON is used and slab_pad_check() finds an overwrite of the
>> slab padding, we call restore_bytes() on the whole slab, not only
>> on the padding.
>>
>> kmem_cache_destroy() should call rcu_barrier() *after* kmem_cache_close()
>> and *before* sysfs_slab_remove() or risk rcu_free_slab()
>> being called after kmem_cache is deleted (kfreed).
>>
>> rmmod nf_conntrack can crash the machine because it has to
>> kmem_cache_destroy() a SLAB_DESTROY_BY_RCU enabled cache.
>>
>> Reported-by: Zdenek Kabelac <[email protected]>
>> Signed-off-by: Eric Dumazet <[email protected]>
>> ---
>> diff --git a/mm/slub.c b/mm/slub.c
>> index b9f1491..0ac839f 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -646,7 +646,7 @@ static int slab_pad_check(struct kmem_cache *s, struct page *page)
>> slab_err(s, page, "Padding overwritten. 0x%p-0x%p", fault, end - 1);
>> print_section("Padding", end - remainder, remainder);
>>
>> - restore_bytes(s, "slab padding", POISON_INUSE, start, end);
>> + restore_bytes(s, "slab padding", POISON_INUSE, end - remainder, end);
>
> OK, makes sense.
>
>> return 0;
>> }
>>
>> @@ -2594,8 +2594,6 @@ static inline int kmem_cache_close(struct kmem_cache *s)
>> */
>> void kmem_cache_destroy(struct kmem_cache *s)
>> {
>> - if (s->flags & SLAB_DESTROY_BY_RCU)
>> - rcu_barrier();
>> down_write(&slub_lock);
>> s->refcount--;
>> if (!s->refcount) {
>> @@ -2606,6 +2604,8 @@ void kmem_cache_destroy(struct kmem_cache *s)
>> "still has objects.\n", s->name, __func__);
>> dump_stack();
>> }
>> + if (s->flags & SLAB_DESTROY_BY_RCU)
>> + rcu_barrier();
>> sysfs_slab_remove(s);
>> } else
>> up_write(&slub_lock);
>
> The rcu_barrier() call was added by this commit:
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=7ed9f7e5db58c6e8c2b4b738a75d5dcd8e17aad5
>
> I guess we should CC Paul as well.
Sure !
rcu_barrier() is definitly better than synchronize_rcu() in
kmem_cache_destroy()
But its location was not really right (for SLUB at least)
SLAB_DESTROY_BY_RCU means subsystem will call kfree(elems) without waiting RCU
grace period.
By the time subsystem calls kmem_cache_destroy(), all previously allocated
elems must have already be kfreed() by this subsystem.
We must however wait that all slabs, queued for freeing by rcu_free_slab(),
are indeed freed, since this freeing needs access to kmem_cache pointer.
As kmem_cache_close() might clean/purge the cache and call rcu_free_slab(),
we must call rcu_barrier() *after* kmem_cache_close(), and before kfree(kmem_cache *s)
Alternatively we could delay this final kfree(s) (with call_rcu()) but would
have to copy s->name in kmem_cache_create() instead of keeping a pointer to
a string that might be in a module, and freed at rmmod time.
Given that there is few uses in current tree that call kmem_cache_destroy()
on a SLAB_DESTROY_BY_RCU cache, there is no need to try to optimize this
rcu_barrier() call, unless we want superfast reboot/halt sequences...
Hi Eric,
On Thu, Sep 3, 2009 at 10:38 AM, Eric Dumazet<[email protected]> wrote:
>> The rcu_barrier() call was added by this commit:
>>
>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=7ed9f7e5db58c6e8c2b4b738a75d5dcd8e17aad5
>>
>> I guess we should CC Paul as well.
>
> Sure !
>
> rcu_barrier() is definitly better than synchronize_rcu() in
> kmem_cache_destroy()
>
> But its location was not really right (for SLUB at least)
>
> SLAB_DESTROY_BY_RCU means subsystem will call kfree(elems) without waiting RCU
> grace period.
>
> By the time subsystem calls kmem_cache_destroy(), all previously allocated
> elems must have already be kfreed() by this subsystem.
>
> We must however wait that all slabs, queued for freeing by rcu_free_slab(),
> are indeed freed, since this freeing needs access to kmem_cache pointer.
>
> As kmem_cache_close() might clean/purge the cache and call rcu_free_slab(),
> we must call rcu_barrier() *after* kmem_cache_close(), and before kfree(kmem_cache *s)
>
> Alternatively we could delay this final kfree(s) (with call_rcu()) but would
> have to copy s->name in kmem_cache_create() instead of keeping a pointer to
> ?a string that might be in a module, and freed at rmmod time.
>
> Given that there is few uses in current tree that call kmem_cache_destroy()
> on a SLAB_DESTROY_BY_RCU cache, there is no need to try to optimize this
> rcu_barrier() call, unless we want superfast reboot/halt sequences...
Oh, sure, the fix looks sane to me. It's just that I am a complete
coward when it comes to merging RCU related patches so I always try to
fish an Acked-by from Paul or Christoph ;).
Pekka
2009/9/3 Eric Dumazet <[email protected]>:
> Zdenek Kabelac a ?crit :
>>
>> Well I'm not noticing any ill behavior - also note - rcu_barrier() is
>> there before the cache is destroyed.
>> But as I said - it's just my shot into the dark - which seems to work for me...
>>
>
> Reading again your traces, I do believe there are two bugs in slub
>
> Maybe not explaining your problem, but worth to fix !
>
> Thank you
>
> [PATCH] slub: fix slab_pad_check() and SLAB_DESTROY_BY_RCU
>
Ok - I can confirm that this patch fixes my reboot problem.
Thanks
Zdenek
On Thu, Sep 03, 2009 at 09:31:01AM +0300, Pekka Enberg wrote:
> On Thu, Sep 3, 2009 at 4:04 AM, Eric Dumazet<[email protected]> wrote:
> > Zdenek Kabelac a ?crit :
> >>
> >> Well I'm not noticing any ill behavior - also note - rcu_barrier() is
> >> there before the cache is destroyed.
> >> But as I said - it's just my shot into the dark - which seems to work for me...
> >>
> >
> > Reading again your traces, I do believe there are two bugs in slub
> >
> > Maybe not explaining your problem, but worth to fix !
> >
> > Thank you
> >
> > [PATCH] slub: fix slab_pad_check() and SLAB_DESTROY_BY_RCU
> >
> > When SLAB_POISON is used and slab_pad_check() finds an overwrite of the
> > slab padding, we call restore_bytes() on the whole slab, not only
> > on the padding.
> >
> > kmem_cache_destroy() should call rcu_barrier() *after* kmem_cache_close()
> > and *before* sysfs_slab_remove() or risk rcu_free_slab()
> > being called after kmem_cache is deleted (kfreed).
> >
> > rmmod nf_conntrack can crash the machine because it has to
> > kmem_cache_destroy() a SLAB_DESTROY_BY_RCU enabled cache.
> >
> > Reported-by: Zdenek Kabelac <[email protected]>
> > Signed-off-by: Eric Dumazet <[email protected]>
> > ---
> > diff --git a/mm/slub.c b/mm/slub.c
> > index b9f1491..0ac839f 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -646,7 +646,7 @@ static int slab_pad_check(struct kmem_cache *s, struct page *page)
> > ? ? ? ?slab_err(s, page, "Padding overwritten. 0x%p-0x%p", fault, end - 1);
> > ? ? ? ?print_section("Padding", end - remainder, remainder);
> >
> > - ? ? ? restore_bytes(s, "slab padding", POISON_INUSE, start, end);
> > + ? ? ? restore_bytes(s, "slab padding", POISON_INUSE, end - remainder, end);
>
> OK, makes sense.
>
> > ? ? ? ?return 0;
> > ?}
> >
> > @@ -2594,8 +2594,6 @@ static inline int kmem_cache_close(struct kmem_cache *s)
> > ?*/
> > ?void kmem_cache_destroy(struct kmem_cache *s)
> > ?{
> > - ? ? ? if (s->flags & SLAB_DESTROY_BY_RCU)
> > - ? ? ? ? ? ? ? rcu_barrier();
> > ? ? ? ?down_write(&slub_lock);
> > ? ? ? ?s->refcount--;
> > ? ? ? ?if (!s->refcount) {
> > @@ -2606,6 +2604,8 @@ void kmem_cache_destroy(struct kmem_cache *s)
> > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?"still has objects.\n", s->name, __func__);
> > ? ? ? ? ? ? ? ? ? ? ? ?dump_stack();
> > ? ? ? ? ? ? ? ?}
> > + ? ? ? ? ? ? ? if (s->flags & SLAB_DESTROY_BY_RCU)
> > + ? ? ? ? ? ? ? ? ? ? ? rcu_barrier();
> > ? ? ? ? ? ? ? ?sysfs_slab_remove(s);
> > ? ? ? ?} else
> > ? ? ? ? ? ? ? ?up_write(&slub_lock);
>
> The rcu_barrier() call was added by this commit:
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=7ed9f7e5db58c6e8c2b4b738a75d5dcd8e17aad5
>
> I guess we should CC Paul as well.
I agree that moving the rcu_barrier() as you have done is the right
thing to do -- no point in doing the rcu_barrier() unless you actually
are destroying the kmem_cache!
Acked-by: Paul E. McKenney <[email protected]>
Zdenek Kabelac a ?crit :
> 2009/9/3 Eric Dumazet <[email protected]>:
>> Zdenek Kabelac a ?crit :
>>> Well I'm not noticing any ill behavior - also note - rcu_barrier() is
>>> there before the cache is destroyed.
>>> But as I said - it's just my shot into the dark - which seems to work for me...
>>>
>> Reading again your traces, I do believe there are two bugs in slub
>>
>> Maybe not explaining your problem, but worth to fix !
>>
>> Thank you
>>
>> [PATCH] slub: fix slab_pad_check() and SLAB_DESTROY_BY_RCU
>>
>
>
> Ok - I can confirm that this patch fixes my reboot problem.
>
> Thanks
>
> Zdenek
Good news !
Hmm... but you had a problem with a 2.6.29.something kernel if I remember well ?
At that time, conntrack did not use SLAB_DESTROY_BY_RCU yet.
Hi Christoph,
On Thu, 3 Sep 2009, Pekka Enberg wrote:
>> Oh, sure, the fix looks sane to me. It's just that I am a complete
>> coward when it comes to merging RCU related patches so I always try to
>> fish an Acked-by from Paul or Christoph ;).
On Thu, Sep 3, 2009 at 8:50 PM, Christoph
Lameter<[email protected]> wrote:
> I am fine with acking the poison piece.
Ok.
On Thu, Sep 3, 2009 at 8:50 PM, Christoph
Lameter<[email protected]> wrote:
> I did not ack the patch that added rcu to kmem_cache_destroy() and I
> likely wont ack that piece either.
Right. I didn't remember that I merged that over your NAK but I don't
share your view that kmem_cache_destroy() callers should do
rcu_barrier() or whatever is necessary if we can do it from
kmem_cache_destroy(). So I am happy to take both changes but I would
appreciate them being split into two separate patches.
Pekka
Christoph Lameter a ?crit :
> On Thu, 3 Sep 2009, Eric Dumazet wrote:
>
>> on a SLAB_DESTROY_BY_RCU cache, there is no need to try to optimize this
>> rcu_barrier() call, unless we want superfast reboot/halt sequences...
>
> I stilll think that the action to quiesce rcu is something that the caller
> of kmem_cache_destroy must take care of.
Do you mean :
if (kmem_cache_shrink(s) == 0) {
rcu_barrier();
kmem_cache_destroy_no_rcu_barrier(s);
} else {
kmem_cache_destroy_with_rcu_barrier_because_SLAB_DESTROY_BY_RCU_cache(s);
}
What would be the point ?
>
> Could you split this into two patches: One that addresses the poison and
> another that deals with rcu?
>
Sure, here is the poison thing
[PATCH] slub: fix slab_pad_check()
When SLAB_POISON is used and slab_pad_check() finds an overwrite of the
slab padding, we call restore_bytes() on the whole slab, not only
on the padding.
Reported-by: Zdenek Kabelac <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
---
diff --git a/mm/slub.c b/mm/slub.c
index b9f1491..0ac839f 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -646,7 +646,7 @@ static int slab_pad_check(struct kmem_cache *s, struct page *page)
slab_err(s, page, "Padding overwritten. 0x%p-0x%p", fault, end - 1);
print_section("Padding", end - remainder, remainder);
- restore_bytes(s, "slab padding", POISON_INUSE, start, end);
+ restore_bytes(s, "slab padding", POISON_INUSE, end - remainder, end);
return 0;
}
Pekka Enberg a ?crit :
> Hi Christoph,
>
> On Thu, 3 Sep 2009, Pekka Enberg wrote:
>>> Oh, sure, the fix looks sane to me. It's just that I am a complete
>>> coward when it comes to merging RCU related patches so I always try to
>>> fish an Acked-by from Paul or Christoph ;).
>
> On Thu, Sep 3, 2009 at 8:50 PM, Christoph
> Lameter<[email protected]> wrote:
>> I am fine with acking the poison piece.
>
> Ok.
>
> On Thu, Sep 3, 2009 at 8:50 PM, Christoph
> Lameter<[email protected]> wrote:
>> I did not ack the patch that added rcu to kmem_cache_destroy() and I
>> likely wont ack that piece either.
>
> Right. I didn't remember that I merged that over your NAK but I don't
> share your view that kmem_cache_destroy() callers should do
> rcu_barrier() or whatever is necessary if we can do it from
> kmem_cache_destroy(). So I am happy to take both changes but I would
> appreciate them being split into two separate patches.
>
Here is the second patch (RCU thing). Stable candidate
[PATCH] slub: Fix kmem_cache_destroy() with SLAB_DESTROY_BY_RCU
kmem_cache_destroy() should call rcu_barrier() *after* kmem_cache_close()
and *before* sysfs_slab_remove() or risk rcu_free_slab()
being called after kmem_cache is deleted (kfreed).
rmmod nf_conntrack can crash the machine because it has to
kmem_cache_destroy() a SLAB_DESTROY_BY_RCU enabled cache.
Reported-by: Zdenek Kabelac <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: Paul E. McKenney <[email protected]>
---
diff --git a/mm/slub.c b/mm/slub.c
index b9f1491..b627675 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2594,8 +2594,6 @@ static inline int kmem_cache_close(struct kmem_cache *s)
*/
void kmem_cache_destroy(struct kmem_cache *s)
{
- if (s->flags & SLAB_DESTROY_BY_RCU)
- rcu_barrier();
down_write(&slub_lock);
s->refcount--;
if (!s->refcount) {
@@ -2606,6 +2604,8 @@ void kmem_cache_destroy(struct kmem_cache *s)
"still has objects.\n", s->name, __func__);
dump_stack();
}
+ if (s->flags & SLAB_DESTROY_BY_RCU)
+ rcu_barrier();
sysfs_slab_remove(s);
} else
up_write(&slub_lock);
2009/9/3 Eric Dumazet <[email protected]>:
> Zdenek Kabelac a ?crit :
>> 2009/9/3 Eric Dumazet <[email protected]>:
>>> Zdenek Kabelac a ?crit :
>>>> Well I'm not noticing any ill behavior - also note - rcu_barrier() is
>>>> there before the cache is destroyed.
>>>> But as I said - it's just my shot into the dark - which seems to work for me...
>>>>
>>> Reading again your traces, I do believe there are two bugs in slub
>>>
>>> Maybe not explaining your problem, but worth to fix !
>>>
>>> Thank you
>>>
>>> [PATCH] slub: fix slab_pad_check() and SLAB_DESTROY_BY_RCU
>>>
>>
>>
>> Ok - I can confirm that this patch fixes my reboot problem.
>>
>> Thanks
>>
>> Zdenek
>
> Good news !
>
> Hmm... but you had a problem with a 2.6.29.something kernel if I remember well ?
>
> At that time, conntrack did not use SLAB_DESTROY_BY_RCU yet.
Yes - but it was also giving different error message - which even not
loaded ftp conntrack module. Basically it has been fine until
conntrack started to use rcu. From that moment various errors started
to appear with slub and some kernel debugging options - slab was fine.
With your recent patch it's the first time I do not see oops with nf
after 2.6.29 kernel.
Zdenek
On Thu, Sep 03, 2009 at 12:45:43PM -0500, Christoph Lameter wrote:
> On Thu, 3 Sep 2009, Eric Dumazet wrote:
>
> > on a SLAB_DESTROY_BY_RCU cache, there is no need to try to optimize this
> > rcu_barrier() call, unless we want superfast reboot/halt sequences...
>
> I stilll think that the action to quiesce rcu is something that the caller
> of kmem_cache_destroy must take care of.
Why?
Thanx, Paul
> Could you split this into two patches: One that addresses the poison and
> another that deals with rcu?
>
On Thu, Sep 03, 2009 at 01:38:50PM -0500, Christoph Lameter wrote:
> On Thu, 3 Sep 2009, Eric Dumazet wrote:
>
> > Christoph Lameter a ?crit :
> > > On Thu, 3 Sep 2009, Eric Dumazet wrote:
> > >
> > >> on a SLAB_DESTROY_BY_RCU cache, there is no need to try to optimize this
> > >> rcu_barrier() call, unless we want superfast reboot/halt sequences...
> > >
> > > I stilll think that the action to quiesce rcu is something that the caller
> > > of kmem_cache_destroy must take care of.
> >
> > Do you mean :
> >
> > if (kmem_cache_shrink(s) == 0) {
> > rcu_barrier();
> > kmem_cache_destroy_no_rcu_barrier(s);
> > } else {
> > kmem_cache_destroy_with_rcu_barrier_because_SLAB_DESTROY_BY_RCU_cache(s);
> > }
> >
> > What would be the point ?
>
> The above is port of slub?
>
> I mean that (in this case) the net subsystem would have to deal with RCU quietness
> before disposing of the slab cache. There may be multiple ways of dealing
> with RCU. The RCU barrier may be unnecessary for future uses. Typically
> one would expect that all deferred handling of structures must be complete
> for correctness before disposing of the whole cache.
Which is precisely the point of the rcu_barrier(), right?
Thanx, Paul
> > [PATCH] slub: fix slab_pad_check()
>
> Acked-by: Christoph Lameter <[email protected]>
> --
> To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
Christoph Lameter a ?crit :
> On Thu, 3 Sep 2009, Eric Dumazet wrote:
>
>> Christoph Lameter a ?crit :
>>> On Thu, 3 Sep 2009, Eric Dumazet wrote:
>>>
>>>> on a SLAB_DESTROY_BY_RCU cache, there is no need to try to optimize this
>>>> rcu_barrier() call, unless we want superfast reboot/halt sequences...
>>> I stilll think that the action to quiesce rcu is something that the caller
>>> of kmem_cache_destroy must take care of.
>> Do you mean :
>>
>> if (kmem_cache_shrink(s) == 0) {
>> rcu_barrier();
>> kmem_cache_destroy_no_rcu_barrier(s);
>> } else {
>> kmem_cache_destroy_with_rcu_barrier_because_SLAB_DESTROY_BY_RCU_cache(s);
>> }
>>
>> What would be the point ?
>
> The above is port of slub?
No, I am trying to code what you suggest, and I could not find a clean way with
current API (SLAB/SLUB/SLOB)
>
> I mean that (in this case) the net subsystem would have to deal with RCU quietness
> before disposing of the slab cache. There may be multiple ways of dealing
> with RCU. The RCU barrier may be unnecessary for future uses. Typically
> one would expect that all deferred handling of structures must be complete
> for correctness before disposing of the whole cache.
Point is we cannot deal with RCU quietness before disposing the slab cache,
(if SLAB_DESTROY_BY_RCU was set on the cache) since this disposing *will*
make call_rcu() calls when a full slab is freed/purged.
And when RCU grace period is elapsed, the callback *will* need access to
the cache we want to dismantle. Better to not have kfreed()/poisoned it...
I believe you mix two RCU uses here.
1) The one we all know, is use normal caches (!SLAB_DESTROY_BY_RCU)
(or kmalloc()), and use call_rcu(... kfree_something)
In this case, you are 100% right that the subsystem itself has
to call rcu_barrier() (or respect whatever self-synchro) itself,
before calling kmem_cache_destroy()
2) The SLAB_DESTROY_BY_RCU one.
Part of cache dismantle needs to call rcu_barrier() itself.
Caller doesnt have to use rcu_barrier(). It would be a waste of time,
as kmem_cache_destroy() will refill rcu wait queues with its own stuff.
On Thu, Sep 03, 2009 at 02:24:17PM -0500, Christoph Lameter wrote:
> On Thu, 3 Sep 2009, Eric Dumazet wrote:
>
> > Point is we cannot deal with RCU quietness before disposing the slab cache,
> > (if SLAB_DESTROY_BY_RCU was set on the cache) since this disposing *will*
> > make call_rcu() calls when a full slab is freed/purged.
>
> There is no need to do call_rcu calls for frees at that point since
> objects are no longer in use. We could simply disable SLAB_DESTROY_BY_RCU
> for the final clearing of caches.
Suppose we have the following sequence of events:
1. CPU 0 is running a task that is using the slab cache.
This CPU does kmem_cache_free(), which happens to free up
some memory to the system. Because SLAB_DESTROY_BY_RCU is
set, an RCU callback is posted to do the actual freeing.
Please note that this RCU callback is internal to the slab,
so that the slab user cannot be aware of it. In fact, the
slab user isn't doing any call_rcu()s whatever.
2. CPU 0 discovers that the slab cache can now be destroyed.
It determines that there are no users, and has guaranteed
that there will be no future users. So it knows that it
can safely do kmem_cache_destroy().
3. In absence of rcu_barrier(), kmem_cache_destroy() would
immediately tear down the slab data structures.
4. At the end of the next grace period, the RCU callback posted
(again, internally by the slab cache) is invoked. It has a
coronary due to the slab data structures having already been
freed, and (worse yet) possibly reallocated for other uses.
Hence the need for the rcu_barrier() when tearing down SLAB_DESTROY_BY_RCU
slab caches.
> > And when RCU grace period is elapsed, the callback *will* need access to
> > the cache we want to dismantle. Better to not have kfreed()/poisoned it...
>
> But going through the RCU period is pointless since no user of the cache
> remains.
Which is irrelevant. The outstanding RCU callback was posted by the
slab cache itself, -not- by the user of the slab cache.
> > I believe you mix two RCU uses here.
> >
> > 1) The one we all know, is use normal caches (!SLAB_DESTROY_BY_RCU)
> > (or kmalloc()), and use call_rcu(... kfree_something)
> >
> > In this case, you are 100% right that the subsystem itself has
> > to call rcu_barrier() (or respect whatever self-synchro) itself,
> > before calling kmem_cache_destroy()
> >
> > 2) The SLAB_DESTROY_BY_RCU one.
> >
> > Part of cache dismantle needs to call rcu_barrier() itself.
> > Caller doesnt have to use rcu_barrier(). It would be a waste of time,
> > as kmem_cache_destroy() will refill rcu wait queues with its own stuff.
>
> The dismantling does not need RCU since there are no operations on the
> objects in progress. So simply switch DESTROY_BY_RCU off for close.
Unless I am missing something, this patch re-introduces the bug that
the rcu_barrier() was added to prevent. So, in absence of a better
explanation of what I am missing:
NACK.
Thanx, Paul
> ---
> mm/slub.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> Index: linux-2.6/mm/slub.c
> ===================================================================
> --- linux-2.6.orig/mm/slub.c 2009-09-03 10:14:51.000000000 -0500
> +++ linux-2.6/mm/slub.c 2009-09-03 10:18:32.000000000 -0500
> @@ -2594,9 +2594,9 @@ static inline int kmem_cache_close(struc
> */
> void kmem_cache_destroy(struct kmem_cache *s)
> {
> - if (s->flags & SLAB_DESTROY_BY_RCU)
> - rcu_barrier();
> down_write(&slub_lock);
> + /* Stop deferring frees so that we can immediately free structures */
> + s->flags &= ~SLAB_DESTROY_BY_RCU;
> s->refcount--;
> if (!s->refcount) {
> list_del(&s->list);
On Thu, 3 Sep 2009, Eric Dumazet wrote:
> on a SLAB_DESTROY_BY_RCU cache, there is no need to try to optimize this
> rcu_barrier() call, unless we want superfast reboot/halt sequences...
I stilll think that the action to quiesce rcu is something that the caller
of kmem_cache_destroy must take care of.
Could you split this into two patches: One that addresses the poison and
another that deals with rcu?
On Thu, 3 Sep 2009, Pekka Enberg wrote:
> Oh, sure, the fix looks sane to me. It's just that I am a complete
> coward when it comes to merging RCU related patches so I always try to
> fish an Acked-by from Paul or Christoph ;).
I am fine with acking the poison piece.
I did not ack the patch that added rcu to kmem_cache_destroy() and I
likely wont ack that piece either.
Christoph Lameter a ?crit :
> On Thu, 3 Sep 2009, Eric Dumazet wrote:
>
>> Point is we cannot deal with RCU quietness before disposing the slab cache,
>> (if SLAB_DESTROY_BY_RCU was set on the cache) since this disposing *will*
>> make call_rcu() calls when a full slab is freed/purged.
>
> There is no need to do call_rcu calls for frees at that point since
> objects are no longer in use. We could simply disable SLAB_DESTROY_BY_RCU
> for the final clearing of caches.
>
>> And when RCU grace period is elapsed, the callback *will* need access to
>> the cache we want to dismantle. Better to not have kfreed()/poisoned it...
>
> But going through the RCU period is pointless since no user of the cache
> remains.
>
>> I believe you mix two RCU uses here.
>>
>> 1) The one we all know, is use normal caches (!SLAB_DESTROY_BY_RCU)
>> (or kmalloc()), and use call_rcu(... kfree_something)
>>
>> In this case, you are 100% right that the subsystem itself has
>> to call rcu_barrier() (or respect whatever self-synchro) itself,
>> before calling kmem_cache_destroy()
>>
>> 2) The SLAB_DESTROY_BY_RCU one.
>>
>> Part of cache dismantle needs to call rcu_barrier() itself.
>> Caller doesnt have to use rcu_barrier(). It would be a waste of time,
>> as kmem_cache_destroy() will refill rcu wait queues with its own stuff.
>
> The dismantling does not need RCU since there are no operations on the
> objects in progress. So simply switch DESTROY_BY_RCU off for close.
>
>
> ---
> mm/slub.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> Index: linux-2.6/mm/slub.c
> ===================================================================
> --- linux-2.6.orig/mm/slub.c 2009-09-03 10:14:51.000000000 -0500
> +++ linux-2.6/mm/slub.c 2009-09-03 10:18:32.000000000 -0500
> @@ -2594,9 +2594,9 @@ static inline int kmem_cache_close(struc
> */
> void kmem_cache_destroy(struct kmem_cache *s)
> {
> - if (s->flags & SLAB_DESTROY_BY_RCU)
> - rcu_barrier();
> down_write(&slub_lock);
> + /* Stop deferring frees so that we can immediately free structures */
> + s->flags &= ~SLAB_DESTROY_BY_RCU;
> s->refcount--;
> if (!s->refcount) {
> list_del(&s->list);
It seems very smart, but needs review of all callers to make sure no slabs
are waiting for final freeing in call_rcu queue on some cpu.
I suspect most of them will then have to use rcu_barrier() before calling
kmem_cache_destroy(), so why not factorizing code in one place ?
net/dccp/ipv6.c:1145: .slab_flags = SLAB_DESTROY_BY_RCU,
net/dccp/ipv4.c:941: .slab_flags = SLAB_DESTROY_BY_RCU,
net/ipv4/udp.c:1593: .slab_flags = SLAB_DESTROY_BY_RCU,
net/ipv4/udplite.c:54: .slab_flags = SLAB_DESTROY_BY_RCU,
net/ipv4/tcp_ipv4.c:2446: .slab_flags = SLAB_DESTROY_BY_RCU,
net/ipv4/udp.c.orig:1587: .slab_flags = SLAB_DESTROY_BY_RCU,
net/ipv6/udp.c:1274: .slab_flags = SLAB_DESTROY_BY_RCU,
net/ipv6/udplite.c:52: .slab_flags = SLAB_DESTROY_BY_RCU,
net/ipv6/tcp_ipv6.c:2085: .slab_flags = SLAB_DESTROY_BY_RCU,
net/netfilter/nf_conntrack_core.c:1269: 0, SLAB_DESTROY_BY_RCU, NULL);
On Thu, Sep 03, 2009 at 12:17:43AM +0200, Eric Dumazet wrote:
> Zdenek Kabelac a ?crit :
> > 2009/8/17 Patrick McHardy <[email protected]>:
> >> Eric Dumazet wrote:
> >>> Zdenek Kabelac a ?crit :
> >>>> [<ffffffffa02c502f>] nf_conntrack_ftp_fini+0x2f/0x70 [nf_conntrack_ftp]
> >>>> [<ffffffff8027bcc5>] sys_delete_module+0x1a5/0x270
> >>>> [<ffffffff8020d329>] ? retint_swapgs+0xe/0x13
> >>>> [<ffffffff80271bf2>] ? trace_hardirqs_on_caller+0x162/0x1b0
> >>>> [<ffffffff80292121>] ? audit_syscall_entry+0x191/0x1c0
> >>>> [<ffffffff80526dae>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> >>>> [<ffffffff8020c84b>] system_call_fastpath+0x16/0x1b
> >>>> Code: c6 00 00 0f 82 66 ff ff ff 49 8b 9e d8 05 00 00 48 85 db 75 16
> >>>> e9 8e 00 00 00 0f 1f 44 00 00 48 85 c0 0f 84 80 00 00 00 48 89 c3 <0f>
> >>>> b6 4b 37 48 8b 03 48 8d 14 cd 00 00 00 00 0f 18 08 48 29 ca
> >>>> RIP [<ffffffffa02b2c2c>] nf_conntrack_helper_unregister+0x16c/0x320
> >>>> [nf_conntrack]
> >>>> RSP <ffff88013982fe68>
> >>>> CR2: 0000000000000038
> >>>> ---[ end trace bc3a0ede3d0084db ]---
> >>>>
> >>> I am currently traveling and wont be able to help you before next week.
> >>>
> >>> I added netdev, Patrick, and netfilter-devel in CC so that more eyes can take a look.
> >> Thanks for the report, I'll have a look at this. Zdenek, please
> >> send me the nf_conntrack.ko file used in the above oops. Thanks.
> >>
> >
> > Ok
> >
> > I've found the solution for my problem.
> >
> > http://thread.gmane.org/gmane.comp.security.firewalls.netfilter.devel/30483
> >
> > I've made this small fix from this thread:
> >
> > diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core
> > index b5869b9..68488f8 100644
> > --- a/net/netfilter/nf_conntrack_core.c
> > +++ b/net/netfilter/nf_conntrack_core.c
> > @@ -1108,6 +1108,7 @@ static void nf_conntrack_cleanup_init_net(void)
> > {
> > nf_conntrack_helper_fini();
> > nf_conntrack_proto_fini();
> > + rcu_barrier();
> > kmem_cache_destroy(nf_conntrack_cachep);
> > }
> >
> > @@ -1266,7 +1267,7 @@ static int nf_conntrack_init_init_net(void)
> >
> > nf_conntrack_cachep = kmem_cache_create("nf_conntrack",
> > sizeof(struct nf_conn),
> > - 0, SLAB_DESTROY_BY_RCU, NULL);
> > + 0, 0, NULL);
> > if (!nf_conntrack_cachep) {
> > printk(KERN_ERR "Unable to create nf_conn slab cache\n");
> > ret = -ENOMEM;
> >
> >
> > As the thread nf_conntrack: Use rcu_barrier() and fix kmem_cache_create flags
> > seems to be samewhat 'unfinished' and already a bit old and I've no
> > idea whether it actually fixes problem completely or just hides it in
> > my case - I'm leaving it to some RCU gurus to fix this issue.
> >
> > All I could say is - this this extra rcu_barrier() and removal of
> > SLAB_DESTROY removes my GPF on reboot.
> >
> > Zdenek
>
> Ouch..
>
> Dont think such a patch makes your kernel better, it'll crash too.
>
> You cannot remove SLAB_DESTROY_BY_RCU like this, it's there for very good reasons.
And if I understand correctly, this is more evidence that
kmem_cache_destroy() needs to do an rcu_barrier() in the
SLAB_DESTROY_BY_RCU case.
Thanx, Paul
On Thu, 3 Sep 2009, Eric Dumazet wrote:
> Christoph Lameter a ?crit :
> > On Thu, 3 Sep 2009, Eric Dumazet wrote:
> >
> >> on a SLAB_DESTROY_BY_RCU cache, there is no need to try to optimize this
> >> rcu_barrier() call, unless we want superfast reboot/halt sequences...
> >
> > I stilll think that the action to quiesce rcu is something that the caller
> > of kmem_cache_destroy must take care of.
>
> Do you mean :
>
> if (kmem_cache_shrink(s) == 0) {
> rcu_barrier();
> kmem_cache_destroy_no_rcu_barrier(s);
> } else {
> kmem_cache_destroy_with_rcu_barrier_because_SLAB_DESTROY_BY_RCU_cache(s);
> }
>
> What would be the point ?
The above is port of slub?
I mean that (in this case) the net subsystem would have to deal with RCU quietness
before disposing of the slab cache. There may be multiple ways of dealing
with RCU. The RCU barrier may be unnecessary for future uses. Typically
one would expect that all deferred handling of structures must be complete
for correctness before disposing of the whole cache.
> [PATCH] slub: fix slab_pad_check()
Acked-by: Christoph Lameter <[email protected]>
On Thu, Sep 3, 2009 at 8:59 PM, Eric Dumazet<[email protected]> wrote:
> Christoph Lameter a ?crit :
>> On Thu, 3 Sep 2009, Eric Dumazet wrote:
>>
>>> Point is we cannot deal with RCU quietness before disposing the slab cache,
>>> (if SLAB_DESTROY_BY_RCU was set on the cache) since this disposing *will*
>>> make call_rcu() calls when a full slab is freed/purged.
>>
>> There is no need to do call_rcu calls for frees at that point since
>> objects are no longer in use. We could simply disable SLAB_DESTROY_BY_RCU
>> for the final clearing of caches.
>>
>>> And when RCU grace period is elapsed, the callback *will* need access to
>>> the cache we want to dismantle. Better to not have kfreed()/poisoned it...
>>
>> But going through the RCU period is pointless since no user of the cache
>> remains.
>>
>>> I believe you mix two RCU uses here.
>>>
>>> 1) The one we all know, is use normal caches (!SLAB_DESTROY_BY_RCU)
>>> (or kmalloc()), and use call_rcu(... kfree_something)
>>>
>>> ? ?In this case, you are 100% right that the subsystem itself has
>>> ? ?to call rcu_barrier() (or respect whatever self-synchro) itself,
>>> ? ?before calling kmem_cache_destroy()
>>>
>>> 2) The SLAB_DESTROY_BY_RCU one.
>>>
>>> ? ?Part of cache dismantle needs to call rcu_barrier() itself.
>>> ? ?Caller doesnt have to use rcu_barrier(). It would be a waste of time,
>>> ? ?as kmem_cache_destroy() will refill rcu wait queues with its own stuff.
>>
>> The dismantling does not need RCU since there are no operations on the
>> objects in progress. So simply switch DESTROY_BY_RCU off for close.
>>
>>
>> ---
>> ?mm/slub.c | ? ?4 ++--
>> ?1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> Index: linux-2.6/mm/slub.c
>> ===================================================================
>> --- linux-2.6.orig/mm/slub.c ?2009-09-03 10:14:51.000000000 -0500
>> +++ linux-2.6/mm/slub.c ? ? ? 2009-09-03 10:18:32.000000000 -0500
>> @@ -2594,9 +2594,9 @@ static inline int kmem_cache_close(struc
>> ? */
>> ?void kmem_cache_destroy(struct kmem_cache *s)
>> ?{
>> - ? ? if (s->flags & SLAB_DESTROY_BY_RCU)
>> - ? ? ? ? ? ? rcu_barrier();
>> ? ? ? down_write(&slub_lock);
>> + ? ? /* Stop deferring frees so that we can immediately free structures */
>> + ? ? s->flags &= ~SLAB_DESTROY_BY_RCU;
>> ? ? ? s->refcount--;
>> ? ? ? if (!s->refcount) {
>> ? ? ? ? ? ? ? list_del(&s->list);
>
> It seems very smart, but needs review of all callers to make sure no slabs
> are waiting for final freeing in call_rcu queue on some cpu.
>
> I suspect most of them will then have to use rcu_barrier() before calling
> kmem_cache_destroy(), so why not factorizing code in one place ?
[snip]
Can someone please explain what's the upside in Christoph's approach?
Performance? Correctness? Something else entirely? We're looking at a
tested bug fix here and I don't understand why I shouldn't just go
ahead and merge it. Hmm?
Pekka
On Thu, 3 Sep 2009, Eric Dumazet wrote:
> Point is we cannot deal with RCU quietness before disposing the slab cache,
> (if SLAB_DESTROY_BY_RCU was set on the cache) since this disposing *will*
> make call_rcu() calls when a full slab is freed/purged.
There is no need to do call_rcu calls for frees at that point since
objects are no longer in use. We could simply disable SLAB_DESTROY_BY_RCU
for the final clearing of caches.
> And when RCU grace period is elapsed, the callback *will* need access to
> the cache we want to dismantle. Better to not have kfreed()/poisoned it...
But going through the RCU period is pointless since no user of the cache
remains.
> I believe you mix two RCU uses here.
>
> 1) The one we all know, is use normal caches (!SLAB_DESTROY_BY_RCU)
> (or kmalloc()), and use call_rcu(... kfree_something)
>
> In this case, you are 100% right that the subsystem itself has
> to call rcu_barrier() (or respect whatever self-synchro) itself,
> before calling kmem_cache_destroy()
>
> 2) The SLAB_DESTROY_BY_RCU one.
>
> Part of cache dismantle needs to call rcu_barrier() itself.
> Caller doesnt have to use rcu_barrier(). It would be a waste of time,
> as kmem_cache_destroy() will refill rcu wait queues with its own stuff.
The dismantling does not need RCU since there are no operations on the
objects in progress. So simply switch DESTROY_BY_RCU off for close.
---
mm/slub.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
Index: linux-2.6/mm/slub.c
===================================================================
--- linux-2.6.orig/mm/slub.c 2009-09-03 10:14:51.000000000 -0500
+++ linux-2.6/mm/slub.c 2009-09-03 10:18:32.000000000 -0500
@@ -2594,9 +2594,9 @@ static inline int kmem_cache_close(struc
*/
void kmem_cache_destroy(struct kmem_cache *s)
{
- if (s->flags & SLAB_DESTROY_BY_RCU)
- rcu_barrier();
down_write(&slub_lock);
+ /* Stop deferring frees so that we can immediately free structures */
+ s->flags &= ~SLAB_DESTROY_BY_RCU;
s->refcount--;
if (!s->refcount) {
list_del(&s->list);
On Thu, Sep 3, 2009 at 5:08 PM, Eric Dumazet<[email protected]> wrote:
> Sure, here is the poison thing
>
> [PATCH] slub: fix slab_pad_check()
>
> When SLAB_POISON is used and slab_pad_check() finds an overwrite of the
> slab padding, we call restore_bytes() on the whole slab, not only
> on the padding.
>
> Reported-by: Zdenek Kabelac <[email protected]>
> Signed-off-by: Eric Dumazet <[email protected]>
Applied, thanks!
On Thu, Sep 3, 2009 at 5:18 PM, Eric Dumazet<[email protected]> wrote:
> Here is the second patch (RCU thing). Stable candidate
>
> [PATCH] slub: Fix kmem_cache_destroy() with SLAB_DESTROY_BY_RCU
>
> kmem_cache_destroy() should call rcu_barrier() *after* kmem_cache_close()
> and *before* sysfs_slab_remove() or risk rcu_free_slab()
> being called after kmem_cache is deleted (kfreed).
>
> rmmod nf_conntrack can crash the machine because it has to
> kmem_cache_destroy() a SLAB_DESTROY_BY_RCU enabled cache.
Do we have a bugzilla URL for this?
> Reported-by: Zdenek Kabelac <[email protected]>
> Signed-off-by: Eric Dumazet <[email protected]>
> Acked-by: Paul E. McKenney <[email protected]>
OK, this is in for-next now and queued for 2.6.31. If you guys want to
fix this in a different way, lets do that in 2.6.32.
Pekka
Pekka Enberg a ?crit :
> On Thu, Sep 3, 2009 at 5:18 PM, Eric Dumazet<[email protected]> wrote:
>> Here is the second patch (RCU thing). Stable candidate
>>
>> [PATCH] slub: Fix kmem_cache_destroy() with SLAB_DESTROY_BY_RCU
>>
>> kmem_cache_destroy() should call rcu_barrier() *after* kmem_cache_close()
>> and *before* sysfs_slab_remove() or risk rcu_free_slab()
>> being called after kmem_cache is deleted (kfreed).
>>
>> rmmod nf_conntrack can crash the machine because it has to
>> kmem_cache_destroy() a SLAB_DESTROY_BY_RCU enabled cache.
>
> Do we have a bugzilla URL for this?
Well, I can crash my 2.6.30.5 machine just doing rmmod nf_conntrack
(You'll need CONFIG_SLUB_DEBUG_ON or equivalent)
Original Zdenek report : http://thread.gmane.org/gmane.linux.kernel/876016/focus=876086
>
>> Reported-by: Zdenek Kabelac <[email protected]>
>> Signed-off-by: Eric Dumazet <[email protected]>
>> Acked-by: Paul E. McKenney <[email protected]>
>
> OK, this is in for-next now and queued for 2.6.31. If you guys want to
> fix this in a different way, lets do that in 2.6.32.
Seems the right thing IMHO
On Thu, Sep 03, 2009 at 05:43:12PM -0500, Christoph Lameter wrote:
> On Thu, 3 Sep 2009, Paul E. McKenney wrote:
>
> > 2. CPU 0 discovers that the slab cache can now be destroyed.
> >
> > It determines that there are no users, and has guaranteed
> > that there will be no future users. So it knows that it
> > can safely do kmem_cache_destroy().
> >
> > 3. In absence of rcu_barrier(), kmem_cache_destroy() would
> > immediately tear down the slab data structures.
>
> Of course. This has been discussed before.
>
> You need to ensure that no objects are in use before destroying a slab. In
> case of DESTROY_BY_RCU you must ensure that there are no potential
> readers. So use a suitable rcu barrier or something else like a
> synchronize_rcu...
If by "you must ensure" you mean "kmem_cache_destroy must ensure", then
we are in complete agreement. Otherwise, not a chance.
> > > But going through the RCU period is pointless since no user of the cache
> > > remains.
> >
> > Which is irrelevant. The outstanding RCU callback was posted by the
> > slab cache itself, -not- by the user of the slab cache.
>
> There will be no rcu callbacks generated at kmem_cache_destroy with the
> patch I posted.
That is true. However, there well might be RCU callbacks generated by
kmem_cache_free() that have not yet been invoked. Since kmem_cache_free()
generated them, it is ridiculous to insist that the user account for them.
That responsibility must instead fall on kmem_cache_destroy().
> > > The dismantling does not need RCU since there are no operations on the
> > > objects in progress. So simply switch DESTROY_BY_RCU off for close.
> >
> > Unless I am missing something, this patch re-introduces the bug that
> > the rcu_barrier() was added to prevent. So, in absence of a better
> > explanation of what I am missing:
>
> The "fix" was ill advised. Slab users must ensure that no objects are in
> use before destroying a slab. Only the slab users know how the objects
> are being used. The slab allocator itself cannot know how to ensure that
> there are no pending references. Putting a rcu_barrier in there creates an
> inconsistency in the operation of kmem_cache_destroy() and an expectation
> of functionality that the function cannot provide.
If by "must ensure that no objects are in use", you mean "must have
no further references to them", then we are in agreement. And in my
scenario above, it is not the -user- who later references the memory,
but rather the slab code itself.
Put the rcu_barrier() in kmem_cache_free(). Please.
Thanx, Paul
Christoph Lameter a ?crit :
> On Thu, 3 Sep 2009, Paul E. McKenney wrote:
>
>> 2. CPU 0 discovers that the slab cache can now be destroyed.
>>
>> It determines that there are no users, and has guaranteed
>> that there will be no future users. So it knows that it
>> can safely do kmem_cache_destroy().
>>
>> 3. In absence of rcu_barrier(), kmem_cache_destroy() would
>> immediately tear down the slab data structures.
>
> Of course. This has been discussed before.
>
> You need to ensure that no objects are in use before destroying a slab. In
> case of DESTROY_BY_RCU you must ensure that there are no potential
> readers. So use a suitable rcu barrier or something else like a
> synchronize_rcu...
>
>>> But going through the RCU period is pointless since no user of the cache
>>> remains.
>> Which is irrelevant. The outstanding RCU callback was posted by the
>> slab cache itself, -not- by the user of the slab cache.
>
> There will be no rcu callbacks generated at kmem_cache_destroy with the
> patch I posted.
>
>>> The dismantling does not need RCU since there are no operations on the
>>> objects in progress. So simply switch DESTROY_BY_RCU off for close.
>> Unless I am missing something, this patch re-introduces the bug that
>> the rcu_barrier() was added to prevent. So, in absence of a better
>> explanation of what I am missing:
>
> The "fix" was ill advised. Slab users must ensure that no objects are in
> use before destroying a slab. Only the slab users know how the objects
> are being used. The slab allocator itself cannot know how to ensure that
> there are no pending references. Putting a rcu_barrier in there creates an
> inconsistency in the operation of kmem_cache_destroy() and an expectation
> of functionality that the function cannot provide.
>
Problem is not _objects_ Christoph, but _slabs_, and your patch is not working.
Its true that when User calls kmem_cache_destroy(), all _objects_ were previously freed.
This is mandatory, with or withou SLAB_DESTROY_BY_RCU thing
Problem is that slub has some internal state, including some to-be-freed _slabs_,
that User have no control at all on it.
User cannot "know" slabs are freed, inuse, or whatever state in cache or call_rcu queues.
Face it, SLAB_DESTROY_BY_RCU is internal affair (to slub/slab/... allocators)
We absolutely need a rcu_barrier() somewhere, believe it or not. You can argue that it should
be done *before*, but it gives no speedup, only potential bugs.
Only case User should do its rcu_barrier() itself is if it knows some call_rcu() are pending
and are delaying _objects_ freeing (typical !SLAB_DESTROY_RCU usage in RCU algos).
I dont even understand why you care so much about kmem_cache_destroy(SLAB_DESTROY_BY_RCU),
given that almost nobody use it. We took almost one month to find out what the bug was in first
place...
Eric Dumazet a ?crit :
>
>
>
> Problem is not _objects_ Christoph, but _slabs_, and your patch is not working.
>
> Its true that when User calls kmem_cache_destroy(), all _objects_ were previously freed.
> This is mandatory, with or withou SLAB_DESTROY_BY_RCU thing
>
> Problem is that slub has some internal state, including some to-be-freed _slabs_,
> that User have no control at all on it.
>
> User cannot "know" slabs are freed, inuse, or whatever state in cache or call_rcu queues.
>
> Face it, SLAB_DESTROY_BY_RCU is internal affair (to slub/slab/... allocators)
>
> We absolutely need a rcu_barrier() somewhere, believe it or not. You can argue that it should
> be done *before*, but it gives no speedup, only potential bugs.
>
> Only case User should do its rcu_barrier() itself is if it knows some call_rcu() are pending
> and are delaying _objects_ freeing (typical !SLAB_DESTROY_RCU usage in RCU algos).
>
> I dont even understand why you care so much about kmem_cache_destroy(SLAB_DESTROY_BY_RCU),
> given that almost nobody use it. We took almost one month to find out what the bug was in first
> place...
So maybe the safest thing would be to include the rcu_barrier() to insure all objects where freed
And another one for SLAB_DESTROY_BY_RCU to make sure slabs where freed
void kmem_cache_destroy(struct kmem_cache *s)
{
/*
* Make sure no objects are waiting in call_rcu queues to be freed
*/
rcu_barrier();
down_write(&slub_lock);
s->refcount--;
if (!s->refcount) {
list_del(&s->list);
up_write(&slub_lock);
if (kmem_cache_close(s)) {
printk(KERN_ERR "SLUB %s: %s called for cache that "
"still has objects.\n", s->name, __func__);
dump_stack();
}
/*
* Make sure no slabs are waiting in call_rcu queues to be freed
*/
if (s->flags & SLAB_DESTROY_BY_RCU)
rcu_barrier();
sysfs_slab_remove(s);
} else
up_write(&slub_lock);
}
On Thu, 3 Sep 2009, Paul E. McKenney wrote:
> 2. CPU 0 discovers that the slab cache can now be destroyed.
>
> It determines that there are no users, and has guaranteed
> that there will be no future users. So it knows that it
> can safely do kmem_cache_destroy().
>
> 3. In absence of rcu_barrier(), kmem_cache_destroy() would
> immediately tear down the slab data structures.
Of course. This has been discussed before.
You need to ensure that no objects are in use before destroying a slab. In
case of DESTROY_BY_RCU you must ensure that there are no potential
readers. So use a suitable rcu barrier or something else like a
synchronize_rcu...
> > But going through the RCU period is pointless since no user of the cache
> > remains.
>
> Which is irrelevant. The outstanding RCU callback was posted by the
> slab cache itself, -not- by the user of the slab cache.
There will be no rcu callbacks generated at kmem_cache_destroy with the
patch I posted.
> > The dismantling does not need RCU since there are no operations on the
> > objects in progress. So simply switch DESTROY_BY_RCU off for close.
>
> Unless I am missing something, this patch re-introduces the bug that
> the rcu_barrier() was added to prevent. So, in absence of a better
> explanation of what I am missing:
The "fix" was ill advised. Slab users must ensure that no objects are in
use before destroying a slab. Only the slab users know how the objects
are being used. The slab allocator itself cannot know how to ensure that
there are no pending references. Putting a rcu_barrier in there creates an
inconsistency in the operation of kmem_cache_destroy() and an expectation
of functionality that the function cannot provide.
On Thu, 3 Sep 2009, Eric Dumazet wrote:
> It seems very smart, but needs review of all callers to make sure no slabs
> are waiting for final freeing in call_rcu queue on some cpu.
Yes. Again this is the first time we encounter a situation where a
DESTROY_BY_RCU slab has to be destroyed. So the review is quite short.
> I suspect most of them will then have to use rcu_barrier() before calling
> kmem_cache_destroy(), so why not factorizing code in one place ?
There are different forms of RCU which require different forms of
barriers. Its best to leave that up to the user. Again the user must make
sure that no objects are in use before a slab is destroyed. For
SLAB_DESTROY_BY_RCU this means that there are no potential outstanding
reads on the structure. You may need an rcu_barrier() to accomplish that.
Slight variations in the use of RCU could require different method. Better
reduce the entanglement of slabs to RCU to a mininum possible.
On Thu, Sep 03, 2009 at 05:44:54PM -0500, Christoph Lameter wrote:
> On Thu, 3 Sep 2009, Eric Dumazet wrote:
>
> > It seems very smart, but needs review of all callers to make sure no slabs
> > are waiting for final freeing in call_rcu queue on some cpu.
>
> Yes. Again this is the first time we encounter a situation where a
> DESTROY_BY_RCU slab has to be destroyed. So the review is quite short.
>
> > I suspect most of them will then have to use rcu_barrier() before calling
> > kmem_cache_destroy(), so why not factorizing code in one place ?
>
> There are different forms of RCU which require different forms of
> barriers. Its best to leave that up to the user. Again the user must make
> sure that no objects are in use before a slab is destroyed. For
> SLAB_DESTROY_BY_RCU this means that there are no potential outstanding
> reads on the structure. You may need an rcu_barrier() to accomplish that.
>
> Slight variations in the use of RCU could require different method. Better
> reduce the entanglement of slabs to RCU to a mininum possible.
If it were the user of the slab who was invoking some variant of
call_rcu(), then I would agree with you.
However, call_rcu() is instead being invoked by the slab itself in the
case of SLAB_DESTROY_BY_RCU, so that there is no variation in usage.
Requiring that the user call rcu_barrier() is asking for subtle bugs.
Therefore, the best approach is to have kmem_cache_destroy() handle
the RCU cleanup, given that this cleanup is for actions taken by
kmem_cache_free(), not by the user.
Thanx, Paul
On Thu, 3 Sep 2009, Paul E. McKenney wrote:
> > You need to ensure that no objects are in use before destroying a slab. In
> > case of DESTROY_BY_RCU you must ensure that there are no potential
> > readers. So use a suitable rcu barrier or something else like a
> > synchronize_rcu...
>
> If by "you must ensure" you mean "kmem_cache_destroy must ensure", then
> we are in complete agreement. Otherwise, not a chance.
Well then we are going down to a crappy interface and mixing rcu with slab
semantics. More bugs to follow in the future.
> If by "must ensure that no objects are in use", you mean "must have
> no further references to them", then we are in agreement. And in my
> scenario above, it is not the -user- who later references the memory,
> but rather the slab code itself.
The slab code itself is not referencing the later memory with my patch. It
can only be the user.
> Put the rcu_barrier() in kmem_cache_free(). Please.
Guess we are doing this ... Sigh. Are you going to add support other rcu
versions to slab as well as time permits and as the need arises? Should we
add you as a maintainer? ;-)
On Fri, 4 Sep 2009, Eric Dumazet wrote:
> Problem is not _objects_ Christoph, but _slabs_, and your patch is not working.
Why?
> Its true that when User calls kmem_cache_destroy(), all _objects_ were
> previously freed. This is mandatory, with or withou SLAB_DESTROY_BY_RCU
> thing
Right.
> Problem is that slub has some internal state, including some to-be-freed _slabs_,
> that User have no control at all on it.
Those are going to be freed without calls to rcu with my patch. The only
reason for earlier rcu frees are user calls to kfree.
> Face it, SLAB_DESTROY_BY_RCU is internal affair (to slub/slab/... allocators)
Nope the user must follow RCU guidelines when using objects.
> We absolutely need a rcu_barrier() somewhere, believe it or not. You can
> argue that it should be done *before*, but it gives no speedup, only
> potential bugs.
I never said that you do not need an rcu_barrier() for this particular
situation? Why suggest such a thing?
The insertion of rcu stuff in the slab code will lead to future bugs since
now the slab logic is tied to the semantics of a particular rcu
implementatin.
> Only case User should do its rcu_barrier() itself is if it knows some
> call_rcu() are pending and are delaying _objects_ freeing (typical
> !SLAB_DESTROY_RCU usage in RCU algos).
Ok then the user already has to deal with the barriers. The API is
inconsistent if you put this into kmem_cache_destroy.
> I dont even understand why you care so much about
> kmem_cache_destroy(SLAB_DESTROY_BY_RCU), given that almost nobody use
> it. We took almost one month to find out what the bug was in first
> place...
This is already the second bug on this issue. Given the complexity of rcu
it is to be experted that inserting more RCU semantics into the slab
allocators is likely to cause future chains of new features and
bugs in slab allocators.
On Fri, 4 Sep 2009, Eric Dumazet wrote:
> And another one for SLAB_DESTROY_BY_RCU to make sure slabs where freed
Now its two rcu_barriers(). What I feared comes true.
On Thu, 3 Sep 2009, Paul E. McKenney wrote:
> If it were the user of the slab who was invoking some variant of
> call_rcu(), then I would agree with you.
The user already has to deal with it as explained by Eric.
> However, call_rcu() is instead being invoked by the slab itself in the
> case of SLAB_DESTROY_BY_RCU, so that there is no variation in usage.
> Requiring that the user call rcu_barrier() is asking for subtle bugs.
> Therefore, the best approach is to have kmem_cache_destroy() handle
> the RCU cleanup, given that this cleanup is for actions taken by
> kmem_cache_free(), not by the user.
The user already has to properly handle barriers and rcu logic in order to
use objects handled with RCU properly. Moreover the user even has to check
that the object is not suddenly checked under it. Its already complex.
On Fri, Sep 04, 2009 at 12:17:34AM +0200, Eric Dumazet wrote:
> Eric Dumazet a ?crit :
> >
> >
> >
> > Problem is not _objects_ Christoph, but _slabs_, and your patch is not working.
> >
> > Its true that when User calls kmem_cache_destroy(), all _objects_ were previously freed.
> > This is mandatory, with or withou SLAB_DESTROY_BY_RCU thing
> >
> > Problem is that slub has some internal state, including some to-be-freed _slabs_,
> > that User have no control at all on it.
> >
> > User cannot "know" slabs are freed, inuse, or whatever state in cache or call_rcu queues.
> >
> > Face it, SLAB_DESTROY_BY_RCU is internal affair (to slub/slab/... allocators)
> >
> > We absolutely need a rcu_barrier() somewhere, believe it or not. You can argue that it should
> > be done *before*, but it gives no speedup, only potential bugs.
> >
> > Only case User should do its rcu_barrier() itself is if it knows some call_rcu() are pending
> > and are delaying _objects_ freeing (typical !SLAB_DESTROY_RCU usage in RCU algos).
> >
> > I dont even understand why you care so much about kmem_cache_destroy(SLAB_DESTROY_BY_RCU),
> > given that almost nobody use it. We took almost one month to find out what the bug was in first
> > place...
>
>
> So maybe the safest thing would be to include the rcu_barrier() to
> insure all objects where freed
I argue that the above is the user's responsibility. That said, I don't
see why the user would pass a SLAB_DESTROY_BY_RCU object to call_rcu().
So I would want to see an example of this before inflicting a pair
of rcu_barrier() calls on kmem_cache_destroy().
> And another one for SLAB_DESTROY_BY_RCU to make sure slabs where freed
This last is I believe kmem_cache's responsibility.
Thanx, Paul
> void kmem_cache_destroy(struct kmem_cache *s)
> {
> /*
> * Make sure no objects are waiting in call_rcu queues to be freed
> */
> rcu_barrier();
>
> down_write(&slub_lock);
> s->refcount--;
> if (!s->refcount) {
> list_del(&s->list);
> up_write(&slub_lock);
> if (kmem_cache_close(s)) {
> printk(KERN_ERR "SLUB %s: %s called for cache that "
> "still has objects.\n", s->name, __func__);
> dump_stack();
> }
> /*
> * Make sure no slabs are waiting in call_rcu queues to be freed
> */
> if (s->flags & SLAB_DESTROY_BY_RCU)
> rcu_barrier();
> sysfs_slab_remove(s);
> } else
> up_write(&slub_lock);
> }
On Fri, Sep 04, 2009 at 11:42:17AM -0400, Christoph Lameter wrote:
> On Thu, 3 Sep 2009, Paul E. McKenney wrote:
>
> > If it were the user of the slab who was invoking some variant of
> > call_rcu(), then I would agree with you.
>
> The user already has to deal with it as explained by Eric.
I didn't read his email that way. Perhaps I misinterpreted it.
> > However, call_rcu() is instead being invoked by the slab itself in the
> > case of SLAB_DESTROY_BY_RCU, so that there is no variation in usage.
> > Requiring that the user call rcu_barrier() is asking for subtle bugs.
> > Therefore, the best approach is to have kmem_cache_destroy() handle
> > the RCU cleanup, given that this cleanup is for actions taken by
> > kmem_cache_free(), not by the user.
>
> The user already has to properly handle barriers and rcu logic in order to
> use objects handled with RCU properly. Moreover the user even has to check
> that the object is not suddenly checked under it. Its already complex.
mm/slab.c has had RCU calls since 2.6.9, so this is not new.
> Guess we are doing this ... Sigh. Are you going to add support other rcu
> versions to slab as well as time permits and as the need arises? Should
> we add you as a maintainer? ;-)
I don't see any point in adding anything resembling SLAB_DESTROY_BY_RCU_BH,
SLAB_DESTROY_BY_RCU_SCHED, or SLAB_DESTROY_BY_SRCU unless and until
someone needs it. And I most especially don't see the point of adding
(say) SLAB_DESTROY_BY_RCU_BH_SCHED until someone convinces me of the
need for it. I would prefer to put the energy into further streamlining
the underlying RCU implementation, maybe someday collapsing RCU-BH back
into RCU. ;-)
We have gotten along fine with only SLAB_DESTROY_BY_RCU for almost
five years, so I think we are plenty fine with what we have. So, as
you say, "as the need arises".
I don't see any more need to add me as maintainer of slab and friends
than of btrfs, netfilter, selinux, decnet, afs, wireless, or any of a
number of other subsystems that use RCU.
Thanx, Paul
On Fri, 4 Sep 2009, Paul E. McKenney wrote:
> We have gotten along fine with only SLAB_DESTROY_BY_RCU for almost
> five years, so I think we are plenty fine with what we have. So, as
> you say, "as the need arises".
These were the glory years where SLAB_DESTROY_BY_RCU was only used for
anonymous vmas. Now Eric has picked it up for the net subsystem. You may
see the RCU use proliferate.
The kmem_cache_destroy rcu barriers did not matter until
SLAB_DESTROY_BY_RCU spread.
On Tue, Sep 08, 2009 at 03:57:04PM -0400, Christoph Lameter wrote:
> On Fri, 4 Sep 2009, Paul E. McKenney wrote:
>
> > We have gotten along fine with only SLAB_DESTROY_BY_RCU for almost
> > five years, so I think we are plenty fine with what we have. So, as
> > you say, "as the need arises".
>
> These were the glory years where SLAB_DESTROY_BY_RCU was only used for
> anonymous vmas. Now Eric has picked it up for the net subsystem. You may
> see the RCU use proliferate.
>
> The kmem_cache_destroy rcu barriers did not matter until
> SLAB_DESTROY_BY_RCU spread.
Certainly it is true that increased use of RCU has resulted in new
requirements, which have in turn led to any number of changes over
the years.
Are you saying that people have already asked you for additional
variants of SLAB_DESTROY_BY_RCU? If so, please don't keep them a secret!
Otherwise, experience indicates that it is best to wait for the new uses,
because they usually aren't quite what one might expect.
Thanx, Paul
On Tue, 8 Sep 2009, Paul E. McKenney wrote:
> Are you saying that people have already asked you for additional
> variants of SLAB_DESTROY_BY_RCU? If so, please don't keep them a secret!
> Otherwise, experience indicates that it is best to wait for the new uses,
> because they usually aren't quite what one might expect.
No direct request but I have seen the network developers discover these
features and their caching benefits over the last year. It is likely that
they will try to push it into more components of the net subsystem.
On Tue, Sep 08, 2009 at 06:41:14PM -0400, Christoph Lameter wrote:
> On Tue, 8 Sep 2009, Paul E. McKenney wrote:
>
> > Are you saying that people have already asked you for additional
> > variants of SLAB_DESTROY_BY_RCU? If so, please don't keep them a secret!
> > Otherwise, experience indicates that it is best to wait for the new uses,
> > because they usually aren't quite what one might expect.
>
> No direct request but I have seen the network developers discover these
> features and their caching benefits over the last year. It is likely that
> they will try to push it into more components of the net subsystem.
So if they push it far enough, they might well decide that they need
a SLAB_DESTROY_BY_RCU_BH, for example. Looks like seven bits left,
so unless I am missing something, should not be a huge problem should
this need arise.
Thanx, Paul
On Tue, 8 Sep 2009, Paul E. McKenney wrote:
> > No direct request but I have seen the network developers discover these
> > features and their caching benefits over the last year. It is likely that
> > they will try to push it into more components of the net subsystem.
>
> So if they push it far enough, they might well decide that they need
> a SLAB_DESTROY_BY_RCU_BH, for example. Looks like seven bits left,
> so unless I am missing something, should not be a huge problem should
> this need arise.
I'd rather have the call_rcu in the slabs replaced by a function that
can be set by the user. Then we can remove all rcu barriers from the code
and the slabs could be used with any type of rcu functionality.
On Wed, Sep 09, 2009 at 10:04:20AM -0400, Christoph Lameter wrote:
> On Tue, 8 Sep 2009, Paul E. McKenney wrote:
>
> > > No direct request but I have seen the network developers discover these
> > > features and their caching benefits over the last year. It is likely that
> > > they will try to push it into more components of the net subsystem.
> >
> > So if they push it far enough, they might well decide that they need
> > a SLAB_DESTROY_BY_RCU_BH, for example. Looks like seven bits left,
> > so unless I am missing something, should not be a huge problem should
> > this need arise.
>
> I'd rather have the call_rcu in the slabs replaced by a function that
> can be set by the user. Then we can remove all rcu barriers from the code
> and the slabs could be used with any type of rcu functionality.
If the embedded guys are OK with an additional pointer in the slab data
structure, I have no objection to this approach. I am assuming that we
would use the usual ops-style structure full of pointers to functions
in order to avoid a pair of extra pointers.
Thanx, Paul
On Wed, 9 Sep 2009, Paul E. McKenney wrote:
> If the embedded guys are OK with an additional pointer in the slab data
> structure, I have no objection to this approach. I am assuming that we
> would use the usual ops-style structure full of pointers to functions
> in order to avoid a pair of extra pointers.
This would also allow us to get rid of the ctor pointer in
kmem_cache_create(). We can put it into the same structure.
On Wed, Sep 09, 2009 at 10:53:22AM -0400, Christoph Lameter wrote:
> On Wed, 9 Sep 2009, Paul E. McKenney wrote:
>
> > If the embedded guys are OK with an additional pointer in the slab data
> > structure, I have no objection to this approach. I am assuming that we
> > would use the usual ops-style structure full of pointers to functions
> > in order to avoid a pair of extra pointers.
>
> This would also allow us to get rid of the ctor pointer in
> kmem_cache_create(). We can put it into the same structure.
Sounds good -- that would result in no increase in size. Could even
add a dtor if people want it. ;-) (Sorry, couldn't resist...)
Thanx, Paul