Hi Ingo,
I encountered BUG at mm/slab.c.
It tells me there is an inconsistency case.
One of the panics, it shows obviously exclusive control in slab is not right.
The console log;
Unable to handle kernel paging request at ffff8108cf81d3c8 RIP:
[<ffffffff8029e92f>] kmem_cache_alloc+0x5e/0xdb
PGD 8063 PUD 0
Oops: 0000 [1] PREEMPT SMP
CPU 1
Modules linked in:
Pid: 32268, comm: dcachebench Not tainted 2.6.24.3-rt3 #3
RIP: 0010:[<ffffffff8029e92f>] [<ffffffff8029e92f>] kmem_cache_alloc+0x5e/0xdb
RSP: 0018:ffff810048895e58 EFLAGS: 00010097
RAX: 00000000ffffffff RBX: 0000000000000246 RCX: 0000000000000000
RDX: ffff8100cf81d3b8 RSI: 0000000000000000 RDI: ffff8100cf80ee40
RBP: ffff8100cf80ee40 R08: 0000000000000c62 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000000000d0
R13: ffffffff802aa83d R14: 0000000000000006 R15: 0000000000000000
FS: 00002adc7b6686f0(0000) GS:ffff8100cf81f340(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffff8108cf81d3c8 CR3: 00000000b36e0000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process dcachebench (pid: 32268, threadinfo ffff810048894000, task ffff8100cd248080)
Stack: 0000000000000000 ffff810097d1c000 0000000100000282 0000000000000006
00000000ffffff9c 0000000000000000 00000000005038b0 ffffffff802aa83d
ffff8100cf811080 0000000000000006 00000000ffffff9c 0000000000000000
Call Trace:
[<ffffffff802aa83d>] getname+0x1e/0x1f0
[<ffffffff802ac38c>] do_rmdir+0x17/0xfd
[<ffffffff8026fb5f>] audit_syscall_entry+0x163/0x189
[<ffffffff8020c2f3>] tracesys+0x71/0xe1
[<ffffffff8020c35e>] tracesys+0xdc/0xe1
---------------------------
| preempt count: 00000001 ]
| 1-level deep critical section nesting:
----------------------------------------
.. [<ffffffff80640737>] .... __spin_trylock+0xe/0x49
.....[<00000000>] .. ( <= 0x0)
Code: 48 8b 54 c2 18 53 9d 4c 89 e9 44 89 e6 48 89 ef e8 f1 e1 ff
RIP [<ffffffff8029e92f>] kmem_cache_alloc+0x5e/0xdb
RSP <ffff810048895e58>
The kernel dump;
(gdb) info thr
4 process 32262 flush_tlb_others (cpumask={bits = {4}}, mm=0xffff8100cd16d450,
va=46966178668544) at arch/x86/kernel/smp_64.c:195
3 process 32325 0xffffffff8020cad0 in invalidate_interrupt3 ()
2 process 32268 0xffffffff8029e92f in kmem_cache_alloc (cachep=0xffff8100cf80ee40,
flags=208) at mm/slab.c:3320
* 1 process 32326 0xffffffff806408c4 in __spin_lock (lock=0xffff8100cdc9a4d0)
at kernel/spinlock.c:333
(gdb) thr 2
[Switching to thread 2 (process 32268)]#0 0xffffffff8029e92f in kmem_cache_alloc (
cachep=0xffff8100cf80ee40, flags=208) at mm/slab.c:3320
3320 mm/slab.c: No such file or directory.
in mm/slab.c
0xffffffff8029e903 <kmem_cache_alloc+50>: callq 0xffffffff8029c292 <check_irq_off>
0xffffffff8029e908 <kmem_cache_alloc+55>: movslq 0x14(%rsp),%rax
0xffffffff8029e90d <kmem_cache_alloc+60>: mov 0x0(%rbp,%rax,8),%rdx
0xffffffff8029e912 <kmem_cache_alloc+65>: mov (%rdx),%eax
0xffffffff8029e914 <kmem_cache_alloc+67>: test %eax,%eax
0xffffffff8029e916 <kmem_cache_alloc+69>: je 0xffffffff8029e990 <kmem_cache_alloc+191>
1st. %eax, ac->avail is not 0.
0xffffffff8029e918 <kmem_cache_alloc+71>: lock incl 0xf8(%rbp)
0xffffffff8029e91f <kmem_cache_alloc+78>: movl $0x1,0xc(%rdx)
0xffffffff8029e926 <kmem_cache_alloc+85>: mov (%rdx),%eax
0xffffffff8029e928 <kmem_cache_alloc+87>: sub $0x1,%eax
0xffffffff8029e92b <kmem_cache_alloc+90>: mov %eax,(%rdx)
0xffffffff8029e92d <kmem_cache_alloc+92>: mov %eax,%eax
0xffffffff8029e92f <kmem_cache_alloc+94>: mov 0x18(%rdx,%rax,8),%rdx
2nd. %eax, ac->avail = 0xffffffff, it causes this panic.
0xffffffff8029e934 <kmem_cache_alloc+99>: push %rbx
0xffffffff8029e935 <kmem_cache_alloc+100>: popfq
static inline void *
____cache_alloc(struct kmem_cache *cachep, gfp_t flags, int *this_cpu)
{
void *objp;
struct array_cache *ac;
check_irq_off();
ac = cpu_cache_get(cachep, *this_cpu);
if (likely(ac->avail)) { ---------------------------> 1st. check
STATS_INC_ALLOCHIT(cachep);
ac->touched = 1;
objp = ac->entry[--ac->avail]; -------------> PANIC HERE
} else {
STATS_INC_ALLOCMISS(cachep);
objp = cache_alloc_refill(cachep, flags, this_cpu);
}
return objp;
}
It means that ac->avail is changed by the other cpu.
So I looked for who touch another cpu's data and found the root cause.
I think, calling cache_grow() in cache_alloc_refill() can change cpu,
but cache_grow() doesn't update this_cpu variable in spite of changing cpu.
It makes the kernel access other cpu's data in cache_alloc_refill().
I'm testing this patch now.
---
From: Hiroshi Shimamoto <[email protected]>
On !PREEMPT_RT, an inconsistency case may be occurred.
After releasing cpu by slab_irq_enable_nort() the cpu can be changed.
In slab_irq_disable_nort() new cpu id should be gotten.
If not, it makes data of other cpu corrupted.
Signed-off-by: Hiroshi Shimamoto <[email protected]>
---
mm/slab.c | 18 +++++++++---------
1 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/mm/slab.c b/mm/slab.c
index 4bf6ca8..7beb378 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -138,8 +138,8 @@
*
* (On PREEMPT_RT, these are NOPs, but we have to drop/get the irq locks.)
*/
-# define slab_irq_disable_nort() local_irq_disable()
-# define slab_irq_enable_nort() local_irq_enable()
+# define slab_irq_disable_nort(cpu) slab_irq_disable(cpu)
+# define slab_irq_enable_nort(cpu) slab_irq_enable(cpu)
# define slab_irq_disable_rt(flags) do { (void)(flags); } while (0)
# define slab_irq_enable_rt(flags) do { (void)(flags); } while (0)
# define slab_spin_lock_irq(lock, cpu) \
@@ -160,8 +160,8 @@ DEFINE_PER_CPU_LOCKED(int, slab_irq_locks) = { 0, };
do { slab_irq_enable(cpu); (void) (flags); } while (0)
# define slab_irq_disable_rt(cpu) slab_irq_disable(cpu)
# define slab_irq_enable_rt(cpu) slab_irq_enable(cpu)
-# define slab_irq_disable_nort() do { } while (0)
-# define slab_irq_enable_nort() do { } while (0)
+# define slab_irq_disable_nort(cpu) do { } while (0)
+# define slab_irq_enable_nort(cpu) do { } while (0)
# define slab_spin_lock_irq(lock, cpu) \
do { slab_irq_disable(cpu); spin_lock(lock); } while (0)
# define slab_spin_unlock_irq(lock, cpu) \
@@ -2899,7 +2899,7 @@ static int cache_grow(struct kmem_cache *cachep, gfp_t flags, int nodeid,
offset *= cachep->colour_off;
if (local_flags & __GFP_WAIT)
- slab_irq_enable_nort();
+ slab_irq_enable_nort(*this_cpu);
slab_irq_enable_rt(*this_cpu);
/*
@@ -2932,7 +2932,7 @@ static int cache_grow(struct kmem_cache *cachep, gfp_t flags, int nodeid,
slab_irq_disable_rt(*this_cpu);
if (local_flags & __GFP_WAIT)
- slab_irq_disable_nort();
+ slab_irq_disable_nort(*this_cpu);
check_irq_off();
spin_lock(&l3->list_lock);
@@ -2948,7 +2948,7 @@ opps1:
failed:
slab_irq_disable_rt(*this_cpu);
if (local_flags & __GFP_WAIT)
- slab_irq_disable_nort();
+ slab_irq_disable_nort(*this_cpu);
return 0;
}
@@ -3397,7 +3397,7 @@ retry:
* set and go into memory reserves if necessary.
*/
if (local_flags & __GFP_WAIT)
- slab_irq_enable_nort();
+ slab_irq_enable_nort(*this_cpu);
slab_irq_enable_rt(*this_cpu);
kmem_flagcheck(cache, flags);
@@ -3405,7 +3405,7 @@ retry:
slab_irq_disable_rt(*this_cpu);
if (local_flags & __GFP_WAIT)
- slab_irq_disable_nort();
+ slab_irq_disable_nort(*this_cpu);
if (obj) {
/*
--
1.5.4.1
Hi Hiroshi,
Thanks for looking into it. I'll take a deeper look at it later, but
first I have a few nitpicks with the patch.
On Thu, 20 Mar 2008, Hiroshi Shimamoto wrote:
>
> Signed-off-by: Hiroshi Shimamoto <[email protected]>
> ---
> mm/slab.c | 18 +++++++++---------
> 1 files changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/mm/slab.c b/mm/slab.c
> index 4bf6ca8..7beb378 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -138,8 +138,8 @@
> *
> * (On PREEMPT_RT, these are NOPs, but we have to drop/get the irq locks.)
> */
> -# define slab_irq_disable_nort() local_irq_disable()
> -# define slab_irq_enable_nort() local_irq_enable()
> +# define slab_irq_disable_nort(cpu) slab_irq_disable(cpu)
> +# define slab_irq_enable_nort(cpu) slab_irq_enable(cpu)
This is the !PREEMPT_RT version. It should stay as is (with the
exception of ignoring the "cpu" variable. [ (void)cpu; ? ]
#define slab_irq_disable_nort(cpu) \
do { \
(void)cpu; \
local_irq_disable(); \
} while(0)
> # define slab_irq_disable_rt(flags) do { (void)(flags); } while (0)
> # define slab_irq_enable_rt(flags) do { (void)(flags); } while (0)
> # define slab_spin_lock_irq(lock, cpu) \
> @@ -160,8 +160,8 @@ DEFINE_PER_CPU_LOCKED(int, slab_irq_locks) = { 0, };
> do { slab_irq_enable(cpu); (void) (flags); } while (0)
> # define slab_irq_disable_rt(cpu) slab_irq_disable(cpu)
> # define slab_irq_enable_rt(cpu) slab_irq_enable(cpu)
> -# define slab_irq_disable_nort() do { } while (0)
> -# define slab_irq_enable_nort() do { } while (0)
> +# define slab_irq_disable_nort(cpu) do { } while (0)
> +# define slab_irq_enable_nort(cpu) do { } while (0)
And these are the PREEMPT_RT version. So basically, this patch is a nop
for PREEMPT_RT. I doubt it will solve your bug ;-)
-- Steve
Steven Rostedt wrote:
> Hi Hiroshi,
Hi Steven,
>
> Thanks for looking into it. I'll take a deeper look at it later, but
> first I have a few nitpicks with the patch.
thanks for comments.
>
> On Thu, 20 Mar 2008, Hiroshi Shimamoto wrote:
>> Signed-off-by: Hiroshi Shimamoto <[email protected]>
>> ---
>> mm/slab.c | 18 +++++++++---------
>> 1 files changed, 9 insertions(+), 9 deletions(-)
>>
>> diff --git a/mm/slab.c b/mm/slab.c
>> index 4bf6ca8..7beb378 100644
>> --- a/mm/slab.c
>> +++ b/mm/slab.c
>> @@ -138,8 +138,8 @@
>> *
>> * (On PREEMPT_RT, these are NOPs, but we have to drop/get the irq locks.)
>> */
>> -# define slab_irq_disable_nort() local_irq_disable()
>> -# define slab_irq_enable_nort() local_irq_enable()
>> +# define slab_irq_disable_nort(cpu) slab_irq_disable(cpu)
>> +# define slab_irq_enable_nort(cpu) slab_irq_enable(cpu)
>
> This is the !PREEMPT_RT version. It should stay as is (with the
> exception of ignoring the "cpu" variable. [ (void)cpu; ? ]
>
> #define slab_irq_disable_nort(cpu) \
> do { \
> (void)cpu; \
> local_irq_disable(); \
> } while(0)
I think "cpu" variable should be changed if current cpu is changed.
mm/slab.c:cache_alloc_refill()
if (unlikely(!ac->avail)) {
int x;
x = cache_grow(cachep, flags | GFP_THISNODE, cpu_to_node(*this_cpu), NULL, this_cpu);
/* cache_grow can reenable interrupts, then ac could change. */
ac = cpu_cache_get(cachep, *this_cpu);
The comment says, "ac could change", but it never if *this_cpu is same.
In cache_alloc_refill(), cpu_cache_get() should called with the valid cpu id
after cache_grow(). Because on !PREEMPT_RT, the array_cache is protected by
disabling irqs, so array_cache of other cpu shouldn't be accessed.
>
>> # define slab_irq_disable_rt(flags) do { (void)(flags); } while (0)
>> # define slab_irq_enable_rt(flags) do { (void)(flags); } while (0)
>> # define slab_spin_lock_irq(lock, cpu) \
>> @@ -160,8 +160,8 @@ DEFINE_PER_CPU_LOCKED(int, slab_irq_locks) = { 0, };
>> do { slab_irq_enable(cpu); (void) (flags); } while (0)
>> # define slab_irq_disable_rt(cpu) slab_irq_disable(cpu)
>> # define slab_irq_enable_rt(cpu) slab_irq_enable(cpu)
>> -# define slab_irq_disable_nort() do { } while (0)
>> -# define slab_irq_enable_nort() do { } while (0)
>> +# define slab_irq_disable_nort(cpu) do { } while (0)
>> +# define slab_irq_enable_nort(cpu) do { } while (0)
>
> And these are the PREEMPT_RT version. So basically, this patch is a nop
> for PREEMPT_RT. I doubt it will solve your bug ;-)
yes, it's nop. I'm sorry, I didn't show .config.
My kernel config is !PREEMPT_RT, it's same as the deadlock report.
I guess it's a problem only !PREEMPT_RT.
thanks,
Hiroshi Shimamoto
On Thu, 2008-03-20 at 15:32 -0400, Steven Rostedt wrote:
> And these are the PREEMPT_RT version. So basically, this patch is a
> nop
> for PREEMPT_RT. I doubt it will solve your bug ;-)
He said it was a !PREEMPT_RT issue (in the -rt tree) ..
Daniel
On Thu, 20 Mar 2008, Hiroshi Shimamoto wrote:
>
> mm/slab.c:cache_alloc_refill()
> if (unlikely(!ac->avail)) {
> int x;
> x = cache_grow(cachep, flags | GFP_THISNODE, cpu_to_node(*this_cpu), NULL, this_cpu);
>
> /* cache_grow can reenable interrupts, then ac could change. */
> ac = cpu_cache_get(cachep, *this_cpu);
>
> The comment says, "ac could change", but it never if *this_cpu is same.
>
> In cache_alloc_refill(), cpu_cache_get() should called with the valid cpu id
> after cache_grow(). Because on !PREEMPT_RT, the array_cache is protected by
> disabling irqs, so array_cache of other cpu shouldn't be accessed.
Ah, sorry I missed the !PREEMPT_RT part.
>
> >
> >> # define slab_irq_disable_rt(flags) do { (void)(flags); } while (0)
> >> # define slab_irq_enable_rt(flags) do { (void)(flags); } while (0)
> >> # define slab_spin_lock_irq(lock, cpu) \
> >> @@ -160,8 +160,8 @@ DEFINE_PER_CPU_LOCKED(int, slab_irq_locks) = { 0, };
> >> do { slab_irq_enable(cpu); (void) (flags); } while (0)
> >> # define slab_irq_disable_rt(cpu) slab_irq_disable(cpu)
> >> # define slab_irq_enable_rt(cpu) slab_irq_enable(cpu)
> >> -# define slab_irq_disable_nort() do { } while (0)
> >> -# define slab_irq_enable_nort() do { } while (0)
> >> +# define slab_irq_disable_nort(cpu) do { } while (0)
> >> +# define slab_irq_enable_nort(cpu) do { } while (0)
> >
> > And these are the PREEMPT_RT version. So basically, this patch is a nop
> > for PREEMPT_RT. I doubt it will solve your bug ;-)
>
> yes, it's nop. I'm sorry, I didn't show .config.
> My kernel config is !PREEMPT_RT, it's same as the deadlock report.
> I guess it's a problem only !PREEMPT_RT.
Hmm, I'll have to look deeper into this on Monday (I'm off today -
Friday).
Is this a bug in mainline? The !PREEMPT_RT case should be as close to
mainline as possible, with no actual changes in object code. If this is
not the case, then we need to fix that.
I'll try to remember to take a look at this on Monday.
Thanks,
-- Steve
Steven Rostedt wrote:
>
> On Thu, 20 Mar 2008, Hiroshi Shimamoto wrote:
>
>> mm/slab.c:cache_alloc_refill()
>> if (unlikely(!ac->avail)) {
>> int x;
>> x = cache_grow(cachep, flags | GFP_THISNODE, cpu_to_node(*this_cpu), NULL, this_cpu);
>>
>> /* cache_grow can reenable interrupts, then ac could change. */
>> ac = cpu_cache_get(cachep, *this_cpu);
>>
>> The comment says, "ac could change", but it never if *this_cpu is same.
>>
>> In cache_alloc_refill(), cpu_cache_get() should called with the valid cpu id
>> after cache_grow(). Because on !PREEMPT_RT, the array_cache is protected by
>> disabling irqs, so array_cache of other cpu shouldn't be accessed.
>
> Ah, sorry I missed the !PREEMPT_RT part.
no problem. I also missed to say clearly this issue is on !PREEMPT_RT :-)
>
>>>> # define slab_irq_disable_rt(flags) do { (void)(flags); } while (0)
>>>> # define slab_irq_enable_rt(flags) do { (void)(flags); } while (0)
>>>> # define slab_spin_lock_irq(lock, cpu) \
>>>> @@ -160,8 +160,8 @@ DEFINE_PER_CPU_LOCKED(int, slab_irq_locks) = { 0, };
>>>> do { slab_irq_enable(cpu); (void) (flags); } while (0)
>>>> # define slab_irq_disable_rt(cpu) slab_irq_disable(cpu)
>>>> # define slab_irq_enable_rt(cpu) slab_irq_enable(cpu)
>>>> -# define slab_irq_disable_nort() do { } while (0)
>>>> -# define slab_irq_enable_nort() do { } while (0)
>>>> +# define slab_irq_disable_nort(cpu) do { } while (0)
>>>> +# define slab_irq_enable_nort(cpu) do { } while (0)
>>> And these are the PREEMPT_RT version. So basically, this patch is a nop
>>> for PREEMPT_RT. I doubt it will solve your bug ;-)
>> yes, it's nop. I'm sorry, I didn't show .config.
>> My kernel config is !PREEMPT_RT, it's same as the deadlock report.
>> I guess it's a problem only !PREEMPT_RT.
>
> Hmm, I'll have to look deeper into this on Monday (I'm off today -
> Friday).
thanks for taking time.
>
> Is this a bug in mainline? The !PREEMPT_RT case should be as close to
> mainline as possible, with no actual changes in object code. If this is
> not the case, then we need to fix that.
No, this issue is in -rt only.
thanks,
Hiroshi Shimamoto
On Fri, 21 Mar 2008, Hiroshi Shimamoto wrote:
>
> >
> > Is this a bug in mainline? The !PREEMPT_RT case should be as close to
> > mainline as possible, with no actual changes in object code. If this is
> > not the case, then we need to fix that.
>
> No, this issue is in -rt only.
>
Hi Hiroshi,
Thanks again for reporting this. I took a deeper look into this today and
came to the conclusion that we need to get rid of both
slab_irq_disable_nort and slab_irq_disable_rt, and simply use
slab_irq_disable ;-)
This is what you patch indirectly does. I'll write up another patch to fix
this.
Thanks,
-- Steve
P.S. I'm hoping to get -rt4 out today.
On Mon, 24 Mar 2008, Steven Rostedt wrote:
>
> On Fri, 21 Mar 2008, Hiroshi Shimamoto wrote:
> >
> > >
> > > Is this a bug in mainline? The !PREEMPT_RT case should be as close to
> > > mainline as possible, with no actual changes in object code. If this is
> > > not the case, then we need to fix that.
> >
> > No, this issue is in -rt only.
> >
>
> Hi Hiroshi,
>
> Thanks again for reporting this. I took a deeper look into this today and
> came to the conclusion that we need to get rid of both
> slab_irq_disable_nort and slab_irq_disable_rt, and simply use
> slab_irq_disable ;-)
>
> This is what you patch indirectly does. I'll write up another patch to fix
> this.
>
Actually, I'm going to take your patch.
Seems that the reason Ingo did the two separate, is that the functions
called also call the slab_irq_save, which will retake the locks in
PREEMPT_RT. If we don't release the lock, then we can deadlock.
Thanks,
-- Steve
Steven Rostedt wrote:
> On Fri, 21 Mar 2008, Hiroshi Shimamoto wrote:
>>> Is this a bug in mainline? The !PREEMPT_RT case should be as close to
>>> mainline as possible, with no actual changes in object code. If this is
>>> not the case, then we need to fix that.
>> No, this issue is in -rt only.
>>
>
> Hi Hiroshi,
>
> Thanks again for reporting this. I took a deeper look into this today and
> came to the conclusion that we need to get rid of both
> slab_irq_disable_nort and slab_irq_disable_rt, and simply use
> slab_irq_disable ;-)
thanks for taking time.
>
> This is what you patch indirectly does. I'll write up another patch to fix
> this.
Sounds smarter.
>
> Thanks,
>
> -- Steve
>
> P.S. I'm hoping to get -rt4 out today.
I'll test it on my box when released.
thanks,
Hiroshi Shimamoto
Steven Rostedt wrote:
> On Mon, 24 Mar 2008, Steven Rostedt wrote:
>
>> On Fri, 21 Mar 2008, Hiroshi Shimamoto wrote:
>>>> Is this a bug in mainline? The !PREEMPT_RT case should be as close to
>>>> mainline as possible, with no actual changes in object code. If this is
>>>> not the case, then we need to fix that.
>>> No, this issue is in -rt only.
>>>
>> Hi Hiroshi,
>>
>> Thanks again for reporting this. I took a deeper look into this today and
>> came to the conclusion that we need to get rid of both
>> slab_irq_disable_nort and slab_irq_disable_rt, and simply use
>> slab_irq_disable ;-)
>>
>> This is what you patch indirectly does. I'll write up another patch to fix
>> this.
>>
>
> Actually, I'm going to take your patch.
>
> Seems that the reason Ingo did the two separate, is that the functions
> called also call the slab_irq_save, which will retake the locks in
> PREEMPT_RT. If we don't release the lock, then we can deadlock.
>
OK, actually, I had a interest in your rework :-)
thanks,
Hiroshi Shimamoto