2008-06-12 06:31:27

by junjie cai

[permalink] [raw]
Subject: [rt] __reacquire_lock_kernel bug?

Hi, List

We get a BUG_ON during some tests on a 2.6.22.19-rt9,
the message is pasted at the end of this mail.

It seems that preemption happened in __reacquire_kernel_lock()
that kernel_lock in released twice.

int __lockfunc __reacquire_kernel_lock(void)
{
local_irq_enable();
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> preempted here
_raw_spin_lock(&kernel_flag);
local_irq_disable();
preempt_disable();
}

After checking the 2.6.25.4-rt6, I think that same problem exist.
Isn't that lock_depth should be set to -1 before local_irq_enable() ?

int __lockfunc __reacquire_kernel_lock(void)
{
struct task_struct *task = current;
int saved_lock_depth = task->lock_depth;

local_irq_enable();
BUG_ON(saved_lock_depth < 0);
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>preempted here
task->lock_depth = -1;

down(&kernel_sem);

task->lock_depth = saved_lock_depth;

local_irq_disable();

return 0;
}


OOPS message:
---------------------------------------------------------------------------------------------------------------------------------------------------
[<c0024a50>] (dump_stack+0x0/0x24) from [<c002cf44>] (__schedule_bug
+0x48/0x54)
[<c002cefc>] (__schedule_bug+0x0/0x54) from [<c01991dc>] (__schedule
+0x9c/0x510)
r6:000003ff r5:f70d0100 r4:c0199868
[<c0199140>] (__schedule+0x0/0x510) from [<c0199868>]
(preempt_schedule_irq+0x48/0x70)
[<c0199820>] (preempt_schedule_irq+0x0/0x70) from [<c0020a08>]
(svc_preempt+0x8/0x20)
r4:ffffffff
[<c019ba20>] (__lock_text_start+0x0/0x2c) from [<c01995cc>] (__schedule
+0x48c/0x510)
[<c0199140>] (__schedule+0x0/0x510) from [<c01997e0>] (schedule
+0xc8/0x108)
[<c0199718>] (schedule+0x0/0x108) from [<c00ce488>] (nfs_wait_schedule
+0x34/0x40)
r4:00000000
[<c00ce454>] (nfs_wait_schedule+0x0/0x40) from [<c019a6cc>]
(__wait_on_bit_lock+0x64/0x98)
r4:cb65bacc
[<c019a668>] (__wait_on_bit_lock+0x0/0x98) from [<c019a77c>]
(out_of_line_wait_on_bit_lock+0x7c/0x88)
r8:cb8172ec r7:00000002 r6:c00ce454 r5:c0000490 r4:cb65bae0
[<c019a700>] (out_of_line_wait_on_bit_lock+0x0/0x88) from [<c00ce57c>]
(__nfs_revalidate_inode+0xe8/0x2f4)
[<c00ce494>] (__nfs_revalidate_inode+0x0/0x2f4) from [<c00cc710>]
(nfs_lookup_revalidate+0x1b8/0x430)
[<c00cc558>] (nfs_lookup_revalidate+0x0/0x430) from [<c00822c8>]
(do_lookup+0x13c/0x184)
[<c008218c>] (do_lookup+0x0/0x184) from [<c0083d40>] (__link_path_walk
+0x6fc/0x8cc)
[<c0083644>] (__link_path_walk+0x0/0x8cc) from [<c0083f90>]
(link_path_walk+0x80/0x13c)
[<c0083f10>] (link_path_walk+0x0/0x13c) from [<c0084070>] (path_walk
+0x24/0x28)
r8:cb656000 r7:cb65bf08 r6:00000001 r5:00000001 r4:cb9f22e0
[<c008404c>] (path_walk+0x0/0x28) from [<c00846a4>] (do_path_lookup
+0x2b4/0x2e0)
[<c00843f0>] (do_path_lookup+0x0/0x2e0) from [<c00851a4>]
(__path_lookup_intent_open+0x54/0x94)
r8:00000001 r7:cb65bf08 r6:00000001 r5:ffffffe9 r4:ffffff9c
[<c0085150>] (__path_lookup_intent_open+0x0/0x94) from [<c0085278>]
(path_lookup_open+0x24/0x2c)
r7:00000000 r6:cb656000 r5:cb65bf08 r4:00000000
[<c0085254>] (path_lookup_open+0x0/0x2c) from [<c00852e0>] (open_namei
+0x60/0x4d4)
[<c0085280>] (open_namei+0x0/0x4d4) from [<c0077ca0>] (do_filp_open
+0x2c/0x4c)
[<c0077c74>] (do_filp_open+0x0/0x4c) from [<c0077d14>] (do_sys_open
+0x54/0x98)
r5:00000003 r4:00000000
[<c0077cc0>] (do_sys_open+0x0/0x98) from [<c0077d90>] (sys_open
+0x24/0x28)
r8:c0020f04 r7:00000005 r6:00009bfd r5:00070060 r4:00000004
[<c0077d6c>] (sys_open+0x0/0x28) from [<c0020d80>] (ret_fast_syscall
+0x0/0x2c)



--
/*********************************************/
/* Junjie Cai ([email protected]) */
/*********************************************/


2008-06-12 12:06:49

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [rt] __reacquire_lock_kernel bug?

On Thu, 12 Jun 2008, junjie cai wrote:
> Hi, List
>
> We get a BUG_ON during some tests on a 2.6.22.19-rt9,
> the message is pasted at the end of this mail.

There is no BUG_ON in the code path where your stack trace comes from.
Unfortunately you omitted the interesting part of the message, which
is printed _before_ the stack trace:

"BUG: scheduling while atomic: .....

> It seems that preemption happened in __reacquire_kernel_lock()
> that kernel_lock in released twice.

Sorry, I do not understand what you mean.

> int __lockfunc __reacquire_kernel_lock(void)
> {
> local_irq_enable();
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> preempted here

When you use rt you won't be in that code path. -rt uses the
preemptible BKL implementation.

> _raw_spin_lock(&kernel_flag);
> local_irq_disable();
> preempt_disable();
> }
>
> After checking the 2.6.25.4-rt6, I think that same problem exist.
> Isn't that lock_depth should be set to -1 before local_irq_enable() ?
>
> int __lockfunc __reacquire_kernel_lock(void)
> {
> struct task_struct *task = current;
> int saved_lock_depth = task->lock_depth;
>
> local_irq_enable();
> BUG_ON(saved_lock_depth < 0);
> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>preempted here

That's ok, when we the task is preempted here. lock_depth needs to be
set to -1 before we call down() and it is restored
afterwards. preempt_schedule_irq() handles the BKL in the right way.

Thanks,

tglx