2005-05-19 06:36:12

by kus Kusche Klaus

[permalink] [raw]
Subject: Resent: BUG in RT 45-01 when RT program dumps core

Quoting my mail from Apr 11th (received no response up to now):
> When a process running with RT priority dumps core,
> I get the following BUG:
>
> Apr 11 13:44:23 OF455 kern.err kernel: BUG: rtc2:833 RT task
> yield()-ing!
> Apr 11 13:44:23 OF455 kern.warn kernel: [<c026dad1>]
> yield+0x61/0x70 (8)
> Apr 11 13:44:23 OF455 kern.warn kernel: [<c0151e49>]
> coredump_wait+0x79/0xc0 (20)
> Apr 11 13:44:23 OF455 kern.warn kernel: [<c0151f83>]
> do_coredump+0xf3/0x200 (92)
> Apr 11 13:44:23 OF455 kern.warn kernel: [<c0136789>]
> kmem_cache_free+0x49/0x120 (32)
> Apr 11 13:44:23 OF455 kern.warn kernel: [<c012abdb>]
> atomic_dec_and_spin_lock+0x3b/0x50 (24)
> Apr 11 13:44:23 OF455 kern.warn kernel: [<c011c9a5>]
> __dequeue_signal+0x105/0x160 (20)
> Apr 11 13:44:23 OF455 kern.warn kernel: [<c011e734>]
> get_signal_to_deliver+0x334/0x350 (48)
> Apr 11 13:44:23 OF455 kern.warn kernel: [<c01027f8>]
> do_signal+0x98/0x180 (44)
> Apr 11 13:44:23 OF455 kern.warn kernel: [<c0106b56>]
> timer_interrupt+0x46/0x70 (108)
> Apr 11 13:44:23 OF455 kern.warn kernel: [<c012c9eb>]
> handle_IRQ_event+0x5b/0xe0 (8)
> Apr 11 13:44:23 OF455 kern.warn kernel: [<c012cba1>]
> __do_IRQ+0x111/0x190 (48)
> Apr 11 13:44:23 OF455 kern.warn kernel: [<c010d590>]
> do_page_fault+0x0/0x530 (16)
> Apr 11 13:44:23 OF455 kern.warn kernel: [<c0102917>]
> do_notify_resume+0x37/0x3c (8)
> Apr 11 13:44:23 OF455 kern.warn kernel: [<c0102ae6>]
> work_notifysig+0x13/0x15 (8)

This is still absolutely reproducable, in RT 7.47-01,
with slight variations in the stack trace.

Is this something to worry about?

--
Klaus Kusche (Software Development - Control Systems)
KEBA AG Gewerbepark Urfahr, A-4041 Linz, Austria (Europe)
Tel: +43 / 732 / 7090-3120 Fax: +43 / 732 / 7090-6301
E-Mail: [email protected] WWW: http://www.keba.com


2005-05-19 11:56:22

by Steven Rostedt

[permalink] [raw]
Subject: Re: Resent: BUG in RT 45-01 when RT program dumps core

On Thu, 2005-05-19 at 08:36 +0200, kus Kusche Klaus wrote:
> Quoting my mail from Apr 11th (received no response up to now):

Sorry, I must have missed it, I try to keep up on all mail associated to
Ingo's RT kernel. Including Ingo on the list is the right thing to do.

> > When a process running with RT priority dumps core,
> > I get the following BUG:
> >
> > Apr 11 13:44:23 OF455 kern.err kernel: BUG: rtc2:833 RT task
> > yield()-ing!

This is a check that we have to flag when a RT task calls yield. This
in itself may not really be a bug, but it can be. There's places in the
kernel that call yield to wait for a bit to clear or a lock to become
unlock (doesn't grab it directly to prevent deadlocking). This may be
OK with non RT tasks, since other tasks will get a chance to run. But
with RT tasks, a yield won't yield to any task with less priority than
the RT task. So if the RT task is yielding to let a lower priority task
do something it needs, it will in effect deadlock the system for all
tasks lower in priority than itself.

> This is still absolutely reproducable, in RT 7.47-01,
> with slight variations in the stack trace.
>
> Is this something to worry about?

I'll take a look into it.


Ingo,

Did you get my patch to fix the kstop_machine yielding problem?

-- Steve


2005-05-19 12:48:21

by Serge Noiraud

[permalink] [raw]
Subject: Re: Resent: BUG in RT 45-01 when RT program dumps core

Le jeu 19/05/2005 ? 13:56, Steven Rostedt a ?crit :
> On Thu, 2005-05-19 at 08:36 +0200, kus Kusche Klaus wrote:
> > Quoting my mail from Apr 11th (received no response up to now):
...
> > > Apr 11 13:44:23 OF455 kern.err kernel: BUG: rtc2:833 RT task
> > > yield()-ing!
>
> This is a check that we have to flag when a RT task calls yield. This
> in itself may not really be a bug, but it can be. There's places in the
> kernel that call yield to wait for a bit to clear or a lock to become
> unlock (doesn't grab it directly to prevent deadlocking). This may be
> OK with non RT tasks, since other tasks will get a chance to run. But
> with RT tasks, a yield won't yield to any task with less priority than
> the RT task. So if the RT task is yielding to let a lower priority task
> do something it needs, it will in effect deadlock the system for all
> tasks lower in priority than itself.
>
> > This is still absolutely reproducable, in RT 7.47-01,
> > with slight variations in the stack trace.
> >
> > Is this something to worry about?
>
> I'll take a look into it.
>
>
> Ingo,
>
> Did you get my patch to fix the kstop_machine yielding problem?
>
> -- Steve

Does it solve this problem ? is it the same ? I'm in RT 47-03.
If yes, I'm interested in this patch.
...
May 16 09:59:52 dtb2 kernel: BUG: kstopmachine:1037 RT task yield()-ing!
May 16 09:59:52 dtb2 kernel: [dump_stack+35/48] (20)
May 16 09:59:52 dtb2 kernel: [<c01044b3>] (20)
May 16 09:59:52 dtb2 kernel: [yield+101/112] (20)
May 16 09:59:52 dtb2 kernel: [<c0344c55>] (20)
May 16 09:59:52 dtb2 kernel: [stop_machine+261/368] (40)
May 16 09:59:52 dtb2 kernel: [<c014c915>] (40)
May 16 09:59:52 dtb2 kernel: [do_stop+21/128] (20)
May 16 09:59:52 dtb2 kernel: [<c014c9b5>] (20)
May 16 09:59:52 dtb2 kernel: [kthread+182/256] (48)
May 16 09:59:52 dtb2 kernel: [<c013a576>] (48)
May 16 09:59:52 dtb2 kernel: [kernel_thread_helper+5/16] (140156956)
May 16 09:59:52 dtb2 kernel: [<c0101515>] (140156956)
May 16 09:59:53 dtb2 kernel: ts: Compaq touchscreen protocol output
May 16 09:59:53 dtb2 kernel: Generic RTC Driver v1.07
...


2005-05-19 13:13:14

by Steven Rostedt

[permalink] [raw]
Subject: Re: Resent: BUG in RT 45-01 when RT program dumps core

On Thu, 2005-05-19 at 14:38 +0200, Serge Noiraud wrote:
> > Ingo,
> >
> > Did you get my patch to fix the kstop_machine yielding problem?
> >
> > -- Steve
>
> Does it solve this problem ? is it the same ? I'm in RT 47-03.
> If yes, I'm interested in this patch.
> ...

Yes it does. Actually, all it does is allow kstopmachine to call yield
without the bug message. I looked into the logic of kstopmachine, and
it is perfectly fine to call yield there. So I added a rt_yield function
to allow for places that it is OK for a RT task to call yield without
showing that message.

I found my patch here: (It's a -p0 patch)

http://seclists.org/lists/linux-kernel/2005/May/att-2111/rt-yeild.patch__charset_us-ascii


-- Steve



2005-05-19 13:37:21

by Steven Rostedt

[permalink] [raw]
Subject: Why yield in coredump_wait? [was: Re: Resent: BUG in RT 45-01 when RT program dumps core]

In the function coredump_wait there's a yield called:

static void coredump_wait(struct mm_struct *mm)
{
[...]
/* give other threads a chance to run: */
yield();

zap_threads(mm);
[...]

I don't see any reason for this. Although the comment says it's giving
other threads a chance to run, but the zap_threads below it will just
send a kill signal to all those sharing the mm and then this thread will
wait for completion (if there were threads to wait on).

Now if there were no other threads to wait on it would just continue.
So, is there some real reason that this yield is there? Or is it just
trying to be nice, as in saying, "I'm dieing now and just don't want to
waste others time" (which I highly doubt is the case).

The reason I'm asking this, is that RT tasks should not call yield,
since it is pretty much meaningless, since an RT task won't yield to any
task of lesser priority, and in Ingo's current kernel the yield will
send a bug message if it was called by an RT task.

Thanks,

-- Steve



2005-05-19 15:47:37

by Lee Revell

[permalink] [raw]
Subject: Re: Why yield in coredump_wait? [was: Re: Resent: BUG in RT 45-01 when RT program dumps core]

On Thu, 2005-05-19 at 09:37 -0400, Steven Rostedt wrote:
> Now if there were no other threads to wait on it would just continue.
> So, is there some real reason that this yield is there? Or is it just
> trying to be nice, as in saying, "I'm dieing now and just don't want to
> waste others time" (which I highly doubt is the case).

Why do you highly doubt this is the case? This is actually the behavior
I would expect.

Lee

2005-05-19 16:03:47

by Steven Rostedt

[permalink] [raw]
Subject: Re: Why yield in coredump_wait? [was: Re: Resent: BUG in RT 45-01 when RT program dumps core]

On Thu, 2005-05-19 at 11:47 -0400, Lee Revell wrote:
> On Thu, 2005-05-19 at 09:37 -0400, Steven Rostedt wrote:
> > Now if there were no other threads to wait on it would just continue.
> > So, is there some real reason that this yield is there? Or is it just
> > trying to be nice, as in saying, "I'm dieing now and just don't want to
> > waste others time" (which I highly doubt is the case).
>
> Why do you highly doubt this is the case? This is actually the behavior
> I would expect.

Because yield just doesn't cut it. Yeah, OK it stops its time slice at
that moment, but it will still come in and preempt whoever to finish the
job. If it really wanted to do the "I'm dieing let others run" then it
should change its priority or nice value.

-- Steve


2005-05-19 17:26:23

by Daniel Walker

[permalink] [raw]
Subject: Re: Why yield in coredump_wait? [was: Re: Resent: BUG in RT 45-01 when RT program dumps core]

On Thu, 2005-05-19 at 09:37 -0400, Steven Rostedt wrote:

> The reason I'm asking this, is that RT tasks should not call yield,
> since it is pretty much meaningless, since an RT task won't yield to any
> task of lesser priority, and in Ingo's current kernel the yield will
> send a bug message if it was called by an RT task.

I've seen a RT yield warning on this yield while running the FUSYN
tests .. I can't imagine why it's there either.

Daniel

2005-05-19 17:45:53

by Alan

[permalink] [raw]
Subject: Re: Why yield in coredump_wait? [was: Re: Resent: BUG in RT 45-01 when RT program dumps core]

On Iau, 2005-05-19 at 18:25, Daniel Walker wrote:
> I've seen a RT yield warning on this yield while running the FUSYN
> tests .. I can't imagine why it's there either.

Would it not make more sense to kick a task out of hard real time at the
point it begins dumping core. The core dumping sequence was never
something that thread intended to execute at real time priority

2005-05-19 18:06:23

by Steven Rostedt

[permalink] [raw]
Subject: Re: Why yield in coredump_wait? [was: Re: Resent: BUG in RT 45-01 when RT program dumps core]

On Thu, 2005-05-19 at 18:43 +0100, Alan Cox wrote:
> On Iau, 2005-05-19 at 18:25, Daniel Walker wrote:
> > I've seen a RT yield warning on this yield while running the FUSYN
> > tests .. I can't imagine why it's there either.
>
> Would it not make more sense to kick a task out of hard real time at the
> point it begins dumping core. The core dumping sequence was never
> something that thread intended to execute at real time priority
>

That's what I recommended in an earlier email. I figured I'd wait to
see Ingo's response before sending him any patches. The drop from RT
should probably be after the zap_threads, that way it can kill those
using the same mm right away. Which also goes to say, we should get rid
of that yield.

-- Steve


2005-05-23 07:54:07

by Ingo Molnar

[permalink] [raw]
Subject: Re: Why yield in coredump_wait? [was: Re: Resent: BUG in RT 45-01 when RT program dumps core]


* Steven Rostedt <[email protected]> wrote:

> On Thu, 2005-05-19 at 18:43 +0100, Alan Cox wrote:
> > On Iau, 2005-05-19 at 18:25, Daniel Walker wrote:
> > > I've seen a RT yield warning on this yield while running the FUSYN
> > > tests .. I can't imagine why it's there either.
> >
> > Would it not make more sense to kick a task out of hard real time at the
> > point it begins dumping core. The core dumping sequence was never
> > something that thread intended to execute at real time priority
> >
>
> That's what I recommended in an earlier email. I figured I'd wait to
> see Ingo's response before sending him any patches. The drop from RT
> should probably be after the zap_threads, that way it can kill those
> using the same mm right away. Which also goes to say, we should get
> rid of that yield.

i think the yield() is bogus - all of coredumping is (or ought to be)
fully event-driven. I agree that coredumping itself does not need to run
with RT priorities - but this does not change the fact that no kernel
code should break if executing with RT priority.

In my tree i removed one yield() from exec.c and changed the other one
to msleep(1).

Ingo