LinuxLists.cc - RE: Why yield in coredump_wait? [was: Re: Resent: BUG in RT 45-01 whenRT program dumps core]

2005-05-19 14:23:34

Subject: RE: Why yield in coredump_wait? [was: Re: Resent: BUG in RT 45-01 whenRT program dumps core]

> In the function coredump_wait there's a yield called:
>
> static void coredump_wait(struct mm_struct *mm)
> {
> [...]
> /* give other threads a chance to run: */
> yield();
>
> zap_threads(mm);
> [...]
>
> I don't see any reason for this. Although the comment says
> it's giving
> other threads a chance to run, but the zap_threads below it will just
> send a kill signal to all those sharing the mm and then this
> thread will
> wait for completion (if there were threads to wait on).
>
> Now if there were no other threads to wait on it would just continue.
> So, is there some real reason that this yield is there? Or is it just
> trying to be nice, as in saying, "I'm dieing now and just
> don't want to
> waste others time" (which I highly doubt is the case).
>
> The reason I'm asking this, is that RT tasks should not call yield,
> since it is pretty much meaningless, since an RT task won't
> yield to any
> task of lesser priority, and in Ingo's current kernel the yield will
> send a bug message if it was called by an RT task.
>
> Thanks,
>
> -- Steve

Does that mean that the core dump is written
with the rt prio of the task which dumps?

I'm not sure if this is a good idea:
Dumping a big core might take *ages* (at least w.r.t. realtime),
especially because it usually goes to flash memory, a CF card,
or some other really slow device.

Doing that on a high rt prio is not nice; in an rt kernel,
it may even keep interrupt handlers from responding...
Is there any way to do it in the background / at low prio?

--
Klaus Kusche (Software Development - Control Systems)
KEBA AG Gewerbepark Urfahr, A-4041 Linz, Austria (Europe)
Tel: +43 / 732 / 7090-3120 Fax: +43 / 732 / 7090-6301
E-Mail: [email protected] WWW: http://www.keba.com

2005-05-19 14:37:28

by Steven Rostedt

[permalink] [raw]

Subject: RE: Why yield in coredump_wait? [was: Re: Resent: BUG in RT 45-01 whenRT program dumps core]

On Thu, 2005-05-19 at 16:23 +0200, kus Kusche Klaus wrote:
> Does that mean that the core dump is written
> with the rt prio of the task which dumps?
>

Yes, since the process itself that crashed is what is writing the core.
So if a RT process crashes, it writes the core as whatever it was.

> I'm not sure if this is a good idea:
> Dumping a big core might take *ages* (at least w.r.t. realtime),
> especially because it usually goes to flash memory, a CF card,
> or some other really slow device.
>

This is interesting, since if a RT task is dumping core, that usually
means that it crashed, and therefore there's a bug in the system. Also,
unless the processes is writing to something that requires a busy wait
(which the serial might do, and probably some flashes), this shouldn't
effect the system.

> Doing that on a high rt prio is not nice; in an rt kernel,
> it may even keep interrupt handlers from responding...
> Is there any way to do it in the background / at low prio?
>

What can easily be done is switch the task to a non RT priority on the
core dump, after it sends the kill signal to the other tasks sharing the
mm. This way it would not affect the other tasks (and interrupt threads)
so badly.

Ingo, What do you think? Should the dumping of a RT task switch its
priority to a non-rt priority?

-- Steve

2005-05-19 16:13:23

by Lee Revell

[permalink] [raw]

Subject: RE: Why yield in coredump_wait? [was: Re: Resent: BUG in RT 45-01 whenRT program dumps core]

On Thu, 2005-05-19 at 10:37 -0400, Steven Rostedt wrote:
> On Thu, 2005-05-19 at 16:23 +0200, kus Kusche Klaus wrote:
> > Does that mean that the core dump is written
> > with the rt prio of the task which dumps?
> >
>
> Yes, since the process itself that crashed is what is writing the core.
> So if a RT process crashes, it writes the core as whatever it was.
>
> > I'm not sure if this is a good idea:
> > Dumping a big core might take *ages* (at least w.r.t. realtime),
> > especially because it usually goes to flash memory, a CF card,
> > or some other really slow device.
> >
>
> This is interesting, since if a RT task is dumping core, that usually
> means that it crashed, and therefore there's a bug in the system. Also,
> unless the processes is writing to something that requires a busy wait
> (which the serial might do, and probably some flashes), this shouldn't
> effect the system.

Interesting indeed. This could be caused by (possibly transient)
hardware failure as well as a bug. How do mission critical hard RT
applications typically handle disasters like the RT process dumping
core? Presumably you have a hardware or software watchdog, and drop
into some kind of safe mode. It seems that you would need redundant
systems if you wanted to continue to handle the RT constraint while
recovering.

Lee