2003-02-04 00:25:58

by Haoqiang Zheng

[permalink] [raw]
Subject: linux hangs with printk on schedule()

I found Linux hangs when printk is inserted to the function schedule().
Sure, it doesn't make much sense to add such a line to schedule(). But Linux
shouldn't hang anyway, right? It's assumed that printk can be inserted
safely to anywhere. So, is it a bug of Linux?

The linux I am running is 2.4.18-14, the same version used by Redhat 8.0.
The scheduler is Ingo's O(1) scheduler.


Here is a fragment of the code
****************************************************************
switch (prev->state) {
------
default:
printk("deactivating task pid=%d
comm=%s\n",prev->pid,prev->comm);
deactivate_task(prev, rq);
}
******************************************************************


2003-02-04 00:34:16

by Robert Love

[permalink] [raw]
Subject: Re: linux hangs with printk on schedule()

On Mon, 2003-02-03 at 19:35, Haoqiang Zheng wrote:

> I found Linux hangs when printk is inserted to the function schedule().
> Sure, it doesn't make much sense to add such a line to schedule(). But Linux
> shouldn't hang anyway, right? It's assumed that printk can be inserted
> safely to anywhere. So, is it a bug of Linux?

Its a known deadlock in 2.4:

schedule -> printk() -> dmesg output -> klogd wakes up -> repeat

It is not a hard fix and its basically one of a few places where you
cannot call printk(), which is otherwise a very robust funciton.

Robert Love

2003-02-04 00:39:00

by Andi Kleen

[permalink] [raw]
Subject: Re: linux hangs with printk on schedule()

"Haoqiang Zheng" <[email protected]> writes:

> I found Linux hangs when printk is inserted to the function schedule().
> Sure, it doesn't make much sense to add such a line to schedule(). But Linux
> shouldn't hang anyway, right? It's assumed that printk can be inserted
> safely to anywhere. So, is it a bug of Linux?
>
> The linux I am running is 2.4.18-14, the same version used by Redhat 8.0.
> The scheduler is Ingo's O(1) scheduler.

printk can call wake_up to wake up the klogd daemon. This will deadlock
on aquiring the scheduler lock of the local run queue.

One way to avoid it is to wrap it like this:

oops_in_progress++;
printk(...);
oops_in_progress--;

And no, it's not a bug in Linux.

-Andi

2003-02-05 02:52:48

by Haoqiang Zheng

[permalink] [raw]
Subject: Re: linux hangs with printk on schedule()

> oops_in_progress++;
> printk(...);
> oops_in_progress--;

Thanks Robert and Andi for your help.
But the trick (avoid waking up klog by setting oops_in_progress) doesn't
seem to work for me.

I did notice the code:
*********************************************
if (must_wake_klogd && !oops_in_progress)
wake_up_interruptible(&log_wait);
*****************************************
But it simply still doesn't work. :-(

I am working on implementing a new SMP scheduler. It's an OS research
project. Without "printk" in the scheduler, it's really very hard to do the
debugging. I don't know how other guys do in this case. Are you guys better
equipped than me? I mean is debugging with gdb running on another machine
(connected via serial port) a common technique? I am not sure whether it's
necessary to set up an environment like that.

Haoqiang

2003-02-06 02:09:36

by Rick Lindsley

[permalink] [raw]
Subject: Re: linux hangs with printk on schedule()

I am working on implementing a new SMP scheduler. It's an OS research
project. Without "printk" in the scheduler, it's really very hard to do the
debugging. I don't know how other guys do in this case. Are you guys better
equipped than me? I mean is debugging with gdb running on another machine
(connected via serial port) a common technique? I am not sure whether it's
necessary to set up an environment like that.

Depending on what your needs are, could you simply note or count events
and do something about them later? I've a patch which inserts counters
into schedule() (basically a very focused code coverage) so that you
can determine later what decisions were taken (and how many times they
were taken). If knowing the frequency at which a path was taken is
more important than knowing specific values each time a path was hit,
this patch might do it for you.

http://eaglet.rain.com/rick/linux/schedstat

If both the details and sequence is important, since this is debugging,
you might try creating a, oh, 5000 member array, treated either as
a circular buffer or a simple recording of the first 5000 events, in
which you record your interesting event for later retrieval via
(fancy) /proc or (more basic) your favorite debugger. I wouldn't
advocate this for finished code, and be warned that too much
intrusive debugging in a path like the scheduler can skew what
you're hoping to observe. (printk's, for instance, can
be very slow compared to what's happening in the scheduler.)

Rick