[ This patch is added to the series since the wakeup timings trace
may lockup without it. ]
I thought that one could place a printk anywhere without worrying.
But it seems that it is not wise to place a printk where the runqueue
lock is held.
I just spent two hours debugging why some of my code was locking up,
to find that the lockup was caused by some debugging printk's that
I had in the scheduler. The printk's were only in rare paths so
they shouldn't be too much of a problem, but after I hit the printk
the system locked up.
Thinking that it was locking up on my code I went looking down the
wrong path. I finally found (after examining an NMI dump) that
the lockup happened because printk was trying to wakeup the klogd
daemon, which caused a deadlock when the try_to_wakeup code tries
to grab the runqueue lock.
Since printks are seldom called with interrupts disabled, we can
hold off the waking of klogd if they are. We don't have access to
the runqueue locks from printk, but those locks need interrupts
disabled in order to be held.
Calling printk with interrupts disabled should only be done for
emergencies and debugging anyway.
And with this patch, my code ran fine ;-)
Signed-off-by: Steven Rostedt <[email protected]>
---
kernel/printk.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)
Index: linux-mcount.git/kernel/printk.c
===================================================================
--- linux-mcount.git.orig/kernel/printk.c 2008-01-18 06:29:15.000000000 -0500
+++ linux-mcount.git/kernel/printk.c 2008-01-18 06:32:38.000000000 -0500
@@ -595,9 +595,11 @@ static int have_callable_console(void)
* @fmt: format string
*
* This is printk(). It can be called from any context. We want it to work.
- * Be aware of the fact that if oops_in_progress is not set, we might try to
- * wake klogd up which could deadlock on runqueue lock if printk() is called
- * from scheduler code.
+ *
+ * Note: if printk() is called with interrupts disabled, it will not wake
+ * up the klogd. This is to avoid a deadlock from calling printk() in schedule
+ * with the runqueue lock held and having the wake_up grab the runqueue lock
+ * as well.
*
* We try to grab the console_sem. If we succeed, it's easy - we log the output and
* call the console drivers. If we fail to get the semaphore we place the output
@@ -978,7 +980,13 @@ void release_console_sem(void)
console_locked = 0;
up(&console_sem);
spin_unlock_irqrestore(&logbuf_lock, flags);
- if (wake_klogd)
+ /*
+ * If we try to wake up klogd while printing with the runqueue lock
+ * held, this will deadlock. We don't have access to the runqueue
+ * lock from here, but just checking for interrupts disabled
+ * should be enough.
+ */
+ if (!irqs_disabled() && wake_klogd)
wake_up_klogd();
}
EXPORT_SYMBOL(release_console_sem);
--
On Wed, 2008-01-23 at 11:02 -0500, Steven Rostedt wrote:
> + if (!irqs_disabled() && wake_klogd)
> wake_up_klogd();
This causes a regression .. When printk is called during an OOPS in
kernels without this change then the OOPS will get logged, since the
logging process (klogd) is woken to handle the messages.. If you apply
this change klogd doesn't wakeup, and hence doesn't log the oops.. So if
you remove the wakeup here you have to add it someplace else to maintain
the logging ..
(I'm not theorizing here, I have defects logged against this specific
piece of code..)
Daniel
On Wed, 2008-01-23 at 10:50 -0800, Daniel Walker wrote:
> On Wed, 2008-01-23 at 13:43 -0500, Steven Rostedt wrote:
> > On Wed, 23 Jan 2008, Daniel Walker wrote:
> >
> > >
> > > On Wed, 2008-01-23 at 13:18 -0500, Steven Rostedt wrote:
> > >
> > > > The wakeup hook in schedule is when we find out that we hit our max. I
> > > > could postpone that somehow, but that would require more glue code than I
> > > > would like to add.
> > >
> > > Which hook specifically ?
> >
> > The wakeup tracer does a print on max when the new task is about to be
> > switched to (in context_switch).
>
> You can't move that hook to when the task actually starts running (i.e.
> at the end of schedule() )..
I mean , why can't you?
Daniel
On Wed, 23 Jan 2008, Daniel Walker wrote:
>
> On Wed, 2008-01-23 at 13:18 -0500, Steven Rostedt wrote:
>
> > The wakeup hook in schedule is when we find out that we hit our max. I
> > could postpone that somehow, but that would require more glue code than I
> > would like to add.
>
> Which hook specifically ?
The wakeup tracer does a print on max when the new task is about to be
switched to (in context_switch).
>
> > As for -rt, we've disabled printk for consoles not ATOMIC safe. So the
> > only consoles that will print in atomic sections so far are, earlyprintk
> > and the VGA.
>
> Is this code you added recently for tracing?
Nope, it's been there for a few releases and had nothing to do with the
latency tracer.
<looks at code>
Hmm, seems that only the "hack" fix is in upstream. The Red Hat version
has the better fix, but mainline didn't get it. OK, I'll add that to the
next -rt release.
-- Steve
On Wed, 2008-01-23 at 13:43 -0500, Steven Rostedt wrote:
> On Wed, 23 Jan 2008, Daniel Walker wrote:
>
> >
> > On Wed, 2008-01-23 at 13:18 -0500, Steven Rostedt wrote:
> >
> > > The wakeup hook in schedule is when we find out that we hit our max. I
> > > could postpone that somehow, but that would require more glue code than I
> > > would like to add.
> >
> > Which hook specifically ?
>
> The wakeup tracer does a print on max when the new task is about to be
> switched to (in context_switch).
You can't move that hook to when the task actually starts running (i.e.
at the end of schedule() )..
> >
> > > As for -rt, we've disabled printk for consoles not ATOMIC safe. So the
> > > only consoles that will print in atomic sections so far are, earlyprintk
> > > and the VGA.
> >
> > Is this code you added recently for tracing?
>
> Nope, it's been there for a few releases and had nothing to do with the
> latency tracer.
>
> <looks at code>
>
> Hmm, seems that only the "hack" fix is in upstream. The Red Hat version
> has the better fix, but mainline didn't get it. OK, I'll add that to the
> next -rt release.
mainline ? The problem I'm thinking of was directly tracing related..
Daniel
On Wed, 2008-01-23 at 13:18 -0500, Steven Rostedt wrote:
> The wakeup hook in schedule is when we find out that we hit our max. I
> could postpone that somehow, but that would require more glue code than I
> would like to add.
Which hook specifically ?
> As for -rt, we've disabled printk for consoles not ATOMIC safe. So the
> only consoles that will print in atomic sections so far are, earlyprintk
> and the VGA.
Is this code you added recently for tracing?
Daniel
On Wed, 23 Jan 2008, Daniel Walker wrote:
>
> > if (wake_klogd && !runqueue_is_locked())
> > wake_up_klogd();
> >
> > This probably is the cleanest solution since it simply prevents the
> > deadlock from occurring.
>
> Do you really need to call it with the runqueue lock held .. There are
> other issue with the calls at that level.. For instance in -rt these
> call can actually hang the system in the console laying from inside
> printk (something I reported more than 6 months ago) .. It would be
> better to move them around the scheduler rather than inside it..
The wakeup hook in schedule is when we find out that we hit our max. I
could postpone that somehow, but that would require more glue code than I
would like to add.
As for -rt, we've disabled printk for consoles not ATOMIC safe. So the
only consoles that will print in atomic sections so far are, earlyprintk
and the VGA.
Tim, could you send me your "postponed print" patches. That sounds like
something the -rt patch could use.
Thanks,
-- Steve
> if (wake_klogd && !runqueue_is_locked())
> wake_up_klogd();
>
> This probably is the cleanest solution since it simply prevents the
> deadlock from occurring.
Do you really need to call it with the runqueue lock held .. There are
other issue with the calls at that level.. For instance in -rt these
call can actually hang the system in the console laying from inside
printk (something I reported more than 6 months ago) .. It would be
better to move them around the scheduler rather than inside it..
Daniel
On Wed, 23 Jan 2008, Daniel Walker wrote:
>
> On Wed, 2008-01-23 at 11:02 -0500, Steven Rostedt wrote:
>
> > + if (!irqs_disabled() && wake_klogd)
> > wake_up_klogd();
>
> This causes a regression .. When printk is called during an OOPS in
> kernels without this change then the OOPS will get logged, since the
> logging process (klogd) is woken to handle the messages.. If you apply
> this change klogd doesn't wakeup, and hence doesn't log the oops.. So if
> you remove the wakeup here you have to add it someplace else to maintain
> the logging ..
>
> (I'm not theorizing here, I have defects logged against this specific
> piece of code..)
It wont get woken up anyway. Did you look at wake_up_klogd?
void wake_up_klogd(void)
{
if (!oops_in_progress && waitqueue_active(&log_wait))
wake_up_interruptible(&log_wait);
}
So if oops_in_progress is set, then it still wont get woken. Perhaps it
got woken some other way? Or is oops_in_progress not set in these oops?
One other solution is to make the runqueue locks visible externally. Like:
in sched.c:
int runqueue_is_locked(void)
{
int cpu = get_cpu();
struct rq *rq = cpu_rq(cpu);
int ret;
ret = spin_is_locked(&rq->lock);
put_cpu();
return ret;
}
And in printk we could do:
if (wake_klogd && !runqueue_is_locked())
wake_up_klogd();
This probably is the cleanest solution since it simply prevents the
deadlock from occurring.
-- Steve
* Daniel Walker ([email protected]) wrote:
>
> On Wed, 2008-01-23 at 11:02 -0500, Steven Rostedt wrote:
>
> > + if (!irqs_disabled() && wake_klogd)
> > wake_up_klogd();
>
> This causes a regression .. When printk is called during an OOPS in
> kernels without this change then the OOPS will get logged, since the
> logging process (klogd) is woken to handle the messages.. If you apply
> this change klogd doesn't wakeup, and hence doesn't log the oops.. So if
> you remove the wakeup here you have to add it someplace else to maintain
> the logging ..
>
> (I'm not theorizing here, I have defects logged against this specific
> piece of code..)
>
Can we change this for :
if (!(irqs_disabled() && !oops_in_progress) && wake_klogd)
wake_up_klogd();
?
(totally untested)
> Daniel
>
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
Mathieu Desnoyers wrote:
> Can we change this for :
> if (!(irqs_disabled() && !oops_in_progress) && wake_klogd)
> wake_up_klogd();
!oops_in_progress is already inside wake_up_klogd().
Also, I think you'd want that first '!' inside the parens.
I've been working on a system to handle deferred displaying
of printk messages, when printk is called in interrupt context.
Currently on a serial console if printk is called with
ints disabled, it can hold interrupts off for up to 20
milliseconds.
My solution was to set up a worker thread to handle deferred
printk activity. I don't think this works for the oops case,
because you don't want to defer the output. However, I can
post this code if you think it would help with this problem.
I hesitate to just drop the call to wake_up_klogd() without
some kind of mechanism call it later. Waiting until the
next printk will work to drain the buffer, but could lead
to indefinite intervals with printks waiting to be logged.
-- Tim
=============================
Tim Bird
Architecture Group Chair, CE Linux Forum
Senior Staff Engineer, Sony Corporation of America
=============================
* Steven Rostedt ([email protected]) wrote:
>
> On Wed, 23 Jan 2008, Daniel Walker wrote:
>
> >
> > On Wed, 2008-01-23 at 11:02 -0500, Steven Rostedt wrote:
> >
> > > + if (!irqs_disabled() && wake_klogd)
> > > wake_up_klogd();
> >
> > This causes a regression .. When printk is called during an OOPS in
> > kernels without this change then the OOPS will get logged, since the
> > logging process (klogd) is woken to handle the messages.. If you apply
> > this change klogd doesn't wakeup, and hence doesn't log the oops.. So if
> > you remove the wakeup here you have to add it someplace else to maintain
> > the logging ..
> >
> > (I'm not theorizing here, I have defects logged against this specific
> > piece of code..)
>
> It wont get woken up anyway. Did you look at wake_up_klogd?
>
> void wake_up_klogd(void)
> {
> if (!oops_in_progress && waitqueue_active(&log_wait))
> wake_up_interruptible(&log_wait);
> }
>
>
> So if oops_in_progress is set, then it still wont get woken. Perhaps it
> got woken some other way? Or is oops_in_progress not set in these oops?
>
> One other solution is to make the runqueue locks visible externally. Like:
>
> in sched.c:
>
> int runqueue_is_locked(void)
> {
> int cpu = get_cpu();
> struct rq *rq = cpu_rq(cpu);
> int ret;
>
> ret = spin_is_locked(&rq->lock);
> put_cpu();
> return ret;
> }
>
> And in printk we could do:
>
> if (wake_klogd && !runqueue_is_locked())
> wake_up_klogd();
>
> This probably is the cleanest solution since it simply prevents the
> deadlock from occurring.
>
FYI :
kernel/panic.c
NORET_TYPE void panic(const char * fmt, ...)
{
....
bust_spinlocks(1);
...
bust_spinlocks(0);
...
In bust_spinlocks :
void __attribute__((weak)) bust_spinlocks(int yes)
{
if (yes) {
++oops_in_progress;
} else {
#ifdef CONFIG_VT
unblank_screen();
#endif
if (--oops_in_progress == 0)
wake_up_klogd();
}
}
Where the final wake_up happens while oops_in_progress is 0, but
interrupts are still disabled.
So about my previous email, proposing testing oops_in_progress, it just
won't work.
Mathieu
> -- Steve
>
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
On Wed 2008-01-23 12:27:12, Steven Rostedt wrote:
>
> On Wed, 23 Jan 2008, Daniel Walker wrote:
>
> >
> > On Wed, 2008-01-23 at 11:02 -0500, Steven Rostedt wrote:
> >
> > > + if (!irqs_disabled() && wake_klogd)
> > > wake_up_klogd();
> >
> > This causes a regression .. When printk is called during an OOPS in
> > kernels without this change then the OOPS will get logged, since the
> > logging process (klogd) is woken to handle the messages.. If you apply
> > this change klogd doesn't wakeup, and hence doesn't log the oops.. So if
> > you remove the wakeup here you have to add it someplace else to maintain
> > the logging ..
> >
> > (I'm not theorizing here, I have defects logged against this specific
> > piece of code..)
>
> It wont get woken up anyway. Did you look at wake_up_klogd?
>
> void wake_up_klogd(void)
> {
> if (!oops_in_progress && waitqueue_active(&log_wait))
> wake_up_interruptible(&log_wait);
> }
>
>
> So if oops_in_progress is set, then it still wont get woken. Perhaps it
> got woken some other way? Or is oops_in_progress not set in these oops?
>
> One other solution is to make the runqueue locks visible externally. Like:
>
> in sched.c:
>
> int runqueue_is_locked(void)
> {
> int cpu = get_cpu();
> struct rq *rq = cpu_rq(cpu);
> int ret;
>
> ret = spin_is_locked(&rq->lock);
> put_cpu();
> return ret;
> }
>
> And in printk we could do:
>
> if (wake_klogd && !runqueue_is_locked())
> wake_up_klogd();
>
> This probably is the cleanest solution since it simply prevents the
> deadlock from occurring.
Yes please... this looks like a solution to me.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Wed 2008-01-23 12:25:09, Mathieu Desnoyers wrote:
> * Daniel Walker ([email protected]) wrote:
> >
> > On Wed, 2008-01-23 at 11:02 -0500, Steven Rostedt wrote:
> >
> > > + if (!irqs_disabled() && wake_klogd)
> > > wake_up_klogd();
> >
> > This causes a regression .. When printk is called during an OOPS in
> > kernels without this change then the OOPS will get logged, since the
> > logging process (klogd) is woken to handle the messages.. If you apply
> > this change klogd doesn't wakeup, and hence doesn't log the oops.. So if
> > you remove the wakeup here you have to add it someplace else to maintain
> > the logging ..
> >
> > (I'm not theorizing here, I have defects logged against this specific
> > piece of code..)
> >
>
> Can we change this for :
> if (!(irqs_disabled() && !oops_in_progress) && wake_klogd)
> wake_up_klogd();
>
> ?
That's wrong, too.
Just make wake_up_klogd do trylock, if it finds that it can't get
neccessary locks, printk(KERN_ALERT) so at least console gets the
message, but proceed without the lock.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
Steven Rostedt wrote:
> Tim, could you send me your "postponed print" patches. That sounds like
> something the -rt patch could use.
Sure. See below.
This patch changes the way the printk code processes output to the
console buffer. On some platforms, especially those with a serial
console, the way printk output the messages to console caused significant
interrupt-off periods. This code introduces two config options to
improve that handling.
Without this change, test code run on a 178 MHZ ARM platform
showed an interrupt off period for a particularly long printk message
of 24 milliseconds. With this patch applied, and the options set
to 'Y', the same message only had an interrupt-off period of
482 microseconds.
Portions of this (some parts that reorder locking and irq management
primitives) were taken from the RT-preempt patch.
The patch does several things:
1. It moves the location of the irqrestore operations in the
printk code, so that the console writing can occur with interrupts enabled
2. It changes the code so that characters are written to the console
one-at-a-time. The serial console code holds interrupts disabled for the
duration of its output to the serial port. To decrease the interrupt-off
time, only a small amount of data is written at a time. (This could be tuned
by writing chunks larger than 1 character at a time, but the current solution
was easiest, and gives the lowest interrupt-off times.)
3. The printk code was changed so that if it is entered with interrupts
disabled, it does not output to console immediately. Rather, a workqueue
routine is scheduled to process the console output later (when interrupts
are enabled).
4. An init-time routine was created, to prevent trying to schedule the
workqueue routine too early in the system boot sequence.
Here's the patch:
Signed-off-by: Tim Bird [email protected]
---
kernel/printk.c | 92 85 + 7 - 0 !
lib/Kconfig.debug | 42 42 + 0 - 0 !
2 files changed, 127 insertions(+), 7 deletions(-)
Index: linux-2.6.24-rc8/kernel/printk.c
===================================================================
--- linux-2.6.24-rc8.orig/kernel/printk.c
+++ linux-2.6.24-rc8/kernel/printk.c
@@ -33,11 +33,14 @@
#include <linux/bootmem.h>
#include <linux/syscalls.h>
#include <linux/jiffies.h>
+#include <linux/workqueue.h>
#include <asm/uaccess.h>
#define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT)
+#define PRINTK_BUF_SIZE 1024
+
/* printk's without a loglevel use this.. */
#define DEFAULT_MESSAGE_LOGLEVEL 4 /* KERN_WARNING */
@@ -47,6 +50,10 @@
DECLARE_WAIT_QUEUE_HEAD(log_wait);
+static void printk_console_flush(struct work_struct *ignored);
+static int have_callable_console(void);
+static DECLARE_WORK(printk_console_work, printk_console_flush);
+
int console_printk[4] = {
DEFAULT_CONSOLE_LOGLEVEL, /* console_loglevel */
DEFAULT_MESSAGE_LOGLEVEL, /* default_message_loglevel */
@@ -220,6 +227,43 @@ static inline void boot_delay_msec(void)
#endif
/*
+ * We can't schedule printk console flushing during early boot.
+ * This flag holds it off until after we've initialized some
+ * important stuff.
+ */
+static int pcf_ok = 0;
+
+static int __init init_printk_console_flush(void)
+{
+ pcf_ok = 1;
+ return 0;
+}
+__initcall(init_printk_console_flush);
+
+void printk_console_flush(struct work_struct *ignored)
+{
+ unsigned long flags;
+ int chars_to_flush;
+
+ spin_lock_irqsave(&logbuf_lock, flags);
+ chars_to_flush = log_end - con_start;
+ spin_unlock_irqrestore(&logbuf_lock, flags);
+
+ if (chars_to_flush) {
+ acquire_console_sem();
+
+ if ((cpu_online(smp_processor_id()) ||
+ have_callable_console())) {
+ console_may_schedule = 0;
+ release_console_sem();
+ } else {
+ console_locked = 0;
+ up(&console_sem);
+ }
+ }
+}
+
+/*
* Return the number of unread characters in the log buffer.
*/
int log_buf_get_len(void)
@@ -436,8 +480,16 @@ static void __call_console_drivers(unsig
for (con = console_drivers; con; con = con->next) {
if ((con->flags & CON_ENABLED) && con->write &&
(cpu_online(smp_processor_id()) ||
- (con->flags & CON_ANYTIME)))
+ (con->flags & CON_ANYTIME))) {
+#ifdef CONFIG_PRINTK_SHORT_IRQ_DISABLE
+ int i;
+ for(i=0; i<end-start; i++) {
+ con->write(con, &LOG_BUF(start+i), 1);
+ }
+#else
con->write(con, &LOG_BUF(start), end - start);
+#endif
+ }
}
}
@@ -633,8 +685,9 @@ asmlinkage int vprintk(const char *fmt,
unsigned long flags;
int printed_len;
char *p;
- static char printk_buf[1024];
+ static char printk_buf[PRINTK_BUF_SIZE];
static int log_level_unknown = 1;
+ int defer;
boot_delay_msec();
@@ -649,6 +702,7 @@ asmlinkage int vprintk(const char *fmt,
lockdep_off();
spin_lock(&logbuf_lock);
printk_cpu = smp_processor_id();
+ preempt_enable();
/* Emit the output into the temporary buffer */
printed_len = vscnprintf(printk_buf, sizeof(printk_buf), fmt, args);
@@ -718,6 +772,19 @@ asmlinkage int vprintk(const char *fmt,
console_locked = 1;
printk_cpu = UINT_MAX;
spin_unlock(&logbuf_lock);
+ lockdep_on();
+ local_irq_restore(flags);
+
+ defer = 0;
+#ifdef CONFIG_PRINTK_DEFER_IN_IRQ
+ /*
+ * If we're in an interrupt and it's not an emergency,
+ * don't emit to console just yet.
+ */
+ if ((irqs_disabled() || in_interrupt()) && !oops_in_progress) {
+ defer = 1;
+ }
+#endif
/*
* Console drivers may assume that per-cpu resources have
@@ -725,7 +792,8 @@ asmlinkage int vprintk(const char *fmt,
* being able to cope (CON_ANYTIME) don't call them until
* this CPU is officially up.
*/
- if (cpu_online(smp_processor_id()) || have_callable_console()) {
+ if (!defer && (cpu_online(smp_processor_id()) ||
+ have_callable_console())) {
console_may_schedule = 0;
release_console_sem();
} else {
@@ -733,8 +801,6 @@ asmlinkage int vprintk(const char *fmt,
console_locked = 0;
up(&console_sem);
}
- lockdep_on();
- raw_local_irq_restore(flags);
} else {
/*
* Someone else owns the drivers. We drop the spinlock, which
@@ -747,7 +813,11 @@ asmlinkage int vprintk(const char *fmt,
raw_local_irq_restore(flags);
}
- preempt_enable();
+ /* didn't flush console all the way yet? - schedule for later */
+ if ((log_end - con_start) && pcf_ok) {
+ schedule_work(&printk_console_work);
+ }
+
return printed_len;
}
EXPORT_SYMBOL(printk);
@@ -971,13 +1041,21 @@ void release_console_sem(void)
_con_start = con_start;
_log_end = log_end;
con_start = log_end; /* Flush */
+
+#ifdef CONFIG_PRINTK_SHORT_IRQ_DISABLE
+ /* allow irqs during console out */
+ spin_unlock_irqrestore(&logbuf_lock, flags);
+ call_console_drivers(_con_start, _log_end);
+#else
spin_unlock(&logbuf_lock);
call_console_drivers(_con_start, _log_end);
local_irq_restore(flags);
+#endif
}
console_locked = 0;
- up(&console_sem);
spin_unlock_irqrestore(&logbuf_lock, flags);
+ up(&console_sem);
+
if (wake_klogd)
wake_up_klogd();
}
Index: linux-2.6.24-rc8/lib/Kconfig.debug
===================================================================
--- linux-2.6.24-rc8.orig/lib/Kconfig.debug
+++ linux-2.6.24-rc8/lib/Kconfig.debug
@@ -17,6 +17,48 @@ config ENABLE_WARN_DEPRECATED
Disable this to suppress the "warning: 'foo' is deprecated
(declared at kernel/power/somefile.c:1234)" messages.
+config PRINTK_SHORT_IRQ_DISABLE
+ bool "Enable interrupts for shorter times during printk output"
+ default n
+ depends on PRINTK
+ help
+ Selecting this option will cause the kernel to write the
+ contents of the printk buffer to the console with interrupts
+ enabled, and in small increments.
+
+ With the default setting of 'n', the kernel attempts to
+ send printk messages to the console with interrupts disabled,
+ and it sends the entire contents of the printk buffer in one go.
+
+ For some consoles (notably the serial console, which is commonly
+ used in embedded products), this results in long interrupt-off
+ periods.
+
+ Users interested in good real-time scheduling should set this
+ to 'y'.
+
+config PRINTK_DEFER_IN_IRQ
+ bool "Defer writing printk output to console during interrupts"
+ default n
+ depends on PRINTK && PRINTK_SHORT_IRQ_DISABLE
+ help
+ Selecting this option will cause the kernel to hold off writing
+ printk messages to the console until interrupts are enabled.
+ On some systems, printk buffer and console handling can take
+ a long time. If done with interrupts off for the entire duration
+ of the operation, this results in long interrupt-off periods.
+
+ With the default setting of 'n', the kernel attempts to
+ send printk messages to the console whether it is in interrupt
+ context or not.
+
+ With a setting of 'y', when printk is called with interrupts
+ disabled, the messages are buffered in the kernel's
+ log buffer until later. If you are debugging the kernel
+ with printks in interrupt routines, you should set this
+ to 'n'. Users interested in good real-time scheduling
+ should set this to 'y'.
+
config ENABLE_MUST_CHECK
bool "Enable __must_check logic"
default y
On Thu, 2008-01-24 at 14:42 -0800, Tim Bird wrote:
> Steven Rostedt wrote:
> > Tim, could you send me your "postponed print" patches. That sounds like
> > something the -rt patch could use.
>
> Sure. See below.
>
> This patch changes the way the printk code processes output to the
> console buffer. On some platforms, especially those with a serial
> console, the way printk output the messages to console caused significant
> interrupt-off periods. This code introduces two config options to
> improve that handling.
>
> Without this change, test code run on a 178 MHZ ARM platform
> showed an interrupt off period for a particularly long printk message
> of 24 milliseconds. With this patch applied, and the options set
> to 'Y', the same message only had an interrupt-off period of
> 482 microseconds.
>
> Portions of this (some parts that reorder locking and irq management
> primitives) were taken from the RT-preempt patch.
>
> The patch does several things:
>
> 1. It moves the location of the irqrestore operations in the
> printk code, so that the console writing can occur with interrupts enabled
>
> 2. It changes the code so that characters are written to the console
> one-at-a-time. The serial console code holds interrupts disabled for the
> duration of its output to the serial port. To decrease the interrupt-off
> time, only a small amount of data is written at a time. (This could be tuned
> by writing chunks larger than 1 character at a time, but the current solution
> was easiest, and gives the lowest interrupt-off times.)
>
> 3. The printk code was changed so that if it is entered with interrupts
> disabled, it does not output to console immediately. Rather, a workqueue
> routine is scheduled to process the console output later (when interrupts
> are enabled).
>
> 4. An init-time routine was created, to prevent trying to schedule the
> workqueue routine too early in the system boot sequence.
>
> Here's the patch:
I suspect these features reduce the change a crash messages makes it out
onto the console, but fail to spot any of the copious text mention this
critical issue.
> Signed-off-by: Tim Bird [email protected]
> ---
> kernel/printk.c | 92 85 + 7 - 0 !
> lib/Kconfig.debug | 42 42 + 0 - 0 !
> 2 files changed, 127 insertions(+), 7 deletions(-)
>
> Index: linux-2.6.24-rc8/kernel/printk.c
> ===================================================================
> --- linux-2.6.24-rc8.orig/kernel/printk.c
> +++ linux-2.6.24-rc8/kernel/printk.c
> @@ -33,11 +33,14 @@
> #include <linux/bootmem.h>
> #include <linux/syscalls.h>
> #include <linux/jiffies.h>
> +#include <linux/workqueue.h>
>
> #include <asm/uaccess.h>
>
> #define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT)
>
> +#define PRINTK_BUF_SIZE 1024
> +
> /* printk's without a loglevel use this.. */
> #define DEFAULT_MESSAGE_LOGLEVEL 4 /* KERN_WARNING */
>
> @@ -47,6 +50,10 @@
>
> DECLARE_WAIT_QUEUE_HEAD(log_wait);
>
> +static void printk_console_flush(struct work_struct *ignored);
> +static int have_callable_console(void);
> +static DECLARE_WORK(printk_console_work, printk_console_flush);
> +
> int console_printk[4] = {
> DEFAULT_CONSOLE_LOGLEVEL, /* console_loglevel */
> DEFAULT_MESSAGE_LOGLEVEL, /* default_message_loglevel */
> @@ -220,6 +227,43 @@ static inline void boot_delay_msec(void)
> #endif
>
> /*
> + * We can't schedule printk console flushing during early boot.
> + * This flag holds it off until after we've initialized some
> + * important stuff.
> + */
> +static int pcf_ok = 0;
> +
> +static int __init init_printk_console_flush(void)
> +{
> + pcf_ok = 1;
> + return 0;
> +}
> +__initcall(init_printk_console_flush);
> +
> +void printk_console_flush(struct work_struct *ignored)
> +{
> + unsigned long flags;
> + int chars_to_flush;
> +
> + spin_lock_irqsave(&logbuf_lock, flags);
> + chars_to_flush = log_end - con_start;
> + spin_unlock_irqrestore(&logbuf_lock, flags);
> +
> + if (chars_to_flush) {
> + acquire_console_sem();
> +
> + if ((cpu_online(smp_processor_id()) ||
> + have_callable_console())) {
> + console_may_schedule = 0;
> + release_console_sem();
> + } else {
> + console_locked = 0;
> + up(&console_sem);
> + }
> + }
> +}
> +
> +/*
> * Return the number of unread characters in the log buffer.
> */
> int log_buf_get_len(void)
> @@ -436,8 +480,16 @@ static void __call_console_drivers(unsig
> for (con = console_drivers; con; con = con->next) {
> if ((con->flags & CON_ENABLED) && con->write &&
> (cpu_online(smp_processor_id()) ||
> - (con->flags & CON_ANYTIME)))
> + (con->flags & CON_ANYTIME))) {
> +#ifdef CONFIG_PRINTK_SHORT_IRQ_DISABLE
> + int i;
> + for(i=0; i<end-start; i++) {
> + con->write(con, &LOG_BUF(start+i), 1);
> + }
> +#else
> con->write(con, &LOG_BUF(start), end - start);
> +#endif
> + }
> }
> }
>
> @@ -633,8 +685,9 @@ asmlinkage int vprintk(const char *fmt,
> unsigned long flags;
> int printed_len;
> char *p;
> - static char printk_buf[1024];
> + static char printk_buf[PRINTK_BUF_SIZE];
> static int log_level_unknown = 1;
> + int defer;
>
> boot_delay_msec();
>
> @@ -649,6 +702,7 @@ asmlinkage int vprintk(const char *fmt,
> lockdep_off();
> spin_lock(&logbuf_lock);
> printk_cpu = smp_processor_id();
> + preempt_enable();
>
> /* Emit the output into the temporary buffer */
> printed_len = vscnprintf(printk_buf, sizeof(printk_buf), fmt, args);
> @@ -718,6 +772,19 @@ asmlinkage int vprintk(const char *fmt,
> console_locked = 1;
> printk_cpu = UINT_MAX;
> spin_unlock(&logbuf_lock);
> + lockdep_on();
> + local_irq_restore(flags);
> +
> + defer = 0;
> +#ifdef CONFIG_PRINTK_DEFER_IN_IRQ
> + /*
> + * If we're in an interrupt and it's not an emergency,
> + * don't emit to console just yet.
> + */
> + if ((irqs_disabled() || in_interrupt()) && !oops_in_progress) {
> + defer = 1;
> + }
> +#endif
>
> /*
> * Console drivers may assume that per-cpu resources have
> @@ -725,7 +792,8 @@ asmlinkage int vprintk(const char *fmt,
> * being able to cope (CON_ANYTIME) don't call them until
> * this CPU is officially up.
> */
> - if (cpu_online(smp_processor_id()) || have_callable_console()) {
> + if (!defer && (cpu_online(smp_processor_id()) ||
> + have_callable_console())) {
> console_may_schedule = 0;
> release_console_sem();
> } else {
> @@ -733,8 +801,6 @@ asmlinkage int vprintk(const char *fmt,
> console_locked = 0;
> up(&console_sem);
> }
> - lockdep_on();
> - raw_local_irq_restore(flags);
> } else {
> /*
> * Someone else owns the drivers. We drop the spinlock, which
> @@ -747,7 +813,11 @@ asmlinkage int vprintk(const char *fmt,
> raw_local_irq_restore(flags);
> }
>
> - preempt_enable();
> + /* didn't flush console all the way yet? - schedule for later */
> + if ((log_end - con_start) && pcf_ok) {
> + schedule_work(&printk_console_work);
> + }
> +
> return printed_len;
> }
> EXPORT_SYMBOL(printk);
> @@ -971,13 +1041,21 @@ void release_console_sem(void)
> _con_start = con_start;
> _log_end = log_end;
> con_start = log_end; /* Flush */
> +
> +#ifdef CONFIG_PRINTK_SHORT_IRQ_DISABLE
> + /* allow irqs during console out */
> + spin_unlock_irqrestore(&logbuf_lock, flags);
> + call_console_drivers(_con_start, _log_end);
> +#else
> spin_unlock(&logbuf_lock);
> call_console_drivers(_con_start, _log_end);
> local_irq_restore(flags);
> +#endif
> }
> console_locked = 0;
> - up(&console_sem);
> spin_unlock_irqrestore(&logbuf_lock, flags);
> + up(&console_sem);
> +
> if (wake_klogd)
> wake_up_klogd();
> }
> Index: linux-2.6.24-rc8/lib/Kconfig.debug
> ===================================================================
> --- linux-2.6.24-rc8.orig/lib/Kconfig.debug
> +++ linux-2.6.24-rc8/lib/Kconfig.debug
> @@ -17,6 +17,48 @@ config ENABLE_WARN_DEPRECATED
> Disable this to suppress the "warning: 'foo' is deprecated
> (declared at kernel/power/somefile.c:1234)" messages.
>
> +config PRINTK_SHORT_IRQ_DISABLE
> + bool "Enable interrupts for shorter times during printk output"
> + default n
> + depends on PRINTK
> + help
> + Selecting this option will cause the kernel to write the
> + contents of the printk buffer to the console with interrupts
> + enabled, and in small increments.
> +
> + With the default setting of 'n', the kernel attempts to
> + send printk messages to the console with interrupts disabled,
> + and it sends the entire contents of the printk buffer in one go.
> +
> + For some consoles (notably the serial console, which is commonly
> + used in embedded products), this results in long interrupt-off
> + periods.
> +
> + Users interested in good real-time scheduling should set this
> + to 'y'.
> +
> +config PRINTK_DEFER_IN_IRQ
> + bool "Defer writing printk output to console during interrupts"
> + default n
> + depends on PRINTK && PRINTK_SHORT_IRQ_DISABLE
> + help
> + Selecting this option will cause the kernel to hold off writing
> + printk messages to the console until interrupts are enabled.
> + On some systems, printk buffer and console handling can take
> + a long time. If done with interrupts off for the entire duration
> + of the operation, this results in long interrupt-off periods.
> +
> + With the default setting of 'n', the kernel attempts to
> + send printk messages to the console whether it is in interrupt
> + context or not.
> +
> + With a setting of 'y', when printk is called with interrupts
> + disabled, the messages are buffered in the kernel's
> + log buffer until later. If you are debugging the kernel
> + with printks in interrupt routines, you should set this
> + to 'n'. Users interested in good real-time scheduling
> + should set this to 'y'.
> +
> config ENABLE_MUST_CHECK
> bool "Enable __must_check logic"
> default y
>
>
Peter Zijlstra wrote:
> I suspect these features reduce the chance a crash messages makes it out
> onto the console, but fail to spot any of the copious text mention this
> critical issue.
Anything not in interrupts would not be deferred.
The following code is meant to avoid deferring in oops as well.
I'm not sure if that covers all crash cases or not.
If you know of other cases, and ways to detect them, please advise.
>> +#ifdef CONFIG_PRINTK_DEFER_IN_IRQ
>> + /*
>> + * If we're in an interrupt and it's not an emergency,
>> + * don't emit to console just yet.
>> + */
>> + if ((irqs_disabled() || in_interrupt()) && !oops_in_progress) {
>> + defer = 1;
>> + }
>> +#endif
I made the code conditional because printk messages are pretty
critical for debugging, and I wanted someone to be aware if they
altered the behavior in any way. I suspect that only in embedded,
when people are having realtime performance problems, would this
change be of interest.
Please recognize that I'm not submitting this for kernel inclusion
at this time. Please see the larger thread for context. That said,
I appreciate the feedback on the code.
-- Tim
=============================
Tim Bird
Architecture Group Chair, CE Linux Forum
Senior Staff Engineer, Sony Corporation of America
=============================
> On Thu, 24 Jan 2008 15:23:03 -0800 Tim Bird <[email protected]> wrote:
> Peter Zijlstra wrote:
> > I suspect these features reduce the chance a crash messages makes it out
> > onto the console, but fail to spot any of the copious text mention this
> > critical issue.
>
> Anything not in interrupts would not be deferred.
> The following code is meant to avoid deferring in oops as well.
> I'm not sure if that covers all crash cases or not.
> If you know of other cases, and ways to detect them, please advise.
>
Yes, using oops_in_progress seems an appropriate fix for that.
I assume that another downside is that printk-from-interrupt will now get
jumbled up with an interrupted pritk-from-non-interrupt.
Please use checkpatch.
Andrew Morton wrote:
>> On Thu, 24 Jan 2008 15:23:03 -0800 Tim Bird <[email protected]> wrote:
>> Peter Zijlstra wrote:
>>> I suspect these features reduce the chance a crash messages makes it out
>>> onto the console, but fail to spot any of the copious text mention this
>>> critical issue.
>> Anything not in interrupts would not be deferred.
>> The following code is meant to avoid deferring in oops as well.
>> I'm not sure if that covers all crash cases or not.
>> If you know of other cases, and ways to detect them, please advise.
>>
>
> Yes, using oops_in_progress seems an appropriate fix for that.
>
> I assume that another downside is that printk-from-interrupt will now get
> jumbled up with an interrupted pritk-from-non-interrupt.
No. The order of printk messages (whether printk'ed
from interrupt context or not) should not be affected.
This patch should only affect the timing of when the
messages are output to console.
Note that an oops will flush any buffered messages
ahead of the oops messages themselves.
Some messages, not in oops, that were printk'ed in
interrupt context, may still be in the log buffer
but not have made it to console if the machine hangs
hard. Because of this (IMO slight) downgrade in
debugging utility, I chose to make the changes
configurable.
> Please use checkpatch.
Will do from now on! Thanks!
-- Tim
=============================
Tim Bird
Architecture Group Chair, CE Linux Forum
Senior Staff Engineer, Sony Corporation of America
=============================