2008-03-24 12:39:30

by Peter Zijlstra

[permalink] [raw]
Subject: [PATCH 0/2] printk vs rq->lock and xtime lock

Hi Linus,

As to the regression reported by Marcin; what happens is that we invoke
printk() while holding the xtime lock for writing. printk() will call
wake_up_klogd() which tries to enqueue klogd on some rq.

The known deadlock here is calling printk() while holding rq->lock, which
would then try to recusively lock the rq again when trying to wake klogd.

The new deadlock is due to task enqueues setting an hrtimer, which requires
reading the time, which will result in a live-lock when the printk() call-
site is holding the xtime lock for writing.

Thomas would like to preserve the printk() information if possible, hence my
proposal of printk_nowakeup(). It will do everything printk() normally does,
except ommit to wake up of klogd. The call is explicitly not EXPORTed so that
its use is confined to core kernel code.

Marcin, could you please test these two patches to confirm they do indeed
solve your issue as well?

Peter



2008-03-24 14:32:38

by Marcin Ślusarz

[permalink] [raw]
Subject: Re: [PATCH 0/2] printk vs rq->lock and xtime lock

On Mon, Mar 24, 2008 at 01:24:24PM +0100, Peter Zijlstra wrote:
> Hi Linus,
>
> As to the regression reported by Marcin; what happens is that we invoke
> printk() while holding the xtime lock for writing. printk() will call
> wake_up_klogd() which tries to enqueue klogd on some rq.
>
> The known deadlock here is calling printk() while holding rq->lock, which
> would then try to recusively lock the rq again when trying to wake klogd.
>
> The new deadlock is due to task enqueues setting an hrtimer, which requires
> reading the time, which will result in a live-lock when the printk() call-
> site is holding the xtime lock for writing.
>
> Thomas would like to preserve the printk() information if possible, hence my
> proposal of printk_nowakeup(). It will do everything printk() normally does,
> except ommit to wake up of klogd. The call is explicitly not EXPORTed so that
> its use is confined to core kernel code.
>
> Marcin, could you please test these two patches to confirm they do indeed
> solve your issue as well?

I've successfully tested these patches. Thanks.
Tested-by: Marcin Slusarz <[email protected]>

Marcin

2008-03-24 18:00:00

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 0/2] printk vs rq->lock and xtime lock



On Mon, 24 Mar 2008, Peter Zijlstra wrote:
>
> As to the regression reported by Marcin; what happens is that we invoke
> printk() while holding the xtime lock for writing. printk() will call
> wake_up_klogd() which tries to enqueue klogd on some rq.
>
> The known deadlock here is calling printk() while holding rq->lock, which
> would then try to recusively lock the rq again when trying to wake klogd.

Ok.

Right now, however, I think that for 2.6.25 I'll just remove the printk.

And for the long haul, I really don't think the solution is
"printk_nowakup()", because this is going to happen again when somebody
doesn't realize the code is called with the rq lock held, and it's going
to be a bitch to debug.

I just don't think this is maintainable.

Linus

2008-03-24 18:16:30

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 0/2] printk vs rq->lock and xtime lock

On Mon, 2008-03-24 at 10:58 -0700, Linus Torvalds wrote:
>
> On Mon, 24 Mar 2008, Peter Zijlstra wrote:
> >
> > As to the regression reported by Marcin; what happens is that we invoke
> > printk() while holding the xtime lock for writing. printk() will call
> > wake_up_klogd() which tries to enqueue klogd on some rq.
> >
> > The known deadlock here is calling printk() while holding rq->lock, which
> > would then try to recusively lock the rq again when trying to wake klogd.
>
> Ok.
>
> Right now, however, I think that for 2.6.25 I'll just remove the printk.
>
> And for the long haul, I really don't think the solution is
> "printk_nowakup()", because this is going to happen again when somebody
> doesn't realize the code is called with the rq lock held, and it's going
> to be a bitch to debug.

Yeah, we get the printk vs rq->lock thing on regular basis, the xtime
lock is new.

If the NMI watchdog works its rather easy to debug.

> I just don't think this is maintainable.

I'm afraid I'll have to agree.

How about I use the lockdep infrastructure to check if printk() is
invoked whole holding either xtime or rq lock, and then avoid calling
wake_up_klogd(). That way, we at least get sane debug output when the
lock debugging infrastructure is enabled?

As for removing the printk(), Thomas, do you see any other sane way to
relay that information?


2008-03-24 18:17:20

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 0/2] printk vs rq->lock and xtime lock



On Mon, 24 Mar 2008, Linus Torvalds wrote:
>
> Right now, however, I think that for 2.6.25 I'll just remove the printk.

So I just committed that patch, but I was also looking at just making
'printk()' itself more robust against this kind of thing.

Quite frankly, the printk() console semaphore etc handling is very
confusing, and it has obviously evolved over time. Here's a suggested
cleanup patch that doesn't actually *fix* anything, but it makes the
resulting code flow a heck of a lot more readable.

In particular, it would now easily and readably allow us to add other
rules inside the new "can_use_console()" that disable the console flushing
if (for example) the request queue lock is held, or xlock is held for
writing.

[ The patch adds more lines than it removes, but that is largely due to
the comments and the fact that it moves some of the code into new helper
functions, which invariably adds a few lines for the function definition
etc.

There are actually fewer lines of actual code, because it ends up using
the already-existing "try_acquire_console_sem()", and it avoids one ugly
special case, and removes some trivial code duplication about releasing
the logbuf_lock by reorganizing things a bit ]

Comments?

Linus
---
kernel/printk.c | 83 +++++++++++++++++++++++++++++++-----------------------
1 files changed, 48 insertions(+), 35 deletions(-)

diff --git a/kernel/printk.c b/kernel/printk.c
index 9adc2a4..c46a20a 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -616,6 +616,40 @@ asmlinkage int printk(const char *fmt, ...)
/* cpu currently holding logbuf_lock */
static volatile unsigned int printk_cpu = UINT_MAX;

+/*
+ * Can we actually use the console at this time on this cpu?
+ *
+ * Console drivers may assume that per-cpu resources have
+ * been allocated. So unless they're explicitly marked as
+ * being able to cope (CON_ANYTIME) don't call them until
+ * this CPU is officially up.
+ */
+static inline int can_use_console(unsigned int cpu)
+{
+ return cpu_online(cpu) || have_callable_console();
+}
+
+/*
+ * Try to get console ownership to actually show the kernel
+ * messages from a 'printk'. Return true (and with the
+ * console_semaphore held, and 'console_locked' set) if it
+ * is successful, false otherwise.
+ *
+ * This gets called with the 'logbuf_lock' spinlock held and
+ * interrupts disabled. It should return with 'lockbuf_lock'
+ * released but interrupts still disabled.
+ */
+static int acquire_console_semaphore_for_printk(unsigned int cpu)
+{
+ int retval = 0;
+
+ if (can_use_console(cpu))
+ retval = !try_acquire_console_sem();
+ printk_cpu = UINT_MAX;
+ spin_unlock(&logbuf_lock);
+ return retval;
+}
+
const char printk_recursion_bug_msg [] =
KERN_CRIT "BUG: recent printk recursion!\n";
static int printk_recursion_bug;
@@ -725,43 +759,22 @@ asmlinkage int vprintk(const char *fmt, va_list args)
log_level_unknown = 1;
}

- if (!down_trylock(&console_sem)) {
- /*
- * We own the drivers. We can drop the spinlock and
- * let release_console_sem() print the text, maybe ...
- */
- console_locked = 1;
- printk_cpu = UINT_MAX;
- spin_unlock(&logbuf_lock);
+ /*
+ * Try to acquire and then immediately release the
+ * console semaphore. The release will do all the
+ * actual magic (print out buffers, wake up klogd,
+ * etc).
+ *
+ * The acquire_console_semaphore_for_printk() function
+ * will release 'logbuf_lock' regardless of whether it
+ * actually gets the semaphore or not.
+ */
+ if (acquire_console_semaphore_for_printk(this_cpu))
+ release_console_sem();

- /*
- * Console drivers may assume that per-cpu resources have
- * been allocated. So unless they're explicitly marked as
- * being able to cope (CON_ANYTIME) don't call them until
- * this CPU is officially up.
- */
- if (cpu_online(smp_processor_id()) || have_callable_console()) {
- console_may_schedule = 0;
- release_console_sem();
- } else {
- /* Release by hand to avoid flushing the buffer. */
- console_locked = 0;
- up(&console_sem);
- }
- lockdep_on();
- raw_local_irq_restore(flags);
- } else {
- /*
- * Someone else owns the drivers. We drop the spinlock, which
- * allows the semaphore holder to proceed and to call the
- * console drivers with the output which we just produced.
- */
- printk_cpu = UINT_MAX;
- spin_unlock(&logbuf_lock);
- lockdep_on();
+ lockdep_on();
out_restore_irqs:
- raw_local_irq_restore(flags);
- }
+ raw_local_irq_restore(flags);

preempt_enable();
return printed_len;

2008-03-24 18:58:20

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 0/2] printk vs rq->lock and xtime lock

On Mon, 24 Mar 2008 19:15:47 +0100
Peter Zijlstra <[email protected]> wrote:

> How about I use the lockdep infrastructure to check if printk() is
> invoked whole holding either xtime or rq lock, and then avoid calling
> wake_up_klogd(). That way, we at least get sane debug output when the
> lock debugging infrastructure is enabled?

The core problem seems to be that printk shouldn't be calling wake_up().
Can we fix that?

I expect it would be acceptable to do it from the timer interrupt instead.
For NOHZ kernels a poll when we enter the idle loop would also be needed.

But does that cover everything? Is it possible for a CPU to run 100% busy
while not receiving timer interrupts? I guess so. To receive no
interrupts at all? Also possible.

2008-08-08 13:31:10

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 0/2] printk vs rq->lock and xtime lock

On Mon, 2008-03-24 at 11:57 -0700, Andrew Morton wrote:
> On Mon, 24 Mar 2008 19:15:47 +0100
> Peter Zijlstra <[email protected]> wrote:
>
> > How about I use the lockdep infrastructure to check if printk() is
> > invoked whole holding either xtime or rq lock, and then avoid calling
> > wake_up_klogd(). That way, we at least get sane debug output when the
> > lock debugging infrastructure is enabled?
>
> The core problem seems to be that printk shouldn't be calling wake_up().
> Can we fix that?
>
> I expect it would be acceptable to do it from the timer interrupt instead.
> For NOHZ kernels a poll when we enter the idle loop would also be needed.

Something along the lines of the below patch?

> But does that cover everything? Is it possible for a CPU to run 100% busy
> while not receiving timer interrupts? I guess so. To receive no
> interrupts at all? Also possible.

local_irq_disable(); while (1);

But I guess you have more pressing issues when that happens..

---
Subject: printk: robustify printk wakeup behaviour

The klogd wakeup in the printk patch can cause deadlocks when holding the
rq->lock and or xtime_lock for writing.

Avoid doing the wakeup under certain conditions and delay it to the next jiffy
tick.

Signed-off-by: Peter Zijlstra <[email protected]>
---
include/linux/kernel.h | 4 ++++
include/linux/seqlock.h | 5 +++++
kernel/printk.c | 47 ++++++++++++++++++++++++++++++++++++++++++++++-
kernel/time/tick-sched.c | 2 +-
kernel/timer.c | 1 +
5 files changed, 57 insertions(+), 2 deletions(-)

Index: linux-2.6/include/linux/kernel.h
===================================================================
--- linux-2.6.orig/include/linux/kernel.h
+++ linux-2.6/include/linux/kernel.h
@@ -200,6 +200,8 @@ extern struct ratelimit_state printk_rat
extern int printk_ratelimit(void);
extern bool printk_timed_ratelimit(unsigned long *caller_jiffies,
unsigned int interval_msec);
+extern void printk_tick(void);
+extern int printk_needs_cpu(int);
#else
static inline int vprintk(const char *s, va_list args)
__attribute__ ((format (printf, 1, 0)));
@@ -211,6 +213,8 @@ static inline int printk_ratelimit(void)
static inline bool printk_timed_ratelimit(unsigned long *caller_jiffies, \
unsigned int interval_msec) \
{ return false; }
+static inline void printk_tick(void) { }
+static inline int printk_needs_cpu(int) { return 0; }
#endif

extern void asmlinkage __attribute__((format(printf, 1, 2)))
Index: linux-2.6/include/linux/seqlock.h
===================================================================
--- linux-2.6.orig/include/linux/seqlock.h
+++ linux-2.6/include/linux/seqlock.h
@@ -71,6 +71,11 @@ static inline void write_sequnlock(seqlo
spin_unlock(&sl->lock);
}

+static inline int seq_is_writelocked(seqlock_t *sl)
+{
+ return spin_is_locked(&sl->lock);
+}
+
static inline int write_tryseqlock(seqlock_t *sl)
{
int ret = spin_trylock(&sl->lock);
Index: linux-2.6/kernel/printk.c
===================================================================
--- linux-2.6.orig/kernel/printk.c
+++ linux-2.6/kernel/printk.c
@@ -32,6 +32,7 @@
#include <linux/security.h>
#include <linux/bootmem.h>
#include <linux/syscalls.h>
+#include <linux/time.h>

#include <asm/uaccess.h>

@@ -982,12 +983,56 @@ int is_console_locked(void)
return console_locked;
}

-void wake_up_klogd(void)
+static int printk_pending;
+
+void __wake_up_klogd(void)
{
+ if (printk_pending)
+ printk_pending = 0;
+
if (!oops_in_progress && waitqueue_active(&log_wait))
wake_up_interruptible(&log_wait);
}

+int printk_needs_cpu(int cpu)
+{
+ if (!printk_pending)
+ return 0;
+
+ /*
+ * Stop the last awake CPU from entering NOHZ state when there still
+ * is a klogd to kick.
+ */
+ return (cpu_weight(cpu_online_map) - cpus_weight(nohz_cpu_mask)) == 1;
+}
+
+void printk_tick(void)
+{
+ if (unlikely(printk_pending))
+ __wake_up_klogd();
+}
+
+static int printk_do_wakeup(void)
+{
+ if (irqs_disabled())
+ return 0;
+
+#ifdef CONFIG_HRTICK
+ if (seq_is_writelocked(&xtime_lock))
+ return 0;
+#endif
+
+ return 1;
+}
+
+void wake_up_klogd(void)
+{
+ if (printk_do_wakeup())
+ __wake_up_klogd();
+ else
+ printk_pending = 1;
+}
+
/**
* release_console_sem - unlock the console system
*
Index: linux-2.6/kernel/time/tick-sched.c
===================================================================
--- linux-2.6.orig/kernel/time/tick-sched.c
+++ linux-2.6/kernel/time/tick-sched.c
@@ -255,7 +255,7 @@ void tick_nohz_stop_sched_tick(int inidl
next_jiffies = get_next_timer_interrupt(last_jiffies);
delta_jiffies = next_jiffies - last_jiffies;

- if (rcu_needs_cpu(cpu))
+ if (rcu_needs_cpu(cpu) || printk_needs_cpu(cpu))
delta_jiffies = 1;
/*
* Do not stop the tick, if we are only one off
Index: linux-2.6/kernel/timer.c
===================================================================
--- linux-2.6.orig/kernel/timer.c
+++ linux-2.6/kernel/timer.c
@@ -978,6 +978,7 @@ void update_process_times(int user_tick)
run_local_timers();
if (rcu_pending(cpu))
rcu_check_callbacks(cpu, user_tick);
+ printk_tick();
scheduler_tick();
run_posix_cpu_timers(p);
}

2008-08-08 13:47:25

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 0/2] printk vs rq->lock and xtime lock

/me tatoos on forehead: 'quilt refresh' before posting!!

On Fri, 2008-08-08 at 15:30 +0200, Peter Zijlstra wrote:
> On Mon, 2008-03-24 at 11:57 -0700, Andrew Morton wrote:
> > On Mon, 24 Mar 2008 19:15:47 +0100
> > Peter Zijlstra <[email protected]> wrote:
> >
> > > How about I use the lockdep infrastructure to check if printk() is
> > > invoked whole holding either xtime or rq lock, and then avoid calling
> > > wake_up_klogd(). That way, we at least get sane debug output when the
> > > lock debugging infrastructure is enabled?
> >
> > The core problem seems to be that printk shouldn't be calling wake_up().
> > Can we fix that?
> >
> > I expect it would be acceptable to do it from the timer interrupt instead.
> > For NOHZ kernels a poll when we enter the idle loop would also be needed.
>
> Something along the lines of the below patch?
>
> > But does that cover everything? Is it possible for a CPU to run 100% busy
> > while not receiving timer interrupts? I guess so. To receive no
> > interrupts at all? Also possible.
>
> local_irq_disable(); while (1);
>
> But I guess you have more pressing issues when that happens..
>
> ---
Subject: printk: robustify printk wakeup behaviour

The klogd wakeup in the printk patch can cause deadlocks when holding the
rq->lock and or xtime_lock for writing.

Avoid doing the wakeup under certain conditions and delay it to the next jiffy
tick.

Signed-off-by: Peter Zijlstra <[email protected]>
---
include/linux/kernel.h | 4 +++
include/linux/seqlock.h | 5 ++++
kernel/printk.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++-
kernel/time/tick-sched.c | 2 -
kernel/timer.c | 1
5 files changed, 58 insertions(+), 2 deletions(-)

Index: linux-2.6/include/linux/kernel.h
===================================================================
--- linux-2.6.orig/include/linux/kernel.h
+++ linux-2.6/include/linux/kernel.h
@@ -200,6 +200,8 @@ extern struct ratelimit_state printk_rat
extern int printk_ratelimit(void);
extern bool printk_timed_ratelimit(unsigned long *caller_jiffies,
unsigned int interval_msec);
+extern void printk_tick(void);
+extern int printk_needs_cpu(int);
#else
static inline int vprintk(const char *s, va_list args)
__attribute__ ((format (printf, 1, 0)));
@@ -211,6 +213,8 @@ static inline int printk_ratelimit(void)
static inline bool printk_timed_ratelimit(unsigned long *caller_jiffies, \
unsigned int interval_msec) \
{ return false; }
+static inline void printk_tick(void) { }
+static inline int printk_needs_cpu(int) { return 0; }
#endif

extern void asmlinkage __attribute__((format(printf, 1, 2)))
Index: linux-2.6/include/linux/seqlock.h
===================================================================
--- linux-2.6.orig/include/linux/seqlock.h
+++ linux-2.6/include/linux/seqlock.h
@@ -71,6 +71,11 @@ static inline void write_sequnlock(seqlo
spin_unlock(&sl->lock);
}

+static inline int seq_is_writelocked(seqlock_t *sl)
+{
+ return spin_is_locked(&sl->lock);
+}
+
static inline int write_tryseqlock(seqlock_t *sl)
{
int ret = spin_trylock(&sl->lock);
Index: linux-2.6/kernel/printk.c
===================================================================
--- linux-2.6.orig/kernel/printk.c
+++ linux-2.6/kernel/printk.c
@@ -32,6 +32,8 @@
#include <linux/security.h>
#include <linux/bootmem.h>
#include <linux/syscalls.h>
+#include <linux/cpumask.h>
+#include <linux/sched.h>

#include <asm/uaccess.h>

@@ -982,12 +984,56 @@ int is_console_locked(void)
return console_locked;
}

-void wake_up_klogd(void)
+static int printk_pending;
+
+void __wake_up_klogd(void)
{
+ if (printk_pending)
+ printk_pending = 0;
+
if (!oops_in_progress && waitqueue_active(&log_wait))
wake_up_interruptible(&log_wait);
}

+int printk_needs_cpu(int cpu)
+{
+ if (!printk_pending)
+ return 0;
+
+ /*
+ * Stop the last awake CPU from entering NOHZ state when there still
+ * is a klogd to kick.
+ */
+ return (cpus_weight(cpu_online_map) - cpus_weight(nohz_cpu_mask)) == 1;
+}
+
+void printk_tick(void)
+{
+ if (unlikely(printk_pending))
+ __wake_up_klogd();
+}
+
+static int printk_do_wakeup(void)
+{
+ if (irqs_disabled())
+ return 0;
+
+#ifdef CONFIG_HRTICK
+ if (seq_is_writelocked(&xtime_lock))
+ return 0;
+#endif
+
+ return 1;
+}
+
+void wake_up_klogd(void)
+{
+ if (printk_do_wakeup())
+ __wake_up_klogd();
+ else
+ printk_pending = 1;
+}
+
/**
* release_console_sem - unlock the console system
*
Index: linux-2.6/kernel/time/tick-sched.c
===================================================================
--- linux-2.6.orig/kernel/time/tick-sched.c
+++ linux-2.6/kernel/time/tick-sched.c
@@ -255,7 +255,7 @@ void tick_nohz_stop_sched_tick(int inidl
next_jiffies = get_next_timer_interrupt(last_jiffies);
delta_jiffies = next_jiffies - last_jiffies;

- if (rcu_needs_cpu(cpu))
+ if (rcu_needs_cpu(cpu) || printk_needs_cpu(cpu))
delta_jiffies = 1;
/*
* Do not stop the tick, if we are only one off
Index: linux-2.6/kernel/timer.c
===================================================================
--- linux-2.6.orig/kernel/timer.c
+++ linux-2.6/kernel/timer.c
@@ -978,6 +978,7 @@ void update_process_times(int user_tick)
run_local_timers();
if (rcu_pending(cpu))
rcu_check_callbacks(cpu, user_tick);
+ printk_tick();
scheduler_tick();
run_posix_cpu_timers(p);
}

2008-08-08 16:43:37

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 0/2] printk vs rq->lock and xtime lock



On Fri, 8 Aug 2008, Peter Zijlstra wrote:
>
> Something along the lines of the below patch?

Could we not literally just make this a RCU event? Unconditionally too?

Linus

2008-08-08 17:11:10

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 0/2] printk vs rq->lock and xtime lock

On Fri, 2008-08-08 at 09:41 -0700, Linus Torvalds wrote:
>
> On Fri, 8 Aug 2008, Peter Zijlstra wrote:
> >
> > Something along the lines of the below patch?
>
> Could we not literally just make this a RCU event? Unconditionally too?

Sure, but the RCU callback period is at least 3 jiffies and much longer
when busy - I'm not sure how long before we force a grace period, we do
that to avoid DoS, right Paul?

So this version would have a much higher risk of overflowing the console
buffer and making klogd miss bits. Then again, I really don't care about
klogd at _all_, I've been running with the wakeup patched out for ages.


Gah, the below doesn't boot - because I guess we start using rcu before
its properly set up.. should I poke at it more?

---
diff --git a/kernel/printk.c b/kernel/printk.c
index b51b156..3d80e30 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -32,6 +32,7 @@
#include <linux/security.h>
#include <linux/bootmem.h>
#include <linux/syscalls.h>
+#include <linux/rcupdate.h>

#include <asm/uaccess.h>

@@ -982,10 +983,38 @@ int is_console_locked(void)
return console_locked;
}

-void wake_up_klogd(void)
+void __wake_up_klogd(struct rcu_head *head);
+
+static struct {
+ struct rcu_head head;
+ spinlock_t lock;
+ int pending;
+} klogd_wakeup_state = {
+ .lock = __SPIN_LOCK_UNLOCKED(klogd_wakeup_state.lock),
+};
+
+void __wake_up_klogd(struct rcu_head *head)
{
+ unsigned long flags;
+
+ spin_lock_irqsave(&klogd_wakeup_state.lock, flags);
+ BUG_ON(!klogd_wakeup_state.pending);
if (!oops_in_progress && waitqueue_active(&log_wait))
wake_up_interruptible(&log_wait);
+ klogd_wakeup_state.pending = 0;
+ spin_unlock_irqrestore(&klogd_wakeup_state.lock, flags);
+}
+
+void wake_up_klogd(void)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&klogd_wakeup_state.lock, flags);
+ if (!klogd_wakeup_state.pending) {
+ call_rcu(&klogd_wakeup_state.head, __wake_up_klogd);
+ klogd_wakeup_state.pending = 1;
+ }
+ spin_unlock_irqrestore(&klogd_wakeup_state.lock, flags);
}

/**

2008-08-08 17:26:18

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 0/2] printk vs rq->lock and xtime lock



On Fri, 8 Aug 2008, Peter Zijlstra wrote:
>
> Sure, but the RCU callback period is at least 3 jiffies and much longer
> when busy - I'm not sure how long before we force a grace period, we do
> that to avoid DoS, right Paul?

I really don't think it matters. klogd is going to write the thing to
_disk_ (or network), and three jiffies really don't matter. If we can fill
the buffer in that kind of time, we're screwed for other reasons anyway.

> So this version would have a much higher risk of overflowing the console
> buffer and making klogd miss bits. Then again, I really don't care about
> klogd at _all_, I've been running with the wakeup patched out for ages.

Well, I'd care a _bit_ about klogd, but not enough to worry about a couple
of jiffies. We want to wake it up at some point, but...

> Gah, the below doesn't boot - because I guess we start using rcu before
> its properly set up.. should I poke at it more?

I'd certainly prefer this kind of approach. However, may I suggest:

- doing the "waitqueue_active(&log_wait)" before even bothering to do the
RCU call. That, btw, will automatically mean that we wouldn't ever call
the RCU code before anything is initialized.

- get rid of the "oops_in_progress" thing, since I think the whole point
of that was to avoid getting the lock recursively in the first place.

- I'd worry about the "spin_lock_irqsave(&klogd_wakeup_state.lock)". What
if the printk happens from call_rcu()? This is exactly what we're
trying to get away from - having some parts of the kernel not able to
printk() because of subtle locking issues.

For that last thing, maybe we can just make it a percpu thing and just
disable irq's?

Linus

2008-08-08 17:41:36

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 0/2] printk vs rq->lock and xtime lock

On Fri, 2008-08-08 at 10:25 -0700, Linus Torvalds wrote:
>
> On Fri, 8 Aug 2008, Peter Zijlstra wrote:
> >
> > Sure, but the RCU callback period is at least 3 jiffies and much longer
> > when busy - I'm not sure how long before we force a grace period, we do
> > that to avoid DoS, right Paul?
>
> I really don't think it matters. klogd is going to write the thing to
> _disk_ (or network), and three jiffies really don't matter. If we can fill
> the buffer in that kind of time, we're screwed for other reasons anyway.
>
> > So this version would have a much higher risk of overflowing the console
> > buffer and making klogd miss bits. Then again, I really don't care about
> > klogd at _all_, I've been running with the wakeup patched out for ages.
>
> Well, I'd care a _bit_ about klogd, but not enough to worry about a couple
> of jiffies. We want to wake it up at some point, but...
>
> > Gah, the below doesn't boot - because I guess we start using rcu before
> > its properly set up.. should I poke at it more?
>
> I'd certainly prefer this kind of approach. However, may I suggest:
>
> - doing the "waitqueue_active(&log_wait)" before even bothering to do the
> RCU call. That, btw, will automatically mean that we wouldn't ever call
> the RCU code before anything is initialized.
>
> - get rid of the "oops_in_progress" thing, since I think the whole point
> of that was to avoid getting the lock recursively in the first place.
>
> - I'd worry about the "spin_lock_irqsave(&klogd_wakeup_state.lock)". What
> if the printk happens from call_rcu()? This is exactly what we're
> trying to get away from - having some parts of the kernel not able to
> printk() because of subtle locking issues.
>
> For that last thing, maybe we can just make it a percpu thing and just
> disable irq's?

You're _so_ right! :-)

This is much prettier and boots to boot.


Signed-off-by: Peter Zijlstra <[email protected]>
---
diff --git a/kernel/printk.c b/kernel/printk.c
index b51b156..10830d8 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -32,6 +32,7 @@
#include <linux/security.h>
#include <linux/bootmem.h>
#include <linux/syscalls.h>
+#include <linux/rcupdate.h>

#include <asm/uaccess.h>

@@ -982,10 +983,43 @@ int is_console_locked(void)
return console_locked;
}

+void __wake_up_klogd(struct rcu_head *head);
+
+struct klogd_wakeup_state {
+ struct rcu_head head;
+ int pending;
+};
+
+DEFINE_PER_CPU(struct klogd_wakeup_state, kws);
+
+void __wake_up_klogd(struct rcu_head *head)
+{
+ unsigned long flags;
+ struct klogd_wakeup_state *kws =
+ container_of(head, struct klogd_wakeup_state, head);
+
+ local_irq_save(flags);
+ BUG_ON(!kws->pending);
+ wake_up_interruptible(&log_wait);
+ kws->pending = 0;
+ local_irq_restore(flags);
+}
+
void wake_up_klogd(void)
{
- if (!oops_in_progress && waitqueue_active(&log_wait))
- wake_up_interruptible(&log_wait);
+ unsigned long flags;
+ struct klogd_wakeup_state *kws;
+
+ if (!waitqueue_active(&log_wait))
+ return;
+
+ local_irq_save(flags);
+ kws = &__get_cpu_var(kws);
+ if (!kws->pending) {
+ call_rcu(&kws->head, __wake_up_klogd);
+ kws->pending = 1;
+ }
+ local_irq_restore(flags);
}

/**

2008-08-08 17:49:35

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH 0/2] printk vs rq->lock and xtime lock



On Fri, 8 Aug 2008, Peter Zijlstra wrote:
> +
> + local_irq_save(flags);
> + kws = &__get_cpu_var(kws);
> + if (!kws->pending) {
> + call_rcu(&kws->head, __wake_up_klogd);
> + kws->pending = 1;

In case we have a recursive oops in the RCU paths, I'd think we'd be
*much* better off with

if (!kws->pending) {
kws->pending = 1;
call_rcu(&kws->head, __wake_up_klogd);
}

in order to not re-enter the RCU paths again and causing endless oopses
along with list corruption.

Hmm?

Oh, and I don't see the point of the irqsave/restore in __wake_up_klogd().
Or the BUG_ON(), for that matter (even if it ever were to trigger, we'd be
better off _not_ triggering it - I agree with it in a "verifying the new
implementation works at all" sense, but not in a "this is actually for
merging" kind of patch).

Linus

2008-08-08 17:52:33

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 0/2] printk vs rq->lock and xtime lock



On Fri, 8 Aug 2008, Peter Zijlstra wrote:
> >
> > For that last thing, maybe we can just make it a percpu thing and just
> > disable irq's?
>
> You're _so_ right! :-)
>
> This is much prettier and boots to boot.
>
>
> Signed-off-by: Peter Zijlstra <[email protected]>

If this makes any difference:

Acked-by: Steven Rostedt <[email protected]>

-- Steve

2008-08-08 18:15:56

by Peter Zijlstra

[permalink] [raw]
Subject: [PATCH] printk: robustify printk

Avoid deadlocks against rq->lock and xtime_lock by using RCU to defer
the klogd wakeup.

Signed-off-by: Peter Zijlstra <[email protected]>
Acked-by: Steven Rostedt <[email protected]>
---
diff --git a/kernel/printk.c b/kernel/printk.c
index b51b156..8870c2e 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -32,6 +32,7 @@
#include <linux/security.h>
#include <linux/bootmem.h>
#include <linux/syscalls.h>
+#include <linux/rcupdate.h>

#include <asm/uaccess.h>

@@ -982,10 +983,37 @@ int is_console_locked(void)
return console_locked;
}

+struct klogd_wakeup_state {
+ struct rcu_head head;
+ int pending;
+};
+
+DEFINE_PER_CPU(struct klogd_wakeup_state, kws);
+
+void __wake_up_klogd(struct rcu_head *head)
+{
+ struct klogd_wakeup_state *kws =
+ container_of(head, struct klogd_wakeup_state, head);
+
+ wake_up_interruptible(&log_wait);
+ kws->pending = 0;
+}
+
void wake_up_klogd(void)
{
- if (!oops_in_progress && waitqueue_active(&log_wait))
- wake_up_interruptible(&log_wait);
+ unsigned long flags;
+ struct klogd_wakeup_state *kws;
+
+ if (!waitqueue_active(&log_wait))
+ return;
+
+ local_irq_save(flags);
+ kws = &__get_cpu_var(kws);
+ if (!kws->pending) {
+ kws->pending = 1;
+ call_rcu(&kws->head, __wake_up_klogd);
+ }
+ local_irq_restore(flags);
}

/**

2008-08-08 18:31:52

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] printk: robustify printk



On Fri, 8 Aug 2008, Peter Zijlstra wrote:
>
> Avoid deadlocks against rq->lock and xtime_lock by using RCU to defer
> the klogd wakeup.

Ok, I can't see anything wrong with this from the patch. I do hate that
deadlock, so I think I'll apply it. But I'll try to hold off for a while
in case somebody has any last-minute comments..

Linus

2008-08-08 18:34:04

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] printk: robustify printk

On Fri, 2008-08-08 at 11:30 -0700, Linus Torvalds wrote:
>
> On Fri, 8 Aug 2008, Peter Zijlstra wrote:
> >
> > Avoid deadlocks against rq->lock and xtime_lock by using RCU to defer
> > the klogd wakeup.
>
> Ok, I can't see anything wrong with this from the patch. I do hate that
> deadlock, so I think I'll apply it. But I'll try to hold off for a while
> in case somebody has any last-minute comments..

Hehe, we could do the boring conventional thing and stick it in -mm and
-next and such :-)

2008-08-08 19:16:05

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] printk: robustify printk

On Fri, 08 Aug 2008 20:14:28 +0200
Peter Zijlstra <[email protected]> wrote:

> void wake_up_klogd(void)
> {
> - if (!oops_in_progress && waitqueue_active(&log_wait))
> - wake_up_interruptible(&log_wait);
> + unsigned long flags;
> + struct klogd_wakeup_state *kws;
> +
> + if (!waitqueue_active(&log_wait))
> + return;
> +
> + local_irq_save(flags);
> + kws = &__get_cpu_var(kws);
> + if (!kws->pending) {
> + kws->pending = 1;
> + call_rcu(&kws->head, __wake_up_klogd);
> + }
> + local_irq_restore(flags);
> }

Note that kernel/rcupreempt.c's flavour of call_rcu() takes
RCU_DATA_ME().lock, so there are still code sites from which a printk
can deadlock. Only now, it is config-dependent.

>From a quick look it appears that large amounts of kernel/rcupreempt.c
are now a printk-free zone.

2008-08-08 19:21:40

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] printk: robustify printk

On Fri, 2008-08-08 at 12:14 -0700, Andrew Morton wrote:
> On Fri, 08 Aug 2008 20:14:28 +0200
> Peter Zijlstra <[email protected]> wrote:
>
> > void wake_up_klogd(void)
> > {
> > - if (!oops_in_progress && waitqueue_active(&log_wait))
> > - wake_up_interruptible(&log_wait);
> > + unsigned long flags;
> > + struct klogd_wakeup_state *kws;
> > +
> > + if (!waitqueue_active(&log_wait))
> > + return;
> > +
> > + local_irq_save(flags);
> > + kws = &__get_cpu_var(kws);
> > + if (!kws->pending) {
> > + kws->pending = 1;
> > + call_rcu(&kws->head, __wake_up_klogd);
> > + }
> > + local_irq_restore(flags);
> > }
>
> Note that kernel/rcupreempt.c's flavour of call_rcu() takes
> RCU_DATA_ME().lock, so there are still code sites from which a printk
> can deadlock. Only now, it is config-dependent.
>
> From a quick look it appears that large amounts of kernel/rcupreempt.c
> are now a printk-free zone.

Drad, missed that bit, I did look at the calling end, but forgot the
call_rcu() end :-/

The initial printk_tick() based implementation didn't suffer this
problem, should we revert to that scheme?

2008-08-08 19:39:21

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] printk: robustify printk

On Fri, 08 Aug 2008 21:21:08 +0200
Peter Zijlstra <[email protected]> wrote:

> On Fri, 2008-08-08 at 12:14 -0700, Andrew Morton wrote:
> > On Fri, 08 Aug 2008 20:14:28 +0200
> > Peter Zijlstra <[email protected]> wrote:
> >
> > > void wake_up_klogd(void)
> > > {
> > > - if (!oops_in_progress && waitqueue_active(&log_wait))
> > > - wake_up_interruptible(&log_wait);
> > > + unsigned long flags;
> > > + struct klogd_wakeup_state *kws;
> > > +
> > > + if (!waitqueue_active(&log_wait))
> > > + return;
> > > +
> > > + local_irq_save(flags);
> > > + kws = &__get_cpu_var(kws);
> > > + if (!kws->pending) {
> > > + kws->pending = 1;
> > > + call_rcu(&kws->head, __wake_up_klogd);
> > > + }
> > > + local_irq_restore(flags);
> > > }
> >
> > Note that kernel/rcupreempt.c's flavour of call_rcu() takes
> > RCU_DATA_ME().lock, so there are still code sites from which a printk
> > can deadlock. Only now, it is config-dependent.
> >
> > From a quick look it appears that large amounts of kernel/rcupreempt.c
> > are now a printk-free zone.
>
> Drad, missed that bit, I did look at the calling end, but forgot the
> call_rcu() end :-/
>
> The initial printk_tick() based implementation didn't suffer this
> problem, should we revert to that scheme?

Dunno. Perhaps we could convert RCU_DATA_ME's spinlock_t into an
rwlock and do read_lock() in call_rcu()? Then we can should be able to
call printk from inside that read_lock(), but not inside write_lock(),
which, with suitable warning comments might be acceptable.

afacit everything in call_rcu()'s *rdp is cpu-local and is protected by
local_irq_save(). rcu_ctrlblk.completed and rcu_flipped need some
protection, but a) rdp->lock isn't sufficient anyway and b)
read_lock protection would suffice. Maybe other CPUs can alter *rdp
while __rcu_advance_callbacks() is running.

Anyway, that's all handwaving. My point is that making rcupreempt.c
more robust and more concurrent might be an alternative fix, and might
be beneficial in its own right. Working out the details is what we
have Pauls for ;)

2008-08-08 19:47:42

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] printk: robustify printk

On Fri, 2008-08-08 at 21:21 +0200, Peter Zijlstra wrote:

> The initial printk_tick() based implementation didn't suffer this
> problem, should we revert to that scheme?

Just in case people care..

---
Subject: printk: robustify printk

Avoid deadlocks against rq->lock and xtime_lock by deferring the klogd
wakeup by polling from the timer tick.

Signed-off-by: Peter Zijlstra <[email protected]>
---
include/linux/kernel.h | 4 ++++
kernel/printk.c | 19 +++++++++++++++++--
kernel/time/tick-sched.c | 2 +-
kernel/timer.c | 1 +
4 files changed, 23 insertions(+), 3 deletions(-)

Index: linux-2.6/include/linux/kernel.h
===================================================================
--- linux-2.6.orig/include/linux/kernel.h
+++ linux-2.6/include/linux/kernel.h
@@ -200,6 +200,8 @@ extern struct ratelimit_state printk_rat
extern int printk_ratelimit(void);
extern bool printk_timed_ratelimit(unsigned long *caller_jiffies,
unsigned int interval_msec);
+extern void printk_tick(void);
+extern int printk_needs_cpu(int);
#else
static inline int vprintk(const char *s, va_list args)
__attribute__ ((format (printf, 1, 0)));
@@ -211,6 +213,8 @@ static inline int printk_ratelimit(void)
static inline bool printk_timed_ratelimit(unsigned long *caller_jiffies, \
unsigned int interval_msec) \
{ return false; }
+static inline void printk_tick(void) { }
+static inline int printk_needs_cpu(int) { return 0; }
#endif

extern void asmlinkage __attribute__((format(printf, 1, 2)))
Index: linux-2.6/kernel/printk.c
===================================================================
--- linux-2.6.orig/kernel/printk.c
+++ linux-2.6/kernel/printk.c
@@ -982,10 +982,25 @@ int is_console_locked(void)
return console_locked;
}

-void wake_up_klogd(void)
+static DEFINE_PER_CPU(int, printk_pending);
+
+void printk_tick(void)
{
- if (!oops_in_progress && waitqueue_active(&log_wait))
+ if (__get_cpu_var(printk_pending)) {
+ __get_cpu_var(printk_pending) = 0;
wake_up_interruptible(&log_wait);
+ }
+}
+
+int printk_needs_cpu(int cpu)
+{
+ return per_cpu(printk_pending, cpu);
+}
+
+void wake_up_klogd(void)
+{
+ if (waitqueue_active(&log_wait))
+ __get_cpu_var(printk_pending) = 1;
}

/**
Index: linux-2.6/kernel/time/tick-sched.c
===================================================================
--- linux-2.6.orig/kernel/time/tick-sched.c
+++ linux-2.6/kernel/time/tick-sched.c
@@ -255,7 +255,7 @@ void tick_nohz_stop_sched_tick(int inidl
next_jiffies = get_next_timer_interrupt(last_jiffies);
delta_jiffies = next_jiffies - last_jiffies;

- if (rcu_needs_cpu(cpu))
+ if (rcu_needs_cpu(cpu) || printk_needs_cpu(cpu))
delta_jiffies = 1;
/*
* Do not stop the tick, if we are only one off
Index: linux-2.6/kernel/timer.c
===================================================================
--- linux-2.6.orig/kernel/timer.c
+++ linux-2.6/kernel/timer.c
@@ -978,6 +978,7 @@ void update_process_times(int user_tick)
run_local_timers();
if (rcu_pending(cpu))
rcu_check_callbacks(cpu, user_tick);
+ printk_tick();
scheduler_tick();
run_posix_cpu_timers(p);
}

2008-08-08 19:49:52

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] printk: robustify printk

On Fri, 2008-08-08 at 12:37 -0700, Andrew Morton wrote:
> On Fri, 08 Aug 2008 21:21:08 +0200
> Peter Zijlstra <[email protected]> wrote:
>
> > On Fri, 2008-08-08 at 12:14 -0700, Andrew Morton wrote:
> > > On Fri, 08 Aug 2008 20:14:28 +0200
> > > Peter Zijlstra <[email protected]> wrote:
> > >
> > > > void wake_up_klogd(void)
> > > > {
> > > > - if (!oops_in_progress && waitqueue_active(&log_wait))
> > > > - wake_up_interruptible(&log_wait);
> > > > + unsigned long flags;
> > > > + struct klogd_wakeup_state *kws;
> > > > +
> > > > + if (!waitqueue_active(&log_wait))
> > > > + return;
> > > > +
> > > > + local_irq_save(flags);
> > > > + kws = &__get_cpu_var(kws);
> > > > + if (!kws->pending) {
> > > > + kws->pending = 1;
> > > > + call_rcu(&kws->head, __wake_up_klogd);
> > > > + }
> > > > + local_irq_restore(flags);
> > > > }
> > >
> > > Note that kernel/rcupreempt.c's flavour of call_rcu() takes
> > > RCU_DATA_ME().lock, so there are still code sites from which a printk
> > > can deadlock. Only now, it is config-dependent.
> > >
> > > From a quick look it appears that large amounts of kernel/rcupreempt.c
> > > are now a printk-free zone.
> >
> > Drad, missed that bit, I did look at the calling end, but forgot the
> > call_rcu() end :-/
> >
> > The initial printk_tick() based implementation didn't suffer this
> > problem, should we revert to that scheme?
>
> Dunno. Perhaps we could convert RCU_DATA_ME's spinlock_t into an
> rwlock and do read_lock() in call_rcu()? Then we can should be able to
> call printk from inside that read_lock(), but not inside write_lock(),
> which, with suitable warning comments might be acceptable.
>
> afacit everything in call_rcu()'s *rdp is cpu-local and is protected by
> local_irq_save(). rcu_ctrlblk.completed and rcu_flipped need some
> protection, but a) rdp->lock isn't sufficient anyway and b)
> read_lock protection would suffice. Maybe other CPUs can alter *rdp
> while __rcu_advance_callbacks() is running.
>
> Anyway, that's all handwaving. My point is that making rcupreempt.c
> more robust and more concurrent might be an alternative fix, and might
> be beneficial in its own right. Working out the details is what we
> have Pauls for ;)

Hehe :-)

Just in case Paul shows he's human and cannot work his way around it, I
just posted a new version of the printk_tick() stuff..

2008-08-08 20:20:41

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] printk: robustify printk

On Fri, Aug 08, 2008 at 08:14:28PM +0200, Peter Zijlstra wrote:
> Avoid deadlocks against rq->lock and xtime_lock by using RCU to defer
> the klogd wakeup.

Longer term, I will have one of the people here (who just happens to have
made the mistake of expressing an interest in hacking on RCU) look into
permitting invocation of RCU earlier during boot, as in before rcu_init()
has been called. Should be possible, in fact should be possible without
common-case performance penalty. ;-)

In the meantime, looks good, so:

Acked-by: Paul E. McKenney <[email protected]>

> Signed-off-by: Peter Zijlstra <[email protected]>
> Acked-by: Steven Rostedt <[email protected]>
> ---
> diff --git a/kernel/printk.c b/kernel/printk.c
> index b51b156..8870c2e 100644
> --- a/kernel/printk.c
> +++ b/kernel/printk.c
> @@ -32,6 +32,7 @@
> #include <linux/security.h>
> #include <linux/bootmem.h>
> #include <linux/syscalls.h>
> +#include <linux/rcupdate.h>
>
> #include <asm/uaccess.h>
>
> @@ -982,10 +983,37 @@ int is_console_locked(void)
> return console_locked;
> }
>
> +struct klogd_wakeup_state {
> + struct rcu_head head;
> + int pending;
> +};
> +
> +DEFINE_PER_CPU(struct klogd_wakeup_state, kws);
> +
> +void __wake_up_klogd(struct rcu_head *head)
> +{
> + struct klogd_wakeup_state *kws =
> + container_of(head, struct klogd_wakeup_state, head);
> +
> + wake_up_interruptible(&log_wait);
> + kws->pending = 0;
> +}
> +
> void wake_up_klogd(void)
> {
> - if (!oops_in_progress && waitqueue_active(&log_wait))
> - wake_up_interruptible(&log_wait);
> + unsigned long flags;
> + struct klogd_wakeup_state *kws;
> +
> + if (!waitqueue_active(&log_wait))
> + return;
> +
> + local_irq_save(flags);
> + kws = &__get_cpu_var(kws);
> + if (!kws->pending) {
> + kws->pending = 1;
> + call_rcu(&kws->head, __wake_up_klogd);
> + }
> + local_irq_restore(flags);
> }
>
> /**
>
>

2008-08-08 20:30:59

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] printk: robustify printk

On Fri, Aug 08, 2008 at 12:14:28PM -0700, Andrew Morton wrote:
> On Fri, 08 Aug 2008 20:14:28 +0200
> Peter Zijlstra <[email protected]> wrote:
>
> > void wake_up_klogd(void)
> > {
> > - if (!oops_in_progress && waitqueue_active(&log_wait))
> > - wake_up_interruptible(&log_wait);
> > + unsigned long flags;
> > + struct klogd_wakeup_state *kws;
> > +
> > + if (!waitqueue_active(&log_wait))
> > + return;
> > +
> > + local_irq_save(flags);
> > + kws = &__get_cpu_var(kws);
> > + if (!kws->pending) {
> > + kws->pending = 1;
> > + call_rcu(&kws->head, __wake_up_klogd);
> > + }
> > + local_irq_restore(flags);
> > }
>
> Note that kernel/rcupreempt.c's flavour of call_rcu() takes
> RCU_DATA_ME().lock, so there are still code sites from which a printk
> can deadlock. Only now, it is config-dependent.

The RCU callbacks are (and must be) invoked without holding any RCU
locks, so printk()s in RCU callbacks are perfectly permissible. It would
not be -that- hard to eliminate the lock in call_rcu(), but yeccch...
It would be far easier to debug RCU without the benefit printk()s than
to deal with a preemptable RCU implementation that was based solely on
atomic instructions!!!

> >From a quick look it appears that large amounts of kernel/rcupreempt.c
> are now a printk-free zone.

The only printk() is in preemptable RCU's rcu_init() function, whose
only purpose is to mark the dmesg appropriately, and which should not
be a problem.

And that would be because I do a fair amount of prototyping and debugging
in user space, where one instead uses printf(). ;-)

Thanx, Paul

2008-08-08 20:33:19

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] printk: robustify printk

On Fri, Aug 08, 2008 at 12:37:47PM -0700, Andrew Morton wrote:
> On Fri, 08 Aug 2008 21:21:08 +0200
> Peter Zijlstra <[email protected]> wrote:
>
> > On Fri, 2008-08-08 at 12:14 -0700, Andrew Morton wrote:
> > > On Fri, 08 Aug 2008 20:14:28 +0200
> > > Peter Zijlstra <[email protected]> wrote:
> > >
> > > > void wake_up_klogd(void)
> > > > {
> > > > - if (!oops_in_progress && waitqueue_active(&log_wait))
> > > > - wake_up_interruptible(&log_wait);
> > > > + unsigned long flags;
> > > > + struct klogd_wakeup_state *kws;
> > > > +
> > > > + if (!waitqueue_active(&log_wait))
> > > > + return;
> > > > +
> > > > + local_irq_save(flags);
> > > > + kws = &__get_cpu_var(kws);
> > > > + if (!kws->pending) {
> > > > + kws->pending = 1;
> > > > + call_rcu(&kws->head, __wake_up_klogd);
> > > > + }
> > > > + local_irq_restore(flags);
> > > > }
> > >
> > > Note that kernel/rcupreempt.c's flavour of call_rcu() takes
> > > RCU_DATA_ME().lock, so there are still code sites from which a printk
> > > can deadlock. Only now, it is config-dependent.
> > >
> > > From a quick look it appears that large amounts of kernel/rcupreempt.c
> > > are now a printk-free zone.
> >
> > Drad, missed that bit, I did look at the calling end, but forgot the
> > call_rcu() end :-/
> >
> > The initial printk_tick() based implementation didn't suffer this
> > problem, should we revert to that scheme?
>
> Dunno. Perhaps we could convert RCU_DATA_ME's spinlock_t into an
> rwlock and do read_lock() in call_rcu()? Then we can should be able to
> call printk from inside that read_lock(), but not inside write_lock(),
> which, with suitable warning comments might be acceptable.
>
> afacit everything in call_rcu()'s *rdp is cpu-local and is protected by
> local_irq_save(). rcu_ctrlblk.completed and rcu_flipped need some
> protection, but a) rdp->lock isn't sufficient anyway and b)
> read_lock protection would suffice. Maybe other CPUs can alter *rdp
> while __rcu_advance_callbacks() is running.
>
> Anyway, that's all handwaving. My point is that making rcupreempt.c
> more robust and more concurrent might be an alternative fix, and might
> be beneficial in its own right. Working out the details is what we
> have Pauls for ;)

How about if I instead add comments warning people not to put printk()
in the relevant RCU-implementation code? That way I can be not only lazy,
but cowardly as well! ;-)

Thanx, Paul

2008-08-08 20:38:23

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] printk: robustify printk

On Fri, 2008-08-08 at 13:32 -0700, Paul E. McKenney wrote:

> How about if I instead add comments warning people not to put printk()
> in the relevant RCU-implementation code? That way I can be not only lazy,
> but cowardly as well! ;-)

Well, it does spoil the party a little, we are now trading one deadlock
for another :-/


2008-08-08 20:50:31

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] printk: robustify printk



On Fri, 8 Aug 2008, Peter Zijlstra wrote:

> On Fri, 2008-08-08 at 13:32 -0700, Paul E. McKenney wrote:
>
> > How about if I instead add comments warning people not to put printk()
> > in the relevant RCU-implementation code? That way I can be not only lazy,
> > but cowardly as well! ;-)
>
> Well, it does spoil the party a little, we are now trading one deadlock
> for another :-/

Yeah, but only Paul and his clones would need printks in preempt RCU.

The rest of us mortals now have the rest of the kernel to play with.

-- Steve

2008-08-08 20:52:32

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] printk: robustify printk

On Fri, 08 Aug 2008 22:37:39 +0200
Peter Zijlstra <[email protected]> wrote:

> On Fri, 2008-08-08 at 13:32 -0700, Paul E. McKenney wrote:
>
> > How about if I instead add comments warning people not to put printk()
> > in the relevant RCU-implementation code? That way I can be not only lazy,
> > but cowardly as well! ;-)
>
> Well, it does spoil the party a little, we are now trading one deadlock
> for another :-/
>

Yup. We could just as easily put those comments into the scheduler
wakeup code.

Why are we fixing this, btw? The problem has been there forever and
people who hack the wakeup code could/should know about it anyway. All
they need to do is to disable klogd during development. Did the
problem recently become worse for some reason?

2008-08-08 21:03:50

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] printk: robustify printk



On Fri, 8 Aug 2008, Andrew Morton wrote:
>
> Why are we fixing this, btw? The problem has been there forever and
> people who hack the wakeup code could/should know about it anyway. All
> they need to do is to disable klogd during development. Did the
> problem recently become worse for some reason?

It hasn't beemn there forever at all.

Yes, there used to be reliance on the actual _scheduler_ locks. Doign a
wake_up() would cause runqueue locks etc to be taken.

But the xtime deadlock is fairly recent, and only happened with CFQ, I
think.

And _that_ is the irritating one. I personally wouldn't mind at all if
there is some printk() dependency on the core runqueue rq->lock or on the
RCU locking thing. But look at xtime_lock. THAT is a disaster.

Just grep for it.

So I personally actually like the RCU version best. Yes, it still depends
on really core locking. But it's really core and low-level and _confined_
locking, where a comment in a single place would probably suffice. Compare
that to all the places where we take the xtime_lock for writing!

Linus

2008-08-08 21:14:20

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] printk: robustify printk

On Fri, 8 Aug 2008 13:57:26 -0700 (PDT)
Linus Torvalds <[email protected]> wrote:

>
>
> On Fri, 8 Aug 2008, Andrew Morton wrote:
> >
> > Why are we fixing this, btw? The problem has been there forever and
> > people who hack the wakeup code could/should know about it anyway. All
> > they need to do is to disable klogd during development. Did the
> > problem recently become worse for some reason?
>
> It hasn't beemn there forever at all.
>
> Yes, there used to be reliance on the actual _scheduler_ locks. Doign a
> wake_up() would cause runqueue locks etc to be taken.
>
> But the xtime deadlock is fairly recent, and only happened with CFQ, I
> think.
>
> And _that_ is the irritating one. I personally wouldn't mind at all if
> there is some printk() dependency on the core runqueue rq->lock or on the
> RCU locking thing. But look at xtime_lock. THAT is a disaster.
>
> Just grep for it.
>
>

<actually reads stuff>

Yes, not being able to do printk inside xtime_lock would be a disaster.
We decided that about 1.5 years ago last time we added then fixed this
bug (at the time I think I identified multiple already-present printks
inside xtime_lock, on error paths). Did we go and re-add this bug recently
or did we just never fix it? Doesn't matter, I guess.

> So I personally actually like the RCU version best. Yes, it still depends
> on really core locking. But it's really core and low-level and _confined_
> locking, where a comment in a single place would probably suffice. Compare
> that to all the places where we take the xtime_lock for writing!

Sure, the cant-printk-in-rcupreempt.c limitation should be quite
acceptable.

2008-08-08 21:35:34

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] printk: robustify printk

Peter Zijlstra <[email protected]> writes:

Hopefully nobody puts a printk into call_rcu now...
Also I'm not sure if there wouldn't be livelocks if someone puts
printk into some other parts of RCU (e.g. each RCU adding more
RCU events -- haven't checked that in detail so I might be wrong)

So you just trade one "hands off" part to another.


> + kws = &__get_cpu_var(kws);
> + if (!kws->pending) {
> + kws->pending = 1;
> + call_rcu(&kws->head, __wake_up_klogd);


How about a big comment describing why you do this?
And you should add comments to all the new hands-off parts
where printk is not allowed now.

-Andi

2008-08-08 23:03:08

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] printk: robustify printk

From: Peter Zijlstra <[email protected]>
Date: Fri, 08 Aug 2008 20:14:28 +0200

> +struct klogd_wakeup_state {
> + struct rcu_head head;
> + int pending;
> +};
> +
> +DEFINE_PER_CPU(struct klogd_wakeup_state, kws);

Since there is not RCU_HEAD_INIT() made on these per-cpu objects, this code
is depending upon the fact that currently the RCU head is just initialized
to all zeros.

2008-08-09 00:18:31

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] printk: robustify printk

On Fri, Aug 08, 2008 at 04:02:54PM -0700, David Miller wrote:
> From: Peter Zijlstra <[email protected]>
> Date: Fri, 08 Aug 2008 20:14:28 +0200
>
> > +struct klogd_wakeup_state {
> > + struct rcu_head head;
> > + int pending;
> > +};
> > +
> > +DEFINE_PER_CPU(struct klogd_wakeup_state, kws);
>
> Since there is not RCU_HEAD_INIT() made on these per-cpu objects, this code
> is depending upon the fact that currently the RCU head is just initialized
> to all zeros.

RCU_HEAD_INIT() is only needed for debugging purposes, when you want
to be sure that any garbage in the rcu_head is due to corruption or
some such. The call_rcu() function always completely initializes the
rcu_head structure.

Thanx, Paul

2008-08-11 10:45:58

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] printk: robustify printk


* Peter Zijlstra <[email protected]> wrote:

> On Fri, 2008-08-08 at 21:21 +0200, Peter Zijlstra wrote:
>
> > The initial printk_tick() based implementation didn't suffer this
> > problem, should we revert to that scheme?
>
> Just in case people care..
>
> ---
> Subject: printk: robustify printk
>
> Avoid deadlocks against rq->lock and xtime_lock by deferring the klogd
> wakeup by polling from the timer tick.

i missed most of the discussion, but this seems like the simplest (and
hence ultimately the best) approach to me.

Coupling printk with RCU, albeit elegant, does not seem like the right
choice to me in this specific case: printk as an essential debug
mechanism should be as decoupled as possible.

Also, once we accept the possibility of async klogd completion, we might
as well do it all the time.

i have only one sidenote:

> --- linux-2.6.orig/kernel/time/tick-sched.c
> +++ linux-2.6/kernel/time/tick-sched.c
> @@ -255,7 +255,7 @@ void tick_nohz_stop_sched_tick(int inidl
> next_jiffies = get_next_timer_interrupt(last_jiffies);
> delta_jiffies = next_jiffies - last_jiffies;
>
> - if (rcu_needs_cpu(cpu))
> + if (rcu_needs_cpu(cpu) || printk_needs_cpu(cpu))
> delta_jiffies = 1;

this change made a previous design quirks even more visible: these are
items that are not purely event driven but need some polling component.
RCU is one, and now printk is another.

We could clean this up further by integrating the rcu_needs_cpu() and
printk_needs_cpu() into a softirq mechanism. We already check for
pending softirqs in tick-sched.c, so the above complication would go
away completely.

( But that's for a separate cleanup patch i think. )

No strong feelings though. Peter, which one do you prefer?

Ingo

2008-08-11 11:03:53

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] printk: robustify printk

Ingo Molnar <[email protected]> writes:

> * Peter Zijlstra <[email protected]> wrote:
>
>> On Fri, 2008-08-08 at 21:21 +0200, Peter Zijlstra wrote:
>>
>> > The initial printk_tick() based implementation didn't suffer this
>> > problem, should we revert to that scheme?
>>
>> Just in case people care..
>>
>> ---
>> Subject: printk: robustify printk
>>
>> Avoid deadlocks against rq->lock and xtime_lock by deferring the klogd
>> wakeup by polling from the timer tick.
>
> i missed most of the discussion, but this seems like the simplest (and
> hence ultimately the best) approach to me.

The problem is that it means any printk data output that is more
than DMESG-BUFFER-SIZE bytes during one clock tick is going to lose data.
It loses the natural adaption to higher printk rates that you
got previously.

Now we could say that for debugging etc. people should switch
to other mechanisms like relayfs, but I would still worry about
some corner cases where losing printk data that wasn't lost before
could be a severe regression (e.g. consider firewall log messages
or similar)

Essentially it makes printk (much?) less reliable than it was before
in the general case. Not sure that's a good thing. So the patch
title is definitely misleading.

As Linus pointed out earlier we've survived with this restriction
(not doing printk in the scheduler) for a long time, so is there
really a that pressing need to change that?

> Coupling printk with RCU, albeit elegant, does not seem like the right
> choice to me in this specific case: printk as an essential debug
> mechanism should be as decoupled as possible.

RCU coupling has actually the same problem.

-Andi

2008-08-11 11:04:22

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] printk: robustify printk

On Mon, 2008-08-11 at 12:45 +0200, Ingo Molnar wrote:
> * Peter Zijlstra <[email protected]> wrote:
>
> > On Fri, 2008-08-08 at 21:21 +0200, Peter Zijlstra wrote:
> >
> > > The initial printk_tick() based implementation didn't suffer this
> > > problem, should we revert to that scheme?
> >
> > Just in case people care..
> >
> > ---
> > Subject: printk: robustify printk
> >
> > Avoid deadlocks against rq->lock and xtime_lock by deferring the klogd
> > wakeup by polling from the timer tick.
>
> i missed most of the discussion, but this seems like the simplest (and
> hence ultimately the best) approach to me.
>
> Coupling printk with RCU, albeit elegant, does not seem like the right
> choice to me in this specific case: printk as an essential debug
> mechanism should be as decoupled as possible.
>
> Also, once we accept the possibility of async klogd completion, we might
> as well do it all the time.
>
> i have only one sidenote:
>
> > --- linux-2.6.orig/kernel/time/tick-sched.c
> > +++ linux-2.6/kernel/time/tick-sched.c
> > @@ -255,7 +255,7 @@ void tick_nohz_stop_sched_tick(int inidl
> > next_jiffies = get_next_timer_interrupt(last_jiffies);
> > delta_jiffies = next_jiffies - last_jiffies;
> >
> > - if (rcu_needs_cpu(cpu))
> > + if (rcu_needs_cpu(cpu) || printk_needs_cpu(cpu))
> > delta_jiffies = 1;
>
> this change made a previous design quirks even more visible: these are
> items that are not purely event driven but need some polling component.
> RCU is one, and now printk is another.
>
> We could clean this up further by integrating the rcu_needs_cpu() and
> printk_needs_cpu() into a softirq mechanism. We already check for
> pending softirqs in tick-sched.c, so the above complication would go
> away completely.

RCU depends on the polling to advance the state machine, if you want an
event driven state machine, you'd have to drive it from
rcu_read_unlock() adding overhead there - and I'm pretty sure you don't
want to do that.

So while its a tad ugly to poll for these states, I'm not too worried in
these two cases - of course every additional poll needs good
justification.

> ( But that's for a separate cleanup patch i think. )
>
> No strong feelings though. Peter, which one do you prefer?

I personally prefer this printk_tick() driven one over the RCU driven
one because it doesn't trade deadlocks.

2008-08-11 11:22:20

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] printk: robustify printk

On Mon, 2008-08-11 at 13:03 +0200, Andi Kleen wrote:
> Ingo Molnar <[email protected]> writes:
>
> > * Peter Zijlstra <[email protected]> wrote:
> >
> >> On Fri, 2008-08-08 at 21:21 +0200, Peter Zijlstra wrote:
> >>
> >> > The initial printk_tick() based implementation didn't suffer this
> >> > problem, should we revert to that scheme?
> >>
> >> Just in case people care..
> >>
> >> ---
> >> Subject: printk: robustify printk
> >>
> >> Avoid deadlocks against rq->lock and xtime_lock by deferring the klogd
> >> wakeup by polling from the timer tick.
> >
> > i missed most of the discussion, but this seems like the simplest (and
> > hence ultimately the best) approach to me.
>
> The problem is that it means any printk data output that is more
> than DMESG-BUFFER-SIZE bytes during one clock tick is going to lose data.
> It loses the natural adaption to higher printk rates that you
> got previously.
>
> Now we could say that for debugging etc. people should switch
> to other mechanisms like relayfs, but I would still worry about
> some corner cases where losing printk data that wasn't lost before
> could be a severe regression (e.g. consider firewall log messages
> or similar)

You only loose the msgs with klogd, console still gets everything. If
firewalls are generating that much data, perhaps its time to think about
alternative ways to channel that.

> Essentially it makes printk (much?) less reliable than it was before
> in the general case. Not sure that's a good thing. So the patch
> title is definitely misleading.

Depends, I don't give a rats arse about klogd - I get everything through
serial onto another machine.

> As Linus pointed out earlier we've survived with this restriction
> (not doing printk in the scheduler) for a long time, so is there
> really a that pressing need to change that?

Why not fix it if its acceptable - the deadlock is just ugly.

2008-08-11 11:41:47

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] printk: robustify printk

On Mon, Aug 11, 2008 at 01:22:06PM +0200, Peter Zijlstra wrote:
> You only loose the msgs with klogd, console still gets everything. If
> firewalls are generating that much data, perhaps its time to think about
> alternative ways to channel that.

Yes, and netfilter has them in fact, but it's clearly still a regression for
people who rely on klogd for this today.

Also firewall is just an example. Other cases might be people relying
on these selinux messages. Or some other kernel messages.

>
> > Essentially it makes printk (much?) less reliable than it was before
> > in the general case. Not sure that's a good thing. So the patch
> > title is definitely misleading.
>
> Depends, I don't give a rats arse about klogd - I get everything through
> serial onto another machine.

The question of interest is not how you personally configure your systems,
but what the userbase uses.

> > As Linus pointed out earlier we've survived with this restriction
> > (not doing printk in the scheduler) for a long time, so is there
> > really a that pressing need to change that?
>
> Why not fix it if its acceptable - the deadlock is just ugly.

Well you fix one thing and you break another thing (high rate
printk). It's not clear to me that the trade off is a good
one in this case. I suppose far more people care about
high rate printk than the number of people who put
printk into the scheduler.

-Andi

2008-08-11 11:51:43

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] printk: robustify printk


* Peter Zijlstra <[email protected]> wrote:

> > We could clean this up further by integrating the rcu_needs_cpu()
> > and printk_needs_cpu() into a softirq mechanism. We already check
> > for pending softirqs in tick-sched.c, so the above complication
> > would go away completely.
>
> RCU depends on the polling to advance the state machine, if you want
> an event driven state machine, you'd have to drive it from
> rcu_read_unlock() adding overhead there - and I'm pretty sure you
> don't want to do that.
>
> So while its a tad ugly to poll for these states, I'm not too worried
> in these two cases - of course every additional poll needs good
> justification.

ok - i guess we can declare stop_tick a slowpath as well (we are on the
way towards idling).

> > ( But that's for a separate cleanup patch i think. )
> >
> > No strong feelings though. Peter, which one do you prefer?
>
> I personally prefer this printk_tick() driven one over the RCU driven
> one because it doesn't trade deadlocks.

i've started testing it in tip/core/printk to give it some track record
- thanks Peter.

Linus, any strong opinion against Peter's patch below (or in favor of
the RCU patch)? Tentatively for v2.6.28 i'd say, to give people time to
object to the async behavior.

Ingo

----------------->
>From b845b517b5e3706a3729f6ea83b88ab85f0725b0 Mon Sep 17 00:00:00 2001
From: Peter Zijlstra <[email protected]>
Date: Fri, 8 Aug 2008 21:47:09 +0200
Subject: [PATCH] printk: robustify printk

Avoid deadlocks against rq->lock and xtime_lock by deferring the klogd
wakeup by polling from the timer tick.

Signed-off-by: Peter Zijlstra <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
include/linux/kernel.h | 4 ++++
kernel/printk.c | 19 +++++++++++++++++--
kernel/time/tick-sched.c | 2 +-
kernel/timer.c | 1 +
4 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index aaa998f..113ac8d 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -200,6 +200,8 @@ extern struct ratelimit_state printk_ratelimit_state;
extern int printk_ratelimit(void);
extern bool printk_timed_ratelimit(unsigned long *caller_jiffies,
unsigned int interval_msec);
+extern void printk_tick(void);
+extern int printk_needs_cpu(int);
#else
static inline int vprintk(const char *s, va_list args)
__attribute__ ((format (printf, 1, 0)));
@@ -211,6 +213,8 @@ static inline int printk_ratelimit(void) { return 0; }
static inline bool printk_timed_ratelimit(unsigned long *caller_jiffies, \
unsigned int interval_msec) \
{ return false; }
+static inline void printk_tick(void) { }
+static inline int printk_needs_cpu(int) { return 0; }
#endif

extern void asmlinkage __attribute__((format(printf, 1, 2)))
diff --git a/kernel/printk.c b/kernel/printk.c
index b51b156..655cc2c 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -982,10 +982,25 @@ int is_console_locked(void)
return console_locked;
}

-void wake_up_klogd(void)
+static DEFINE_PER_CPU(int, printk_pending);
+
+void printk_tick(void)
{
- if (!oops_in_progress && waitqueue_active(&log_wait))
+ if (__get_cpu_var(printk_pending)) {
+ __get_cpu_var(printk_pending) = 0;
wake_up_interruptible(&log_wait);
+ }
+}
+
+int printk_needs_cpu(int cpu)
+{
+ return per_cpu(printk_pending, cpu);
+}
+
+void wake_up_klogd(void)
+{
+ if (waitqueue_active(&log_wait))
+ __get_cpu_var(printk_pending) = 1;
}

/**
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 825b4c0..c13d4f1 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -255,7 +255,7 @@ void tick_nohz_stop_sched_tick(int inidle)
next_jiffies = get_next_timer_interrupt(last_jiffies);
delta_jiffies = next_jiffies - last_jiffies;

- if (rcu_needs_cpu(cpu))
+ if (rcu_needs_cpu(cpu) || printk_needs_cpu(cpu))
delta_jiffies = 1;
/*
* Do not stop the tick, if we are only one off
diff --git a/kernel/timer.c b/kernel/timer.c
index 03bc7f1..510fe69 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -978,6 +978,7 @@ void update_process_times(int user_tick)
run_local_timers();
if (rcu_pending(cpu))
rcu_check_callbacks(cpu, user_tick);
+ printk_tick();
scheduler_tick();
run_posix_cpu_timers(p);
}

2008-08-11 12:03:36

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] printk: robustify printk


* Peter Zijlstra <[email protected]> wrote:

> > The problem is that it means any printk data output that is more
> > than DMESG-BUFFER-SIZE bytes during one clock tick is going to lose
> > data. It loses the natural adaption to higher printk rates that you
> > got previously.
> >
> > Now we could say that for debugging etc. people should switch to
> > other mechanisms like relayfs, but I would still worry about some
> > corner cases where losing printk data that wasn't lost before could
> > be a severe regression (e.g. consider firewall log messages or
> > similar)

That's a rather misleading argument, because klogd as it stands today is
already lossy: you can overflow it with enough printk data. (It's rather
easy to trigger it as well, so nobody sane relies on it as a reliable
channel.)

Using relayfs and dynamic buffers would be a much worse approach because
if we are out of buffer space it would be a robustness disaster to start
allocating more RAM. (we want less coupling of printk to other kernel
subsystems, not more coupling) Preallocating wont help either because it
doesnt protect against sudden spikes.

> You only loose the msgs with klogd, console still gets everything. If
> firewalls are generating that much data, perhaps its time to think
> about alternative ways to channel that.

yeah, and note that klogd as it stands today is a lossy channel no
matter what - there's nothing that keeps a really verbose kernel from
flooding the buffer.

The issue Andi raises is largely independent of this change and there's
three clean options to deal with it:

- increase buffering (already possible, see CONFIG_LOG_BUF_SHIFT)

- create a more reliable klogd channel with a dynamically
shrinking/growing buffer (i very much doubt the complexity is
worth it, and this one wont be 100% loss-less either)

- use a reliable console

#1 is what distros use to maximize the yield of user bugreports, and #3
is what everyone who ultimately cares about massive printk output and
reliable logging uses.

Ingo

2008-08-11 12:13:27

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] printk: robustify printk

> That's a rather misleading argument, because klogd as it stands today is
> already lossy: you can overflow it with enough printk data. (It's rather

That's true, but with at least the timer tick version it would become much
worse I think, because there would be a rather low upper limit on
the maximum frequency of klogd wakeups.

Not fully sure about the RCU version, i guess it would depend on
how many CPUs you have (but that one had the problem of
"cannot put printks into RCU anymore")

BTW if you only care about the console you can always use early_printk()
at least for serial/vga/usb debug on x86 -- it will never lose anything (at
least as long as you can read the output fast enough) and also
works in nearly any context. But of course since it also doesn't
log anything to klogd it's probably not a good idea outside debugging.

-Andi

2008-08-11 12:37:50

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] printk: robustify printk


* Ingo Molnar <[email protected]> wrote:

> > I personally prefer this printk_tick() driven one over the RCU
> > driven one because it doesn't trade deadlocks.
>
> i've started testing it in tip/core/printk to give it some track
> record - thanks Peter.

small build fixlet below.

Ingo

-------------->
>From 99fc8ce41d639ce46f32c4e8618e8a571858e5aa Mon Sep 17 00:00:00 2001
From: Ingo Molnar <[email protected]>
Date: Mon, 11 Aug 2008 14:38:12 +0200
Subject: [PATCH] printk: robustify printk, fix
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit

fix:

include/linux/kernel.h: In function ‘printk_needs_cpu':
include/linux/kernel.h:217: error: parameter name omitted

Signed-off-by: Ingo Molnar <[email protected]>
---
include/linux/kernel.h | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index c9ed24c..a012579 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -201,7 +201,7 @@ extern int printk_ratelimit(void);
extern bool printk_timed_ratelimit(unsigned long *caller_jiffies,
unsigned int interval_msec);
extern void printk_tick(void);
-extern int printk_needs_cpu(int);
+extern int printk_needs_cpu(int cpu);
#else
static inline int vprintk(const char *s, va_list args)
__attribute__ ((format (printf, 1, 0)));
@@ -214,7 +214,7 @@ static inline bool printk_timed_ratelimit(unsigned long *caller_jiffies, \
unsigned int interval_msec) \
{ return false; }
static inline void printk_tick(void) { }
-static inline int printk_needs_cpu(int) { return 0; }
+static inline int printk_needs_cpu(int cpu) { return 0; }
#endif

extern void asmlinkage __attribute__((format(printf, 1, 2)))

2008-08-11 13:22:53

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] printk: robustify printk

On Mon, Aug 11, 2008 at 12:45:26PM +0200, Ingo Molnar wrote:
>
> * Peter Zijlstra <[email protected]> wrote:
>
> > On Fri, 2008-08-08 at 21:21 +0200, Peter Zijlstra wrote:
> >
> > > The initial printk_tick() based implementation didn't suffer this
> > > problem, should we revert to that scheme?
> >
> > Just in case people care..
> >
> > ---
> > Subject: printk: robustify printk
> >
> > Avoid deadlocks against rq->lock and xtime_lock by deferring the klogd
> > wakeup by polling from the timer tick.
>
> i missed most of the discussion, but this seems like the simplest (and
> hence ultimately the best) approach to me.
>
> Coupling printk with RCU, albeit elegant, does not seem like the right
> choice to me in this specific case: printk as an essential debug
> mechanism should be as decoupled as possible.
>
> Also, once we accept the possibility of async klogd completion, we might
> as well do it all the time.
>
> i have only one sidenote:
>
> > --- linux-2.6.orig/kernel/time/tick-sched.c
> > +++ linux-2.6/kernel/time/tick-sched.c
> > @@ -255,7 +255,7 @@ void tick_nohz_stop_sched_tick(int inidl
> > next_jiffies = get_next_timer_interrupt(last_jiffies);
> > delta_jiffies = next_jiffies - last_jiffies;
> >
> > - if (rcu_needs_cpu(cpu))
> > + if (rcu_needs_cpu(cpu) || printk_needs_cpu(cpu))
> > delta_jiffies = 1;
>
> this change made a previous design quirks even more visible: these are
> items that are not purely event driven but need some polling component.
> RCU is one, and now printk is another.
>
> We could clean this up further by integrating the rcu_needs_cpu() and
> printk_needs_cpu() into a softirq mechanism. We already check for
> pending softirqs in tick-sched.c, so the above complication would go
> away completely.

I am missing something here. Are you suggesting that RCU call out
when a given CPU has nothing to do, rather than the current behavior
where rcu_needs_cpu() is invoked when a CPU is being considered for
dynticks idle mode? My concern with this approach would be races that
are currently avoided by the fact that calls to rcu_needs_cpu() are
performed with hardirqs disabled.

Thanx, Paul

> ( But that's for a separate cleanup patch i think. )
>
> No strong feelings though. Peter, which one do you prefer?
>
> Ingo

2008-08-11 14:17:30

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: [PATCH] printk: robustify printk

On Mon, 11 Aug 2008 13:42:43 +0200, Andi Kleen said:
> On Mon, Aug 11, 2008 at 01:22:06PM +0200, Peter Zijlstra wrote:
> > You only loose the msgs with klogd, console still gets everything. If
> > firewalls are generating that much data, perhaps its time to think about
> > alternative ways to channel that.
>
> Yes, and netfilter has them in fact, but it's clearly still a regression for
> people who rely on klogd for this today.
>
> Also firewall is just an example. Other cases might be people relying
> on these selinux messages. Or some other kernel messages.

Similar to "netfilter has other ways", the SELinux messages have another
defined way as well. If your site is high-security and *depends* on the
messages, the *right* answer is to run auditd to catch and log the messages,
and configure auditd to bring the system to a screeching halt if the
log space fills up or there's other problems (yes, some sites would rather
have the machine drop dead than possibly miss an SELinux message).


Attachments:
(No filename) (226.00 B)

2008-08-11 14:28:09

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] printk: robustify printk

$INSERT-ANY-OTHER-PRINTK-USE$

The point was that printk may have been perfectly adequate for
them with its existing buffer sizes and dynamic wake up frequency,
and might not anymore with the timer wakeup change. Essentially
it is making a widely used kernel facility more fragile.

-Andi

2008-08-11 14:55:52

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] printk: robustify printk


On Mon, 11 Aug 2008, Andi Kleen wrote:

> $INSERT-ANY-OTHER-PRINTK-USE$
>
> The point was that printk may have been perfectly adequate for
> them with its existing buffer sizes and dynamic wake up frequency,
> and might not anymore with the timer wakeup change. Essentially
> it is making a widely used kernel facility more fragile.

I've measured several latencies in the kernel that would cause klogd to
take serveral jiffies to wake up. I highly doubt that adding one jiffy
will break anything. And if it did, then it would show a bug in their
system. If adding one jiffy causes lost data, then the system
administrators are relying on a utility (klogd) that can easily fail them
without these patches.


If we are making a highly fragile setup a bit more fragile, maybe it will
wake the IT guys up to change their setup.

-- Steve

2008-08-11 16:10:17

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH] printk: robustify printk

On Mon, Aug 11, 2008 at 01:04:00PM +0200, Peter Zijlstra wrote:
> On Mon, 2008-08-11 at 12:45 +0200, Ingo Molnar wrote:
> > ( But that's for a separate cleanup patch i think. )
> >
> > No strong feelings though. Peter, which one do you prefer?
>
> I personally prefer this printk_tick() driven one over the RCU driven
> one because it doesn't trade deadlocks.

One way to break the deadlock within the RCU subsystem would be something
similar to the following untested (known not to compile) patch. The
idea is that RCU detects that call_rcu() is being called from printk(), and
simply enqueues the callback in this case. For this to really work, RCU
needs something exported from printk() to allow it to make this
decision. I chose the static variable printk_cpu below just to present
the general idea.

Thoughts?

Buggy, but otherwise:

Signed-off-by: Paul E. McKenney <[email protected]>
---

rcupreempt.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)

diff -urpNa -X dontdiff linux-2.6.27-rc1/kernel/rcupreempt.c linux-2.6.27-rc1-printk/kernel/rcupreempt.c
--- linux-2.6.27-rc1/kernel/rcupreempt.c 2008-07-30 08:48:17.000000000 -0700
+++ linux-2.6.27-rc1-printk/kernel/rcupreempt.c 2008-08-11 09:02:03.000000000 -0700
@@ -1118,17 +1118,22 @@ void call_rcu(struct rcu_head *head, voi
{
unsigned long flags;
struct rcu_data *rdp;
+ int this_cpu = smp_processor_id();

head->func = func;
head->next = NULL;
local_irq_save(flags);
rdp = RCU_DATA_ME();
- spin_lock(&rdp->lock);
- __rcu_advance_callbacks(rdp);
+ if (this_cpu == printk_cpu) {
+ spin_lock(&rdp->lock);
+ __rcu_advance_callbacks(rdp);
+ }
*rdp->nexttail = head;
rdp->nexttail = &head->next;
RCU_TRACE_RDP(rcupreempt_trace_next_add, rdp);
- spin_unlock_irqrestore(&rdp->lock, flags);
+ if (this_cpu == printk_cpu) {
+ spin_unlock_irqrestore(&rdp->lock, flags);
+ }
}
EXPORT_SYMBOL_GPL(call_rcu);

2008-08-20 12:42:42

by Jiri Kosina

[permalink] [raw]
Subject: Re: [PATCH] printk: robustify printk

On Mon, 11 Aug 2008, Ingo Molnar wrote:

> ----------------->
> >From b845b517b5e3706a3729f6ea83b88ab85f0725b0 Mon Sep 17 00:00:00 2001
> From: Peter Zijlstra <[email protected]>
> Date: Fri, 8 Aug 2008 21:47:09 +0200
> Subject: [PATCH] printk: robustify printk
>
> Avoid deadlocks against rq->lock and xtime_lock by deferring the klogd
> wakeup by polling from the timer tick.
>
> Signed-off-by: Peter Zijlstra <[email protected]>
> Signed-off-by: Ingo Molnar <[email protected]>

Then I think we also want the patch below on top of that, right?


From: Jiri Kosina <[email protected]>
printk: robustify printk, fix #3

Remove the comment describing the possibility of printk() deadlocking on
runqueue lock.

Signed-off-by: Jiri Kosina <[email protected]>

---

kernel/printk.c | 3 ---
1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/kernel/printk.c b/kernel/printk.c
index b51b156..9fbf391 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -577,9 +577,6 @@ static int have_callable_console(void)
* @fmt: format string
*
* This is printk(). It can be called from any context. We want it to work.
- * Be aware of the fact that if oops_in_progress is not set, we might try to
- * wake klogd up which could deadlock on runqueue lock if printk() is called
- * from scheduler code.
*
* We try to grab the console_sem. If we succeed, it's easy - we log the output and
* call the console drivers. If we fail to get the semaphore we place the output

2008-08-20 12:43:20

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] printk: robustify printk

On Wed, 2008-08-20 at 14:40 +0200, Jiri Kosina wrote:
> On Mon, 11 Aug 2008, Ingo Molnar wrote:
>
> > ----------------->
> > >From b845b517b5e3706a3729f6ea83b88ab85f0725b0 Mon Sep 17 00:00:00 2001
> > From: Peter Zijlstra <[email protected]>
> > Date: Fri, 8 Aug 2008 21:47:09 +0200
> > Subject: [PATCH] printk: robustify printk
> >
> > Avoid deadlocks against rq->lock and xtime_lock by deferring the klogd
> > wakeup by polling from the timer tick.
> >
> > Signed-off-by: Peter Zijlstra <[email protected]>
> > Signed-off-by: Ingo Molnar <[email protected]>
>
> Then I think we also want the patch below on top of that, right?

Indeed, Thanks!

>
> From: Jiri Kosina <[email protected]>
> printk: robustify printk, fix #3
>
> Remove the comment describing the possibility of printk() deadlocking on
> runqueue lock.
>
> Signed-off-by: Jiri Kosina <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
> ---
>
> kernel/printk.c | 3 ---
> 1 files changed, 0 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/printk.c b/kernel/printk.c
> index b51b156..9fbf391 100644
> --- a/kernel/printk.c
> +++ b/kernel/printk.c
> @@ -577,9 +577,6 @@ static int have_callable_console(void)
> * @fmt: format string
> *
> * This is printk(). It can be called from any context. We want it to work.
> - * Be aware of the fact that if oops_in_progress is not set, we might try to
> - * wake klogd up which could deadlock on runqueue lock if printk() is called
> - * from scheduler code.
> *
> * We try to grab the console_sem. If we succeed, it's easy - we log the output and
> * call the console drivers. If we fail to get the semaphore we place the output

2008-08-20 13:41:19

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] printk: robustify printk


* Peter Zijlstra <[email protected]> wrote:

> > Remove the comment describing the possibility of printk() deadlocking on
> > runqueue lock.
> >
> > Signed-off-by: Jiri Kosina <[email protected]>
> Acked-by: Peter Zijlstra <[email protected]>

applied to tip/core/printk, thanks.

Ingo