2014-04-23 16:55:02

by Rik van Riel

[permalink] [raw]
Subject: [PATCH RFC] sysrq: rcu-ify __handle_sysrq

Echoing values into /proc/sysrq-trigger seems to be a popular way to
get information out of the kernel. However, dumping information about
thousands of processes, or hundreds of CPUs to serial console can
result in IRQs being blocked for minutes, resulting in various kinds
of cascade failures.

The most common failure is due to interrupts being blocked for a very
long time. This can lead to things like failed IO requests, and other
things the system cannot easily recover from.

This problem is easily fixable by making __handle_sysrq use RCU
instead of spin_lock_irqsave.

This leaves the warning that RCU grace periods have not elapsed for a
long time, but the system will come back from that automatically.

It also leaves sysrq-from-irq-context when the sysrq keys are pressed,
but that is probably desired since people want that to work in situations
where the system is already hosed.

The callers of register_sysrq_key and unregister_sysrq_key appear to be
capable of sleeping.

Signed-off-by: Rik van Riel <[email protected]>
Reported-by: Madper Xie <[email protected]>
---
drivers/tty/sysrq.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
index ce396ec..3c61e9b 100644
--- a/drivers/tty/sysrq.c
+++ b/drivers/tty/sysrq.c
@@ -510,9 +510,8 @@ void __handle_sysrq(int key, bool check_mask)
struct sysrq_key_op *op_p;
int orig_log_level;
int i;
- unsigned long flags;

- spin_lock_irqsave(&sysrq_key_table_lock, flags);
+ rcu_read_lock();
/*
* Raise the apparent loglevel to maximum so that the sysrq header
* is shown to provide the user with positive feedback. We do not
@@ -554,7 +553,7 @@ void __handle_sysrq(int key, bool check_mask)
printk("\n");
console_loglevel = orig_log_level;
}
- spin_unlock_irqrestore(&sysrq_key_table_lock, flags);
+ rcu_read_unlock();
}

void handle_sysrq(int key)
@@ -1043,16 +1042,23 @@ static int __sysrq_swap_key_ops(int key, struct sysrq_key_op *insert_op_p,
struct sysrq_key_op *remove_op_p)
{
int retval;
- unsigned long flags;

- spin_lock_irqsave(&sysrq_key_table_lock, flags);
+ spin_lock(&sysrq_key_table_lock);
if (__sysrq_get_key_op(key) == remove_op_p) {
__sysrq_put_key_op(key, insert_op_p);
retval = 0;
} else {
retval = -1;
}
- spin_unlock_irqrestore(&sysrq_key_table_lock, flags);
+ spin_unlock(&sysrq_key_table_lock);
+
+ /*
+ * A concurrent __handle_sysrq eitehr got the old op or the new op.
+ * Wait for it to go away before returning, so the code for an old
+ * op is not freed (eg. on module unload) while it is in use.
+ */
+ synchronize_rcu();
+
return retval;
}


2014-04-23 20:04:57

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH RFC] sysrq: rcu-ify __handle_sysrq

On Wed, 23 Apr 2014 12:53:52 -0400 Rik van Riel <[email protected]> wrote:

> Echoing values into /proc/sysrq-trigger seems to be a popular way to
> get information out of the kernel. However, dumping information about
> thousands of processes, or hundreds of CPUs to serial console can
> result in IRQs being blocked for minutes, resulting in various kinds
> of cascade failures.
>
> The most common failure is due to interrupts being blocked for a very
> long time. This can lead to things like failed IO requests, and other
> things the system cannot easily recover from.

I bet nobody wants that console output anyway. You do the sysrq then
run dmesg or look in /var/log/messages to see what happened. People
who are experiencing problems such as this should run `dmesg -n 1'
before writing to sysrq-trigger.

Maybe what we can do is to arrange for the sysrq-trigger output to not
go to the console at all. Add /proc/sysrq-trigger-ng or add a sysrq
command to [un]silence the console or something.

> This problem is easily fixable by making __handle_sysrq use RCU
> instead of spin_lock_irqsave.
>
> This leaves the warning that RCU grace periods have not elapsed for a
> long time, but the system will come back from that automatically.
>
> It also leaves sysrq-from-irq-context when the sysrq keys are pressed,
> but that is probably desired since people want that to work in situations
> where the system is already hosed.
>
> The callers of register_sysrq_key and unregister_sysrq_key appear to be
> capable of sleeping.

unregister_sysrq_key() is basically never used - a couple of scruffy
drivers during rmmod. We hardly need any locking in there at all. I
guess using simple RCU is better than just removing it though.

> --- a/drivers/tty/sysrq.c
> +++ b/drivers/tty/sysrq.c
> @@ -510,9 +510,8 @@ void __handle_sysrq(int key, bool check_mask)
> struct sysrq_key_op *op_p;
> int orig_log_level;
> int i;
> - unsigned long flags;
>
> - spin_lock_irqsave(&sysrq_key_table_lock, flags);
> + rcu_read_lock();
> /*
> * Raise the apparent loglevel to maximum so that the sysrq header
> * is shown to provide the user with positive feedback. We do not
> @@ -554,7 +553,7 @@ void __handle_sysrq(int key, bool check_mask)
> printk("\n");
> console_loglevel = orig_log_level;
> }
> - spin_unlock_irqrestore(&sysrq_key_table_lock, flags);
> + rcu_read_unlock();
> }
>
> void handle_sysrq(int key)
> @@ -1043,16 +1042,23 @@ static int __sysrq_swap_key_ops(int key, struct sysrq_key_op *insert_op_p,
> struct sysrq_key_op *remove_op_p)
> {
> int retval;
> - unsigned long flags;
>
> - spin_lock_irqsave(&sysrq_key_table_lock, flags);
> + spin_lock(&sysrq_key_table_lock);
> if (__sysrq_get_key_op(key) == remove_op_p) {
> __sysrq_put_key_op(key, insert_op_p);
> retval = 0;
> } else {
> retval = -1;
> }
> - spin_unlock_irqrestore(&sysrq_key_table_lock, flags);
> + spin_unlock(&sysrq_key_table_lock);
> +
> + /*
> + * A concurrent __handle_sysrq eitehr got the old op or the new op.

yuo cnat spel

> + * Wait for it to go away before returning, so the code for an old
> + * op is not freed (eg. on module unload) while it is in use.
> + */
> + synchronize_rcu();
> +
> return retval;
> }

2014-04-23 20:44:38

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH RFC] sysrq: rcu-ify __handle_sysrq

On 04/23/2014 04:04 PM, Andrew Morton wrote:
> On Wed, 23 Apr 2014 12:53:52 -0400 Rik van Riel <[email protected]> wrote:
>
>> Echoing values into /proc/sysrq-trigger seems to be a popular way to
>> get information out of the kernel. However, dumping information about
>> thousands of processes, or hundreds of CPUs to serial console can
>> result in IRQs being blocked for minutes, resulting in various kinds
>> of cascade failures.
>>
>> The most common failure is due to interrupts being blocked for a very
>> long time. This can lead to things like failed IO requests, and other
>> things the system cannot easily recover from.
>
> I bet nobody wants that console output anyway. You do the sysrq then
> run dmesg or look in /var/log/messages to see what happened. People
> who are experiencing problems such as this should run `dmesg -n 1'
> before writing to sysrq-trigger.

I'm not sure about that. I know of a few hundred QA people who
gather the bulk of their logs through serial console, and they
do appear interested in sysrq output :)

>> It also leaves sysrq-from-irq-context when the sysrq keys are pressed,
>> but that is probably desired since people want that to work in situations
>> where the system is already hosed.
>>
>> The callers of register_sysrq_key and unregister_sysrq_key appear to be
>> capable of sleeping.
>
> unregister_sysrq_key() is basically never used - a couple of scruffy
> drivers during rmmod. We hardly need any locking in there at all. I
> guess using simple RCU is better than just removing it though.

Yeah, I went with the "solve the easy 90%" aspect with this
patch. I am not convinced that we want to complicate the
sysrq code to better support a fringe use case, but if we
can fix the big without increasing the code maintenance
burden in the future, why not?

>> --- a/drivers/tty/sysrq.c
>> +++ b/drivers/tty/sysrq.c
>> @@ -510,9 +510,8 @@ void __handle_sysrq(int key, bool check_mask)
>> struct sysrq_key_op *op_p;
>> int orig_log_level;
>> int i;
>> - unsigned long flags;
>>
>> - spin_lock_irqsave(&sysrq_key_table_lock, flags);
>> + rcu_read_lock();
>> /*
>> * Raise the apparent loglevel to maximum so that the sysrq header
>> * is shown to provide the user with positive feedback. We do not
>> @@ -554,7 +553,7 @@ void __handle_sysrq(int key, bool check_mask)
>> printk("\n");
>> console_loglevel = orig_log_level;
>> }
>> - spin_unlock_irqrestore(&sysrq_key_table_lock, flags);
>> + rcu_read_unlock();
>> }
>>
>> void handle_sysrq(int key)
>> @@ -1043,16 +1042,23 @@ static int __sysrq_swap_key_ops(int key, struct sysrq_key_op *insert_op_p,
>> struct sysrq_key_op *remove_op_p)
>> {
>> int retval;
>> - unsigned long flags;
>>
>> - spin_lock_irqsave(&sysrq_key_table_lock, flags);
>> + spin_lock(&sysrq_key_table_lock);
>> if (__sysrq_get_key_op(key) == remove_op_p) {
>> __sysrq_put_key_op(key, insert_op_p);
>> retval = 0;
>> } else {
>> retval = -1;
>> }
>> - spin_unlock_irqrestore(&sysrq_key_table_lock, flags);
>> + spin_unlock(&sysrq_key_table_lock);
>> +
>> + /*
>> + * A concurrent __handle_sysrq eitehr got the old op or the new op.
>
> yuo cnat spel

I can fix that for version 2, assuming people are interested it a v2 :)

>> + * Wait for it to go away before returning, so the code for an old
>> + * op is not freed (eg. on module unload) while it is in use.
>> + */
>> + synchronize_rcu();
>> +
>> return retval;
>> }
>


--
All rights reversed

2014-04-23 21:38:01

by Jiri Kosina

[permalink] [raw]
Subject: Re: [PATCH RFC] sysrq: rcu-ify __handle_sysrq

On Wed, 23 Apr 2014, Rik van Riel wrote:

> Echoing values into /proc/sysrq-trigger seems to be a popular way to
> get information out of the kernel. However, dumping information about
> thousands of processes, or hundreds of CPUs to serial console can
> result in IRQs being blocked for minutes, resulting in various kinds
> of cascade failures.
>
> The most common failure is due to interrupts being blocked for a very
> long time. This can lead to things like failed IO requests, and other
> things the system cannot easily recover from.
>
> This problem is easily fixable by making __handle_sysrq use RCU
> instead of spin_lock_irqsave.
>
> This leaves the warning that RCU grace periods have not elapsed for a
> long time, but the system will come back from that automatically.

This, however, will make RCU stall detector to send NMI to all online CPUs
so that they can dump their stacks.

IOW, this might actually make the whole sysrq dump last for much longer,
and have the log polluted with all-CPU dumps for no good reason.

I wonder whether explicitly setting rcu_cpu_stall_suppress during sysrq
handling might be a viable workaround for this.

--
Jiri Kosina
SUSE Labs

2014-04-23 21:39:45

by Jiri Kosina

[permalink] [raw]
Subject: Re: [PATCH RFC] sysrq: rcu-ify __handle_sysrq

On Wed, 23 Apr 2014, Andrew Morton wrote:

> > Echoing values into /proc/sysrq-trigger seems to be a popular way to
> > get information out of the kernel. However, dumping information about
> > thousands of processes, or hundreds of CPUs to serial console can
> > result in IRQs being blocked for minutes, resulting in various kinds
> > of cascade failures.
> >
> > The most common failure is due to interrupts being blocked for a very
> > long time. This can lead to things like failed IO requests, and other
> > things the system cannot easily recover from.
>
> I bet nobody wants that console output anyway. You do the sysrq then
> run dmesg or look in /var/log/messages to see what happened. People
> who are experiencing problems such as this should run `dmesg -n 1'
> before writing to sysrq-trigger.

I don't agree. I have used sysrq-t multiple times in situations where
userspace was already dead, but sysrq was still able to provide valuable
information about the state of the kernel.

--
Jiri Kosina
SUSE Labs

2014-04-23 21:41:21

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH RFC] sysrq: rcu-ify __handle_sysrq

On Wed, 23 Apr 2014 23:39:43 +0200 (CEST) Jiri Kosina <[email protected]> wrote:

> On Wed, 23 Apr 2014, Andrew Morton wrote:
>
> > > Echoing values into /proc/sysrq-trigger seems to be a popular way to
> > > get information out of the kernel. However, dumping information about
> > > thousands of processes, or hundreds of CPUs to serial console can
> > > result in IRQs being blocked for minutes, resulting in various kinds
> > > of cascade failures.
> > >
> > > The most common failure is due to interrupts being blocked for a very
> > > long time. This can lead to things like failed IO requests, and other
> > > things the system cannot easily recover from.
> >
> > I bet nobody wants that console output anyway. You do the sysrq then
> > run dmesg or look in /var/log/messages to see what happened. People
> > who are experiencing problems such as this should run `dmesg -n 1'
> > before writing to sysrq-trigger.
>
> I don't agree. I have used sysrq-t multiple times in situations where
> userspace was already dead, but sysrq was still able to provide valuable
> information about the state of the kernel.
>

I'm talking about /proc/sysrq-trigger, not the magic key combo.

2014-04-23 21:42:42

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH RFC] sysrq: rcu-ify __handle_sysrq

On 04/23/2014 05:37 PM, Jiri Kosina wrote:
> On Wed, 23 Apr 2014, Rik van Riel wrote:
>
>> Echoing values into /proc/sysrq-trigger seems to be a popular way to
>> get information out of the kernel. However, dumping information about
>> thousands of processes, or hundreds of CPUs to serial console can
>> result in IRQs being blocked for minutes, resulting in various kinds
>> of cascade failures.
>>
>> The most common failure is due to interrupts being blocked for a very
>> long time. This can lead to things like failed IO requests, and other
>> things the system cannot easily recover from.
>>
>> This problem is easily fixable by making __handle_sysrq use RCU
>> instead of spin_lock_irqsave.
>>
>> This leaves the warning that RCU grace periods have not elapsed for a
>> long time, but the system will come back from that automatically.
>
> This, however, will make RCU stall detector to send NMI to all online CPUs
> so that they can dump their stacks.

It already does that, since several of the longer-running
sysrq handlers already grab rcu_read_lock(), for example
show_state().

> IOW, this might actually make the whole sysrq dump last for much longer,
> and have the log polluted with all-CPU dumps for no good reason.
>
> I wonder whether explicitly setting rcu_cpu_stall_suppress during sysrq
> handling might be a viable workaround for this.

I suppose that would do the trick.

--
All rights reversed

2014-04-23 21:44:50

by Jiri Kosina

[permalink] [raw]
Subject: Re: [PATCH RFC] sysrq: rcu-ify __handle_sysrq

On Wed, 23 Apr 2014, Andrew Morton wrote:

> > > > Echoing values into /proc/sysrq-trigger seems to be a popular way to
> > > > get information out of the kernel. However, dumping information about
> > > > thousands of processes, or hundreds of CPUs to serial console can
> > > > result in IRQs being blocked for minutes, resulting in various kinds
> > > > of cascade failures.
> > > >
> > > > The most common failure is due to interrupts being blocked for a very
> > > > long time. This can lead to things like failed IO requests, and other
> > > > things the system cannot easily recover from.
> > >
> > > I bet nobody wants that console output anyway. You do the sysrq then
> > > run dmesg or look in /var/log/messages to see what happened. People
> > > who are experiencing problems such as this should run `dmesg -n 1'
> > > before writing to sysrq-trigger.
> >
> > I don't agree. I have used sysrq-t multiple times in situations where
> > userspace was already dead, but sysrq was still able to provide valuable
> > information about the state of the kernel.
> >
>
> I'm talking about /proc/sysrq-trigger, not the magic key combo.

At the end of the day, that reaches the same __handle_sysrq() codepath,
no?

--
Jiri Kosina
SUSE Labs

2014-04-23 21:49:39

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH RFC] sysrq: rcu-ify __handle_sysrq

On Wed, 23 Apr 2014 23:44:47 +0200 (CEST) Jiri Kosina <[email protected]> wrote:

> On Wed, 23 Apr 2014, Andrew Morton wrote:
>
> > > > > Echoing values into /proc/sysrq-trigger seems to be a popular way to
> > > > > get information out of the kernel. However, dumping information about
> > > > > thousands of processes, or hundreds of CPUs to serial console can
> > > > > result in IRQs being blocked for minutes, resulting in various kinds
> > > > > of cascade failures.
> > > > >
> > > > > The most common failure is due to interrupts being blocked for a very
> > > > > long time. This can lead to things like failed IO requests, and other
> > > > > things the system cannot easily recover from.
> > > >
> > > > I bet nobody wants that console output anyway. You do the sysrq then
> > > > run dmesg or look in /var/log/messages to see what happened. People
> > > > who are experiencing problems such as this should run `dmesg -n 1'
> > > > before writing to sysrq-trigger.
> > >
> > > I don't agree. I have used sysrq-t multiple times in situations where
> > > userspace was already dead, but sysrq was still able to provide valuable
> > > information about the state of the kernel.
> > >
> >
> > I'm talking about /proc/sysrq-trigger, not the magic key combo.
>
> At the end of the day, that reaches the same __handle_sysrq() codepath,
> no?

What I'm proposing is that we provide a way in which console output may be
suppressed during /proc/sysrq-trigger writes. I'm not suggesting that the
implementation be buggy ;)

But clueful people can run `dmesg -1' beforehand, so I wonder if the
patch really has much value?

2014-04-23 21:51:58

by Jiri Kosina

[permalink] [raw]
Subject: Re: [PATCH RFC] sysrq: rcu-ify __handle_sysrq

On Wed, 23 Apr 2014, Rik van Riel wrote:

> >> Echoing values into /proc/sysrq-trigger seems to be a popular way to
> >> get information out of the kernel. However, dumping information about
> >> thousands of processes, or hundreds of CPUs to serial console can
> >> result in IRQs being blocked for minutes, resulting in various kinds
> >> of cascade failures.
> >>
> >> The most common failure is due to interrupts being blocked for a very
> >> long time. This can lead to things like failed IO requests, and other
> >> things the system cannot easily recover from.
> >>
> >> This problem is easily fixable by making __handle_sysrq use RCU
> >> instead of spin_lock_irqsave.
> >>
> >> This leaves the warning that RCU grace periods have not elapsed for a
> >> long time, but the system will come back from that automatically.
> >
> > This, however, will make RCU stall detector to send NMI to all online CPUs
> > so that they can dump their stacks.
>
> It already does that, since several of the longer-running
> sysrq handlers already grab rcu_read_lock(), for example
> show_state().
>
> > IOW, this might actually make the whole sysrq dump last for much longer,
> > and have the log polluted with all-CPU dumps for no good reason.
> >
> > I wonder whether explicitly setting rcu_cpu_stall_suppress during sysrq
> > handling might be a viable workaround for this.
>
> I suppose that would do the trick.

I can imagine Paul opposing this though ... this variable is supposed to
be changed only by cmdline/modparam, not really flipped during runtime as
a bandaid ... let's add Paul to CC.

--
Jiri Kosina
SUSE Labs

2014-04-24 00:53:41

by Jörn Engel

[permalink] [raw]
Subject: Re: [PATCH RFC] sysrq: rcu-ify __handle_sysrq

On Wed, 23 April 2014 23:37:54 +0200, Jiri Kosina wrote:
>
> This, however, will make RCU stall detector to send NMI to all online CPUs
> so that they can dump their stacks.
>
> IOW, this might actually make the whole sysrq dump last for much longer,
> and have the log polluted with all-CPU dumps for no good reason.

I use the patch below for some time now. While it doesn't avoid the
log pollution in the first place, it lessens the impact somewhat.

Jörn

--
The cost of changing business rules is much more expensive for software
than for a secretaty.
-- unknown

>From 8286b521865503f92837a036fecdb51f41c0141c Mon Sep 17 00:00:00 2001
From: Joern Engel <[email protected]>
Date: Wed, 20 Mar 2013 09:47:52 -0700
Subject: [PATCH] printk: Print cpu number along with time

Sometimes the printk log is heavily interleaving between different cpus.
This is particularly bad when you have two backtraces at the same time,
but can be annoying in other cases as well. With an explicit cpu
number, a simple grep can disentangle the mess for you.

Signed-off-by: Joern Engel <[email protected]>
---
kernel/printk.c | 18 +++++++++++-------
1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/kernel/printk.c b/kernel/printk.c
index abbdd9e..c5f6dda 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -210,6 +210,7 @@ struct log {
u16 len; /* length of entire record */
u16 text_len; /* length of text buffer */
u16 dict_len; /* length of dictionary buffer */
+ u16 cpu; /* cpu the message was generated on */
u8 facility; /* syslog facility */
u8 flags:5; /* internal record flags */
u8 level:3; /* syslog level */
@@ -356,6 +357,7 @@ static void log_store(int facility, int level,
msg->facility = facility;
msg->level = level & 7;
msg->flags = flags & 0x1f;
+ msg->cpu = smp_processor_id();
if (ts_nsec > 0)
msg->ts_nsec = ts_nsec;
else
@@ -863,7 +865,7 @@ static bool printk_time;
#endif
module_param_named(time, printk_time, bool, S_IRUGO | S_IWUSR);

-static size_t print_time(u64 ts, char *buf)
+static size_t print_time(u64 ts, u16 cpu, char *buf)
{
unsigned long rem_nsec;

@@ -871,12 +873,12 @@ static size_t print_time(u64 ts, char *buf)
return 0;

rem_nsec = do_div(ts, 1000000000);
-
if (!buf)
- return snprintf(NULL, 0, "[%5lu.000000] ", (unsigned long)ts);
+ return snprintf(NULL, 0, "[%5lu.000000,%02x] ",
+ (unsigned long)ts, cpu);

- return sprintf(buf, "[%5lu.%06lu] ",
- (unsigned long)ts, rem_nsec / 1000);
+ return sprintf(buf, "[%5lu.%06lu,%02x] ",
+ (unsigned long)ts, rem_nsec / 1000, cpu);
}

static size_t print_prefix(const struct log *msg, bool syslog, char *buf)
@@ -898,7 +900,7 @@ static size_t print_prefix(const struct log *msg, bool syslog, char *buf)
}
}

- len += print_time(msg->ts_nsec, buf ? buf + len : NULL);
+ len += print_time(msg->ts_nsec, msg->cpu, buf ? buf + len : NULL);
return len;
}

@@ -1396,6 +1398,7 @@ static struct cont {
size_t cons; /* bytes written to console */
struct task_struct *owner; /* task of first print*/
u64 ts_nsec; /* time of first print */
+ u16 cpu; /* cpu the message was generated on */
u8 level; /* log level of first message */
u8 facility; /* log level of first message */
enum log_flags flags; /* prefix, newline flags */
@@ -1445,6 +1448,7 @@ static bool cont_add(int facility, int level, const char *text, size_t len)
cont.facility = facility;
cont.level = level;
cont.owner = current;
+ cont.cpu = smp_processor_id();
cont.ts_nsec = local_clock();
cont.flags = 0;
cont.cons = 0;
@@ -1466,7 +1470,7 @@ static size_t cont_print_text(char *text, size_t size)
size_t len;

if (cont.cons == 0 && (console_prev & LOG_NEWLINE)) {
- textlen += print_time(cont.ts_nsec, text);
+ textlen += print_time(cont.ts_nsec, cont.cpu, text);
size -= textlen;
}

--
1.7.10.4

2014-04-24 01:54:31

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH RFC] sysrq: rcu-ify __handle_sysrq

On Wed, Apr 23, 2014 at 11:51:55PM +0200, Jiri Kosina wrote:
> On Wed, 23 Apr 2014, Rik van Riel wrote:
>
> > >> Echoing values into /proc/sysrq-trigger seems to be a popular way to
> > >> get information out of the kernel. However, dumping information about
> > >> thousands of processes, or hundreds of CPUs to serial console can
> > >> result in IRQs being blocked for minutes, resulting in various kinds
> > >> of cascade failures.
> > >>
> > >> The most common failure is due to interrupts being blocked for a very
> > >> long time. This can lead to things like failed IO requests, and other
> > >> things the system cannot easily recover from.
> > >>
> > >> This problem is easily fixable by making __handle_sysrq use RCU
> > >> instead of spin_lock_irqsave.
> > >>
> > >> This leaves the warning that RCU grace periods have not elapsed for a
> > >> long time, but the system will come back from that automatically.
> > >
> > > This, however, will make RCU stall detector to send NMI to all online CPUs
> > > so that they can dump their stacks.

Hey, if dumping the stacks once is a good idea, dumping them twice
must be twice as good, right? ;-)

> > It already does that, since several of the longer-running
> > sysrq handlers already grab rcu_read_lock(), for example
> > show_state().
> >
> > > IOW, this might actually make the whole sysrq dump last for much longer,
> > > and have the log polluted with all-CPU dumps for no good reason.
> > >
> > > I wonder whether explicitly setting rcu_cpu_stall_suppress during sysrq
> > > handling might be a viable workaround for this.
> >
> > I suppose that would do the trick.
>
> I can imagine Paul opposing this though ... this variable is supposed to
> be changed only by cmdline/modparam, not really flipped during runtime as
> a bandaid ... let's add Paul to CC.

Well, we already crowbar it to 1 when panic starts, see rcu_panic().

How about something like the following?

void rcu_sysrq_start(void)
{
rcu_cpu_stall_suppress = 2;
}

void rcu_sysrq_end(void)
{
if (rcu_cpu_stall_suppress == 2)
rcu_cpu_stall_suppress = 0;
}

If there get to be too many more different reasons for temporarily
suppressing RCU CPU stall warnings, I can then swap out to a better
implementation, for some definition or another of "better".

Thanx, Paul

2014-04-24 13:05:29

by Rik van Riel

[permalink] [raw]
Subject: [PATCH RFC] sysrq,rcu: suppress RCU stall warnings while sysrq runs

As suggested by Paul McKenney...

This seems small and simple enough that it should not introduce any
extra maintenance work in the future.

---8<---

Some sysrq handlers can run for a long time, because they dump a lot
of data onto a serial console. Having RCU stall warnings pop up in
the middle of them only makes the problem worse.

This patch temporarily disables RCU stall warnings while a sysrq
request is handled.

Signed-off-by: Rik van Riel <[email protected]>
Suggested-by: Paul McKenney <[email protected]>
---
drivers/tty/sysrq.c | 2 ++
include/linux/rcupdate.h | 2 ++
kernel/rcu/update.c | 11 +++++++++++
3 files changed, 15 insertions(+)

diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
index 3c61e9b..dd53af9 100644
--- a/drivers/tty/sysrq.c
+++ b/drivers/tty/sysrq.c
@@ -511,6 +511,7 @@ void __handle_sysrq(int key, bool check_mask)
int orig_log_level;
int i;

+ rcu_sysrq_start();
rcu_read_lock();
/*
* Raise the apparent loglevel to maximum so that the sysrq header
@@ -554,6 +555,7 @@ void __handle_sysrq(int key, bool check_mask)
console_loglevel = orig_log_level;
}
rcu_read_unlock();
+ rcu_sysrq_end();
}

void handle_sysrq(int key)
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 00a7fd6..b9a95f8 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -227,6 +227,8 @@ void rcu_idle_enter(void);
void rcu_idle_exit(void);
void rcu_irq_enter(void);
void rcu_irq_exit(void);
+void rcu_sysrq_start(void);
+void rcu_sysrq_end(void);

#ifdef CONFIG_RCU_USER_QS
void rcu_user_enter(void);
diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index 4c0a9b0..2ac3289 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -320,6 +320,17 @@ int rcu_jiffies_till_stall_check(void)
return till_stall_check * HZ + RCU_STALL_DELAY_DELTA;
}

+void rcu_sysrq_start(void)
+{
+ rcu_cpu_stall_suppress = 2;
+}
+
+void rcu_sysrq_end(void)
+{
+ if (rcu_cpu_stall_suppress == 2)
+ rcu_cpu_stall_suppress = 0;
+}
+
static int rcu_panic(struct notifier_block *this, unsigned long ev, void *ptr)
{
rcu_cpu_stall_suppress = 1;

2014-04-24 16:33:06

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH RFC] sysrq,rcu: suppress RCU stall warnings while sysrq runs

On Thu, Apr 24, 2014 at 09:04:52AM -0400, Rik van Riel wrote:
> As suggested by Paul McKenney...
>
> This seems small and simple enough that it should not introduce any
> extra maintenance work in the future.
>
> ---8<---
>
> Some sysrq handlers can run for a long time, because they dump a lot
> of data onto a serial console. Having RCU stall warnings pop up in
> the middle of them only makes the problem worse.
>
> This patch temporarily disables RCU stall warnings while a sysrq
> request is handled.
>
> Signed-off-by: Rik van Riel <[email protected]>
> Suggested-by: Paul McKenney <[email protected]>

I guess this means you like the idea. ;-)

Reviewed-by: Paul E. McKenney <[email protected]>

> ---
> drivers/tty/sysrq.c | 2 ++
> include/linux/rcupdate.h | 2 ++
> kernel/rcu/update.c | 11 +++++++++++
> 3 files changed, 15 insertions(+)
>
> diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
> index 3c61e9b..dd53af9 100644
> --- a/drivers/tty/sysrq.c
> +++ b/drivers/tty/sysrq.c
> @@ -511,6 +511,7 @@ void __handle_sysrq(int key, bool check_mask)
> int orig_log_level;
> int i;
>
> + rcu_sysrq_start();
> rcu_read_lock();
> /*
> * Raise the apparent loglevel to maximum so that the sysrq header
> @@ -554,6 +555,7 @@ void __handle_sysrq(int key, bool check_mask)
> console_loglevel = orig_log_level;
> }
> rcu_read_unlock();
> + rcu_sysrq_end();
> }
>
> void handle_sysrq(int key)
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 00a7fd6..b9a95f8 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -227,6 +227,8 @@ void rcu_idle_enter(void);
> void rcu_idle_exit(void);
> void rcu_irq_enter(void);
> void rcu_irq_exit(void);
> +void rcu_sysrq_start(void);
> +void rcu_sysrq_end(void);
>
> #ifdef CONFIG_RCU_USER_QS
> void rcu_user_enter(void);
> diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> index 4c0a9b0..2ac3289 100644
> --- a/kernel/rcu/update.c
> +++ b/kernel/rcu/update.c
> @@ -320,6 +320,17 @@ int rcu_jiffies_till_stall_check(void)
> return till_stall_check * HZ + RCU_STALL_DELAY_DELTA;
> }
>
> +void rcu_sysrq_start(void)
> +{
> + rcu_cpu_stall_suppress = 2;
> +}
> +
> +void rcu_sysrq_end(void)
> +{
> + if (rcu_cpu_stall_suppress == 2)
> + rcu_cpu_stall_suppress = 0;
> +}
> +
> static int rcu_panic(struct notifier_block *this, unsigned long ev, void *ptr)
> {
> rcu_cpu_stall_suppress = 1;
>

2014-04-24 19:41:20

by Jörn Engel

[permalink] [raw]
Subject: [PATCH] printk: Print cpu number along with time

On Wed, 23 April 2014 20:52:47 -0400, Jörn Engel wrote:
>
> I use the patch below for some time now. While it doesn't avoid the
> log pollution in the first place, it lessens the impact somewhat.

Added a config option and ported it to current -linus. Andrew, would
you take this patch?

---

Sometimes the printk log is heavily interleaving between different cpus.
This is particularly bad when you have two backtraces at the same time,
but can be annoying in other cases as well. With an explicit cpu
number, a simple grep can disentangle the mess for you.

Signed-off-by: Joern Engel <[email protected]>
---
kernel/printk/printk.c | 19 ++++++++++++++++---
lib/Kconfig.debug | 9 +++++++++
2 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index a45b50962295..b9e464924825 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -200,6 +200,7 @@ struct printk_log {
u16 len; /* length of entire record */
u16 text_len; /* length of text buffer */
u16 dict_len; /* length of dictionary buffer */
+ u16 cpu; /* cpu the message was generated on */
u8 facility; /* syslog facility */
u8 flags:5; /* internal record flags */
u8 level:3; /* syslog level */
@@ -346,6 +347,7 @@ static void log_store(int facility, int level,
msg->facility = facility;
msg->level = level & 7;
msg->flags = flags & 0x1f;
+ msg->cpu = smp_processor_id();
if (ts_nsec > 0)
msg->ts_nsec = ts_nsec;
else
@@ -859,7 +861,7 @@ static bool printk_time;
#endif
module_param_named(time, printk_time, bool, S_IRUGO | S_IWUSR);

-static size_t print_time(u64 ts, char *buf)
+static size_t print_time(u64 ts, u16 cpu, char *buf)
{
unsigned long rem_nsec;

@@ -868,11 +870,20 @@ static size_t print_time(u64 ts, char *buf)

rem_nsec = do_div(ts, 1000000000);

+#ifdef CONFIG_PRINTK_CPU
+ if (!buf)
+ return snprintf(NULL, 0, "[%5lu.000000,%02x] ",
+ (unsigned long)ts, cpu);
+
+ return sprintf(buf, "[%5lu.%06lu,%02x] ",
+ (unsigned long)ts, rem_nsec / 1000, cpu);
+#else
if (!buf)
return snprintf(NULL, 0, "[%5lu.000000] ", (unsigned long)ts);

return sprintf(buf, "[%5lu.%06lu] ",
(unsigned long)ts, rem_nsec / 1000);
+#endif
}

static size_t print_prefix(const struct printk_log *msg, bool syslog, char *buf)
@@ -894,7 +905,7 @@ static size_t print_prefix(const struct printk_log *msg, bool syslog, char *buf)
}
}

- len += print_time(msg->ts_nsec, buf ? buf + len : NULL);
+ len += print_time(msg->ts_nsec, msg->cpu, buf ? buf + len : NULL);
return len;
}

@@ -1391,6 +1402,7 @@ static struct cont {
size_t cons; /* bytes written to console */
struct task_struct *owner; /* task of first print*/
u64 ts_nsec; /* time of first print */
+ u16 cpu; /* cpu the message was generated on */
u8 level; /* log level of first message */
u8 facility; /* log level of first message */
enum log_flags flags; /* prefix, newline flags */
@@ -1440,6 +1452,7 @@ static bool cont_add(int facility, int level, const char *text, size_t len)
cont.facility = facility;
cont.level = level;
cont.owner = current;
+ cont.cpu = smp_processor_id();
cont.ts_nsec = local_clock();
cont.flags = 0;
cont.cons = 0;
@@ -1461,7 +1474,7 @@ static size_t cont_print_text(char *text, size_t size)
size_t len;

if (cont.cons == 0 && (console_prev & LOG_NEWLINE)) {
- textlen += print_time(cont.ts_nsec, text);
+ textlen += print_time(cont.ts_nsec, cont.cpu, text);
size -= textlen;
}

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 819ac51202c0..bdec67eca521 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -15,6 +15,15 @@ config PRINTK_TIME
The behavior is also controlled by the kernel command line
parameter printk.time=1. See Documentation/kernel-parameters.txt

+config PRINTK_CPU
+ bool "Show CPU number along with timing information"
+ depends on PRINTK_TIME
+ help
+ Selecting this option adds the CPU number to be added to
+ timestamps of printk() messages. This can be useful when
+ messages from several CPUs are interleaved, in particular
+ for stack traces.
+
config DEFAULT_MESSAGE_LOGLEVEL
int "Default message log level (1-7)"
range 1 7
--
2.0.0.rc0.1.g7b2ba98

2014-04-24 19:55:30

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH] printk: Print cpu number along with time

On Thu, Apr 24, 2014 at 03:40:24PM -0400, J?rn Engel wrote:
> On Wed, 23 April 2014 20:52:47 -0400, J?rn Engel wrote:
> >
> > I use the patch below for some time now. While it doesn't avoid the
> > log pollution in the first place, it lessens the impact somewhat.
>
> Added a config option and ported it to current -linus. Andrew, would
> you take this patch?
>
> ---
>
> Sometimes the printk log is heavily interleaving between different cpus.
> This is particularly bad when you have two backtraces at the same time,
> but can be annoying in other cases as well. With an explicit cpu
> number, a simple grep can disentangle the mess for you.
>
> Signed-off-by: Joern Engel <[email protected]>
> ---
> kernel/printk/printk.c | 19 ++++++++++++++++---
> lib/Kconfig.debug | 9 +++++++++
> 2 files changed, 25 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index a45b50962295..b9e464924825 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -200,6 +200,7 @@ struct printk_log {
> u16 len; /* length of entire record */
> u16 text_len; /* length of text buffer */
> u16 dict_len; /* length of dictionary buffer */
> + u16 cpu; /* cpu the message was generated on */
> u8 facility; /* syslog facility */
> u8 flags:5; /* internal record flags */
> u8 level:3; /* syslog level */
> @@ -346,6 +347,7 @@ static void log_store(int facility, int level,
> msg->facility = facility;
> msg->level = level & 7;
> msg->flags = flags & 0x1f;
> + msg->cpu = smp_processor_id();
> if (ts_nsec > 0)
> msg->ts_nsec = ts_nsec;
> else
> @@ -859,7 +861,7 @@ static bool printk_time;
> #endif
> module_param_named(time, printk_time, bool, S_IRUGO | S_IWUSR);
>
> -static size_t print_time(u64 ts, char *buf)
> +static size_t print_time(u64 ts, u16 cpu, char *buf)
> {
> unsigned long rem_nsec;
>
> @@ -868,11 +870,20 @@ static size_t print_time(u64 ts, char *buf)
>
> rem_nsec = do_div(ts, 1000000000);
>
> +#ifdef CONFIG_PRINTK_CPU
> + if (!buf)
> + return snprintf(NULL, 0, "[%5lu.000000,%02x] ",

%02x for a cpu? What happens on machines with 8k cpus?

And is this really an issue? Debugging by using printk is fun, but not
really something that people need to add a cpu number to. Why not just
use a tracepoint in your code to get the needed information instead?

thanks,

greg k-h

2014-04-24 21:24:09

by Jörn Engel

[permalink] [raw]
Subject: Re: [PATCH] printk: Print cpu number along with time

On Thu, 24 April 2014 12:58:21 -0700, Greg Kroah-Hartman wrote:
> > @@ -868,11 +870,20 @@ static size_t print_time(u64 ts, char *buf)
> >
> > rem_nsec = do_div(ts, 1000000000);
> >
> > +#ifdef CONFIG_PRINTK_CPU
> > + if (!buf)
> > + return snprintf(NULL, 0, "[%5lu.000000,%02x] ",
>
> %02x for a cpu? What happens on machines with 8k cpus?

You should get something like:
Apr 23 10:34:03 [ 228.950926,1a] Call Trace:
Apr 23 10:34:03 [ 228.950926,201] Call Trace:
Apr 23 10:34:03 [ 228.950928,1a] [<ffffffff815e0f29>] schedule+0x29/0x70
...

That said, I don't have access to hardware with >256 cpus and haven't
actually tested this.

> And is this really an issue? Debugging by using printk is fun, but not
> really something that people need to add a cpu number to. Why not just
> use a tracepoint in your code to get the needed information instead?

Yes, this is an issue. There have been several instances in my life
when I would have had to guess which function belonged to which
backtrace without this patch. Good guesses take a long time, bad
guesses are nearly useless and giving up means you cannot debug the
issue.

I don't have a good example at hand, but here is something from a
quick grep for illustration.
Feb 22 19:04:46 [ 11.642470,2e] Call Trace:
Feb 22 19:04:46 [ 11.642472,16] Call Trace:
Feb 22 19:04:46 [ 11.642476,29] [<ffffffff81115bf8>] pcpu_alloc+0x988/0xa20
Feb 22 19:04:46 [ 11.642481,01] [<ffffffff810992ad>] ? find_symbol+0x3d/0xb0
Feb 22 19:04:46 [ 11.642489,05] [<ffffffff81115bf8>] pcpu_alloc+0x988/0xa20
Feb 22 19:04:46 [ 11.642493,12] [<ffffffff81115bf8>] pcpu_alloc+0x988/0xa20
Feb 22 19:04:46 [ 11.642498,08] [<ffffffff81115bf8>] pcpu_alloc+0x988/0xa20
Feb 22 19:04:46 [ 11.642504,21] [<ffffffff810992ad>] ? find_symbol+0x3d/0xb0

And are you seriously suggesting I use a tracepoint for kernel panics? ;)

Jörn

--
There's nothing better for promoting creativity in a medium than
making an audience feel "Hmm ­ I could do better than that!"
-- Douglas Adams in a slashdot interview

2014-04-24 22:12:13

by Jiri Kosina

[permalink] [raw]
Subject: Re: [PATCH] printk: Print cpu number along with time

On Thu, 24 Apr 2014, Greg Kroah-Hartman wrote:

> > +#ifdef CONFIG_PRINTK_CPU
> > + if (!buf)
> > + return snprintf(NULL, 0, "[%5lu.000000,%02x] ",
>
> %02x for a cpu? What happens on machines with 8k cpus?

Ummm ... what issue do you see here, Greg? It'll print 0x1f40, no?

> And is this really an issue? Debugging by using printk is fun, but not
> really something that people need to add a cpu number to. Why not just
> use a tracepoint in your code to get the needed information instead?

Well, if you have dmesg dump from panic that happens every other year, and
you have to do post-mortem analysis on it, I am pretty sure you would love
to be able to figure out how the stack traces would look like without
inter-CPU interleaving. And I am pretty sure you wouldn't want to
insert/enable a tracepoint and wait another two years for the bug to
trigger again.

--
Jiri Kosina
SUSE Labs

2014-04-24 22:17:57

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH] printk: Print cpu number along with time

On Fri, Apr 25, 2014 at 12:12:06AM +0200, Jiri Kosina wrote:
> On Thu, 24 Apr 2014, Greg Kroah-Hartman wrote:
>
> > > +#ifdef CONFIG_PRINTK_CPU
> > > + if (!buf)
> > > + return snprintf(NULL, 0, "[%5lu.000000,%02x] ",
> >
> > %02x for a cpu? What happens on machines with 8k cpus?
>
> Ummm ... what issue do you see here, Greg? It'll print 0x1f40, no?

Yes, just not be as "pretty" and aligned properly :)

> > And is this really an issue? Debugging by using printk is fun, but not
> > really something that people need to add a cpu number to. Why not just
> > use a tracepoint in your code to get the needed information instead?
>
> Well, if you have dmesg dump from panic that happens every other year, and
> you have to do post-mortem analysis on it, I am pretty sure you would love
> to be able to figure out how the stack traces would look like without
> inter-CPU interleaving. And I am pretty sure you wouldn't want to
> insert/enable a tracepoint and wait another two years for the bug to
> trigger again.

Fair enough, and the multi-oops message drives that home.

But does this break things like 'dmesg --human' mode that is in
util-linux?

thanks,

greg k-h

2014-04-24 22:19:00

by David Rientjes

[permalink] [raw]
Subject: Re: [PATCH] printk: Print cpu number along with time

On Fri, 25 Apr 2014, Jiri Kosina wrote:

> > > +#ifdef CONFIG_PRINTK_CPU
> > > + if (!buf)
> > > + return snprintf(NULL, 0, "[%5lu.000000,%02x] ",
> >
> > %02x for a cpu? What happens on machines with 8k cpus?
>
> Ummm ... what issue do you see here, Greg? It'll print 0x1f40, no?
>

I think he's referring to the alignment with %02x.

> > And is this really an issue? Debugging by using printk is fun, but not
> > really something that people need to add a cpu number to. Why not just
> > use a tracepoint in your code to get the needed information instead?
>
> Well, if you have dmesg dump from panic that happens every other year, and
> you have to do post-mortem analysis on it, I am pretty sure you would love
> to be able to figure out how the stack traces would look like without
> inter-CPU interleaving. And I am pretty sure you wouldn't want to
> insert/enable a tracepoint and wait another two years for the bug to
> trigger again.
>

Sounds like the appropriate fix would be to serialize stack dumping to the
kernel log.

2014-04-24 22:21:29

by Jiri Kosina

[permalink] [raw]
Subject: Re: [PATCH] printk: Print cpu number along with time

On Thu, 24 Apr 2014, David Rientjes wrote:

> > Well, if you have dmesg dump from panic that happens every other year, and
> > you have to do post-mortem analysis on it, I am pretty sure you would love
> > to be able to figure out how the stack traces would look like without
> > inter-CPU interleaving. And I am pretty sure you wouldn't want to
> > insert/enable a tracepoint and wait another two years for the bug to
> > trigger again.
>
> Sounds like the appropriate fix would be to serialize stack dumping to the
> kernel log.

That's rather difficult, as it can be done both from NMI and non-NMI
contexts, and those are very hard to synchronize.

--
Jiri Kosina
SUSE Labs

2014-04-24 23:30:18

by Jörn Engel

[permalink] [raw]
Subject: Re: [PATCH] printk: Print cpu number along with time

On Fri, 25 April 2014 00:21:20 +0200, Jiri Kosina wrote:
> On Thu, 24 Apr 2014, David Rientjes wrote:
>
> > > Well, if you have dmesg dump from panic that happens every other year, and
> > > you have to do post-mortem analysis on it, I am pretty sure you would love
> > > to be able to figure out how the stack traces would look like without
> > > inter-CPU interleaving. And I am pretty sure you wouldn't want to
> > > insert/enable a tracepoint and wait another two years for the bug to
> > > trigger again.
> >
> > Sounds like the appropriate fix would be to serialize stack dumping to the
> > kernel log.
>
> That's rather difficult, as it can be done both from NMI and non-NMI
> contexts, and those are very hard to synchronize.

It is also easy to get wrong and cause a panic() that doesn't reboot
but instead gets stuck somewhere.

Plus, you might want similar serialization for other multi-line debug
messages. Scsi error messages come to mind, but we have many more
examples.

Jörn

--
Any headline which ends in a question mark can be answered by the word 'no'.
-- Betteridge's Law

2014-04-25 05:35:58

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH RFC] sysrq,rcu: suppress RCU stall warnings while sysrq runs

On Thu, 2014-04-24 at 09:04 -0400, Rik van Riel wrote:

> diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> index 4c0a9b0..2ac3289 100644
> --- a/kernel/rcu/update.c
> +++ b/kernel/rcu/update.c
> @@ -320,6 +320,17 @@ int rcu_jiffies_till_stall_check(void)
> return till_stall_check * HZ + RCU_STALL_DELAY_DELTA;
> }
>
> +void rcu_sysrq_start(void)
> +{
> + rcu_cpu_stall_suppress = 2;
> +}

Monster box folks seem to routinely kill that at boot, probably won't
want it coming back to life at the poke of a button.

-Mike

2014-04-28 23:41:39

by Jörn Engel

[permalink] [raw]
Subject: Re: [PATCH] printk: Print cpu number along with time

On Thu, 24 April 2014 15:40:24 -0400, Jörn Engel wrote:
> On Wed, 23 April 2014 20:52:47 -0400, Jörn Engel wrote:
> >
> > I use the patch below for some time now. While it doesn't avoid the
> > log pollution in the first place, it lessens the impact somewhat.
>
> Added a config option and ported it to current -linus. Andrew, would
> you take this patch?

Andrew? Did you dislike this patch for some reason or just miss it in
the thread?

> ---
>
> Sometimes the printk log is heavily interleaving between different cpus.
> This is particularly bad when you have two backtraces at the same time,
> but can be annoying in other cases as well. With an explicit cpu
> number, a simple grep can disentangle the mess for you.
>
> Signed-off-by: Joern Engel <[email protected]>
> ---
> kernel/printk/printk.c | 19 ++++++++++++++++---
> lib/Kconfig.debug | 9 +++++++++
> 2 files changed, 25 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index a45b50962295..b9e464924825 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -200,6 +200,7 @@ struct printk_log {
> u16 len; /* length of entire record */
> u16 text_len; /* length of text buffer */
> u16 dict_len; /* length of dictionary buffer */
> + u16 cpu; /* cpu the message was generated on */
> u8 facility; /* syslog facility */
> u8 flags:5; /* internal record flags */
> u8 level:3; /* syslog level */
> @@ -346,6 +347,7 @@ static void log_store(int facility, int level,
> msg->facility = facility;
> msg->level = level & 7;
> msg->flags = flags & 0x1f;
> + msg->cpu = smp_processor_id();
> if (ts_nsec > 0)
> msg->ts_nsec = ts_nsec;
> else
> @@ -859,7 +861,7 @@ static bool printk_time;
> #endif
> module_param_named(time, printk_time, bool, S_IRUGO | S_IWUSR);
>
> -static size_t print_time(u64 ts, char *buf)
> +static size_t print_time(u64 ts, u16 cpu, char *buf)
> {
> unsigned long rem_nsec;
>
> @@ -868,11 +870,20 @@ static size_t print_time(u64 ts, char *buf)
>
> rem_nsec = do_div(ts, 1000000000);
>
> +#ifdef CONFIG_PRINTK_CPU
> + if (!buf)
> + return snprintf(NULL, 0, "[%5lu.000000,%02x] ",
> + (unsigned long)ts, cpu);
> +
> + return sprintf(buf, "[%5lu.%06lu,%02x] ",
> + (unsigned long)ts, rem_nsec / 1000, cpu);
> +#else
> if (!buf)
> return snprintf(NULL, 0, "[%5lu.000000] ", (unsigned long)ts);
>
> return sprintf(buf, "[%5lu.%06lu] ",
> (unsigned long)ts, rem_nsec / 1000);
> +#endif
> }
>
> static size_t print_prefix(const struct printk_log *msg, bool syslog, char *buf)
> @@ -894,7 +905,7 @@ static size_t print_prefix(const struct printk_log *msg, bool syslog, char *buf)
> }
> }
>
> - len += print_time(msg->ts_nsec, buf ? buf + len : NULL);
> + len += print_time(msg->ts_nsec, msg->cpu, buf ? buf + len : NULL);
> return len;
> }
>
> @@ -1391,6 +1402,7 @@ static struct cont {
> size_t cons; /* bytes written to console */
> struct task_struct *owner; /* task of first print*/
> u64 ts_nsec; /* time of first print */
> + u16 cpu; /* cpu the message was generated on */
> u8 level; /* log level of first message */
> u8 facility; /* log level of first message */
> enum log_flags flags; /* prefix, newline flags */
> @@ -1440,6 +1452,7 @@ static bool cont_add(int facility, int level, const char *text, size_t len)
> cont.facility = facility;
> cont.level = level;
> cont.owner = current;
> + cont.cpu = smp_processor_id();
> cont.ts_nsec = local_clock();
> cont.flags = 0;
> cont.cons = 0;
> @@ -1461,7 +1474,7 @@ static size_t cont_print_text(char *text, size_t size)
> size_t len;
>
> if (cont.cons == 0 && (console_prev & LOG_NEWLINE)) {
> - textlen += print_time(cont.ts_nsec, text);
> + textlen += print_time(cont.ts_nsec, cont.cpu, text);
> size -= textlen;
> }
>
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 819ac51202c0..bdec67eca521 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -15,6 +15,15 @@ config PRINTK_TIME
> The behavior is also controlled by the kernel command line
> parameter printk.time=1. See Documentation/kernel-parameters.txt
>
> +config PRINTK_CPU
> + bool "Show CPU number along with timing information"
> + depends on PRINTK_TIME
> + help
> + Selecting this option adds the CPU number to be added to
> + timestamps of printk() messages. This can be useful when
> + messages from several CPUs are interleaved, in particular
> + for stack traces.
> +
> config DEFAULT_MESSAGE_LOGLEVEL
> int "Default message log level (1-7)"
> range 1 7
> --
> 2.0.0.rc0.1.g7b2ba98
>

Jörn

--
The grand essentials of happiness are: something to do, something to
love, and something to hope for.
-- Allan K. Chalmers

2014-04-29 00:20:39

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] printk: Print cpu number along with time

On Mon, 28 Apr 2014 19:40:39 -0400 J__rn Engel <[email protected]> wrote:

> On Thu, 24 April 2014 15:40:24 -0400, J__rn Engel wrote:
> > On Wed, 23 April 2014 20:52:47 -0400, J__rn Engel wrote:
> > >
> > > I use the patch below for some time now. While it doesn't avoid the
> > > log pollution in the first place, it lessens the impact somewhat.
> >
> > Added a config option and ported it to current -linus. Andrew, would
> > you take this patch?
>
> Andrew? Did you dislike this patch for some reason or just miss it in
> the thread?

Neither ;) I try to respond in some way to all patches unless I think it's clear
to the originator why I took no action. And I think I'm pretty good at
not losing stuff.

otoh, it has only been four days, three of which I spent offline...

2014-06-04 23:17:01

by Jörn Engel

[permalink] [raw]
Subject: Re: [PATCH] printk: Print cpu number along with time

On Mon, 28 April 2014 17:22:19 -0700, Andrew Morton wrote:
> On Mon, 28 Apr 2014 19:40:39 -0400 J__rn Engel <[email protected]> wrote:
> > On Thu, 24 April 2014 15:40:24 -0400, J__rn Engel wrote:
> > > On Wed, 23 April 2014 20:52:47 -0400, J__rn Engel wrote:
> > > >
> > > > I use the patch below for some time now. While it doesn't avoid the
> > > > log pollution in the first place, it lessens the impact somewhat.
> > >
> > > Added a config option and ported it to current -linus. Andrew, would
> > > you take this patch?
> >
> > Andrew? Did you dislike this patch for some reason or just miss it in
> > the thread?
>
> Neither ;) I try to respond in some way to all patches unless I think it's clear
> to the originator why I took no action. And I think I'm pretty good at
> not losing stuff.
>
> otoh, it has only been four days, three of which I spent offline...

Ping. It is closer to 40 days now. Can I fault this patch back into
your memory?

Jörn

--
Good warriors cause others to come to them and do not go to others.
-- Sun Tzu

2014-06-04 23:28:35

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] printk: Print cpu number along with time

On Wed, 4 Jun 2014 19:15:06 -0400 J__rn Engel <[email protected]> wrote:

> On Mon, 28 April 2014 17:22:19 -0700, Andrew Morton wrote:
> > On Mon, 28 Apr 2014 19:40:39 -0400 J__rn Engel <[email protected]> wrote:
> > > On Thu, 24 April 2014 15:40:24 -0400, J__rn Engel wrote:
> > > > On Wed, 23 April 2014 20:52:47 -0400, J__rn Engel wrote:
> > > > >
> > > > > I use the patch below for some time now. While it doesn't avoid the
> > > > > log pollution in the first place, it lessens the impact somewhat.
> > > >
> > > > Added a config option and ported it to current -linus. Andrew, would
> > > > you take this patch?
> > >
> > > Andrew? Did you dislike this patch for some reason or just miss it in
> > > the thread?
> >
> > Neither ;) I try to respond in some way to all patches unless I think it's clear
> > to the originator why I took no action. And I think I'm pretty good at
> > not losing stuff.
> >
> > otoh, it has only been four days, three of which I spent offline...
>
> Ping. It is closer to 40 days now. Can I fault this patch back into
> your memory?
>

Sorry, it seems I'm better at losing stuff than I thought I was.

I've restored the patch to the input queue.

2014-06-04 23:51:25

by Jörn Engel

[permalink] [raw]
Subject: Re: [PATCH] printk: Print cpu number along with time

On Wed, 4 June 2014 16:28:32 -0700, Andrew Morton wrote:
>
> Sorry, it seems I'm better at losing stuff than I thought I was.

The trouble with forgetting stuff is that few people remember ever
doing so. But they do remember other people forgetting stuff.

> I've restored the patch to the input queue.

Thanks!

Jörn

--
Maintenance in other professions and of other articles is concerned with
the return of the item to its original state; in Software, maintenance
is concerned with moving an item away from its original state.
-- Les Belady