2012-06-21 23:52:08

by Steven Rostedt

[permalink] [raw]
Subject: [PATCH] printk: Add printk_flush() to force buffered text to console

Fengguang Wu had a config that caused the system to lockup. It reported:

[ 6.086395] type=2000 audit(1339501575.085:1): initialized
[ 6.116356] Kprobe smoke test started
[ 6.125915] Kprobe smoke test passed successfully
[ 6.127478] rcu-torture:--- Start of test: nreaders=2 nfakewriters=4 stat_interval=0 verbose=0 test_no_idle_hz=0 shuffle_interval=3 stutter=5 irqreader=1 fqs_duration=0
+fqs_holdoff=0 fqs_stutter=3 test_boost=1/0 test_boost_interval=7 test_boost_duration=4 shutdown_secs=0 onoff_interval=0 onoff_holdoff=0

and then froze. So naturally, the suspected bug was with rcu-torture.
Fengguang did a git bisect and discovered that the crash came with a
function trace update. He also noticed that if he reverted that update,
the system got farther and showed:

[ 1.611901] Testing tracer function: PASSED

His time was wasted by the fact that the function tracing self test
first prints:

"Testing tracer function: "

then runs the test, and if it succeeds, it prints "PASSED", giving us
the nice output you see above.

But with the new printk() changes, text without a newline gets buffered
and does not print out to the console at the location of the printk.
This caused the "Testing tracer function: " not to print out and because
the test caused the kernel to lock up, we are clueless to the fact that
the problem was with the function tracer test and not the rcu_torture
test. As it made sense that the rcu_torture test could lock up the
system, many kernel developer hours were wasted chasing the wrong wild
goose.

If the "Testing tracer function: " had printed out in the first place,
we would have caught the correct wild goose and saved precious kernel
developer's time.

Thus a need for the following utility. That is to add a 'printk_flush()'
that acts like a fflush() in userspace to flush out the buffers used by
the printing facility so we don't have unexpected hunts for nature
roaming fowl.

I hooked onto the 'facility' infrastructure and used '0x1ffc' (the max
number) as a way to communicate printk() commands to the msg_print_text()
which is performed at a later time (console_unlock()).

I tested this out, (adding pr_flush() after my printk) and now the
lockup shows:

[ 9.018231] Kprobe smoke test passed successfully
[ 9.023084] rcu-torture:--- Start of test: nreaders=4 nfakewriters=4 stat_interval=0 verbose=0 test_no_idle_hz=0 shuffle_interval=3 stutter=5 irqreader=1 fqs_duration=0 fqs_hold
off=0 fqs_stutter=3 test_boost=1/0 test_boost_interval=7 test_boost_duration=4 shutdown_secs=0 onoff_interval=0 onoff_holdoff=0
[ 9.066065] Testing tracer function:

To prevent the flush to cause the next printk to have a timestamp that
would produce the following:

[ 6.834073] Testing tracer function: [ 7.136194] PASSED

I made the KERN_CONT ("<c>") use the facility as well to pass info to
not print the timestamp. This fixes the abve issue to print:

[ 1.291025] Testing tracer function: PASSED

Link: http://lkml.kernel.org/r/[email protected]

Cc: Linus Torvalds <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "kay.sievers" <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Fengguang Wu <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andrew Morton <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
---
include/linux/printk.h | 8 ++++
kernel/printk.c | 104 ++++++++++++++++++++++++++++++++++++++++++-----
2 files changed, 101 insertions(+), 11 deletions(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index 1bec2f7..b3317bf3 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -111,6 +111,8 @@ asmlinkage int printk_emit(int facility, int level,
asmlinkage __printf(1, 2) __cold
int printk(const char *fmt, ...);

+void printk_flush(void);
+
/*
* Special printk facility for scheduler use only, _DO_NOT_USE_ !
*/
@@ -158,6 +160,10 @@ static inline bool printk_timed_ratelimit(unsigned long *caller_jiffies,
return false;
}

+static inline void printk_flush(void)
+{
+}
+
static inline void log_buf_kexec_setup(void)
{
}
@@ -190,6 +196,8 @@ extern void dump_stack(void) __cold;
printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
#define pr_cont(fmt, ...) \
printk(KERN_CONT fmt, ##__VA_ARGS__)
+#define pr_flush() \
+ printk_flush()

/* pr_devel() should produce zero code unless DEBUG is defined */
#ifdef DEBUG
diff --git a/kernel/printk.c b/kernel/printk.c
index 32462d2..ca1f5ff 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -237,6 +237,24 @@ static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
static char *log_buf = __log_buf;
static u32 log_buf_len = __LOG_BUF_LEN;

+/*
+ * The max number for facility is 0xffff >> 3, or 0x1fff.
+ * As log facilities count upward from 1, we have the kernel
+ * keep the top numbers or itself.
+ *
+ * 0x1ffc is the code for kernel processing in msg->level (facility)
+ * bit 0 - do not print prefix
+ * bit 1 - do not print newline
+ */
+#define LOG_KERNEL 0x1ffc
+#define LOG_NONL_NOPREFIX_MSK 3
+#define LOG_NONL_SET 2
+#define LOG_NOPREFIX_SET 1
+
+#define LOG_NONL_NOPREFIX (LOG_KERNEL | LOG_NONL_NOPREFIX_MSK)
+#define LOG_NONL (LOG_KERNEL | LOG_NONL_SET)
+#define LOG_NOPREFIX (LOG_KERNEL | LOG_NOPREFIX_SET)
+
/* cpu currently holding logbuf_lock */
static volatile unsigned int logbuf_cpu = UINT_MAX;

@@ -836,14 +854,36 @@ static size_t msg_print_text(const struct log *msg, bool syslog,
}

if (buf) {
- if (print_prefix(msg, syslog, NULL) +
- text_len + 1>= size - len)
- break;
+ int newline = 1;
+ int prefix = 1;
+ int facility = msg->level >> 3;
+
+ /*
+ * The kernel sends some commands via the facility.
+ * To do so, a high number mask is used (LOG_KERNEL)
+ * and the low bits of the mask hold the command bits
+ * that the kernel printk() will use to state how the
+ * msg will be printed.
+ */
+ if ((facility & LOG_KERNEL) == LOG_KERNEL) {
+ if (facility & LOG_NOPREFIX_SET)
+ prefix = 0;
+ if (facility & LOG_NONL_SET)
+ newline = 0;
+ }
+
+ if (prefix) {
+ if (print_prefix(msg, syslog, NULL) +
+ text_len + 1 >= size - len)
+ break;
+
+ len += print_prefix(msg, syslog, buf + len);
+ }

- len += print_prefix(msg, syslog, buf + len);
memcpy(buf + len, text, text_len);
len += text_len;
- buf[len++] = '\n';
+ if (newline)
+ buf[len++] = '\n';
} else {
/* SYSLOG_ACTION_* buffer size only calculation */
len += print_prefix(msg, syslog, NULL);
@@ -1267,6 +1307,7 @@ asmlinkage int vprintk_emit(int facility, int level,
static char cont_buf[LOG_LINE_MAX];
static size_t cont_len;
static int cont_level;
+ static bool cont_prefix;
static struct task_struct *cont_task;
static char textbuf[LOG_LINE_MAX];
char *text = textbuf;
@@ -1275,6 +1316,7 @@ asmlinkage int vprintk_emit(int facility, int level,
int this_cpu;
bool newline = false;
bool prefix = false;
+ bool flush = false;
int printed_len = 0;

boot_delay_msec();
@@ -1339,18 +1381,37 @@ asmlinkage int vprintk_emit(int facility, int level,
case 'c': /* KERN_CONT */
text += 3;
text_len -= 3;
+ break;
+ case 'f': /* KERN_FLUSH - used internally */
+ flush = true;
}
}

- if (level == -1)
- level = default_message_loglevel;
+ if (!flush) {
+ if (level == -1)
+ level = default_message_loglevel;

- if (dict) {
- prefix = true;
- newline = true;
+ if (dict) {
+ prefix = true;
+ newline = true;
+ }
}

- if (!newline) {
+ if (flush) {
+ if (cont_len) {
+ int code = LOG_NONL;
+
+ /*
+ * If the buffered string was KERN_CONT,
+ * do not print prefix.
+ */
+ if (!cont_prefix)
+ code = LOG_NONL_NOPREFIX;
+ log_store(code, cont_level, NULL, 0, cont_buf, cont_len);
+ cont_len = 0;
+ }
+
+ } else if (!newline) {
if (cont_len && (prefix || cont_task != current)) {
/*
* Flush earlier buffer, which is either from a
@@ -1363,6 +1424,7 @@ asmlinkage int vprintk_emit(int facility, int level,
if (!cont_len) {
cont_level = level;
cont_task = current;
+ cont_prefix = prefix;
}

/* buffer or append to earlier buffer from the same thread */
@@ -1395,6 +1457,8 @@ asmlinkage int vprintk_emit(int facility, int level,
printed_len = cont_len;
} else {
/* ordinary single and terminated line */
+ if (!prefix && !facility && cont_task == current)
+ facility = LOG_NOPREFIX;
log_store(facility, level,
dict, dictlen, text, text_len);
printed_len = text_len;
@@ -1483,6 +1547,24 @@ asmlinkage int printk(const char *fmt, ...)
}
EXPORT_SYMBOL(printk);

+/**
+ * printk_flush - flush out any buffered text
+ *
+ * printk() will buffer text and not write it out to the console
+ * if the text was missing a newline. If it is required to get text
+ * out to the console without a newline, use printk_flush() and it
+ * will do that. This is useful when running self tests, where you
+ * have a line that prints what is being tested, and then if it
+ * passed or failed after the test, and you want this all done on
+ * a single line. Without flushing, if the test crashes, you may
+ * never see what was being tested.
+ */
+void printk_flush(void)
+{
+ printk("<f>");
+}
+EXPORT_SYMBOL(printk_flush);
+
#else

#define LOG_LINE_MAX 0
--
1.7.3.4



2012-06-22 07:17:25

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] printk: Add printk_flush() to force buffered text to console


* Steven Rostedt <[email protected]> wrote:

> Fengguang Wu had a config that caused the system to lockup. It reported:
>
> [ 6.086395] type=2000 audit(1339501575.085:1): initialized
> [ 6.116356] Kprobe smoke test started
> [ 6.125915] Kprobe smoke test passed successfully
> [ 6.127478] rcu-torture:--- Start of test: nreaders=2 nfakewriters=4 stat_interval=0 verbose=0 test_no_idle_hz=0 shuffle_interval=3 stutter=5 irqreader=1 fqs_duration=0
> +fqs_holdoff=0 fqs_stutter=3 test_boost=1/0 test_boost_interval=7 test_boost_duration=4 shutdown_secs=0 onoff_interval=0 onoff_holdoff=0
>
> and then froze. So naturally, the suspected bug was with rcu-torture.
> Fengguang did a git bisect and discovered that the crash came with a
> function trace update. He also noticed that if he reverted that update,
> the system got farther and showed:
>
> [ 1.611901] Testing tracer function: PASSED
>
> His time was wasted by the fact that the function tracing self test
> first prints:
>
> "Testing tracer function: "
>
> then runs the test, and if it succeeds, it prints "PASSED", giving us
> the nice output you see above.
>
> But with the new printk() changes, text without a newline gets buffered
> and does not print out to the console at the location of the printk.
> This caused the "Testing tracer function: " not to print out and because
> the test caused the kernel to lock up, we are clueless to the fact that
> the problem was with the function tracer test and not the rcu_torture
> test. As it made sense that the rcu_torture test could lock up the
> system, many kernel developer hours were wasted chasing the wrong wild
> goose.
>
> If the "Testing tracer function: " had printed out in the first place,
> we would have caught the correct wild goose and saved precious kernel
> developer's time.
>
> Thus a need for the following utility. That is to add a 'printk_flush()'
> that acts like a fflush() in userspace to flush out the buffers used by
> the printing facility so we don't have unexpected hunts for nature
> roaming fowl.
>
> I hooked onto the 'facility' infrastructure and used '0x1ffc' (the max
> number) as a way to communicate printk() commands to the msg_print_text()
> which is performed at a later time (console_unlock()).
>
> I tested this out, (adding pr_flush() after my printk) and now the
> lockup shows:
>
> [ 9.018231] Kprobe smoke test passed successfully
> [ 9.023084] rcu-torture:--- Start of test: nreaders=4 nfakewriters=4 stat_interval=0 verbose=0 test_no_idle_hz=0 shuffle_interval=3 stutter=5 irqreader=1 fqs_duration=0 fqs_hold
> off=0 fqs_stutter=3 test_boost=1/0 test_boost_interval=7 test_boost_duration=4 shutdown_secs=0 onoff_interval=0 onoff_holdoff=0
> [ 9.066065] Testing tracer function:
>
> To prevent the flush to cause the next printk to have a timestamp that
> would produce the following:
>
> [ 6.834073] Testing tracer function: [ 7.136194] PASSED
>
> I made the KERN_CONT ("<c>") use the facility as well to pass info to
> not print the timestamp. This fixes the abve issue to print:
>
> [ 1.291025] Testing tracer function: PASSED
>
> Link: http://lkml.kernel.org/r/[email protected]
>
> Cc: Linus Torvalds <[email protected]>
> Cc: Greg Kroah-Hartman <[email protected]>
> Cc: "kay.sievers" <[email protected]>
> Cc: "Paul E. McKenney" <[email protected]>
> Cc: Fengguang Wu <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Signed-off-by: Steven Rostedt <[email protected]>
> ---
> include/linux/printk.h | 8 ++++
> kernel/printk.c | 104 ++++++++++++++++++++++++++++++++++++++++++-----
> 2 files changed, 101 insertions(+), 11 deletions(-)

Maybe also add the actual usage sites, to demonstrate how it's
going to be used?

Thanks,

Ingo

2012-06-22 08:24:12

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH] printk: Add printk_flush() to force buffered text to console

On Thu, 2012-06-21 at 19:52 -0400, Steven Rostedt wrote:
> Fengguang Wu had a config that caused the system to lockup. It reported:
[]

It'd have been nicer if you'd have cc'd me.

What I submitted is far simpler and it allows a knob
to control the printk buffering on a global basis.

Did you try it? It seems to work here, but I don't
have your other patches to rcu-torture with inserted
printk_flush patches to verify it. Can I have a
copy of those please.



2012-06-22 10:45:43

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] printk: Add printk_flush() to force buffered text to console

On Fri, 2012-06-22 at 09:17 +0200, Ingo Molnar wrote:

> Maybe also add the actual usage sites, to demonstrate how it's
> going to be used?

Here's the places that I added (and tested) the pr_flush().

I even added 'set_current_task(TASK_UNINTERRUPTIBLE);
schedule_timeout(5*HZ);' to see if it does flush, and it worked fine.

-- Steve

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 49249c2..f7f2cd0 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -825,6 +825,7 @@ int register_tracer(struct tracer *type)

/* the test is responsible for initializing and enabling */
pr_info("Testing tracer %s: ", type->name);
+ pr_flush();
ret = type->selftest(type, tr);
/* the test is responsible for resetting too */
current_trace = saved_tracer;
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 29111da..84b674e 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -1606,6 +1606,7 @@ static __init void event_trace_self_tests(void)
#endif

pr_info("Testing event %s: ", call->name);
+ pr_flush();

/*
* If an event is already enabled, someone is using
@@ -1635,6 +1636,7 @@ static __init void event_trace_self_tests(void)
continue;

pr_info("Testing event system %s: ", system->name);
+ pr_flush();

ret = __ftrace_set_clr_event(NULL, system->name, NULL, 1);
if (WARN_ON_ONCE(ret)) {
@@ -1657,6 +1659,7 @@ static __init void event_trace_self_tests(void)

pr_info("Running tests on all trace events:\n");
pr_info("Testing all events: ");
+ pr_flush();

ret = __ftrace_set_clr_event(NULL, NULL, NULL, 1);
if (WARN_ON_ONCE(ret)) {
diff --git a/kernel/trace/trace_selftest.c b/kernel/trace/trace_selftest.c
index 288541f..9638639 100644
--- a/kernel/trace/trace_selftest.c
+++ b/kernel/trace/trace_selftest.c
@@ -161,6 +161,7 @@ static void print_counts(void)
trace_selftest_test_probe3_cnt,
trace_selftest_test_global_cnt,
trace_selftest_test_dyn_cnt);
+ pr_flush();
}

static void reset_counts(void)
@@ -184,6 +185,7 @@ static int trace_selftest_ops(int cnt)

printk(KERN_CONT "PASSED\n");
pr_info("Testing dynamic ftrace ops #%d: ", cnt);
+ pr_flush();

ftrace_enabled = 1;
reset_counts();
@@ -315,6 +317,7 @@ int trace_selftest_startup_dynamic_tracing(struct tracer *trace,
/* The ftrace test PASSED */
printk(KERN_CONT "PASSED\n");
pr_info("Testing dynamic ftrace: ");
+ pr_flush();

/* enable tracing, and record the filter function */
ftrace_enabled = 1;

2012-06-22 10:48:44

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] printk: Add printk_flush() to force buffered text to console

On Fri, 2012-06-22 at 01:24 -0700, Joe Perches wrote:
> On Thu, 2012-06-21 at 19:52 -0400, Steven Rostedt wrote:
> > Fengguang Wu had a config that caused the system to lockup. It reported:
> []
>
> It'd have been nicer if you'd have cc'd me.

I didn't mean to be rude. I had pulled in my original patch which kept
the Cc list, and did a git --amend after making my changes and testing.
I just did a cut and paste to put back the Cc list.

>
> What I submitted is far simpler and it allows a knob
> to control the printk buffering on a global basis.

But it did not handle the \n nor the timestamp prefix.

>
> Did you try it?

No, because it did not do what I needed.

> It seems to work here, but I don't
> have your other patches to rcu-torture with inserted
> printk_flush patches to verify it. Can I have a
> copy of those please.

I just posted the places that I add it. There may be more places coming
too.

-- Steve

2012-06-22 21:54:07

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] printk: Add printk_flush() to force buffered text to console

On Thu, 21 Jun 2012 19:52:03 -0400
Steven Rostedt <[email protected]> wrote:

> But with the new printk() changes, text without a newline gets buffered
> and does not print out to the console at the location of the printk.

uh, how about we fix that? The old behaviour was good, the new
behaviour is noxious.

Please idenfity these "new printk() changes". Was the new noxiousness
an unavoidable effect of them?

2012-06-22 23:41:08

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] printk: Add printk_flush() to force buffered text to console

On Fri, 2012-06-22 at 14:54 -0700, Andrew Morton wrote:
> On Thu, 21 Jun 2012 19:52:03 -0400
> Steven Rostedt <[email protected]> wrote:
>
> > But with the new printk() changes, text without a newline gets buffered
> > and does not print out to the console at the location of the printk.
>
> uh, how about we fix that? The old behaviour was good, the new
> behaviour is noxious.
>
> Please idenfity these "new printk() changes". Was the new noxiousness
> an unavoidable effect of them?

See commit 7ff9554bb578 ("printk: convert byte-buffer to variable-length
record buffer") and related commits.

But that said, there may be a way that I can make it still always flush
and not add a new API. We can flush on partial writes, and keep track of
the current task (as it already does). If a new task comes in, we can
then force a newline before printing the content of the old task (if
there wasn't a newline printed before).

This is basically what it does now, except that it buffers the data. If
a new task were to do a print in between the two partial writes, it
flushes what was buffered and adds a newline before printing the new
text.

I think I may be able to implement the same behavior, except that it
wont buffer. It would just keep track of the state of the last write.

-- Steve


2012-06-22 23:49:20

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] printk: Add printk_flush() to force buffered text to console

On Fri, 22 Jun 2012 19:41:04 -0400
Steven Rostedt <[email protected]> wrote:

> On Fri, 2012-06-22 at 14:54 -0700, Andrew Morton wrote:
> > On Thu, 21 Jun 2012 19:52:03 -0400
> > Steven Rostedt <[email protected]> wrote:
> >
> > > But with the new printk() changes, text without a newline gets buffered
> > > and does not print out to the console at the location of the printk.
> >
> > uh, how about we fix that? The old behaviour was good, the new
> > behaviour is noxious.
> >
> > Please idenfity these "new printk() changes". Was the new noxiousness
> > an unavoidable effect of them?
>
> See commit 7ff9554bb578 ("printk: convert byte-buffer to variable-length
> record buffer") and related commits.
>
> But that said, there may be a way that I can make it still always flush
> and not add a new API. We can flush on partial writes, and keep track of
> the current task (as it already does). If a new task comes in, we can
> then force a newline before printing the content of the old task (if
> there wasn't a newline printed before).
>
> This is basically what it does now, except that it buffers the data. If
> a new task were to do a print in between the two partial writes, it
> flushes what was buffered and adds a newline before printing the new
> text.

If a driver does

printk("testing the frobnozzle ...");
do_test();
printk(" OK\n");

and do_test() hangs up, we really really want the user to know that
there was a frobnozzle testing problem. Please tell me this isn't
broken.

2012-06-22 23:56:34

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] printk: Add printk_flush() to force buffered text to console

On Fri, 2012-06-22 at 16:49 -0700, Andrew Morton wrote:

> If a driver does
>
> printk("testing the frobnozzle ...");
> do_test();
> printk(" OK\n");
>
> and do_test() hangs up, we really really want the user to know that
> there was a frobnozzle testing problem. Please tell me this isn't
> broken.

OK, then I wont tell you :-) But this is exactly what happened to me.
There was a config that caused the function tracer self test to triple
fault the machine. The test does exactly what you just described:

printk("Testing trace %s: ", tracer->name);
ret = run_tests(tracer);
if (ret == 0)
printk("PASSED!\n");
else
printk("FAILED!\n");

But because of this new buffering, that didn't print. And unfortunately,
the previous printk was the start of the rcu torture test. Thus when
Fengguang saw that the last thing printed before his machine crashed was
the rcu torture tests starting, he obviously (but incorrectly) blamed it
on the rcu torture test.

Yes, I think this is broken by design. But I have an idea for a fix.
I'll work on it on Monday, and it wont require adding a pr_flush() like
I did in this patch set.

-- Steve


2012-06-23 06:13:20

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] printk: Add printk_flush() to force buffered text to console


* Andrew Morton <[email protected]> wrote:

> On Thu, 21 Jun 2012 19:52:03 -0400
> Steven Rostedt <[email protected]> wrote:
>
> > But with the new printk() changes, text without a newline
> > gets buffered and does not print out to the console at the
> > location of the printk.
>
> uh, how about we fix that? The old behaviour was good, the
> new behaviour is noxious.

Absolutely.

pr_flush() really seems to be a workaround.

> Please idenfity these "new printk() changes". Was the new
> noxiousness an unavoidable effect of them?

Fundamentally:

e2ae715d66bf kmsg - kmsg_dump() use iterator to receive log buffer content
c313af145b9b printk() - isolate KERN_CONT users from ordinary complete lines
3ce9a7c0ac28 printk() - restore prefix/timestamp printing for multi-newline strings
649e6ee33f73 printk() - restore timestamp printing at console output
5c5d5ca51abd printk() - do not merge continuation lines of different threads
7f3a781d6fd8 printk - fix compilation for CONFIG_PRINTK=n
5fc3249068c1 kmsg: use do_div() to divide 64bit integer
e11fea92e13f kmsg: export printk records to the /dev/kmsg interface
7ff9554bb578 printk: convert byte-buffer to variable-length record buffer

Should we revert them or can they be fixed sanely? Kay seems to
be busy with other things so I guess a revert is the best we can
do. Greg, Kay?

Thanks,

Ingo

2012-06-23 07:44:27

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH] printk: Add printk_flush() to force buffered text to console

On Sat, 2012-06-23 at 08:13 +0200, Ingo Molnar wrote:
> * Andrew Morton <[email protected]> wrote:
>
> > On Thu, 21 Jun 2012 19:52:03 -0400
> > Steven Rostedt <[email protected]> wrote:
> >
> > > But with the new printk() changes, text without a newline
> > > gets buffered and does not print out to the console at the
> > > location of the printk.
> >
> > uh, how about we fix that? The old behaviour was good, the
> > new behaviour is noxious.
>
> Absolutely.
>
> pr_flush() really seems to be a workaround.

I think the new behavior an overall improvement.
It helps avoid multiple thread interleaving.

I suggested a fix to enable a printk buffered/non-buffered
mechanism with an optional printk_flush/pr_flush.

https://lkml.org/lkml/2012/6/21/305

2012-06-23 11:47:45

by Kay Sievers

[permalink] [raw]
Subject: Re: [PATCH] printk: Add printk_flush() to force buffered text to console

On Sat, 2012-06-23 at 08:13 +0200, Ingo Molnar wrote:
> * Andrew Morton <[email protected]> wrote:
>
> > On Thu, 21 Jun 2012 19:52:03 -0400
> > Steven Rostedt <[email protected]> wrote:
> >
> > > But with the new printk() changes, text without a newline
> > > gets buffered and does not print out to the console at the
> > > location of the printk.
> >
> > uh, how about we fix that? The old behaviour was good, the
> > new behaviour is noxious.
>
> Absolutely.
>
> pr_flush() really seems to be a workaround.
>
> > Please idenfity these "new printk() changes". Was the new
> > noxiousness an unavoidable effect of them?
>
> Fundamentally:
>
> e2ae715d66bf kmsg - kmsg_dump() use iterator to receive log buffer content
> c313af145b9b printk() - isolate KERN_CONT users from ordinary complete lines
> 3ce9a7c0ac28 printk() - restore prefix/timestamp printing for multi-newline strings
> 649e6ee33f73 printk() - restore timestamp printing at console output
> 5c5d5ca51abd printk() - do not merge continuation lines of different threads
> 7f3a781d6fd8 printk - fix compilation for CONFIG_PRINTK=n
> 5fc3249068c1 kmsg: use do_div() to divide 64bit integer
> e11fea92e13f kmsg: export printk records to the /dev/kmsg interface
> 7ff9554bb578 printk: convert byte-buffer to variable-length record buffer
>
> Should we revert them or can they be fixed sanely? Kay seems to
> be busy with other things so I guess a revert is the best we can
> do. Greg, Kay?

I just don't have a better idea than Joe or Steven.

The conversion from the unreliable byte-buffer to records changes the
behavior a little in some areas. We are atomic at the line level now and
not at an sometimes unpredictable byte stream position which can just be
moved forward by adding another few bytes.

The usual continuation line user should all be buffered to make things
generally more reliable and less racy. And we really had these problems
in the past, and substantially improved the common use-cases now.

But yes, it seems to come into our way with the current output format of
the kernel self-tests, and we need to do something about it. But I don't
think it should be going back to the much-too-simple and unreliable
byte-stream buffer.

We need make some sort of a compromise for the self-tests here, I guess.
I think the trade of using one of these options for the self-tests to
cope with the new model is justified:
- flush explicitly when it is needed, by a global flag or with a
message prefix
- printing a second full line that says: test foo OK, instead
of appending the OK to the former line
- printing one full line, and suppress the later OK entirely. The OK
might not be needed at all; we will find out that the machine
has not crashed at that point and a message in case of an error
could be sufficient in most cases.

Reverting the changes now would remove:
- proper device identifiers attached to a message, which allows
tools for the first time to know the contexts of the kernel messages
- 100% message integrity of structured messages, and single line
prints, and if only one continuation line user prints at a time.
We never mix continuation users with full line users, like the
old buffer did.
- very reliable continuation line prints, because of the buffering.
The only race that can still happen is several cont users racing
against each other, which is not too likely
- 100% granularity at line level during userspace buffer access,
crash dumping; we never copy around half or overwritten lines,
like the old buffer
- the ring buffer prunes only entire messages not just the half
of a line like the old byte buffer
- sequence numbers allow us to track the state of the read and write
position in the buffer, so we never copy in-the-meantime already
overwritten messages around
- sequence numbers which allow userspace to reconnect to the old reading
position and process only new messages, and find if messages got
overwritten
- support for multiple concurrent readers with live update of the kernel
log stream

This seems like a trade of a rather cosmetic output-format issue for
kernel self-tests, against general reliability, integrity and usefulness
of the rest of the kernel message users.

The record buffer is a huge step forward for the system logging, and it
allows us for the first time to make the kernel log useful to consumers
other than humans, and is key part of providing important information to
the system management.

An explicit flush for the self-test like users is unfortunate, and
inconvenient, and surely not nice, but I think it's still the better
overall deal.

Thanks,
Kay

2012-06-23 12:04:40

by Fengguang Wu

[permalink] [raw]
Subject: Re: [PATCH] printk: Add printk_flush() to force buffered text to console

> - printing one full line, and suppress the later OK entirely. The OK
> might not be needed at all; we will find out that the machine
> has not crashed at that point and a message in case of an error
> could be sufficient in most cases.

That looks neat and feasible. We not only get less output, but also
eliminate the problem of flushing:

[ 16.309109] Testing tracer irqsoff: PASSED
[ 17.184240] Testing tracer wakeup: PASSED
[ 18.060218] Testing tracer wakeup_rt: PASSED
[ 18.729222] Testing tracer function_graph: PASSED
...thousands more lines
=>
[ 16.309109] Testing tracer irqsoff
[ 17.184240] Testing tracer wakeup
[ 18.060218] Testing tracer wakeup_rt
[ 18.729222] Testing tracer function_graph
...thousands more lines

However these two special lines will have to be broken up into more lines:

[ 14.947553] Testing dynamic ftrace ops #1: (1 0 1 1 0) (1 1 2 1 0) (2 1 3 1 26) (2 2 4 1 43) PASSED
[ 15.740993] Testing dynamic ftrace ops #2: (1 0 1 26 0) (1 1 2 43 0) (2 1 3 1 15) (2 2 4 18 32) PASSED

Thanks,
Fengguang

2012-06-23 15:28:13

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH] printk: Add printk_flush() to force buffered text to console

On Sat, 2012-06-23 at 13:47 +0200, Kay Sievers wrote:
[]
> We need make some sort of a compromise for the self-tests here, I guess.
> I think the trade of using one of these options for the self-tests to
> cope with the new model is justified:
> - flush explicitly when it is needed, by a global flag or with a
> message prefix

> - printing a second full line that says: test foo OK, instead
> of appending the OK to the former line

This is probably the easiest actual fix.

> - printing one full line, and suppress the later OK entirely. The OK
> might not be needed at all; we will find out that the machine
> has not crashed at that point and a message in case of an error
> could be sufficient in most cases.

I think the added timing information useful.

> Reverting the changes now would remove:
> - proper device identifiers attached to a message, which allows
> tools for the first time to know the contexts of the kernel messages

I think this is not particularly true nor really useful at
present and the [v]printk_emit parts should be removed
until a the benefit of such a capability more proven.

I'm still concerned someone is going to say that the
because you say these things, someone else is going to
declare that the content of the [v]printk_emit bits is
effectively an ABI (inviolate and immutable) like the
seq_ functions. I think that'd make changing any of the
logging much more difficult and should be avoided.

> - 100% message integrity of structured messages, and single line
> prints, and if only one continuation line user prints at a time.
> We never mix continuation users with full line users, like the
> old buffer did.
> - very reliable continuation line prints, because of the buffering.
> The only race that can still happen is several cont users racing
> against each other, which is not too likely

It's hard to say that it's not too likely. I think it
depends on cpu count. More cpus, more likely.

Still, it's decidedly better than it was.

> - 100% granularity at line level during userspace buffer access,
> crash dumping; we never copy around half or overwritten lines,
> like the old buffer
> - the ring buffer prunes only entire messages not just the half
> of a line like the old byte buffer
> - sequence numbers allow us to track the state of the read and write
> position in the buffer, so we never copy in-the-meantime already
> overwritten messages around
> - sequence numbers which allow userspace to reconnect to the old reading
> position and process only new messages, and find if messages got
> overwritten
> - support for multiple concurrent readers with live update of the kernel
> log stream

Good features list and it was well implemented too.

I think the uses of the direct emit model should simply
be updated to full lines as you suggested. There are not
too many files that need touching.

2012-06-23 16:57:13

by Kay Sievers

[permalink] [raw]
Subject: Re: [PATCH] printk: Add printk_flush() to force buffered text to console

On Sat, Jun 23, 2012 at 5:28 PM, Joe Perches <[email protected]> wrote:

>> - proper device identifiers attached to a message, which allows
>>   tools for the first time to know the contexts of the kernel messages
>
> I think this is not particularly true nor really useful at
> present and the [v]printk_emit parts should be removed
> until a the benefit of such a capability more proven.

We track all 'interesting' devices in userspace. So you can ask the
system management about a mountpoint or a network interface and it
will show all messages that got emitted during that bootup from the
underlying devices in that context. With it's udev integration we can
also include all the userspace data about the devices and walk up the
parent devices and include the parent device information into the
output.

For example, you ask for the state of a mountpoint and the system will
return you all messages that got emitted from the underlying block
devices/disks, then you can broaden the match and include all messages
from the storage host controller where the disks are connected, and so
on.

Similarly you can efficiently retrieve the logged messages for a
network interface, including the parent device, which is usually a PCI
card.

That's just how stuff should work today, and the consumer side of it
is almost ready to ship, it's just the reliable device identifier in
the kernel log that is missing today, and which we wait for, to be
generally available.

The idea is similar how you can ask today for "give me all log
messages of a certain service (note, not a PID):
https://plus.google.com/108087225644395745666/posts/iuFEgBZT3co

The whole point of attaching the context, where a message was
generated from, in a sane format, is to replace the former unparsable
and unrecognizable prefixes like:
cdc_acm 2-1.4:1.1: ...
iwlwifi 0000:03:00.0: ...
usb usb2: ...

which now are:
SUBSYSTEM=usb
DEVICE=+usb:2-1.4:1.1

SUBSYSTEM=pci
DEVICE=+pci:0000:03:00.0

SUBSYSTEM=usb
DEVICE=c189:1

>From that angle, this just makes things possible which have always
been intended to be possible, but which have just been insufficiently
implemented for dev_printk().

> I'm still concerned someone is going to say that the
> because you say these things, someone else is going to
> declare that the content of the [v]printk_emit bits is
> effectively an ABI (inviolate and immutable) like the
> seq_ functions.  I think that'd make changing any of the
> logging much more difficult and should be avoided.

Nothing should really change to what we do already today, dev_printk()
was supposed to be recognizable, it just wasn't good enough, it wasn't
intentionally insufficiently implemented.

We do not give the messages texts any more meaning than they already
carry, we can just properly *filter* for them. The importance or not
of the actual text stays pretty much the same. People today already
run magic regexes on the kernel log to guess what's happened in there,
and the attached context to allow reliable filtering and
identification does not change it.

What we did in the old logging was to throw away all of the context
the kernel has at logging time, log the pure text with a pretty much
unrecognizable prefix, and later start guessing games to restore the
context again. That's really a pretty amateurish way of doing
software. Tools in "the stuff mostly works, or otherwise humans can
probably make sense of it" category might be fine or even unavoidable
in some areas of software development, but it is surely not how the
basic building blocks of an operating system should work today. :)

Kay

2012-06-25 08:45:53

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] printk: Add printk_flush() to force buffered text to console


* Joe Perches <[email protected]> wrote:

> On Sat, 2012-06-23 at 08:13 +0200, Ingo Molnar wrote:
> > * Andrew Morton <[email protected]> wrote:
> >
> > > On Thu, 21 Jun 2012 19:52:03 -0400
> > > Steven Rostedt <[email protected]> wrote:
> > >
> > > > But with the new printk() changes, text without a newline
> > > > gets buffered and does not print out to the console at the
> > > > location of the printk.
> > >
> > > uh, how about we fix that? The old behaviour was good, the
> > > new behaviour is noxious.
> >
> > Absolutely.
> >
> > pr_flush() really seems to be a workaround.
>
> I think the new behavior an overall improvement.
> It helps avoid multiple thread interleaving.
>
> I suggested a fix to enable a printk buffered/non-buffered
> mechanism with an optional printk_flush/pr_flush.
>
> https://lkml.org/lkml/2012/6/21/305

Ugh, so you suggest a global toggle to turn buffering on/off?

That approach really misses thole whole *point* of printk() and
debug logging in particular. In most cases when users are
triggering and reporting problems they don't go around toggling
various rarely used switches to see whether they make a
difference. They and kernel developers just take printk output
at face value and trust it.

Even if people are aware of this global toggle there are
frequent cases where a hang is probabilistic and occurs only
very rarely. If a once a year bug triggers then it's obviously
useful to have good debug output on the screen or in the serial
log and not wait another year with the debug option turned on
...

If I cannot trust printk output anymore and would have to toggle
global switches to make it work then printk() has lost value for
debugging kernel problems and that would be sad.

So NAK on your patch, really. We should either fix this
regression in behavior, or, as a second best alternative we
should at least give a deterministic way to flush the printk
output, which works all the time.

Thanks,

Ingo

2012-06-25 09:09:43

by Ingo Molnar

[permalink] [raw]
Subject: [PATCH] printk: Revert the buffered-printk() changes for now


* Kay Sievers <[email protected]> wrote:

> On Sat, 2012-06-23 at 08:13 +0200, Ingo Molnar wrote:
> > * Andrew Morton <[email protected]> wrote:
> >
> > > On Thu, 21 Jun 2012 19:52:03 -0400
> > > Steven Rostedt <[email protected]> wrote:
> > >
> > > > But with the new printk() changes, text without a newline
> > > > gets buffered and does not print out to the console at the
> > > > location of the printk.
> > >
> > > uh, how about we fix that? The old behaviour was good, the
> > > new behaviour is noxious.
> >
> > Absolutely.
> >
> > pr_flush() really seems to be a workaround.
> >
> > > Please idenfity these "new printk() changes". Was the new
> > > noxiousness an unavoidable effect of them?
> >
> > Fundamentally:
> >
> > e2ae715d66bf kmsg - kmsg_dump() use iterator to receive log buffer content
> > c313af145b9b printk() - isolate KERN_CONT users from ordinary complete lines
> > 3ce9a7c0ac28 printk() - restore prefix/timestamp printing for multi-newline strings
> > 649e6ee33f73 printk() - restore timestamp printing at console output
> > 5c5d5ca51abd printk() - do not merge continuation lines of different threads
> > 7f3a781d6fd8 printk - fix compilation for CONFIG_PRINTK=n
> > 5fc3249068c1 kmsg: use do_div() to divide 64bit integer
> > e11fea92e13f kmsg: export printk records to the /dev/kmsg interface
> > 7ff9554bb578 printk: convert byte-buffer to variable-length record buffer
> >
> > Should we revert them or can they be fixed sanely? Kay seems to
> > be busy with other things so I guess a revert is the best we can
> > do. Greg, Kay?
>
> I just don't have a better idea than Joe or Steven.

Then pick the fix you see as the best solution, post it and push
the fix to Linus, don't just sit there passive-aggressively
leaving a regression you introduced unresolved ...

I think Steve's fix would be OK as a workaround, if no-one can
think of anything better.

The thing is, this bug, despite being reported early on, has
been left unresolved for weeks and it's -rc4 time now. Time is
running out and frankly, I've been watching this thread, and the
only reason you appear to be taking this bug somewhat seriously
now is because Andrew and me complained. That is sad.

Kernel policy is that kernel bugs introduced during the merge
window are fixed by those who introduced them, or the changes
get reverted. The kernel project uses a very user-friendly
process to resolve regressions and in the worst case you as a
developer have to suffer your changes reverted. Obviously timely
fixes are preferred over invasive reverts.

It is not *Steve's* job to fix this bug. That he actually posted
a fix is nice from him, it makes your life easier (and you can
still write some other fix) but *you* should be driving the
resolution here, not Steve, me or Andrew.

Below is a (very crude and totally untested) roll-up revert of
all the buffering changes to printk - it's obviously not what I
want to see happen, but it's obviously *possible*.

Thanks,

Ingo

----------------->
Subject: printk: Revert the buffered-printk() changes for now
From: Ingo Molnar <[email protected]>
Date: Mon Jun 25 10:50:44 CEST 2012

They are not yet cooked and fixes have not arrived in time.
We can re-try them in v3.6.

Not-Signed-off-by: Ingo Molnar <[email protected]>
---
drivers/base/core.c | 51 -
drivers/char/mem.c | 42 +
include/linux/printk.h | 12
kernel/printk.c | 1599 +++++++++++++------------------------------------
4 files changed, 484 insertions(+), 1220 deletions(-)

Index: linux/drivers/base/core.c
===================================================================
--- linux.orig/drivers/base/core.c
+++ linux/drivers/base/core.c
@@ -1844,60 +1844,15 @@ void device_shutdown(void)
*/

#ifdef CONFIG_PRINTK
+
int __dev_printk(const char *level, const struct device *dev,
struct va_format *vaf)
{
- char dict[128];
- size_t dictlen = 0;
- const char *subsys;
-
if (!dev)
return printk("%s(NULL device *): %pV", level, vaf);

- if (dev->class)
- subsys = dev->class->name;
- else if (dev->bus)
- subsys = dev->bus->name;
- else
- goto skip;
-
- dictlen += snprintf(dict + dictlen, sizeof(dict) - dictlen,
- "SUBSYSTEM=%s", subsys);
-
- /*
- * Add device identifier DEVICE=:
- * b12:8 block dev_t
- * c127:3 char dev_t
- * n8 netdev ifindex
- * +sound:card0 subsystem:devname
- */
- if (MAJOR(dev->devt)) {
- char c;
-
- if (strcmp(subsys, "block") == 0)
- c = 'b';
- else
- c = 'c';
- dictlen++;
- dictlen += snprintf(dict + dictlen, sizeof(dict) - dictlen,
- "DEVICE=%c%u:%u",
- c, MAJOR(dev->devt), MINOR(dev->devt));
- } else if (strcmp(subsys, "net") == 0) {
- struct net_device *net = to_net_dev(dev);
-
- dictlen++;
- dictlen += snprintf(dict + dictlen, sizeof(dict) - dictlen,
- "DEVICE=n%u", net->ifindex);
- } else {
- dictlen++;
- dictlen += snprintf(dict + dictlen, sizeof(dict) - dictlen,
- "DEVICE=+%s:%s", subsys, dev_name(dev));
- }
-skip:
- return printk_emit(0, level[1] - '0',
- dictlen ? dict : NULL, dictlen,
- "%s %s: %pV",
- dev_driver_string(dev), dev_name(dev), vaf);
+ return printk("%s%s %s: %pV",
+ level, dev_driver_string(dev), dev_name(dev), vaf);
}
EXPORT_SYMBOL(__dev_printk);

Index: linux/drivers/char/mem.c
===================================================================
--- linux.orig/drivers/char/mem.c
+++ linux/drivers/char/mem.c
@@ -807,6 +807,44 @@ static const struct file_operations oldm
};
#endif

+static ssize_t kmsg_writev(struct kiocb *iocb, const struct iovec *iv,
+ unsigned long count, loff_t pos)
+{
+ char *line, *p;
+ int i;
+ ssize_t ret = -EFAULT;
+ size_t len = iov_length(iv, count);
+
+ line = kmalloc(len + 1, GFP_KERNEL);
+ if (line == NULL)
+ return -ENOMEM;
+
+ /*
+ * copy all vectors into a single string, to ensure we do
+ * not interleave our log line with other printk calls
+ */
+ p = line;
+ for (i = 0; i < count; i++) {
+ if (copy_from_user(p, iv[i].iov_base, iv[i].iov_len))
+ goto out;
+ p += iv[i].iov_len;
+ }
+ p[0] = '\0';
+
+ ret = printk("%s", line);
+ /* printk can add a prefix */
+ if (ret > len)
+ ret = len;
+out:
+ kfree(line);
+ return ret;
+}
+
+static const struct file_operations kmsg_fops = {
+ .aio_write = kmsg_writev,
+ .llseek = noop_llseek,
+};
+
static const struct memdev {
const char *name;
umode_t mode;
@@ -825,9 +863,7 @@ static const struct memdev {
[7] = { "full", 0666, &full_fops, NULL },
[8] = { "random", 0666, &random_fops, NULL },
[9] = { "urandom", 0666, &urandom_fops, NULL },
-#ifdef CONFIG_PRINTK
- [11] = { "kmsg", 0644, &kmsg_fops, NULL },
-#endif
+ [11] = { "kmsg", 0, &kmsg_fops, NULL },
#ifdef CONFIG_CRASH_DUMP
[12] = { "oldmem", 0, &oldmem_fops, NULL },
#endif
Index: linux/include/linux/printk.h
===================================================================
--- linux.orig/include/linux/printk.h
+++ linux/include/linux/printk.h
@@ -95,19 +95,9 @@ extern int printk_needs_cpu(int cpu);
extern void printk_tick(void);

#ifdef CONFIG_PRINTK
-asmlinkage __printf(5, 0)
-int vprintk_emit(int facility, int level,
- const char *dict, size_t dictlen,
- const char *fmt, va_list args);
-
asmlinkage __printf(1, 0)
int vprintk(const char *fmt, va_list args);

-asmlinkage __printf(5, 6) __cold
-asmlinkage int printk_emit(int facility, int level,
- const char *dict, size_t dictlen,
- const char *fmt, ...);
-
asmlinkage __printf(1, 2) __cold
int printk(const char *fmt, ...);

@@ -300,8 +290,6 @@ extern void dump_stack(void) __cold;
no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__)
#endif

-extern const struct file_operations kmsg_fops;
-
enum {
DUMP_PREFIX_NONE,
DUMP_PREFIX_ADDRESS,
Index: linux/kernel/printk.c
===================================================================
--- linux.orig/kernel/printk.c
+++ linux/kernel/printk.c
@@ -41,7 +41,6 @@
#include <linux/cpu.h>
#include <linux/notifier.h>
#include <linux/rculist.h>
-#include <linux/poll.h>

#include <asm/uaccess.h>

@@ -55,6 +54,8 @@ void asmlinkage __attribute__((weak)) ea
{
}

+#define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT)
+
/* printk's without a loglevel use this.. */
#define DEFAULT_MESSAGE_LOGLEVEL CONFIG_DEFAULT_MESSAGE_LOGLEVEL

@@ -98,6 +99,24 @@ EXPORT_SYMBOL_GPL(console_drivers);
static int console_locked, console_suspended;

/*
+ * logbuf_lock protects log_buf, log_start, log_end, con_start and logged_chars
+ * It is also used in interesting ways to provide interlocking in
+ * console_unlock();.
+ */
+static DEFINE_RAW_SPINLOCK(logbuf_lock);
+
+#define LOG_BUF_MASK (log_buf_len-1)
+#define LOG_BUF(idx) (log_buf[(idx) & LOG_BUF_MASK])
+
+/*
+ * The indices into log_buf are not constrained to log_buf_len - they
+ * must be masked before subscripting
+ */
+static unsigned log_start; /* Index into log_buf: next char to be read by syslog() */
+static unsigned con_start; /* Index into log_buf: next char to be sent to consoles */
+static unsigned log_end; /* Index into log_buf: most-recently-written-char + 1 */
+
+/*
* If exclusive_console is non-NULL then only this console is to be printed to.
*/
static struct console *exclusive_console;
@@ -126,493 +145,13 @@ EXPORT_SYMBOL(console_set_on_cmdline);
/* Flag: console code may call schedule() */
static int console_may_schedule;

-/*
- * The printk log buffer consists of a chain of concatenated variable
- * length records. Every record starts with a record header, containing
- * the overall length of the record.
- *
- * The heads to the first and last entry in the buffer, as well as the
- * sequence numbers of these both entries are maintained when messages
- * are stored..
- *
- * If the heads indicate available messages, the length in the header
- * tells the start next message. A length == 0 for the next message
- * indicates a wrap-around to the beginning of the buffer.
- *
- * Every record carries the monotonic timestamp in microseconds, as well as
- * the standard userspace syslog level and syslog facility. The usual
- * kernel messages use LOG_KERN; userspace-injected messages always carry
- * a matching syslog facility, by default LOG_USER. The origin of every
- * message can be reliably determined that way.
- *
- * The human readable log message directly follows the message header. The
- * length of the message text is stored in the header, the stored message
- * is not terminated.
- *
- * Optionally, a message can carry a dictionary of properties (key/value pairs),
- * to provide userspace with a machine-readable message context.
- *
- * Examples for well-defined, commonly used property names are:
- * DEVICE=b12:8 device identifier
- * b12:8 block dev_t
- * c127:3 char dev_t
- * n8 netdev ifindex
- * +sound:card0 subsystem:devname
- * SUBSYSTEM=pci driver-core subsystem name
- *
- * Valid characters in property names are [a-zA-Z0-9.-_]. The plain text value
- * follows directly after a '=' character. Every property is terminated by
- * a '\0' character. The last property is not terminated.
- *
- * Example of a message structure:
- * 0000 ff 8f 00 00 00 00 00 00 monotonic time in nsec
- * 0008 34 00 record is 52 bytes long
- * 000a 0b 00 text is 11 bytes long
- * 000c 1f 00 dictionary is 23 bytes long
- * 000e 03 00 LOG_KERN (facility) LOG_ERR (level)
- * 0010 69 74 27 73 20 61 20 6c "it's a l"
- * 69 6e 65 "ine"
- * 001b 44 45 56 49 43 "DEVIC"
- * 45 3d 62 38 3a 32 00 44 "E=b8:2\0D"
- * 52 49 56 45 52 3d 62 75 "RIVER=bu"
- * 67 "g"
- * 0032 00 00 00 padding to next message header
- *
- * The 'struct log' buffer header must never be directly exported to
- * userspace, it is a kernel-private implementation detail that might
- * need to be changed in the future, when the requirements change.
- *
- * /dev/kmsg exports the structured data in the following line format:
- * "level,sequnum,timestamp;<message text>\n"
- *
- * The optional key/value pairs are attached as continuation lines starting
- * with a space character and terminated by a newline. All possible
- * non-prinatable characters are escaped in the "\xff" notation.
- *
- * Users of the export format should ignore possible additional values
- * separated by ',', and find the message after the ';' character.
- */
-
-struct log {
- u64 ts_nsec; /* timestamp in nanoseconds */
- u16 len; /* length of entire record */
- u16 text_len; /* length of text buffer */
- u16 dict_len; /* length of dictionary buffer */
- u16 level; /* syslog level + facility */
-};
-
-/*
- * The logbuf_lock protects kmsg buffer, indices, counters. It is also
- * used in interesting ways to provide interlocking in console_unlock();
- */
-static DEFINE_RAW_SPINLOCK(logbuf_lock);
-
-/* the next printk record to read by syslog(READ) or /proc/kmsg */
-static u64 syslog_seq;
-static u32 syslog_idx;
-
-/* index and sequence number of the first record stored in the buffer */
-static u64 log_first_seq;
-static u32 log_first_idx;
-
-/* index and sequence number of the next record to store in the buffer */
-static u64 log_next_seq;
#ifdef CONFIG_PRINTK
-static u32 log_next_idx;
-
-/* the next printk record to read after the last 'clear' command */
-static u64 clear_seq;
-static u32 clear_idx;

-#define LOG_LINE_MAX 1024
-
-/* record buffer */
-#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
-#define LOG_ALIGN 4
-#else
-#define LOG_ALIGN __alignof__(struct log)
-#endif
-#define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT)
-static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);
+static char __log_buf[__LOG_BUF_LEN];
static char *log_buf = __log_buf;
-static u32 log_buf_len = __LOG_BUF_LEN;
-
-/* cpu currently holding logbuf_lock */
-static volatile unsigned int logbuf_cpu = UINT_MAX;
-
-/* human readable text of the record */
-static char *log_text(const struct log *msg)
-{
- return (char *)msg + sizeof(struct log);
-}
-
-/* optional key/value pair dictionary attached to the record */
-static char *log_dict(const struct log *msg)
-{
- return (char *)msg + sizeof(struct log) + msg->text_len;
-}
-
-/* get record by index; idx must point to valid msg */
-static struct log *log_from_idx(u32 idx)
-{
- struct log *msg = (struct log *)(log_buf + idx);
-
- /*
- * A length == 0 record is the end of buffer marker. Wrap around and
- * read the message at the start of the buffer.
- */
- if (!msg->len)
- return (struct log *)log_buf;
- return msg;
-}
-
-/* get next record; idx must point to valid msg */
-static u32 log_next(u32 idx)
-{
- struct log *msg = (struct log *)(log_buf + idx);
-
- /* length == 0 indicates the end of the buffer; wrap */
- /*
- * A length == 0 record is the end of buffer marker. Wrap around and
- * read the message at the start of the buffer as *this* one, and
- * return the one after that.
- */
- if (!msg->len) {
- msg = (struct log *)log_buf;
- return msg->len;
- }
- return idx + msg->len;
-}
-
-/* insert record into the buffer, discard old ones, update heads */
-static void log_store(int facility, int level,
- const char *dict, u16 dict_len,
- const char *text, u16 text_len)
-{
- struct log *msg;
- u32 size, pad_len;
-
- /* number of '\0' padding bytes to next message */
- size = sizeof(struct log) + text_len + dict_len;
- pad_len = (-size) & (LOG_ALIGN - 1);
- size += pad_len;
-
- while (log_first_seq < log_next_seq) {
- u32 free;
-
- if (log_next_idx > log_first_idx)
- free = max(log_buf_len - log_next_idx, log_first_idx);
- else
- free = log_first_idx - log_next_idx;
-
- if (free > size + sizeof(struct log))
- break;
-
- /* drop old messages until we have enough contiuous space */
- log_first_idx = log_next(log_first_idx);
- log_first_seq++;
- }
-
- if (log_next_idx + size + sizeof(struct log) >= log_buf_len) {
- /*
- * This message + an additional empty header does not fit
- * at the end of the buffer. Add an empty header with len == 0
- * to signify a wrap around.
- */
- memset(log_buf + log_next_idx, 0, sizeof(struct log));
- log_next_idx = 0;
- }
-
- /* fill message */
- msg = (struct log *)(log_buf + log_next_idx);
- memcpy(log_text(msg), text, text_len);
- msg->text_len = text_len;
- memcpy(log_dict(msg), dict, dict_len);
- msg->dict_len = dict_len;
- msg->level = (facility << 3) | (level & 7);
- msg->ts_nsec = local_clock();
- memset(log_dict(msg) + dict_len, 0, pad_len);
- msg->len = sizeof(struct log) + text_len + dict_len + pad_len;
-
- /* insert message */
- log_next_idx += msg->len;
- log_next_seq++;
-}
-
-/* /dev/kmsg - userspace message inject/listen interface */
-struct devkmsg_user {
- u64 seq;
- u32 idx;
- struct mutex lock;
- char buf[8192];
-};
-
-static ssize_t devkmsg_writev(struct kiocb *iocb, const struct iovec *iv,
- unsigned long count, loff_t pos)
-{
- char *buf, *line;
- int i;
- int level = default_message_loglevel;
- int facility = 1; /* LOG_USER */
- size_t len = iov_length(iv, count);
- ssize_t ret = len;
-
- if (len > LOG_LINE_MAX)
- return -EINVAL;
- buf = kmalloc(len+1, GFP_KERNEL);
- if (buf == NULL)
- return -ENOMEM;
-
- line = buf;
- for (i = 0; i < count; i++) {
- if (copy_from_user(line, iv[i].iov_base, iv[i].iov_len))
- goto out;
- line += iv[i].iov_len;
- }
-
- /*
- * Extract and skip the syslog prefix <[0-9]*>. Coming from userspace
- * the decimal value represents 32bit, the lower 3 bit are the log
- * level, the rest are the log facility.
- *
- * If no prefix or no userspace facility is specified, we
- * enforce LOG_USER, to be able to reliably distinguish
- * kernel-generated messages from userspace-injected ones.
- */
- line = buf;
- if (line[0] == '<') {
- char *endp = NULL;
-
- i = simple_strtoul(line+1, &endp, 10);
- if (endp && endp[0] == '>') {
- level = i & 7;
- if (i >> 3)
- facility = i >> 3;
- endp++;
- len -= endp - line;
- line = endp;
- }
- }
- line[len] = '\0';
-
- printk_emit(facility, level, NULL, 0, "%s", line);
-out:
- kfree(buf);
- return ret;
-}
-
-static ssize_t devkmsg_read(struct file *file, char __user *buf,
- size_t count, loff_t *ppos)
-{
- struct devkmsg_user *user = file->private_data;
- struct log *msg;
- u64 ts_usec;
- size_t i;
- size_t len;
- ssize_t ret;
-
- if (!user)
- return -EBADF;
-
- ret = mutex_lock_interruptible(&user->lock);
- if (ret)
- return ret;
- raw_spin_lock(&logbuf_lock);
- while (user->seq == log_next_seq) {
- if (file->f_flags & O_NONBLOCK) {
- ret = -EAGAIN;
- raw_spin_unlock(&logbuf_lock);
- goto out;
- }
-
- raw_spin_unlock(&logbuf_lock);
- ret = wait_event_interruptible(log_wait,
- user->seq != log_next_seq);
- if (ret)
- goto out;
- raw_spin_lock(&logbuf_lock);
- }
-
- if (user->seq < log_first_seq) {
- /* our last seen message is gone, return error and reset */
- user->idx = log_first_idx;
- user->seq = log_first_seq;
- ret = -EPIPE;
- raw_spin_unlock(&logbuf_lock);
- goto out;
- }
-
- msg = log_from_idx(user->idx);
- ts_usec = msg->ts_nsec;
- do_div(ts_usec, 1000);
- len = sprintf(user->buf, "%u,%llu,%llu;",
- msg->level, user->seq, ts_usec);
-
- /* escape non-printable characters */
- for (i = 0; i < msg->text_len; i++) {
- unsigned char c = log_text(msg)[i];
-
- if (c < ' ' || c >= 128)
- len += sprintf(user->buf + len, "\\x%02x", c);
- else
- user->buf[len++] = c;
- }
- user->buf[len++] = '\n';
-
- if (msg->dict_len) {
- bool line = true;
-
- for (i = 0; i < msg->dict_len; i++) {
- unsigned char c = log_dict(msg)[i];
-
- if (line) {
- user->buf[len++] = ' ';
- line = false;
- }
-
- if (c == '\0') {
- user->buf[len++] = '\n';
- line = true;
- continue;
- }
-
- if (c < ' ' || c >= 128) {
- len += sprintf(user->buf + len, "\\x%02x", c);
- continue;
- }
-
- user->buf[len++] = c;
- }
- user->buf[len++] = '\n';
- }
-
- user->idx = log_next(user->idx);
- user->seq++;
- raw_spin_unlock(&logbuf_lock);
-
- if (len > count) {
- ret = -EINVAL;
- goto out;
- }
-
- if (copy_to_user(buf, user->buf, len)) {
- ret = -EFAULT;
- goto out;
- }
- ret = len;
-out:
- mutex_unlock(&user->lock);
- return ret;
-}
-
-static loff_t devkmsg_llseek(struct file *file, loff_t offset, int whence)
-{
- struct devkmsg_user *user = file->private_data;
- loff_t ret = 0;
-
- if (!user)
- return -EBADF;
- if (offset)
- return -ESPIPE;
-
- raw_spin_lock(&logbuf_lock);
- switch (whence) {
- case SEEK_SET:
- /* the first record */
- user->idx = log_first_idx;
- user->seq = log_first_seq;
- break;
- case SEEK_DATA:
- /*
- * The first record after the last SYSLOG_ACTION_CLEAR,
- * like issued by 'dmesg -c'. Reading /dev/kmsg itself
- * changes no global state, and does not clear anything.
- */
- user->idx = clear_idx;
- user->seq = clear_seq;
- break;
- case SEEK_END:
- /* after the last record */
- user->idx = log_next_idx;
- user->seq = log_next_seq;
- break;
- default:
- ret = -EINVAL;
- }
- raw_spin_unlock(&logbuf_lock);
- return ret;
-}
-
-static unsigned int devkmsg_poll(struct file *file, poll_table *wait)
-{
- struct devkmsg_user *user = file->private_data;
- int ret = 0;
-
- if (!user)
- return POLLERR|POLLNVAL;
-
- poll_wait(file, &log_wait, wait);
-
- raw_spin_lock(&logbuf_lock);
- if (user->seq < log_next_seq) {
- /* return error when data has vanished underneath us */
- if (user->seq < log_first_seq)
- ret = POLLIN|POLLRDNORM|POLLERR|POLLPRI;
- ret = POLLIN|POLLRDNORM;
- }
- raw_spin_unlock(&logbuf_lock);
-
- return ret;
-}
-
-static int devkmsg_open(struct inode *inode, struct file *file)
-{
- struct devkmsg_user *user;
- int err;
-
- /* write-only does not need any file context */
- if ((file->f_flags & O_ACCMODE) == O_WRONLY)
- return 0;
-
- err = security_syslog(SYSLOG_ACTION_READ_ALL);
- if (err)
- return err;
-
- user = kmalloc(sizeof(struct devkmsg_user), GFP_KERNEL);
- if (!user)
- return -ENOMEM;
-
- mutex_init(&user->lock);
-
- raw_spin_lock(&logbuf_lock);
- user->idx = log_first_idx;
- user->seq = log_first_seq;
- raw_spin_unlock(&logbuf_lock);
-
- file->private_data = user;
- return 0;
-}
-
-static int devkmsg_release(struct inode *inode, struct file *file)
-{
- struct devkmsg_user *user = file->private_data;
-
- if (!user)
- return 0;
-
- mutex_destroy(&user->lock);
- kfree(user);
- return 0;
-}
-
-const struct file_operations kmsg_fops = {
- .open = devkmsg_open,
- .read = devkmsg_read,
- .aio_write = devkmsg_writev,
- .llseek = devkmsg_llseek,
- .poll = devkmsg_poll,
- .release = devkmsg_release,
-};
+static int log_buf_len = __LOG_BUF_LEN;
+static unsigned logged_chars; /* Number of chars produced since last read+clear operation */
+static int saved_console_loglevel = -1;

#ifdef CONFIG_KEXEC
/*
@@ -626,9 +165,9 @@ const struct file_operations kmsg_fops =
void log_buf_kexec_setup(void)
{
VMCOREINFO_SYMBOL(log_buf);
+ VMCOREINFO_SYMBOL(log_end);
VMCOREINFO_SYMBOL(log_buf_len);
- VMCOREINFO_SYMBOL(log_first_idx);
- VMCOREINFO_SYMBOL(log_next_idx);
+ VMCOREINFO_SYMBOL(logged_chars);
}
#endif

@@ -652,6 +191,7 @@ early_param("log_buf_len", log_buf_len_s
void __init setup_log_buf(int early)
{
unsigned long flags;
+ unsigned start, dest_idx, offset;
char *new_log_buf;
int free;

@@ -679,8 +219,20 @@ void __init setup_log_buf(int early)
log_buf_len = new_log_buf_len;
log_buf = new_log_buf;
new_log_buf_len = 0;
- free = __LOG_BUF_LEN - log_next_idx;
- memcpy(log_buf, __log_buf, __LOG_BUF_LEN);
+ free = __LOG_BUF_LEN - log_end;
+
+ offset = start = min(con_start, log_start);
+ dest_idx = 0;
+ while (start != log_end) {
+ unsigned log_idx_mask = start & (__LOG_BUF_LEN - 1);
+
+ log_buf[dest_idx] = __log_buf[log_idx_mask];
+ start++;
+ dest_idx++;
+ }
+ log_start -= offset;
+ con_start -= offset;
+ log_end -= offset;
raw_spin_unlock_irqrestore(&logbuf_lock, flags);

pr_info("log_buf_len: %d\n", log_buf_len);
@@ -780,207 +332,11 @@ static int check_syslog_permissions(int
return 0;
}

-#if defined(CONFIG_PRINTK_TIME)
-static bool printk_time = 1;
-#else
-static bool printk_time;
-#endif
-module_param_named(time, printk_time, bool, S_IRUGO | S_IWUSR);
-
-static size_t print_prefix(const struct log *msg, bool syslog, char *buf)
-{
- size_t len = 0;
-
- if (syslog) {
- if (buf) {
- len += sprintf(buf, "<%u>", msg->level);
- } else {
- len += 3;
- if (msg->level > 9)
- len++;
- if (msg->level > 99)
- len++;
- }
- }
-
- if (printk_time) {
- if (buf) {
- unsigned long long ts = msg->ts_nsec;
- unsigned long rem_nsec = do_div(ts, 1000000000);
-
- len += sprintf(buf + len, "[%5lu.%06lu] ",
- (unsigned long) ts, rem_nsec / 1000);
- } else {
- len += 15;
- }
- }
-
- return len;
-}
-
-static size_t msg_print_text(const struct log *msg, bool syslog,
- char *buf, size_t size)
-{
- const char *text = log_text(msg);
- size_t text_size = msg->text_len;
- size_t len = 0;
-
- do {
- const char *next = memchr(text, '\n', text_size);
- size_t text_len;
-
- if (next) {
- text_len = next - text;
- next++;
- text_size -= next - text;
- } else {
- text_len = text_size;
- }
-
- if (buf) {
- if (print_prefix(msg, syslog, NULL) +
- text_len + 1>= size - len)
- break;
-
- len += print_prefix(msg, syslog, buf + len);
- memcpy(buf + len, text, text_len);
- len += text_len;
- buf[len++] = '\n';
- } else {
- /* SYSLOG_ACTION_* buffer size only calculation */
- len += print_prefix(msg, syslog, NULL);
- len += text_len + 1;
- }
-
- text = next;
- } while (text);
-
- return len;
-}
-
-static int syslog_print(char __user *buf, int size)
-{
- char *text;
- struct log *msg;
- int len;
-
- text = kmalloc(LOG_LINE_MAX, GFP_KERNEL);
- if (!text)
- return -ENOMEM;
-
- raw_spin_lock_irq(&logbuf_lock);
- if (syslog_seq < log_first_seq) {
- /* messages are gone, move to first one */
- syslog_seq = log_first_seq;
- syslog_idx = log_first_idx;
- }
- msg = log_from_idx(syslog_idx);
- len = msg_print_text(msg, true, text, LOG_LINE_MAX);
- syslog_idx = log_next(syslog_idx);
- syslog_seq++;
- raw_spin_unlock_irq(&logbuf_lock);
-
- if (len > size)
- len = -EINVAL;
- else if (len > 0 && copy_to_user(buf, text, len))
- len = -EFAULT;
-
- kfree(text);
- return len;
-}
-
-static int syslog_print_all(char __user *buf, int size, bool clear)
-{
- char *text;
- int len = 0;
-
- text = kmalloc(LOG_LINE_MAX, GFP_KERNEL);
- if (!text)
- return -ENOMEM;
-
- raw_spin_lock_irq(&logbuf_lock);
- if (buf) {
- u64 next_seq;
- u64 seq;
- u32 idx;
-
- if (clear_seq < log_first_seq) {
- /* messages are gone, move to first available one */
- clear_seq = log_first_seq;
- clear_idx = log_first_idx;
- }
-
- /*
- * Find first record that fits, including all following records,
- * into the user-provided buffer for this dump.
- */
- seq = clear_seq;
- idx = clear_idx;
- while (seq < log_next_seq) {
- struct log *msg = log_from_idx(idx);
-
- len += msg_print_text(msg, true, NULL, 0);
- idx = log_next(idx);
- seq++;
- }
-
- /* move first record forward until length fits into the buffer */
- seq = clear_seq;
- idx = clear_idx;
- while (len > size && seq < log_next_seq) {
- struct log *msg = log_from_idx(idx);
-
- len -= msg_print_text(msg, true, NULL, 0);
- idx = log_next(idx);
- seq++;
- }
-
- /* last message fitting into this dump */
- next_seq = log_next_seq;
-
- len = 0;
- while (len >= 0 && seq < next_seq) {
- struct log *msg = log_from_idx(idx);
- int textlen;
-
- textlen = msg_print_text(msg, true, text, LOG_LINE_MAX);
- if (textlen < 0) {
- len = textlen;
- break;
- }
- idx = log_next(idx);
- seq++;
-
- raw_spin_unlock_irq(&logbuf_lock);
- if (copy_to_user(buf + len, text, textlen))
- len = -EFAULT;
- else
- len += textlen;
- raw_spin_lock_irq(&logbuf_lock);
-
- if (seq < log_first_seq) {
- /* messages are gone, move to next one */
- seq = log_first_seq;
- idx = log_first_idx;
- }
- }
- }
-
- if (clear) {
- clear_seq = log_next_seq;
- clear_idx = log_next_idx;
- }
- raw_spin_unlock_irq(&logbuf_lock);
-
- kfree(text);
- return len;
-}
-
int do_syslog(int type, char __user *buf, int len, bool from_file)
{
- bool clear = false;
- static int saved_console_loglevel = -1;
- static DEFINE_MUTEX(syslog_mutex);
+ unsigned i, j, limit, count;
+ int do_clear = 0;
+ char c;
int error;

error = check_syslog_permissions(type, from_file);
@@ -1007,21 +363,29 @@ int do_syslog(int type, char __user *buf
error = -EFAULT;
goto out;
}
- error = mutex_lock_interruptible(&syslog_mutex);
- if (error)
- goto out;
error = wait_event_interruptible(log_wait,
- syslog_seq != log_next_seq);
- if (error) {
- mutex_unlock(&syslog_mutex);
+ (log_start - log_end));
+ if (error)
goto out;
+ i = 0;
+ raw_spin_lock_irq(&logbuf_lock);
+ while (!error && (log_start != log_end) && i < len) {
+ c = LOG_BUF(log_start);
+ log_start++;
+ raw_spin_unlock_irq(&logbuf_lock);
+ error = __put_user(c,buf);
+ buf++;
+ i++;
+ cond_resched();
+ raw_spin_lock_irq(&logbuf_lock);
}
- error = syslog_print(buf, len);
- mutex_unlock(&syslog_mutex);
+ raw_spin_unlock_irq(&logbuf_lock);
+ if (!error)
+ error = i;
break;
/* Read/clear last kernel messages */
case SYSLOG_ACTION_READ_CLEAR:
- clear = true;
+ do_clear = 1;
/* FALL THRU */
/* Read last kernel messages */
case SYSLOG_ACTION_READ_ALL:
@@ -1035,11 +399,52 @@ int do_syslog(int type, char __user *buf
error = -EFAULT;
goto out;
}
- error = syslog_print_all(buf, len, clear);
+ count = len;
+ if (count > log_buf_len)
+ count = log_buf_len;
+ raw_spin_lock_irq(&logbuf_lock);
+ if (count > logged_chars)
+ count = logged_chars;
+ if (do_clear)
+ logged_chars = 0;
+ limit = log_end;
+ /*
+ * __put_user() could sleep, and while we sleep
+ * printk() could overwrite the messages
+ * we try to copy to user space. Therefore
+ * the messages are copied in reverse. <manfreds>
+ */
+ for (i = 0; i < count && !error; i++) {
+ j = limit-1-i;
+ if (j + log_buf_len < log_end)
+ break;
+ c = LOG_BUF(j);
+ raw_spin_unlock_irq(&logbuf_lock);
+ error = __put_user(c,&buf[count-1-i]);
+ cond_resched();
+ raw_spin_lock_irq(&logbuf_lock);
+ }
+ raw_spin_unlock_irq(&logbuf_lock);
+ if (error)
+ break;
+ error = i;
+ if (i != count) {
+ int offset = count-error;
+ /* buffer overflow during copy, correct user buffer. */
+ for (i = 0; i < error; i++) {
+ if (__get_user(c,&buf[i+offset]) ||
+ __put_user(c,&buf[i])) {
+ error = -EFAULT;
+ break;
+ }
+ cond_resched();
+ }
+ }
break;
/* Clear ring buffer */
case SYSLOG_ACTION_CLEAR:
- syslog_print_all(NULL, 0, true);
+ logged_chars = 0;
+ break;
/* Disable logging to console */
case SYSLOG_ACTION_CONSOLE_OFF:
if (saved_console_loglevel == -1)
@@ -1067,35 +472,7 @@ int do_syslog(int type, char __user *buf
break;
/* Number of chars in the log buffer */
case SYSLOG_ACTION_SIZE_UNREAD:
- raw_spin_lock_irq(&logbuf_lock);
- if (syslog_seq < log_first_seq) {
- /* messages are gone, move to first one */
- syslog_seq = log_first_seq;
- syslog_idx = log_first_idx;
- }
- if (from_file) {
- /*
- * Short-cut for poll(/"proc/kmsg") which simply checks
- * for pending data, not the size; return the count of
- * records, not the length.
- */
- error = log_next_idx - syslog_idx;
- } else {
- u64 seq;
- u32 idx;
-
- error = 0;
- seq = syslog_seq;
- idx = syslog_idx;
- while (seq < log_next_seq) {
- struct log *msg = log_from_idx(idx);
-
- error += msg_print_text(msg, true, NULL, 0);
- idx = log_next(idx);
- seq++;
- }
- }
- raw_spin_unlock_irq(&logbuf_lock);
+ error = log_end - log_start;
break;
/* Size of the log buffer */
case SYSLOG_ACTION_SIZE_BUFFER:
@@ -1124,11 +501,29 @@ void kdb_syslog_data(char *syslog_data[4
{
syslog_data[0] = log_buf;
syslog_data[1] = log_buf + log_buf_len;
- syslog_data[2] = log_buf + log_first_idx;
- syslog_data[3] = log_buf + log_next_idx;
+ syslog_data[2] = log_buf + log_end -
+ (logged_chars < log_buf_len ? logged_chars : log_buf_len);
+ syslog_data[3] = log_buf + log_end;
}
#endif /* CONFIG_KGDB_KDB */

+/*
+ * Call the console drivers on a range of log_buf
+ */
+static void __call_console_drivers(unsigned start, unsigned end)
+{
+ struct console *con;
+
+ for_each_console(con) {
+ if (exclusive_console && con != exclusive_console)
+ continue;
+ if ((con->flags & CON_ENABLED) && con->write &&
+ (cpu_online(smp_processor_id()) ||
+ (con->flags & CON_ANYTIME)))
+ con->write(con, &LOG_BUF(start), end - start);
+ }
+}
+
static bool __read_mostly ignore_loglevel;

static int __init ignore_loglevel_setup(char *str)
@@ -1145,33 +540,142 @@ MODULE_PARM_DESC(ignore_loglevel, "ignor
"print all kernel messages to the console.");

/*
+ * Write out chars from start to end - 1 inclusive
+ */
+static void _call_console_drivers(unsigned start,
+ unsigned end, int msg_log_level)
+{
+ trace_console(&LOG_BUF(0), start, end, log_buf_len);
+
+ if ((msg_log_level < console_loglevel || ignore_loglevel) &&
+ console_drivers && start != end) {
+ if ((start & LOG_BUF_MASK) > (end & LOG_BUF_MASK)) {
+ /* wrapped write */
+ __call_console_drivers(start & LOG_BUF_MASK,
+ log_buf_len);
+ __call_console_drivers(0, end & LOG_BUF_MASK);
+ } else {
+ __call_console_drivers(start, end);
+ }
+ }
+}
+
+/*
+ * Parse the syslog header <[0-9]*>. The decimal value represents 32bit, the
+ * lower 3 bit are the log level, the rest are the log facility. In case
+ * userspace passes usual userspace syslog messages to /dev/kmsg or
+ * /dev/ttyprintk, the log prefix might contain the facility. Printk needs
+ * to extract the correct log level for in-kernel processing, and not mangle
+ * the original value.
+ *
+ * If a prefix is found, the length of the prefix is returned. If 'level' is
+ * passed, it will be filled in with the log level without a possible facility
+ * value. If 'special' is passed, the special printk prefix chars are accepted
+ * and returned. If no valid header is found, 0 is returned and the passed
+ * variables are not touched.
+ */
+static size_t log_prefix(const char *p, unsigned int *level, char *special)
+{
+ unsigned int lev = 0;
+ char sp = '\0';
+ size_t len;
+
+ if (p[0] != '<' || !p[1])
+ return 0;
+ if (p[2] == '>') {
+ /* usual single digit level number or special char */
+ switch (p[1]) {
+ case '0' ... '7':
+ lev = p[1] - '0';
+ break;
+ case 'c': /* KERN_CONT */
+ case 'd': /* KERN_DEFAULT */
+ sp = p[1];
+ break;
+ default:
+ return 0;
+ }
+ len = 3;
+ } else {
+ /* multi digit including the level and facility number */
+ char *endp = NULL;
+
+ lev = (simple_strtoul(&p[1], &endp, 10) & 7);
+ if (endp == NULL || endp[0] != '>')
+ return 0;
+ len = (endp + 1) - p;
+ }
+
+ /* do not accept special char if not asked for */
+ if (sp && !special)
+ return 0;
+
+ if (special) {
+ *special = sp;
+ /* return special char, do not touch level */
+ if (sp)
+ return len;
+ }
+
+ if (level)
+ *level = lev;
+ return len;
+}
+
+/*
* Call the console drivers, asking them to write out
* log_buf[start] to log_buf[end - 1].
* The console_lock must be held.
*/
-static void call_console_drivers(int level, const char *text, size_t len)
+static void call_console_drivers(unsigned start, unsigned end)
{
- struct console *con;
+ unsigned cur_index, start_print;
+ static int msg_level = -1;

- trace_console(text, 0, len, len);
+ BUG_ON(((int)(start - end)) > 0);

- if (level >= console_loglevel && !ignore_loglevel)
- return;
- if (!console_drivers)
- return;
-
- for_each_console(con) {
- if (exclusive_console && con != exclusive_console)
- continue;
- if (!(con->flags & CON_ENABLED))
- continue;
- if (!con->write)
- continue;
- if (!cpu_online(smp_processor_id()) &&
- !(con->flags & CON_ANYTIME))
- continue;
- con->write(con, text, len);
+ cur_index = start;
+ start_print = start;
+ while (cur_index != end) {
+ if (msg_level < 0 && ((end - cur_index) > 2)) {
+ /* strip log prefix */
+ cur_index += log_prefix(&LOG_BUF(cur_index), &msg_level, NULL);
+ start_print = cur_index;
+ }
+ while (cur_index != end) {
+ char c = LOG_BUF(cur_index);
+
+ cur_index++;
+ if (c == '\n') {
+ if (msg_level < 0) {
+ /*
+ * printk() has already given us loglevel tags in
+ * the buffer. This code is here in case the
+ * log buffer has wrapped right round and scribbled
+ * on those tags
+ */
+ msg_level = default_message_loglevel;
+ }
+ _call_console_drivers(start_print, cur_index, msg_level);
+ msg_level = -1;
+ start_print = cur_index;
+ break;
+ }
+ }
}
+ _call_console_drivers(start_print, end, msg_level);
+}
+
+static void emit_log_char(char c)
+{
+ LOG_BUF(log_end) = c;
+ log_end++;
+ if (log_end - log_start > log_buf_len)
+ log_start = log_end - log_buf_len;
+ if (log_end - con_start > log_buf_len)
+ con_start = log_end - log_buf_len;
+ if (logged_chars < log_buf_len)
+ logged_chars++;
}

/*
@@ -1196,6 +700,16 @@ static void zap_locks(void)
sema_init(&console_sem, 1);
}

+#if defined(CONFIG_PRINTK_TIME)
+static bool printk_time = 1;
+#else
+static bool printk_time = 0;
+#endif
+module_param_named(time, printk_time, bool, S_IRUGO | S_IWUSR);
+
+static bool always_kmsg_dump;
+module_param_named(always_kmsg_dump, always_kmsg_dump, bool, S_IRUGO | S_IWUSR);
+
/* Check if we have any console registered that can be called early in boot. */
static int have_callable_console(void)
{
@@ -1208,6 +722,51 @@ static int have_callable_console(void)
return 0;
}

+/**
+ * printk - print a kernel message
+ * @fmt: format string
+ *
+ * This is printk(). It can be called from any context. We want it to work.
+ *
+ * We try to grab the console_lock. If we succeed, it's easy - we log the output and
+ * call the console drivers. If we fail to get the semaphore we place the output
+ * into the log buffer and return. The current holder of the console_sem will
+ * notice the new output in console_unlock(); and will send it to the
+ * consoles before releasing the lock.
+ *
+ * One effect of this deferred printing is that code which calls printk() and
+ * then changes console_loglevel may break. This is because console_loglevel
+ * is inspected when the actual printing occurs.
+ *
+ * See also:
+ * printf(3)
+ *
+ * See the vsnprintf() documentation for format string extensions over C99.
+ */
+
+asmlinkage int printk(const char *fmt, ...)
+{
+ va_list args;
+ int r;
+
+#ifdef CONFIG_KGDB_KDB
+ if (unlikely(kdb_trap_printk)) {
+ va_start(args, fmt);
+ r = vkdb_printf(fmt, args);
+ va_end(args);
+ return r;
+ }
+#endif
+ va_start(args, fmt);
+ r = vprintk(fmt, args);
+ va_end(args);
+
+ return r;
+}
+
+/* cpu currently holding logbuf_lock */
+static volatile unsigned int printk_cpu = UINT_MAX;
+
/*
* Can we actually use the console at this time on this cpu?
*
@@ -1251,12 +810,17 @@ static int console_trylock_for_printk(un
retval = 0;
}
}
- logbuf_cpu = UINT_MAX;
+ printk_cpu = UINT_MAX;
if (wake)
up(&console_sem);
raw_spin_unlock(&logbuf_lock);
return retval;
}
+static const char recursion_bug_msg [] =
+ KERN_CRIT "BUG: recent printk recursion!\n";
+static int recursion_bug;
+static int new_text_line = 1;
+static char printk_buf[1024];

int printk_delay_msec __read_mostly;

@@ -1272,23 +836,15 @@ static inline void printk_delay(void)
}
}

-asmlinkage int vprintk_emit(int facility, int level,
- const char *dict, size_t dictlen,
- const char *fmt, va_list args)
-{
- static int recursion_bug;
- static char cont_buf[LOG_LINE_MAX];
- static size_t cont_len;
- static int cont_level;
- static struct task_struct *cont_task;
- static char textbuf[LOG_LINE_MAX];
- char *text = textbuf;
- size_t text_len;
+asmlinkage int vprintk(const char *fmt, va_list args)
+{
+ int printed_len = 0;
+ int current_log_level = default_message_loglevel;
unsigned long flags;
int this_cpu;
- bool newline = false;
- bool prefix = false;
- int printed_len = 0;
+ char *p;
+ size_t plen;
+ char special;

boot_delay_msec();
printk_delay();
@@ -1300,7 +856,7 @@ asmlinkage int vprintk_emit(int facility
/*
* Ouch, printk recursed into itself!
*/
- if (unlikely(logbuf_cpu == this_cpu)) {
+ if (unlikely(printk_cpu == this_cpu)) {
/*
* If a crash is occurring during printk() on this CPU,
* then try to get the crash message out but make sure
@@ -1317,110 +873,97 @@ asmlinkage int vprintk_emit(int facility

lockdep_off();
raw_spin_lock(&logbuf_lock);
- logbuf_cpu = this_cpu;
+ printk_cpu = this_cpu;

if (recursion_bug) {
- static const char recursion_msg[] =
- "BUG: recent printk recursion!";
-
recursion_bug = 0;
- printed_len += strlen(recursion_msg);
- /* emit KERN_CRIT message */
- log_store(0, 2, NULL, 0, recursion_msg, printed_len);
- }
-
- /*
- * The printf needs to come first; we need the syslog
- * prefix which might be passed-in as a parameter.
- */
- text_len = vscnprintf(text, sizeof(textbuf), fmt, args);
-
- /* mark and strip a trailing newline */
- if (text_len && text[text_len-1] == '\n') {
- text_len--;
- newline = true;
+ strcpy(printk_buf, recursion_bug_msg);
+ printed_len = strlen(recursion_bug_msg);
}
-
- /* strip syslog prefix and extract log level or control flags */
- if (text[0] == '<' && text[1] && text[2] == '>') {
- switch (text[1]) {
- case '0' ... '7':
- if (level == -1)
- level = text[1] - '0';
- case 'd': /* KERN_DEFAULT */
- prefix = true;
- case 'c': /* KERN_CONT */
- text += 3;
- text_len -= 3;
+ /* Emit the output into the temporary buffer */
+ printed_len += vscnprintf(printk_buf + printed_len,
+ sizeof(printk_buf) - printed_len, fmt, args);
+
+ p = printk_buf;
+
+ /* Read log level and handle special printk prefix */
+ plen = log_prefix(p, &current_log_level, &special);
+ if (plen) {
+ p += plen;
+
+ switch (special) {
+ case 'c': /* Strip <c> KERN_CONT, continue line */
+ plen = 0;
+ break;
+ case 'd': /* Strip <d> KERN_DEFAULT, start new line */
+ plen = 0;
+ default:
+ if (!new_text_line) {
+ emit_log_char('\n');
+ new_text_line = 1;
+ }
}
}

- if (level == -1)
- level = default_message_loglevel;
-
- if (dict) {
- prefix = true;
- newline = true;
- }
-
- if (!newline) {
- if (cont_len && (prefix || cont_task != current)) {
- /*
- * Flush earlier buffer, which is either from a
- * different thread, or when we got a new prefix.
- */
- log_store(facility, cont_level, NULL, 0, cont_buf, cont_len);
- cont_len = 0;
- }
-
- if (!cont_len) {
- cont_level = level;
- cont_task = current;
- }
+ /*
+ * Copy the output into log_buf. If the caller didn't provide
+ * the appropriate log prefix, we insert them here
+ */
+ for (; *p; p++) {
+ if (new_text_line) {
+ new_text_line = 0;
+
+ if (plen) {
+ /* Copy original log prefix */
+ int i;
+
+ for (i = 0; i < plen; i++)
+ emit_log_char(printk_buf[i]);
+ printed_len += plen;
+ } else {
+ /* Add log prefix */
+ emit_log_char('<');
+ emit_log_char(current_log_level + '0');
+ emit_log_char('>');
+ printed_len += 3;
+ }

- /* buffer or append to earlier buffer from the same thread */
- if (cont_len + text_len > sizeof(cont_buf))
- text_len = sizeof(cont_buf) - cont_len;
- memcpy(cont_buf + cont_len, text, text_len);
- cont_len += text_len;
- } else {
- if (cont_len && cont_task == current) {
- if (prefix) {
- /*
- * New prefix from the same thread; flush. We
- * either got no earlier newline, or we race
- * with an interrupt.
- */
- log_store(facility, cont_level,
- NULL, 0, cont_buf, cont_len);
- cont_len = 0;
+ if (printk_time) {
+ /* Add the current time stamp */
+ char tbuf[50], *tp;
+ unsigned tlen;
+ unsigned long long t;
+ unsigned long nanosec_rem;
+
+ t = cpu_clock(printk_cpu);
+ nanosec_rem = do_div(t, 1000000000);
+ tlen = sprintf(tbuf, "[%5lu.%06lu] ",
+ (unsigned long) t,
+ nanosec_rem / 1000);
+
+ for (tp = tbuf; tp < tbuf + tlen; tp++)
+ emit_log_char(*tp);
+ printed_len += tlen;
}

- /* append to the earlier buffer and flush */
- if (cont_len + text_len > sizeof(cont_buf))
- text_len = sizeof(cont_buf) - cont_len;
- memcpy(cont_buf + cont_len, text, text_len);
- cont_len += text_len;
- log_store(facility, cont_level,
- NULL, 0, cont_buf, cont_len);
- cont_len = 0;
- cont_task = NULL;
- printed_len = cont_len;
- } else {
- /* ordinary single and terminated line */
- log_store(facility, level,
- dict, dictlen, text, text_len);
- printed_len = text_len;
+ if (!*p)
+ break;
}
+
+ emit_log_char(*p);
+ if (*p == '\n')
+ new_text_line = 1;
}

/*
- * Try to acquire and then immediately release the console semaphore.
- * The release will print out buffers and wake up /dev/kmsg and syslog()
- * users.
+ * Try to acquire and then immediately release the
+ * console semaphore. The release will do all the
+ * actual magic (print out buffers, wake up klogd,
+ * etc).
*
- * The console_trylock_for_printk() function will release 'logbuf_lock'
- * regardless of whether it actually gets the console semaphore or not.
+ * The console_trylock_for_printk() function
+ * will release 'logbuf_lock' regardless of whether it
+ * actually gets the semaphore or not.
*/
if (console_trylock_for_printk(this_cpu))
console_unlock();
@@ -1431,81 +974,16 @@ out_restore_irqs:

return printed_len;
}
-EXPORT_SYMBOL(vprintk_emit);
-
-asmlinkage int vprintk(const char *fmt, va_list args)
-{
- return vprintk_emit(0, -1, NULL, 0, fmt, args);
-}
+EXPORT_SYMBOL(printk);
EXPORT_SYMBOL(vprintk);

-asmlinkage int printk_emit(int facility, int level,
- const char *dict, size_t dictlen,
- const char *fmt, ...)
-{
- va_list args;
- int r;
-
- va_start(args, fmt);
- r = vprintk_emit(facility, level, dict, dictlen, fmt, args);
- va_end(args);
-
- return r;
-}
-EXPORT_SYMBOL(printk_emit);
+#else

-/**
- * printk - print a kernel message
- * @fmt: format string
- *
- * This is printk(). It can be called from any context. We want it to work.
- *
- * We try to grab the console_lock. If we succeed, it's easy - we log the
- * output and call the console drivers. If we fail to get the semaphore, we
- * place the output into the log buffer and return. The current holder of
- * the console_sem will notice the new output in console_unlock(); and will
- * send it to the consoles before releasing the lock.
- *
- * One effect of this deferred printing is that code which calls printk() and
- * then changes console_loglevel may break. This is because console_loglevel
- * is inspected when the actual printing occurs.
- *
- * See also:
- * printf(3)
- *
- * See the vsnprintf() documentation for format string extensions over C99.
- */
-asmlinkage int printk(const char *fmt, ...)
+static void call_console_drivers(unsigned start, unsigned end)
{
- va_list args;
- int r;
-
-#ifdef CONFIG_KGDB_KDB
- if (unlikely(kdb_trap_printk)) {
- va_start(args, fmt);
- r = vkdb_printf(fmt, args);
- va_end(args);
- return r;
- }
-#endif
- va_start(args, fmt);
- r = vprintk_emit(0, -1, NULL, 0, fmt, args);
- va_end(args);
-
- return r;
}
-EXPORT_SYMBOL(printk);

-#else
-
-#define LOG_LINE_MAX 0
-static struct log *log_from_idx(u32 idx) { return NULL; }
-static u32 log_next(u32 idx) { return 0; }
-static void call_console_drivers(int level, const char *text, size_t len) {}
-static size_t msg_print_text(const struct log *msg, bool syslog,
- char *buf, size_t size) { return 0; }
-
-#endif /* CONFIG_PRINTK */
+#endif

static int __add_preferred_console(char *name, int idx, char *options,
char *brl_options)
@@ -1739,7 +1217,7 @@ int is_console_locked(void)
}

/*
- * Delayed printk version, for scheduler-internal messages:
+ * Delayed printk facility, for scheduler-internal messages:
*/
#define PRINTK_BUF_SIZE 512

@@ -1775,10 +1253,6 @@ void wake_up_klogd(void)
this_cpu_or(printk_pending, PRINTK_PENDING_WAKEUP);
}

-/* the next printk record to write to the console */
-static u64 console_seq;
-static u32 console_idx;
-
/**
* console_unlock - unlock the console system
*
@@ -1789,16 +1263,15 @@ static u32 console_idx;
* by printk(). If this is the case, console_unlock(); emits
* the output prior to releasing the lock.
*
- * If there is output waiting, we wake /dev/kmsg and syslog() users.
+ * If there is output waiting for klogd, we wake it up.
*
* console_unlock(); may be called from any context.
*/
void console_unlock(void)
{
- static u64 seen_seq;
unsigned long flags;
- bool wake_klogd = false;
- bool retry;
+ unsigned _con_start, _log_end;
+ unsigned wake_klogd = 0, retry = 0;

if (console_suspended) {
up(&console_sem);
@@ -1808,38 +1281,17 @@ void console_unlock(void)
console_may_schedule = 0;

again:
- for (;;) {
- struct log *msg;
- static char text[LOG_LINE_MAX];
- size_t len;
- int level;
-
+ for ( ; ; ) {
raw_spin_lock_irqsave(&logbuf_lock, flags);
- if (seen_seq != log_next_seq) {
- wake_klogd = true;
- seen_seq = log_next_seq;
- }
-
- if (console_seq < log_first_seq) {
- /* messages are gone, move to first one */
- console_seq = log_first_seq;
- console_idx = log_first_idx;
- }
-
- if (console_seq == log_next_seq)
- break;
-
- msg = log_from_idx(console_idx);
- level = msg->level & 7;
-
- len = msg_print_text(msg, false, text, sizeof(text));
-
- console_idx = log_next(console_idx);
- console_seq++;
+ wake_klogd |= log_start - log_end;
+ if (con_start == log_end)
+ break; /* Nothing to print */
+ _con_start = con_start;
+ _log_end = log_end;
+ con_start = log_end; /* Flush */
raw_spin_unlock(&logbuf_lock);
-
stop_critical_timings(); /* don't trace print latency */
- call_console_drivers(level, text, len);
+ call_console_drivers(_con_start, _log_end);
start_critical_timings();
local_irq_restore(flags);
}
@@ -1860,7 +1312,8 @@ again:
* flush, no worries.
*/
raw_spin_lock(&logbuf_lock);
- retry = console_seq != log_next_seq;
+ if (con_start != log_end)
+ retry = 1;
raw_spin_unlock_irqrestore(&logbuf_lock, flags);

if (retry && console_trylock())
@@ -2096,8 +1549,7 @@ void register_console(struct console *ne
* for us.
*/
raw_spin_lock_irqsave(&logbuf_lock, flags);
- console_seq = syslog_seq;
- console_idx = syslog_idx;
+ con_start = log_start;
raw_spin_unlock_irqrestore(&logbuf_lock, flags);
/*
* We're about to replay the log buffer. Only do this to the
@@ -2306,217 +1758,50 @@ int kmsg_dump_unregister(struct kmsg_dum
}
EXPORT_SYMBOL_GPL(kmsg_dump_unregister);

-static bool always_kmsg_dump;
-module_param_named(always_kmsg_dump, always_kmsg_dump, bool, S_IRUGO | S_IWUSR);
-
/**
* kmsg_dump - dump kernel log to kernel message dumpers.
* @reason: the reason (oops, panic etc) for dumping
*
- * Call each of the registered dumper's dump() callback, which can
- * retrieve the kmsg records with kmsg_dump_get_line() or
- * kmsg_dump_get_buffer().
+ * Iterate through each of the dump devices and call the oops/panic
+ * callbacks with the log buffer.
*/
void kmsg_dump(enum kmsg_dump_reason reason)
{
+ unsigned long end;
+ unsigned chars;
struct kmsg_dumper *dumper;
+ const char *s1, *s2;
+ unsigned long l1, l2;
unsigned long flags;

if ((reason > KMSG_DUMP_OOPS) && !always_kmsg_dump)
return;

- rcu_read_lock();
- list_for_each_entry_rcu(dumper, &dump_list, list) {
- if (dumper->max_reason && reason > dumper->max_reason)
- continue;
-
- /* initialize iterator with data about the stored records */
- dumper->active = true;
-
- raw_spin_lock_irqsave(&logbuf_lock, flags);
- dumper->cur_seq = clear_seq;
- dumper->cur_idx = clear_idx;
- dumper->next_seq = log_next_seq;
- dumper->next_idx = log_next_idx;
- raw_spin_unlock_irqrestore(&logbuf_lock, flags);
-
- /* invoke dumper which will iterate over records */
- dumper->dump(dumper, reason);
-
- /* reset iterator */
- dumper->active = false;
- }
- rcu_read_unlock();
-}
-
-/**
- * kmsg_dump_get_line - retrieve one kmsg log line
- * @dumper: registered kmsg dumper
- * @syslog: include the "<4>" prefixes
- * @line: buffer to copy the line to
- * @size: maximum size of the buffer
- * @len: length of line placed into buffer
- *
- * Start at the beginning of the kmsg buffer, with the oldest kmsg
- * record, and copy one record into the provided buffer.
- *
- * Consecutive calls will return the next available record moving
- * towards the end of the buffer with the youngest messages.
- *
- * A return value of FALSE indicates that there are no more records to
- * read.
- */
-bool kmsg_dump_get_line(struct kmsg_dumper *dumper, bool syslog,
- char *line, size_t size, size_t *len)
-{
- unsigned long flags;
- struct log *msg;
- size_t l = 0;
- bool ret = false;
-
- if (!dumper->active)
- goto out;
-
+ /* Theoretically, the log could move on after we do this, but
+ there's not a lot we can do about that. The new messages
+ will overwrite the start of what we dump. */
raw_spin_lock_irqsave(&logbuf_lock, flags);
- if (dumper->cur_seq < log_first_seq) {
- /* messages are gone, move to first available one */
- dumper->cur_seq = log_first_seq;
- dumper->cur_idx = log_first_idx;
- }
-
- /* last entry */
- if (dumper->cur_seq >= log_next_seq) {
- raw_spin_unlock_irqrestore(&logbuf_lock, flags);
- goto out;
- }
-
- msg = log_from_idx(dumper->cur_idx);
- l = msg_print_text(msg, syslog,
- line, size);
-
- dumper->cur_idx = log_next(dumper->cur_idx);
- dumper->cur_seq++;
- ret = true;
+ end = log_end & LOG_BUF_MASK;
+ chars = logged_chars;
raw_spin_unlock_irqrestore(&logbuf_lock, flags);
-out:
- if (len)
- *len = l;
- return ret;
-}
-EXPORT_SYMBOL_GPL(kmsg_dump_get_line);

-/**
- * kmsg_dump_get_buffer - copy kmsg log lines
- * @dumper: registered kmsg dumper
- * @syslog: include the "<4>" prefixes
- * @line: buffer to copy the line to
- * @size: maximum size of the buffer
- * @len: length of line placed into buffer
- *
- * Start at the end of the kmsg buffer and fill the provided buffer
- * with as many of the the *youngest* kmsg records that fit into it.
- * If the buffer is large enough, all available kmsg records will be
- * copied with a single call.
- *
- * Consecutive calls will fill the buffer with the next block of
- * available older records, not including the earlier retrieved ones.
- *
- * A return value of FALSE indicates that there are no more records to
- * read.
- */
-bool kmsg_dump_get_buffer(struct kmsg_dumper *dumper, bool syslog,
- char *buf, size_t size, size_t *len)
-{
- unsigned long flags;
- u64 seq;
- u32 idx;
- u64 next_seq;
- u32 next_idx;
- size_t l = 0;
- bool ret = false;
-
- if (!dumper->active)
- goto out;
+ if (chars > end) {
+ s1 = log_buf + log_buf_len - chars + end;
+ l1 = chars - end;

- raw_spin_lock_irqsave(&logbuf_lock, flags);
- if (dumper->cur_seq < log_first_seq) {
- /* messages are gone, move to first available one */
- dumper->cur_seq = log_first_seq;
- dumper->cur_idx = log_first_idx;
- }
-
- /* last entry */
- if (dumper->cur_seq >= dumper->next_seq) {
- raw_spin_unlock_irqrestore(&logbuf_lock, flags);
- goto out;
- }
-
- /* calculate length of entire buffer */
- seq = dumper->cur_seq;
- idx = dumper->cur_idx;
- while (seq < dumper->next_seq) {
- struct log *msg = log_from_idx(idx);
-
- l += msg_print_text(msg, true, NULL, 0);
- idx = log_next(idx);
- seq++;
- }
-
- /* move first record forward until length fits into the buffer */
- seq = dumper->cur_seq;
- idx = dumper->cur_idx;
- while (l > size && seq < dumper->next_seq) {
- struct log *msg = log_from_idx(idx);
-
- l -= msg_print_text(msg, true, NULL, 0);
- idx = log_next(idx);
- seq++;
- }
-
- /* last message in next interation */
- next_seq = seq;
- next_idx = idx;
-
- l = 0;
- while (seq < dumper->next_seq) {
- struct log *msg = log_from_idx(idx);
-
- l += msg_print_text(msg, syslog,
- buf + l, size - l);
+ s2 = log_buf;
+ l2 = end;
+ } else {
+ s1 = "";
+ l1 = 0;

- idx = log_next(idx);
- seq++;
+ s2 = log_buf + end - chars;
+ l2 = chars;
}

- dumper->next_seq = next_seq;
- dumper->next_idx = next_idx;
- ret = true;
- raw_spin_unlock_irqrestore(&logbuf_lock, flags);
-out:
- if (len)
- *len = l;
- return ret;
-}
-EXPORT_SYMBOL_GPL(kmsg_dump_get_buffer);
-
-/**
- * kmsg_dump_rewind - reset the interator
- * @dumper: registered kmsg dumper
- *
- * Reset the dumper's iterator so that kmsg_dump_get_line() and
- * kmsg_dump_get_buffer() can be called again and used multiple
- * times within the same dumper.dump() callback.
- */
-void kmsg_dump_rewind(struct kmsg_dumper *dumper)
-{
- unsigned long flags;
-
- raw_spin_lock_irqsave(&logbuf_lock, flags);
- dumper->cur_seq = clear_seq;
- dumper->cur_idx = clear_idx;
- dumper->next_seq = log_next_seq;
- dumper->next_idx = log_next_idx;
- raw_spin_unlock_irqrestore(&logbuf_lock, flags);
+ rcu_read_lock();
+ list_for_each_entry_rcu(dumper, &dump_list, list)
+ dumper->dump(dumper, reason);
+ rcu_read_unlock();
}
-EXPORT_SYMBOL_GPL(kmsg_dump_rewind);
#endif

2012-06-25 10:07:05

by Kay Sievers

[permalink] [raw]
Subject: Re: [PATCH] printk: Revert the buffered-printk() changes for now

On Mon, Jun 25, 2012 at 11:09 AM, Ingo Molnar <[email protected]> wrote:
> * Kay Sievers <[email protected]> wrote:

>> I just don't have a better idea than Joe or Steven.
>
> Then pick the fix you see as the best solution, post it and push
> the fix to Linus, don't just sit there passive-aggressively
> leaving a regression you introduced unresolved ...

Yeah, sometimes we can't have everything at the same time, and we
fixed serious reliability and integrity bugs for the cost of changing
the behaviour of: debug-printk + continuation line + crash the box. If
the box does not crash, the output does not change at all from before.

> I think Steve's fix would be OK as a workaround, if no-one can
> think of anything better.
>
> The thing is, this bug, despite being reported early on, has
> been left unresolved for weeks and it's -rc4 time now. Time is
> running out and frankly, I've been watching this thread, and the
> only reason you appear to be taking this bug somewhat seriously
> now is because Andrew and me complained. That is sad.

Steven said he would try something else on Monday, that's why I'm
waiting. I'm not happy with any of the current
optimize-continuation-lines-for-kernel-crashes patches, so I wanted to
see what he has in mind.

> Kernel policy is that kernel bugs introduced during the merge
> window are fixed by those who introduced them, or the changes
> get reverted. The kernel project uses a very user-friendly
> process to resolve regressions and in the worst case you as a
> developer have to suffer your changes reverted. Obviously timely
> fixes are preferred over invasive reverts.

That's true. this change is a trade, and the kernel self-tests
print-continuation-line-and-let-the-kernel-crash is currently affected
by the hugely improved integrity and reliability of all the "normal"
users.

> It is not *Steve's* job to fix this bug. That he actually posted
> a fix is nice from him, it makes your life easier (and you can
> still write some other fix) but *you* should be driving the
> resolution here, not Steve, me or Andrew.

Let's see what other idea Steven has, he wrote "But I have an idea for
a fix. I'll work on it on Monday, and it wont require adding a
pr_flush() like I did in this patch set." before pointing fingers here
please.

If we find a way to solve that cleanly, I'm all for it. But honestly,
just letting these very special-case users print fully terminated
lines instead of continuation lines seems like an option to me too. We
really should not optimize for cosmetics (full lines work reliably,
they are not buffered) of self-tests, for the price of the reliability
and integrity of all other users.

The self-test users would need to be changed anyway to use "flush",
why don't we just print a full line, omit the "PASSED" in case all
went well, and only print in case of an error? I really do not see the
need for continuation lines for the crash-sensitive self-tests; people
will find out (without PASSED on the same line), that we passed the
test when the box is still alive.

Kay

2012-06-25 13:42:40

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] printk: Revert the buffered-printk() changes for now

On Mon, 2012-06-25 at 12:06 +0200, Kay Sievers wrote:
> On Mon, Jun 25, 2012 at 11:09 AM, Ingo Molnar <[email protected]> wrote:
> > * Kay Sievers <[email protected]> wrote:
>
> >> I just don't have a better idea than Joe or Steven.
> >
> > Then pick the fix you see as the best solution, post it and push
> > the fix to Linus, don't just sit there passive-aggressively
> > leaving a regression you introduced unresolved ...
>
> Yeah, sometimes we can't have everything at the same time, and we
> fixed serious reliability and integrity bugs for the cost of changing
> the behaviour of: debug-printk + continuation line + crash the box. If
> the box does not crash, the output does not change at all from before.

But if the box crashes, then we get lies about where it crashed. That's
an important part.

>
> > I think Steve's fix would be OK as a workaround, if no-one can
> > think of anything better.
> >
> > The thing is, this bug, despite being reported early on, has
> > been left unresolved for weeks and it's -rc4 time now. Time is
> > running out and frankly, I've been watching this thread, and the
> > only reason you appear to be taking this bug somewhat seriously
> > now is because Andrew and me complained. That is sad.
>
> Steven said he would try something else on Monday, that's why I'm
> waiting. I'm not happy with any of the current
> optimize-continuation-lines-for-kernel-crashes patches, so I wanted to
> see what he has in mind.

I just posted it. You may not like it either, but it removes the needed
printk_flush(), and doesn't touch any of your 'facility' work. Besides
adding its own 0x1ffc tag.


>
> > Kernel policy is that kernel bugs introduced during the merge
> > window are fixed by those who introduced them, or the changes
> > get reverted. The kernel project uses a very user-friendly
> > process to resolve regressions and in the worst case you as a
> > developer have to suffer your changes reverted. Obviously timely
> > fixes are preferred over invasive reverts.
>
> That's true. this change is a trade, and the kernel self-tests
> print-continuation-line-and-let-the-kernel-crash is currently affected
> by the hugely improved integrity and reliability of all the "normal"
> users.

I'm hoping that my last patch can satisfy both users.

>
> > It is not *Steve's* job to fix this bug. That he actually posted
> > a fix is nice from him, it makes your life easier (and you can
> > still write some other fix) but *you* should be driving the
> > resolution here, not Steve, me or Andrew.
>
> Let's see what other idea Steven has, he wrote "But I have an idea for
> a fix. I'll work on it on Monday, and it wont require adding a
> pr_flush() like I did in this patch set." before pointing fingers here
> please.

It's out, please continue to point ;-)

>
> If we find a way to solve that cleanly, I'm all for it. But honestly,
> just letting these very special-case users print fully terminated
> lines instead of continuation lines seems like an option to me too. We
> really should not optimize for cosmetics (full lines work reliably,
> they are not buffered) of self-tests, for the price of the reliability
> and integrity of all other users.

The thing is, you made a fundamental change to the way printk() has
worked from day one. This will give lots of developers a "surprise",
that will last for a long time.

>
> The self-test users would need to be changed anyway to use "flush",
> why don't we just print a full line, omit the "PASSED" in case all
> went well, and only print in case of an error? I really do not see the
> need for continuation lines for the crash-sensitive self-tests; people
> will find out (without PASSED on the same line), that we passed the
> test when the box is still alive.

Well, it was a way my scripts would check if the test passed or not.

-- Steve

2012-06-25 14:07:55

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] printk: Revert the buffered-printk() changes for now


* Kay Sievers <[email protected]> wrote:

> > Kernel policy is that kernel bugs introduced during the
> > merge window are fixed by those who introduced them, or the
> > changes get reverted. The kernel project uses a very
> > user-friendly process to resolve regressions and in the
> > worst case you as a developer have to suffer your changes
> > reverted. Obviously timely fixes are preferred over invasive
> > reverts.
>
> That's true. this change is a trade, and the kernel self-tests
> print-continuation-line-and-let-the-kernel-crash is currently
> affected by the hugely improved integrity and reliability of
> all the "normal" users.

Sigh, which part of the "no regressions" policy did you not
understand?

Even if we ignored regressions (which we don't), you'd *STILL*
be wrong: using printk to debug crashes (and to develop code in
general) is one of its earliest and I'd say most important role.

And when do we need printk() output? Most of the time only when
there's a problem with the system - such as a crash.

System logging is an arguably secondary role, and it should not
degrade printk()s primary role.

Your flippant attitude towards printk quality is really sad.

Thanks,

Ingo

2012-06-25 14:48:56

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] printk: Revert the buffered-printk() changes for now

On Mon, 2012-06-25 at 16:07 +0200, Ingo Molnar wrote:

> System logging is an arguably secondary role, and it should not
> degrade printk()s primary role.

I would argue that printk() should not play the role of normal system
logging. It's main role should be for boot up and crashes. If devices
need to log information to userspace, it should really use some other
means. What was /sys made for anyway?

-- Steve

2012-06-25 16:40:41

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH] printk: Revert the buffered-printk() changes for now

On Mon, Jun 25, 2012 at 10:48:54AM -0400, Steven Rostedt wrote:
> On Mon, 2012-06-25 at 16:07 +0200, Ingo Molnar wrote:
>
> > System logging is an arguably secondary role, and it should not
> > degrade printk()s primary role.
>
> I would argue that printk() should not play the role of normal system
> logging. It's main role should be for boot up and crashes. If devices
> need to log information to userspace, it should really use some other
> means. What was /sys made for anyway?

Specifically not for logging. See the very old discussions of this a
long time ago (back in the 2.5 days), if you are curious.

printk() is the best thing we have for logging as everyone uses it and
the information in it is exactly what userspace wants to know about.
Because of that, why wouldn't we use it?

Anyway, your "never buffer printk data" patch looks like the right
solution here, I'm guessing you are going to respin it based on the
feedback so far, right?

thanks,

greg k-h

2012-06-25 16:53:30

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH] printk: Add printk_flush() to force buffered text to console

On Mon, 2012-06-25 at 10:45 +0200, Ingo Molnar wrote:
> * Joe Perches <[email protected]> wrote:
>
> > On Sat, 2012-06-23 at 08:13 +0200, Ingo Molnar wrote:
> > > * Andrew Morton <[email protected]> wrote:
> > >
> > > > On Thu, 21 Jun 2012 19:52:03 -0400
> > > > Steven Rostedt <[email protected]> wrote:
> > > >
> > > > > But with the new printk() changes, text without a newline
> > > > > gets buffered and does not print out to the console at the
> > > > > location of the printk.
> > > >
> > > > uh, how about we fix that? The old behaviour was good, the
> > > > new behaviour is noxious.
> > >
> > > Absolutely.
> > >
> > > pr_flush() really seems to be a workaround.
> >
> > I think the new behavior an overall improvement.
> > It helps avoid multiple thread interleaving.
> >
> > I suggested a fix to enable a printk buffered/non-buffered
> > mechanism with an optional printk_flush/pr_flush.
> >
> > https://lkml.org/lkml/2012/6/21/305
>
> Ugh, so you suggest a global toggle to turn buffering on/off?

<shrug>

Rather than adding a bunch of printk_flush() calls around
the possibly many sites per file that need them, I'd rather
see a code block surrounded by buffer(false)/buffer(true).

I'd actually rather see the existing partial line uses like:

printk("doing test_func: ");
printk("%s\n", test_func(foo) ? "successful", "failed");

and

printk("step 1")
bar();
printk(" - step 2");
baz();
printk("completed\n");

changed to whole lines rather than adding a bunch of
printk_flush() sites or the printk_buffering(on|off)
blocks.

Anyway, I suggested it shouldn't be merged awhile ago.

https://lkml.org/lkml/2012/5/22/439

Making systemic printk changes is a slow business.

2012-06-27 05:53:06

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] printk: Revert the buffered-printk() changes for now


* Greg Kroah-Hartman <[email protected]> wrote:

> On Mon, Jun 25, 2012 at 10:48:54AM -0400, Steven Rostedt wrote:
> > On Mon, 2012-06-25 at 16:07 +0200, Ingo Molnar wrote:
> >
> > > System logging is an arguably secondary role, and it
> > > should not degrade printk()s primary role.
> >
> > I would argue that printk() should not play the role of
> > normal system logging. It's main role should be for boot up
> > and crashes. If devices need to log information to
> > userspace, it should really use some other means. What was
> > /sys made for anyway?
>
> Specifically not for logging. See the very old discussions of
> this a long time ago (back in the 2.5 days), if you are
> curious.
>
> printk() is the best thing we have for logging as everyone
> uses it and the information in it is exactly what userspace
> wants to know about. Because of that, why wouldn't we use it?

I agree with Greg's point there: the reality is that we have
over 50,000 printk() sites in the kernel, which is a heck of a
good source of system logging information, which we are not
going to change over to some other facility, even if we had some
marginal reasons to do it.

Just consider the life time of printk() call sites: in most
cases it gets added as a debugging printout, as a: "Hey, I just
finished reading Linux Device Drivers, 3rd Edition, and this
best ever Linux driver is now ALIVE!!" tag of success.

Then it gets extended with a few more printouts of unexpected
behavior: "So, if you got here the hardware must be buggy or you
must be doing something stupid, as the driver code sure as heck
is perfect".

Most of the printk()s get removed during productization, but
some actually make sense and remain (and some don't make sense
but just never trigger).

Very few people add printk()s as "inform the system logging
daemon about an event". The prevailing mindset is that perfect
code does not need any logging, so what is left are over 50,000
call sites of bragging and debugging code, occasionally massaged
to be somewhat user friendly.

System logging and tracing in particular can hook off this
mechanism, but we probably aren't going to change the social
role of printk()s overnight (or ever).

> Anyway, your "never buffer printk data" patch looks like the
> right solution here, I'm guessing you are going to respin it
> based on the feedback so far, right?

Acked-by: Ingo Molnar <[email protected]>

Thanks,

Ingo

2012-06-27 05:59:50

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH] printk: Revert the buffered-printk() changes for now

On Wed, 2012-06-27 at 07:52 +0200, Ingo Molnar wrote:
> what is left are over 50,000
> call sites of bragging and debugging code, occasionally massaged
> to be somewhat user friendly.

Bragging. Silly. A _very_ small percent of
printks are "look at me" messages so I think that's
a ridiculously poor argument.

Most of the printks are announcement and error
logging notifications.