2012-06-25 19:05:47

by Steven Rostedt

[permalink] [raw]
Subject: [PATCH v3] printk: Have printk() never buffer its data


Fengguang Wu had a config that caused the system to lockup. It reported:

[ 6.086395] type=2000 audit(1339501575.085:1): initialized
[ 6.116356] Kprobe smoke test started
[ 6.125915] Kprobe smoke test passed successfully
[ 6.127478] rcu-torture:--- Start of test: nreaders=2 nfakewriters=4 stat_interval=0 verbose=0 test_no_idle_hz=0 shuffle_interval=3 stutter=5 irqreader=1 fqs_duration=0
+fqs_holdoff=0 fqs_stutter=3 test_boost=1/0 test_boost_interval=7 test_boost_duration=4 shutdown_secs=0 onoff_interval=0 onoff_holdoff=0

and then froze. So naturally, the suspected bug was with rcu-torture.
Fengguang did a git bisect and discovered that the crash came with a
function trace update. He also noticed that if he reverted that update,
the system got farther and showed:

[ 1.611901] Testing tracer function: PASSED

His time was wasted by the fact that the function tracing self test
first prints:

"Testing tracer function: "

then runs the test, and if it succeeds, it prints "PASSED", giving us
the nice output you see above.

But with the new printk() changes, text without a newline gets buffered
and does not print out to the console at the location of the printk.
This caused the "Testing tracer function: " not to print out and because
the test caused the kernel to lock up, we are clueless to the fact that
the problem was with the function tracer test and not the rcu_torture
test. As it made sense that the rcu_torture test could lock up the
system, many kernel developer hours were wasted chasing the wrong wild
goose.

If the "Testing tracer function: " had printed out in the first place,
we would have caught the correct wild goose and saved precious kernel
developer's time.

This brings back the old way of printk() that always prints to the console
and does not buffer the data. As printk_emit() is used by other 'facilities'
this only affects printk(), but keeps pretty much the new behavior.

The printk() state of the last print is kept around to know how to print
the new lines. If the printk does not have a new line, the state is
stored in msg_print_text() and if a prefix is to be printed next, it
forces a newline. This keeps the new behavior when multiple partial prints are
happening across CPUs. Each one will have its own line (like it does today).

I hooked onto the 'msg->level' infrastructure, which already is used for
both the facility and level. I reserved the top 4 bits as flags for printk_emit()
to pass information to the msg_print_text() function which is called at a later
time via console_unlock(). The facility is limited to 0x1ff which should be
more than enough.

I tested this out, and now the lockup shows:

[ 9.018231] Kprobe smoke test passed successfully
[ 9.023084] rcu-torture:--- Start of test: nreaders=4 nfakewriters=4 stat_interval=0 verbose=0 test_no_idle_hz=0 shuffle_interval=3 stutter=5 irqreader=1 fqs_duration=0 fqs_hold
off=0 fqs_stutter=3 test_boost=1/0 test_boost_interval=7 test_boost_duration=4 shutdown_secs=0 onoff_interval=0 onoff_holdoff=0
[ 9.066065] Testing tracer function:

To prevent the flush to cause the next printk to have a timestamp that
would produce the following:

[ 6.834073] Testing tracer function: [ 7.136194] PASSED

I made the KERN_CONT ("<c>") use the facility as well to pass info to
not print the timestamp. This fixes the abve issue to print:

[ 1.291025] Testing tracer function: PASSED

When multiple SMP calls take place, this happens:

[ 1.310024] Testing tracer function: PASSED
[ 1.648504] Testing dynamic ftrace: PASSED
[ 2.099752] Testing dynamic ftrace ops #1: (1 0 1 1 0) (1 1 2 1 0)
[ 2.247911] Refined TSC clocksource calibration: 2659.999 MHz.
[ 2.254229] Switching to clocksource tsc
(2 1 3 1 22) (2 2 4 1 4044) PASSED
[ 2.572013] Testing dynamic ftrace ops #2: (1 0 1 11 0) (1 1 2 4623 0) (2 1 3 10096 170) (2 2 4 16097 6162) PASSED
[ 3.106015] Testing tracer irqsoff: PASSED

Link: http://lkml.kernel.org/r/[email protected]

Cc: Linus Torvalds <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "kay.sievers" <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Fengguang Wu <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Joe Perches <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
---
kernel/printk.c | 117 ++++++++++++++++++++++++++++++++++++++++++------------
1 files changed, 91 insertions(+), 26 deletions(-)

diff --git a/kernel/printk.c b/kernel/printk.c
index 32462d2..e4da1a0 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -140,9 +140,9 @@ static int console_may_schedule;
* indicates a wrap-around to the beginning of the buffer.
*
* Every record carries the monotonic timestamp in microseconds, as well as
- * the standard userspace syslog level and syslog facility. The usual
- * kernel messages use LOG_KERN; userspace-injected messages always carry
- * a matching syslog facility, by default LOG_USER. The origin of every
+ * the standard userspace syslog level, kernel flags and syslog facility.
+ * The usual kernel messages use LOG_KERN; userspace-injected messages always
+ * carry a matching syslog facility, by default LOG_USER. The origin of every
* message can be reliably determined that way.
*
* The human readable log message directly follows the message header. The
@@ -198,9 +198,21 @@ struct log {
u16 len; /* length of entire record */
u16 text_len; /* length of text buffer */
u16 dict_len; /* length of dictionary buffer */
- u16 level; /* syslog level + facility */
+ u16 level; /* syslog level + facility + flags */
};

+#define FACILITY_MAX 0x1ff
+#define FACILITY_SHIFT 3
+#define FLAGS_SHIFT 12
+
+#define MSG_FACILITY(msg) (((msg) >> FACILITY_SHIFT) & FACILITY_MAX)
+#define MSG_FLAGS(msg) ((msg) >> FLAGS_SHIFT)
+#define MSG_LEVEL(msg) ((msg) & 7)
+
+#define LOG_NONL_NOPREFIX 3
+#define LOG_NONL 2
+#define LOG_NOPREFIX 1
+
/*
* The logbuf_lock protects kmsg buffer, indices, counters. It is also
* used in interesting ways to provide interlocking in console_unlock();
@@ -285,7 +297,7 @@ static u32 log_next(u32 idx)
}

/* insert record into the buffer, discard old ones, update heads */
-static void log_store(int facility, int level,
+static void log_store(int facility, int level, int flags,
const char *dict, u16 dict_len,
const char *text, u16 text_len)
{
@@ -329,7 +341,8 @@ static void log_store(int facility, int level,
msg->text_len = text_len;
memcpy(log_dict(msg), dict, dict_len);
msg->dict_len = dict_len;
- msg->level = (facility << 3) | (level & 7);
+ msg->level = (flags << FLAGS_SHIFT) |
+ ((facility & FACILITY_MAX) << FACILITY_SHIFT) | (level & 7);
msg->ts_nsec = local_clock();
memset(log_dict(msg) + dict_len, 0, pad_len);
msg->len = sizeof(struct log) + text_len + dict_len + pad_len;
@@ -385,9 +398,9 @@ static ssize_t devkmsg_writev(struct kiocb *iocb, const struct iovec *iv,

i = simple_strtoul(line+1, &endp, 10);
if (endp && endp[0] == '>') {
- level = i & 7;
- if (i >> 3)
- facility = i >> 3;
+ level = MSG_LEVEL(i);
+ if (MSG_FACILITY(i))
+ facility = MSG_FACILITY(i);
endp++;
len -= endp - line;
line = endp;
@@ -819,6 +832,7 @@ static size_t print_prefix(const struct log *msg, bool syslog, char *buf)
static size_t msg_print_text(const struct log *msg, bool syslog,
char *buf, size_t size)
{
+ static bool last_newline = true;
const char *text = log_text(msg);
size_t text_size = msg->text_len;
size_t len = 0;
@@ -836,14 +850,35 @@ static size_t msg_print_text(const struct log *msg, bool syslog,
}

if (buf) {
- if (print_prefix(msg, syslog, NULL) +
- text_len + 1>= size - len)
- break;
+ bool newline = true;
+ bool prefix = true;
+ int flags = MSG_FLAGS(msg->level);
+
+ if (flags & LOG_NOPREFIX)
+ prefix = false;
+ if (flags & LOG_NONL)
+ newline = false;
+
+ if (prefix) {
+ /*
+ * Force newline, for last line if this line
+ * is printing out a prefix.
+ */
+ if (!last_newline)
+ buf[len++] = '\n';
+
+ if (print_prefix(msg, syslog, NULL) +
+ text_len + 1 >= size - len)
+ break;
+
+ len += print_prefix(msg, syslog, buf + len);
+ }

- len += print_prefix(msg, syslog, buf + len);
memcpy(buf + len, text, text_len);
len += text_len;
- buf[len++] = '\n';
+ if (newline)
+ buf[len++] = '\n';
+ last_newline = newline;
} else {
/* SYSLOG_ACTION_* buffer size only calculation */
len += print_prefix(msg, syslog, NULL);
@@ -1267,6 +1302,7 @@ asmlinkage int vprintk_emit(int facility, int level,
static char cont_buf[LOG_LINE_MAX];
static size_t cont_len;
static int cont_level;
+ static bool cont_prefix;
static struct task_struct *cont_task;
static char textbuf[LOG_LINE_MAX];
char *text = textbuf;
@@ -1275,8 +1311,12 @@ asmlinkage int vprintk_emit(int facility, int level,
int this_cpu;
bool newline = false;
bool prefix = false;
+ bool flush;
int printed_len = 0;

+ /* output from printk() always flush to console (no line buffering) */
+ flush = (facility == 0);
+
boot_delay_msec();
printk_delay();

@@ -1313,7 +1353,7 @@ asmlinkage int vprintk_emit(int facility, int level,
recursion_bug = 0;
printed_len += strlen(recursion_msg);
/* emit KERN_CRIT message */
- log_store(0, 2, NULL, 0, recursion_msg, printed_len);
+ log_store(0, 2, 0, NULL, 0, recursion_msg, printed_len);
}

/*
@@ -1350,26 +1390,47 @@ asmlinkage int vprintk_emit(int facility, int level,
newline = true;
}

+ /* Force prefix for new task */
+ if (cont_task != current)
+ prefix = true;
+
if (!newline) {
- if (cont_len && (prefix || cont_task != current)) {
+
+ if (cont_len && prefix) {
/*
* Flush earlier buffer, which is either from a
* different thread, or when we got a new prefix.
*/
- log_store(facility, cont_level, NULL, 0, cont_buf, cont_len);
+ log_store(facility, cont_level, 0, NULL, 0, cont_buf, cont_len);
cont_len = 0;
}

if (!cont_len) {
cont_level = level;
cont_task = current;
+ cont_prefix = prefix;
}

- /* buffer or append to earlier buffer from the same thread */
- if (cont_len + text_len > sizeof(cont_buf))
- text_len = sizeof(cont_buf) - cont_len;
- memcpy(cont_buf + cont_len, text, text_len);
- cont_len += text_len;
+ /* For printk() callers, always output to the console */
+ if (flush) {
+ int flag = LOG_NONL;
+
+ /*
+ * If the buffered string was KERN_CONT,
+ * do not print prefix.
+ */
+ if (!prefix)
+ flag = LOG_NONL_NOPREFIX;
+
+ log_store(0, cont_level, flag, NULL, 0, text, text_len);
+ } else {
+
+ /* buffer or append to earlier buffer from the same thread */
+ if (cont_len + text_len > sizeof(cont_buf))
+ text_len = sizeof(cont_buf) - cont_len;
+ memcpy(cont_buf + cont_len, text, text_len);
+ cont_len += text_len;
+ }
} else {
if (cont_len && cont_task == current) {
if (prefix) {
@@ -1378,7 +1439,7 @@ asmlinkage int vprintk_emit(int facility, int level,
* either got no earlier newline, or we race
* with an interrupt.
*/
- log_store(facility, cont_level,
+ log_store(facility, cont_level, 0,
NULL, 0, cont_buf, cont_len);
cont_len = 0;
}
@@ -1388,14 +1449,18 @@ asmlinkage int vprintk_emit(int facility, int level,
text_len = sizeof(cont_buf) - cont_len;
memcpy(cont_buf + cont_len, text, text_len);
cont_len += text_len;
- log_store(facility, cont_level,
+ log_store(facility, cont_level, 0,
NULL, 0, cont_buf, cont_len);
cont_len = 0;
cont_task = NULL;
printed_len = cont_len;
} else {
+ int flag = 0;
+
/* ordinary single and terminated line */
- log_store(facility, level,
+ if (!prefix && !facility && cont_task == current)
+ flag = LOG_NOPREFIX;
+ log_store(facility, level, flag,
dict, dictlen, text, text_len);
printed_len = text_len;
}
@@ -1817,7 +1882,7 @@ again:
break;

msg = log_from_idx(console_idx);
- level = msg->level & 7;
+ level = MSG_LEVEL(msg->level);

len = msg_print_text(msg, false, text, sizeof(text));

--
1.7.3.4



2012-06-25 22:07:25

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v3] printk: Have printk() never buffer its data

On Mon, 25 Jun 2012 15:05:42 -0400
Steven Rostedt <[email protected]> wrote:

>
> This brings back the old way of printk() that always prints to the console
> and does not buffer the data. As printk_emit() is used by other 'facilities'
> this only affects printk(), but keeps pretty much the new behavior.

If you say so. The core printk code is starting to make one think of
things like "cleansing with fire".

Why did that logging code need to futz with this printk behaviour in
the first place?

I don't think the thing which you're fixing here was a real
showstopper. If a non-newline-terminated printk gets delayed, then so
be it - we identify and fix all callers and remember the new rule and
move on, although I still don't know why this change was deemed
necessary. If that results in a cleaner, clearer and more reliable
core printk() then we win.

And we do need a cleaner, clearer and more reliable core printk() :(
wtf have you guys been up to??

<wonders why vprintk_emit() is asmlinkage, global, exported to modules
and has no callers outside printk.c>

> /* This stops the holder of console_sem just where we want him */

This poor comment is now stranded half a page away from the code it's
documenting. Please check that it's still true and then relocate it?

>
> ...
>

must.. stop.. thinking.. about.. fire..

2012-06-25 23:55:37

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v3] printk: Have printk() never buffer its data

On Mon, Jun 25, 2012 at 03:07:22PM -0700, Andrew Morton wrote:
> On Mon, 25 Jun 2012 15:05:42 -0400
> Steven Rostedt <[email protected]> wrote:
>
> >
> > This brings back the old way of printk() that always prints to the console
> > and does not buffer the data. As printk_emit() is used by other 'facilities'
> > this only affects printk(), but keeps pretty much the new behavior.
>
> If you say so. The core printk code is starting to make one think of
> things like "cleansing with fire".
>
> Why did that logging code need to futz with this printk behaviour in
> the first place?

Because, as Kay, and I think Linus, showed a while ago, the behavior
really wasn't all that good and it wasn't doing what people thought it
was doing, especially when printks from multiple cpus happened at the
same time. There were other cases that had problems as well, the long
thread a few months ago detailed it all, which drove the creation of
these changes.

> I don't think the thing which you're fixing here was a real
> showstopper. If a non-newline-terminated printk gets delayed, then so
> be it - we identify and fix all callers and remember the new rule and
> move on, although I still don't know why this change was deemed
> necessary. If that results in a cleaner, clearer and more reliable
> core printk() then we win.

I'm beginning to agree here as well.

Actually, didn't I say that at the beginning of this? :)

Stephen and Ingo, I understand that your tests now would require
multiple printk() lines, but this affects what, 10 boxes in the world
that run these tests (I'm not trying to be mean, just understand the
issues). The fixes that now are in place fix problems for many more
systems, and provide the infrastructure for proper logging that people
have been screaming at us for over 10 years to accomplish.

I'll be glad to make the patch to fix up these test printks, (remember,
as a bonus, you now get a timestamp showing exactly how long your tests
take, which you didn't get before, how can you pass up an offer like
that?) I think that's the best solution here, and easiest for everyone
involved (your tests then properly show what failed, you get bonus
timestamps, multi-cpu printks happen properly, proper logging messages
get sent to userspace, everone wins.)

> And we do need a cleaner, clearer and more reliable core printk() :(
> wtf have you guys been up to??

See, Andrew even agrees with me :)

thanks,

greg k-h

2012-06-26 00:01:33

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v3] printk: Have printk() never buffer its data

On Mon, Jun 25, 2012 at 4:55 PM, Greg Kroah-Hartman
<[email protected]> wrote:
>
> Stephen and Ingo, I understand that your tests now would require
> multiple printk() lines, but this affects what, 10 boxes in the world
> that run these tests (I'm not trying to be mean, just understand the
> issues). ?The fixes that now are in place fix problems for many more
> systems, and provide the infrastructure for proper logging that people
> have been screaming at us for over 10 years to accomplish.

I disagree violently.

I think we absolutely should apply Steven's patch.

Why? Because the buffering does not help *anything*, and it's
surprising, and it breaks one of our main debugging tools. There's no
upside to it.

The fact that we found *one* case where it broke within days of it
being introduced is not the issue. Fixing that one case is irrelevant.
It's the unknown number of other cases that did similar thngs that
matter.

If there are other places that print out partial lines, they may have
this problem too. Don't buffer.

And if there are *not* other places that print out partial lines, then
buffering doesn't help. Don't buffer.

Notice? Buffering partial lines is never *ever* the right thing to do
for something like printk.

If you want to merge the partial lines, do it at the *logging* stage,
not at the printout stage. Nobody cares if you buffer the stuff that
actually makes it to "dmesg". But buffering the stuff before it makes
it to the screen is just wrong.

Linus

2012-06-26 00:18:36

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH v3] printk: Have printk() never buffer its data

On Mon, 2012-06-25 at 15:07 -0700, Andrew Morton wrote:
> The core printk code is starting to make one think of
> things like "cleansing with fire".

I've been waiting to have the code settle a bit before
restructuring/refactoring it.

I've got a tree with printk.c moved to kernel/printk/ and
broken up into what I think are more sensible bits.

Maybe I'll submit it in awhile. I think it's probably
better to wait until after 3.6-rc though.

> must.. stop.. thinking.. about.. fire..

Please remember to aim before firing.

cheers, Joe

2012-06-26 00:23:14

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v3] printk: Have printk() never buffer its data

On Mon, Jun 25, 2012 at 05:01:11PM -0700, Linus Torvalds wrote:
> On Mon, Jun 25, 2012 at 4:55 PM, Greg Kroah-Hartman
> <[email protected]> wrote:
> >
> > Stephen and Ingo, I understand that your tests now would require
> > multiple printk() lines, but this affects what, 10 boxes in the world
> > that run these tests (I'm not trying to be mean, just understand the
> > issues). ?The fixes that now are in place fix problems for many more
> > systems, and provide the infrastructure for proper logging that people
> > have been screaming at us for over 10 years to accomplish.
>
> I disagree violently.
>
> I think we absolutely should apply Steven's patch.
>
> Why? Because the buffering does not help *anything*, and it's
> surprising, and it breaks one of our main debugging tools. There's no
> upside to it.

Ok, but I thought you wanted the "properly handle continuations" that we
now have in the kernel. I must be mistaken.

> The fact that we found *one* case where it broke within days of it
> being introduced is not the issue. Fixing that one case is irrelevant.
> It's the unknown number of other cases that did similar thngs that
> matter.
>
> If there are other places that print out partial lines, they may have
> this problem too. Don't buffer.
>
> And if there are *not* other places that print out partial lines, then
> buffering doesn't help. Don't buffer.
>
> Notice? Buffering partial lines is never *ever* the right thing to do
> for something like printk.
>
> If you want to merge the partial lines, do it at the *logging* stage,
> not at the printout stage. Nobody cares if you buffer the stuff that
> actually makes it to "dmesg". But buffering the stuff before it makes
> it to the screen is just wrong.

Ok, Kay, does Stephen's patch work for you as well? I'll go boot a box
with it and look at the output, but that will take 30 minutes or so...

greg k-h

2012-06-26 00:41:14

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v3] printk: Have printk() never buffer its data

On Mon, Jun 25, 2012 at 5:23 PM, Greg Kroah-Hartman
<[email protected]> wrote:
>
> Ok, but I thought you wanted the "properly handle continuations" that we
> now have in the kernel. ?I must be mistaken.

We had that before too. It has nothing to do with merging the data,
and the problem was just an outright *bug* in the initial record-based
implementation. Nothing to do with buffering.

Line continuations should happen if the previous printk didn't end
with a '\n', and if the next one doesn't have a level. That was always
the rule. It's a simple rule, and it works.

Now, we can (and Key did) tweak the rule a little bit for cases that
never actually happen in practice.

So sure, add the "we only continue on the same line if we're the same
process as the last process that did the printk" rule to the above
rule. In practice it really doesn't matter, since partial lines are
rare to begin with. And taking interrupt depth into account is another
nice tweak. But it's a tweak, it's not important. Nobody really cares,
but you can try to do a bit better.

And none of the above has to do with *buffering*. The above rules are
100% the same whether you buffer or not.

Sure, with buffering, you can handle the insane cases and try to get
interspersed partial lines to do the right thing. And Kay did that.
And it turns out to (a) never matter and (b) cause problems for
debugging. So just saying "stop doing it" is the no-brainer solution.
Don't buffer printouts to the display, because bad things may happen
before the buffer is flushed.

Linus

2012-06-26 00:56:40

by Kay Sievers

[permalink] [raw]
Subject: Re: [PATCH v3] printk: Have printk() never buffer its data

On Tue, Jun 26, 2012 at 2:40 AM, Linus Torvalds
<[email protected]> wrote:
> On Mon, Jun 25, 2012 at 5:23 PM, Greg Kroah-Hartman
> <[email protected]> wrote:
>>
>> Ok, but I thought you wanted the "properly handle continuations" that we
>> now have in the kernel.  I must be mistaken.
>
> We had that before too. It has nothing to do with merging the data,
> and the problem was just an outright *bug* in the initial record-based
> implementation. Nothing to do with buffering.
>
> Line continuations should happen if the previous printk didn't end
> with a '\n', and if the next one doesn't have a level. That was always
> the rule. It's a simple rule, and it works.
>
> Now, we can (and Key did) tweak the rule a little bit for cases that
> never actually happen in practice.
>
> So sure, add the "we only continue on the same line if we're the same
> process as the last process that did the printk" rule to the above
> rule. In practice it really doesn't matter, since partial lines are
> rare to begin with. And taking interrupt depth into account is another
> nice tweak. But it's a tweak, it's not important. Nobody really cares,
> but you can try to do a bit better.
>
> And none of the above has to do with *buffering*. The above rules are
> 100% the same whether you buffer or not.
>
> Sure, with buffering, you can handle the insane cases and try to get
> interspersed partial lines to do the right thing. And Kay did that.
> And it turns out to (a) never matter and (b) cause problems for
> debugging. So just saying "stop doing it" is the no-brainer solution.
> Don't buffer printouts to the display, because bad things may happen
> before the buffer is flushed.

Buffering has nice effects though:
It makes continuation lines appear as one record in the buffer, not as
n individual prints with n headers.

It fully isolates the race-sensitive continuation print users from the
normal terminated print users. With buffering, races against each
other are a lot more unlikely to happen, because the normal terminated
line prints never affect the sensitive continuation users, and we
would need two racy continuation users to race against each other. And
we did see the interleaving all the time with the old printk.

Here is what our test module has produced. And that could only be
achieved by doing buffering:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=c313af145b9bc4fb8e8e0c83b8cfc10e1b894a50

I'm not arguing anymore, but I have to defend that buffering produces
far better results than without. Granted, besides for continuation
users that crash the kernel and want to see the partial line they
printed.

Kay

2012-06-26 01:41:10

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v3] printk: Have printk() never buffer its data

On Mon, Jun 25, 2012 at 5:56 PM, Kay Sievers <[email protected]> wrote:
>
> Buffering has nice effects though:
> It makes continuation lines appear as one record in the buffer, not as
> n individual prints with n headers.

As I already mentioned, buffering for *logging* is different from
buffering for *printing*.

I think it might be a great idea to buffer for logging in order to
generate one individual buffer record there.

But it needs to be printed as it is generated.

The printing and the recording really should be thought of as separate
issues. You already record things that aren't printed, so they already
aren't 1:1 relationships.

Linus

2012-06-26 16:07:50

by Kay Sievers

[permalink] [raw]
Subject: Re: [PATCH v3] printk: Have printk() never buffer its data

On Mon, 2012-06-25 at 18:40 -0700, Linus Torvalds wrote:
> On Mon, Jun 25, 2012 at 5:56 PM, Kay Sievers <[email protected]> wrote:
> >
> > Buffering has nice effects though:
> > It makes continuation lines appear as one record in the buffer, not as
> > n individual prints with n headers.
>
> As I already mentioned, buffering for *logging* is different from
> buffering for *printing*.
>
> I think it might be a great idea to buffer for logging in order to
> generate one individual buffer record there.
>
> But it needs to be printed as it is generated.

That's a good idea.

Something like this could work - only minimally tested at this moment.

Thanks,
Kay


From: Kay Sievers <[email protected]>
Subject: printk: flush continuation lines immediately to console

Continuation lines are buffered internally to merge the chunked
printk()s into a single record, and to isolate potentially racy
continuation users from usual terminated line users.

This though, has the effect that partial lines are not printed to
the console in the moment they are emitted. In case the kernel
crashes in the meantime, the potentially interesting printed
information will never reach the consoles.

Here we share the continuation buffer with the console copy logic,
and partial lines are immediately flushed to the consoles, but still
buffered internally to improve the integrity of the messages.
---

kernel/printk.c | 129 +++++++++++++++++++++++++++++++++++++++++++------------
1 file changed, 101 insertions(+), 28 deletions(-)

diff --git a/kernel/printk.c b/kernel/printk.c
index a2276b9..92c34bd1 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -193,12 +193,19 @@ static int console_may_schedule;
* separated by ',', and find the message after the ';' character.
*/

+enum log_flags {
+ LOG_DEFAULT = 0,
+ LOG_NOCONS = 1, /* already flushed, do not print to console */
+};
+
struct log {
u64 ts_nsec; /* timestamp in nanoseconds */
u16 len; /* length of entire record */
u16 text_len; /* length of text buffer */
u16 dict_len; /* length of dictionary buffer */
- u16 level; /* syslog level + facility */
+ u8 facility; /* syslog facility */
+ u8 flags:5; /* internal record flags */
+ u8 level:3; /* syslog level */
};

/*
@@ -286,6 +293,7 @@ static u32 log_next(u32 idx)

/* insert record into the buffer, discard old ones, update heads */
static void log_store(int facility, int level,
+ enum log_flags flags, u64 ts_nsec,
const char *dict, u16 dict_len,
const char *text, u16 text_len)
{
@@ -329,8 +337,13 @@ static void log_store(int facility, int level,
msg->text_len = text_len;
memcpy(log_dict(msg), dict, dict_len);
msg->dict_len = dict_len;
- msg->level = (facility << 3) | (level & 7);
- msg->ts_nsec = local_clock();
+ msg->facility = facility;
+ msg->level = level & 7;
+ msg->flags = flags & 0x1f;
+ if (ts_nsec > 0)
+ msg->ts_nsec = ts_nsec;
+ else
+ msg->ts_nsec = local_clock();
memset(log_dict(msg) + dict_len, 0, pad_len);
msg->len = sizeof(struct log) + text_len + dict_len + pad_len;

@@ -446,7 +459,7 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
ts_usec = msg->ts_nsec;
do_div(ts_usec, 1000);
len = sprintf(user->buf, "%u,%llu,%llu;",
- msg->level, user->seq, ts_usec);
+ (msg->facility << 3) | msg->level, user->seq, ts_usec);

/* escape non-printable characters */
for (i = 0; i < msg->text_len; i++) {
@@ -787,6 +800,21 @@ static bool printk_time;
#endif
module_param_named(time, printk_time, bool, S_IRUGO | S_IWUSR);

+static size_t print_time(u64 ts, char *buf)
+{
+ unsigned long rem_nsec;
+
+ if (!printk_time)
+ return 0;
+
+ if (!buf)
+ return 15;
+
+ rem_nsec = do_div(ts, 1000000000);
+ return sprintf(buf, "[%5lu.%06lu] ",
+ (unsigned long)ts, rem_nsec / 1000);
+}
+
static size_t print_prefix(const struct log *msg, bool syslog, char *buf)
{
size_t len = 0;
@@ -803,18 +831,7 @@ static size_t print_prefix(const struct log *msg, bool syslog, char *buf)
}
}

- if (printk_time) {
- if (buf) {
- unsigned long long ts = msg->ts_nsec;
- unsigned long rem_nsec = do_div(ts, 1000000000);
-
- len += sprintf(buf + len, "[%5lu.%06lu] ",
- (unsigned long) ts, rem_nsec / 1000);
- } else {
- len += 15;
- }
- }
-
+ len += print_time(msg->ts_nsec, buf ? buf + len : NULL);
return len;
}

@@ -1272,15 +1289,25 @@ static inline void printk_delay(void)
}
}

+/*
+ * Continuation lines are buffered, and not committed to the record buffer
+ * until the line is complete, or a race forces a flush. The line fragments
+ * though, are printed immediately to the consoles to ensure everything has
+ * reached the console in case of a kernel crash.
+ */
+static char cont_buf[LOG_LINE_MAX];
+static size_t cont_len;
+static size_t cont_cons;
+static size_t cont_cons_len;
+static int cont_level;
+static u64 cont_ts_nsec;
+static struct task_struct *cont_task;
+
asmlinkage int vprintk_emit(int facility, int level,
const char *dict, size_t dictlen,
const char *fmt, va_list args)
{
static int recursion_bug;
- static char cont_buf[LOG_LINE_MAX];
- static size_t cont_len;
- static int cont_level;
- static struct task_struct *cont_task;
static char textbuf[LOG_LINE_MAX];
char *text = textbuf;
size_t text_len;
@@ -1326,7 +1353,8 @@ asmlinkage int vprintk_emit(int facility, int level,
recursion_bug = 0;
printed_len += strlen(recursion_msg);
/* emit KERN_CRIT message */
- log_store(0, 2, NULL, 0, recursion_msg, printed_len);
+ log_store(0, 2, LOG_DEFAULT, 0,
+ NULL, 0, recursion_msg, printed_len);
}

/*
@@ -1369,13 +1397,17 @@ asmlinkage int vprintk_emit(int facility, int level,
* Flush earlier buffer, which is either from a
* different thread, or when we got a new prefix.
*/
- log_store(facility, cont_level, NULL, 0, cont_buf, cont_len);
+ log_store(facility, cont_level,
+ LOG_NOCONS, cont_ts_nsec,
+ NULL, 0, cont_buf, cont_len);
cont_len = 0;
}

if (!cont_len) {
+ cont_cons = 0;
cont_level = level;
cont_task = current;
+ cont_ts_nsec = local_clock();
}

/* buffer or append to earlier buffer from the same thread */
@@ -1383,6 +1415,8 @@ asmlinkage int vprintk_emit(int facility, int level,
text_len = sizeof(cont_buf) - cont_len;
memcpy(cont_buf + cont_len, text, text_len);
cont_len += text_len;
+ cont_cons_len = cont_len;
+ printed_len = text_len;
} else {
if (cont_len && cont_task == current) {
if (prefix) {
@@ -1392,6 +1426,7 @@ asmlinkage int vprintk_emit(int facility, int level,
* with an interrupt.
*/
log_store(facility, cont_level,
+ LOG_NOCONS, cont_ts_nsec,
NULL, 0, cont_buf, cont_len);
cont_len = 0;
}
@@ -1401,14 +1436,15 @@ asmlinkage int vprintk_emit(int facility, int level,
text_len = sizeof(cont_buf) - cont_len;
memcpy(cont_buf + cont_len, text, text_len);
cont_len += text_len;
+ cont_cons_len = cont_len;
log_store(facility, cont_level,
+ LOG_NOCONS, cont_ts_nsec,
NULL, 0, cont_buf, cont_len);
cont_len = 0;
- cont_task = NULL;
- printed_len = cont_len;
+ printed_len = text_len;
} else {
/* ordinary single and terminated line */
- log_store(facility, level,
+ log_store(facility, level, LOG_DEFAULT, 0,
dict, dictlen, text, text_len);
printed_len = text_len;
}
@@ -1795,6 +1831,7 @@ static u32 console_idx;
*/
void console_unlock(void)
{
+ static char text[LOG_LINE_MAX];
static u64 seen_seq;
unsigned long flags;
bool wake_klogd = false;
@@ -1807,10 +1844,37 @@ void console_unlock(void)

console_may_schedule = 0;

+ /* flush buffered message fragment immediately to console */
+ raw_spin_lock_irqsave(&logbuf_lock, flags);
+ if (cont_cons_len && (cont_cons < cont_cons_len || !cont_len)) {
+ size_t textlen = 0;
+ size_t len;
+
+ len = cont_cons_len - cont_cons;
+ if (!cont_cons)
+ textlen = print_time(cont_ts_nsec, text);
+ memcpy(text + textlen, cont_buf + cont_cons, len);
+ textlen += len;
+ cont_cons = cont_cons_len;
+
+ if (!cont_len) {
+ /* last chunk of line; terminate */
+ text[textlen++] = '\n';
+ cont_cons_len = 0;
+ }
+ raw_spin_unlock(&logbuf_lock);
+
+ stop_critical_timings();
+ call_console_drivers(cont_level, text, textlen);
+ start_critical_timings();
+
+ local_irq_restore(flags);
+ } else
+ raw_spin_unlock_irqrestore(&logbuf_lock, flags);
+
again:
for (;;) {
struct log *msg;
- static char text[LOG_LINE_MAX];
size_t len;
int level;

@@ -1825,13 +1889,22 @@ again:
console_seq = log_first_seq;
console_idx = log_first_idx;
}
-
+skip:
if (console_seq == log_next_seq)
break;

msg = log_from_idx(console_idx);
- level = msg->level & 7;
+ if (msg->flags & LOG_NOCONS) {
+ /*
+ * Skip record we have buffered and already printed
+ * directly to the console when we received it.
+ */
+ console_idx = log_next(console_idx);
+ console_seq++;
+ goto skip;
+ }

+ level = msg->level;
len = msg_print_text(msg, false, text, sizeof(text));

console_idx = log_next(console_idx);

2012-06-26 16:30:57

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH v3] printk: Have printk() never buffer its data

On Tue, 2012-06-26 at 18:07 +0200, Kay Sievers wrote:
[]

Just a couple trivial notes:

> diff --git a/kernel/printk.c b/kernel/printk.c
[]
> @@ -1795,6 +1831,7 @@ static u32 console_idx;
> */
> void console_unlock(void)
> {
> + static char text[LOG_LINE_MAX];

Is this static to avoid excessive stack?
Might be useful to note that.

> static u64 seen_seq;
> unsigned long flags;
> bool wake_klogd = false;
> @@ -1807,10 +1844,37 @@ void console_unlock(void)
>
> console_may_schedule = 0;
>
> + /* flush buffered message fragment immediately to console */
> + raw_spin_lock_irqsave(&logbuf_lock, flags);
> + if (cont_cons_len && (cont_cons < cont_cons_len || !cont_len)) {
> + size_t textlen = 0;
> + size_t len;
> +
> + len = cont_cons_len - cont_cons;
> + if (!cont_cons)
> + textlen = print_time(cont_ts_nsec, text);
> + memcpy(text + textlen, cont_buf + cont_cons, len);

This could overrun the 512 bytes of text yes?

2012-06-26 16:59:02

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v3] printk: Have printk() never buffer its data

On Tue, Jun 26, 2012 at 06:07:36PM +0200, Kay Sievers wrote:
> On Mon, 2012-06-25 at 18:40 -0700, Linus Torvalds wrote:
> > On Mon, Jun 25, 2012 at 5:56 PM, Kay Sievers <[email protected]> wrote:
> > >
> > > Buffering has nice effects though:
> > > It makes continuation lines appear as one record in the buffer, not as
> > > n individual prints with n headers.
> >
> > As I already mentioned, buffering for *logging* is different from
> > buffering for *printing*.
> >
> > I think it might be a great idea to buffer for logging in order to
> > generate one individual buffer record there.
> >
> > But it needs to be printed as it is generated.
>
> That's a good idea.
>
> Something like this could work - only minimally tested at this moment.

Hm, this doesn't boot for me, just hangs at startup :(

2012-06-26 17:01:00

by Kay Sievers

[permalink] [raw]
Subject: Re: [PATCH v3] printk: Have printk() never buffer its data

On Tue, Jun 26, 2012 at 6:58 PM, Greg Kroah-Hartman
<[email protected]> wrote:
> On Tue, Jun 26, 2012 at 06:07:36PM +0200, Kay Sievers wrote:
>> On Mon, 2012-06-25 at 18:40 -0700, Linus Torvalds wrote:
>> > On Mon, Jun 25, 2012 at 5:56 PM, Kay Sievers <[email protected]> wrote:
>> > >
>> > > Buffering has nice effects though:
>> > > It makes continuation lines appear as one record in the buffer, not as
>> > > n individual prints with n headers.
>> >
>> > As I already mentioned, buffering for *logging* is different from
>> > buffering for *printing*.
>> >
>> > I think it might be a great idea to buffer for logging in order to
>> > generate one individual buffer record there.
>> >
>> > But it needs to be printed as it is generated.
>>
>> That's a good idea.
>>
>> Something like this could work - only minimally tested at this moment.
>
> Hm, this doesn't boot for me, just hangs at startup :(

You have that patch against your tree? Let me try that, maybe it
conflicts with one of the patches there ...

Kay

2012-06-26 17:02:38

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v3] printk: Have printk() never buffer its data

On Tue, Jun 26, 2012 at 07:00:38PM +0200, Kay Sievers wrote:
> On Tue, Jun 26, 2012 at 6:58 PM, Greg Kroah-Hartman
> <[email protected]> wrote:
> > On Tue, Jun 26, 2012 at 06:07:36PM +0200, Kay Sievers wrote:
> >> On Mon, 2012-06-25 at 18:40 -0700, Linus Torvalds wrote:
> >> > On Mon, Jun 25, 2012 at 5:56 PM, Kay Sievers <[email protected]> wrote:
> >> > >
> >> > > Buffering has nice effects though:
> >> > > It makes continuation lines appear as one record in the buffer, not as
> >> > > n individual prints with n headers.
> >> >
> >> > As I already mentioned, buffering for *logging* is different from
> >> > buffering for *printing*.
> >> >
> >> > I think it might be a great idea to buffer for logging in order to
> >> > generate one individual buffer record there.
> >> >
> >> > But it needs to be printed as it is generated.
> >>
> >> That's a good idea.
> >>
> >> Something like this could work - only minimally tested at this moment.
> >
> > Hm, this doesn't boot for me, just hangs at startup :(
>
> You have that patch against your tree? Let me try that, maybe it
> conflicts with one of the patches there ...

No, I applied to to a clean 3.5-rc4 (which boots fine on my laptop).

greg k-h

2012-06-26 18:34:17

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v3] printk: Have printk() never buffer its data

On Tue, Jun 26, 2012 at 07:00:38PM +0200, Kay Sievers wrote:
> > Hm, this doesn't boot for me, just hangs at startup :(

Booting in a qemu image I get the following, it just hangs here:

[ 2.683675] BIOS EDD facility v0.16 2004-Jun-25, 1 devices found
Started File System Check on Root Device [ OK ]
Starting LSB: setup hostname and yp...
[ 2.732973] systemd-fsck[108]: root: clean, 244481/1313280 files, 2189115/5242880 blocks
Starting Remount Root FS...
systemd-fsck[107]: root: clean, 244481/1313280 files, 2189115/5242880 blocks
[ 2.746312] EXT4-fs (sda2): re-mounted. Opts: acl,user_xattr
Started Remount Root FS [ OK ]
Starting Load Random Seed...
Started Load Random Seed [ OK ]
[ 2.797945] boot.localnet[133]: Using boot-specified hostname 'pistachio.kroah.org'
boot.localnet[133]: Using boot-specified hostname 'pistachio.kroah.org'
[ 2.898718] boot.localnet[133]: Setting up hostname 'pistachio'..done
boot.localnet[133]: Setting up hostname 'pistachio'..done
boot.localnet[ 2.973681] boot.localnet[133]: Setting up loopback interface RTNETLINK answers: File exists
[133]: Setting up loopback interface RTNETLINK answers: File exists
boot.localnet[133]: ..done
[ 2.979959] boot.localnet[108]: ..done
Started LSB: setup hostname and yp [ OK ]
Started Setup Virtual Console [ OK ]
[ 3.056710] input: PC Speaker as /devices/platform/pcspkr/input/input3


any ideas?

2012-06-26 18:38:34

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v3] printk: Have printk() never buffer its data

On Tue, Jun 26, 2012 at 11:34:05AM -0700, Greg Kroah-Hartman wrote:
> On Tue, Jun 26, 2012 at 07:00:38PM +0200, Kay Sievers wrote:
> > > Hm, this doesn't boot for me, just hangs at startup :(
>
> Booting in a qemu image I get the following, it just hangs here:
>
> [ 2.683675] BIOS EDD facility v0.16 2004-Jun-25, 1 devices found
> Started File System Check on Root Device [ OK ]
> Starting LSB: setup hostname and yp...
> [ 2.732973] systemd-fsck[108]: root: clean, 244481/1313280 files, 2189115/5242880 blocks
> Starting Remount Root FS...
> systemd-fsck[107]: root: clean, 244481/1313280 files, 2189115/5242880 blocks
> [ 2.746312] EXT4-fs (sda2): re-mounted. Opts: acl,user_xattr
> Started Remount Root FS [ OK ]
> Starting Load Random Seed...
> Started Load Random Seed [ OK ]
> [ 2.797945] boot.localnet[133]: Using boot-specified hostname 'pistachio.kroah.org'
> boot.localnet[133]: Using boot-specified hostname 'pistachio.kroah.org'
> [ 2.898718] boot.localnet[133]: Setting up hostname 'pistachio'..done
> boot.localnet[133]: Setting up hostname 'pistachio'..done
> boot.localnet[ 2.973681] boot.localnet[133]: Setting up loopback interface RTNETLINK answers: File exists
> [133]: Setting up loopback interface RTNETLINK answers: File exists
> boot.localnet[133]: ..done
> [ 2.979959] boot.localnet[108]: ..done
> Started LSB: setup hostname and yp [ OK ]
> Started Setup Virtual Console [ OK ]
> [ 3.056710] input: PC Speaker as /devices/platform/pcspkr/input/input3

Ok, without your patch I get the same hang within qemu, so odds are it's
not your fault here, but booting still fails. I wonder if it's due to
whatever oddness suse does with it's initrd, let me go try to get a
initrd-free boot and see if that works...

greg k-h

2012-06-26 18:48:23

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v3] printk: Have printk() never buffer its data

On Tue, Jun 26, 2012 at 11:38:28AM -0700, Greg Kroah-Hartman wrote:
> On Tue, Jun 26, 2012 at 11:34:05AM -0700, Greg Kroah-Hartman wrote:
> > On Tue, Jun 26, 2012 at 07:00:38PM +0200, Kay Sievers wrote:
> > > > Hm, this doesn't boot for me, just hangs at startup :(
> >
> > Booting in a qemu image I get the following, it just hangs here:
> >
> > [ 2.683675] BIOS EDD facility v0.16 2004-Jun-25, 1 devices found
> > Started File System Check on Root Device [ OK ]
> > Starting LSB: setup hostname and yp...
> > [ 2.732973] systemd-fsck[108]: root: clean, 244481/1313280 files, 2189115/5242880 blocks
> > Starting Remount Root FS...
> > systemd-fsck[107]: root: clean, 244481/1313280 files, 2189115/5242880 blocks
> > [ 2.746312] EXT4-fs (sda2): re-mounted. Opts: acl,user_xattr
> > Started Remount Root FS [ OK ]
> > Starting Load Random Seed...
> > Started Load Random Seed [ OK ]
> > [ 2.797945] boot.localnet[133]: Using boot-specified hostname 'pistachio.kroah.org'
> > boot.localnet[133]: Using boot-specified hostname 'pistachio.kroah.org'
> > [ 2.898718] boot.localnet[133]: Setting up hostname 'pistachio'..done
> > boot.localnet[133]: Setting up hostname 'pistachio'..done
> > boot.localnet[ 2.973681] boot.localnet[133]: Setting up loopback interface RTNETLINK answers: File exists
> > [133]: Setting up loopback interface RTNETLINK answers: File exists
> > boot.localnet[133]: ..done
> > [ 2.979959] boot.localnet[108]: ..done
> > Started LSB: setup hostname and yp [ OK ]
> > Started Setup Virtual Console [ OK ]
> > [ 3.056710] input: PC Speaker as /devices/platform/pcspkr/input/input3
>
> Ok, without your patch I get the same hang within qemu, so odds are it's
> not your fault here, but booting still fails. I wonder if it's due to
> whatever oddness suse does with it's initrd, let me go try to get a
> initrd-free boot and see if that works...

Doh, stupid 'quiet' boot mode, it was hiding where things were hanging.

I get the following as the last line on the terminal before it locks up
(I sent you the full image off-list):
Booting Node 0, Processors #1 #2 #3

And I have 4 processors in this laptop, so odds are that's the first
time the continuation printk logic kicks in.

Any ideas?

greg k-h

2012-06-27 15:14:04

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v3] printk: Have printk() never buffer its data

On Tue, 2012-06-26 at 18:07 +0200, Kay Sievers wrote:

> From: Kay Sievers <[email protected]>
> Subject: printk: flush continuation lines immediately to console
>
> Continuation lines are buffered internally to merge the chunked
> printk()s into a single record, and to isolate potentially racy
> continuation users from usual terminated line users.
>
> This though, has the effect that partial lines are not printed to
> the console in the moment they are emitted. In case the kernel
> crashes in the meantime, the potentially interesting printed
> information will never reach the consoles.
>
> Here we share the continuation buffer with the console copy logic,
> and partial lines are immediately flushed to the consoles, but still
> buffered internally to improve the integrity of the messages.

I just tested this patch and it boots fine for me. Better still, I put
in my 'delay' to make sure the output shows up as it is printed, and it
seems to do so.

Tested-by: Steven Rostedt <[email protected]>

-- Steve

> ---
>
> kernel/printk.c | 129 +++++++++++++++++++++++++++++++++++++++++++------------
> 1 file changed, 101 insertions(+), 28 deletions(-)
>
> diff --git a/kernel/printk.c b/kernel/printk.c
> index a2276b9..92c34bd1 100644
> --- a/kernel/printk.c
> +++ b/kernel/printk.c
> @@ -193,12 +193,19 @@ static int console_may_schedule;
> * separated by ',', and find the message after the ';' character.
> */
>
> +enum log_flags {
> + LOG_DEFAULT = 0,
> + LOG_NOCONS = 1, /* already flushed, do not print to console */
> +};
> +
> struct log {
> u64 ts_nsec; /* timestamp in nanoseconds */
> u16 len; /* length of entire record */
> u16 text_len; /* length of text buffer */
> u16 dict_len; /* length of dictionary buffer */
> - u16 level; /* syslog level + facility */
> + u8 facility; /* syslog facility */
> + u8 flags:5; /* internal record flags */
> + u8 level:3; /* syslog level */
> };
>
> /*
> @@ -286,6 +293,7 @@ static u32 log_next(u32 idx)
>
> /* insert record into the buffer, discard old ones, update heads */
> static void log_store(int facility, int level,
> + enum log_flags flags, u64 ts_nsec,
> const char *dict, u16 dict_len,
> const char *text, u16 text_len)
> {
> @@ -329,8 +337,13 @@ static void log_store(int facility, int level,
> msg->text_len = text_len;
> memcpy(log_dict(msg), dict, dict_len);
> msg->dict_len = dict_len;
> - msg->level = (facility << 3) | (level & 7);
> - msg->ts_nsec = local_clock();
> + msg->facility = facility;
> + msg->level = level & 7;
> + msg->flags = flags & 0x1f;
> + if (ts_nsec > 0)
> + msg->ts_nsec = ts_nsec;
> + else
> + msg->ts_nsec = local_clock();
> memset(log_dict(msg) + dict_len, 0, pad_len);
> msg->len = sizeof(struct log) + text_len + dict_len + pad_len;
>
> @@ -446,7 +459,7 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
> ts_usec = msg->ts_nsec;
> do_div(ts_usec, 1000);
> len = sprintf(user->buf, "%u,%llu,%llu;",
> - msg->level, user->seq, ts_usec);
> + (msg->facility << 3) | msg->level, user->seq, ts_usec);
>
> /* escape non-printable characters */
> for (i = 0; i < msg->text_len; i++) {
> @@ -787,6 +800,21 @@ static bool printk_time;
> #endif
> module_param_named(time, printk_time, bool, S_IRUGO | S_IWUSR);
>
> +static size_t print_time(u64 ts, char *buf)
> +{
> + unsigned long rem_nsec;
> +
> + if (!printk_time)
> + return 0;
> +
> + if (!buf)
> + return 15;
> +
> + rem_nsec = do_div(ts, 1000000000);
> + return sprintf(buf, "[%5lu.%06lu] ",
> + (unsigned long)ts, rem_nsec / 1000);
> +}
> +
> static size_t print_prefix(const struct log *msg, bool syslog, char *buf)
> {
> size_t len = 0;
> @@ -803,18 +831,7 @@ static size_t print_prefix(const struct log *msg, bool syslog, char *buf)
> }
> }
>
> - if (printk_time) {
> - if (buf) {
> - unsigned long long ts = msg->ts_nsec;
> - unsigned long rem_nsec = do_div(ts, 1000000000);
> -
> - len += sprintf(buf + len, "[%5lu.%06lu] ",
> - (unsigned long) ts, rem_nsec / 1000);
> - } else {
> - len += 15;
> - }
> - }
> -
> + len += print_time(msg->ts_nsec, buf ? buf + len : NULL);
> return len;
> }
>
> @@ -1272,15 +1289,25 @@ static inline void printk_delay(void)
> }
> }
>
> +/*
> + * Continuation lines are buffered, and not committed to the record buffer
> + * until the line is complete, or a race forces a flush. The line fragments
> + * though, are printed immediately to the consoles to ensure everything has
> + * reached the console in case of a kernel crash.
> + */
> +static char cont_buf[LOG_LINE_MAX];
> +static size_t cont_len;
> +static size_t cont_cons;
> +static size_t cont_cons_len;
> +static int cont_level;
> +static u64 cont_ts_nsec;
> +static struct task_struct *cont_task;
> +
> asmlinkage int vprintk_emit(int facility, int level,
> const char *dict, size_t dictlen,
> const char *fmt, va_list args)
> {
> static int recursion_bug;
> - static char cont_buf[LOG_LINE_MAX];
> - static size_t cont_len;
> - static int cont_level;
> - static struct task_struct *cont_task;
> static char textbuf[LOG_LINE_MAX];
> char *text = textbuf;
> size_t text_len;
> @@ -1326,7 +1353,8 @@ asmlinkage int vprintk_emit(int facility, int level,
> recursion_bug = 0;
> printed_len += strlen(recursion_msg);
> /* emit KERN_CRIT message */
> - log_store(0, 2, NULL, 0, recursion_msg, printed_len);
> + log_store(0, 2, LOG_DEFAULT, 0,
> + NULL, 0, recursion_msg, printed_len);
> }
>
> /*
> @@ -1369,13 +1397,17 @@ asmlinkage int vprintk_emit(int facility, int level,
> * Flush earlier buffer, which is either from a
> * different thread, or when we got a new prefix.
> */
> - log_store(facility, cont_level, NULL, 0, cont_buf, cont_len);
> + log_store(facility, cont_level,
> + LOG_NOCONS, cont_ts_nsec,
> + NULL, 0, cont_buf, cont_len);
> cont_len = 0;
> }
>
> if (!cont_len) {
> + cont_cons = 0;
> cont_level = level;
> cont_task = current;
> + cont_ts_nsec = local_clock();
> }
>
> /* buffer or append to earlier buffer from the same thread */
> @@ -1383,6 +1415,8 @@ asmlinkage int vprintk_emit(int facility, int level,
> text_len = sizeof(cont_buf) - cont_len;
> memcpy(cont_buf + cont_len, text, text_len);
> cont_len += text_len;
> + cont_cons_len = cont_len;
> + printed_len = text_len;
> } else {
> if (cont_len && cont_task == current) {
> if (prefix) {
> @@ -1392,6 +1426,7 @@ asmlinkage int vprintk_emit(int facility, int level,
> * with an interrupt.
> */
> log_store(facility, cont_level,
> + LOG_NOCONS, cont_ts_nsec,
> NULL, 0, cont_buf, cont_len);
> cont_len = 0;
> }
> @@ -1401,14 +1436,15 @@ asmlinkage int vprintk_emit(int facility, int level,
> text_len = sizeof(cont_buf) - cont_len;
> memcpy(cont_buf + cont_len, text, text_len);
> cont_len += text_len;
> + cont_cons_len = cont_len;
> log_store(facility, cont_level,
> + LOG_NOCONS, cont_ts_nsec,
> NULL, 0, cont_buf, cont_len);
> cont_len = 0;
> - cont_task = NULL;
> - printed_len = cont_len;
> + printed_len = text_len;
> } else {
> /* ordinary single and terminated line */
> - log_store(facility, level,
> + log_store(facility, level, LOG_DEFAULT, 0,
> dict, dictlen, text, text_len);
> printed_len = text_len;
> }
> @@ -1795,6 +1831,7 @@ static u32 console_idx;
> */
> void console_unlock(void)
> {
> + static char text[LOG_LINE_MAX];
> static u64 seen_seq;
> unsigned long flags;
> bool wake_klogd = false;
> @@ -1807,10 +1844,37 @@ void console_unlock(void)
>
> console_may_schedule = 0;
>
> + /* flush buffered message fragment immediately to console */
> + raw_spin_lock_irqsave(&logbuf_lock, flags);
> + if (cont_cons_len && (cont_cons < cont_cons_len || !cont_len)) {
> + size_t textlen = 0;
> + size_t len;
> +
> + len = cont_cons_len - cont_cons;
> + if (!cont_cons)
> + textlen = print_time(cont_ts_nsec, text);
> + memcpy(text + textlen, cont_buf + cont_cons, len);
> + textlen += len;
> + cont_cons = cont_cons_len;
> +
> + if (!cont_len) {
> + /* last chunk of line; terminate */
> + text[textlen++] = '\n';
> + cont_cons_len = 0;
> + }
> + raw_spin_unlock(&logbuf_lock);
> +
> + stop_critical_timings();
> + call_console_drivers(cont_level, text, textlen);
> + start_critical_timings();
> +
> + local_irq_restore(flags);
> + } else
> + raw_spin_unlock_irqrestore(&logbuf_lock, flags);
> +
> again:
> for (;;) {
> struct log *msg;
> - static char text[LOG_LINE_MAX];
> size_t len;
> int level;
>
> @@ -1825,13 +1889,22 @@ again:
> console_seq = log_first_seq;
> console_idx = log_first_idx;
> }
> -
> +skip:
> if (console_seq == log_next_seq)
> break;
>
> msg = log_from_idx(console_idx);
> - level = msg->level & 7;
> + if (msg->flags & LOG_NOCONS) {
> + /*
> + * Skip record we have buffered and already printed
> + * directly to the console when we received it.
> + */
> + console_idx = log_next(console_idx);
> + console_seq++;
> + goto skip;
> + }
>
> + level = msg->level;
> len = msg_print_text(msg, false, text, sizeof(text));
>
> console_idx = log_next(console_idx);
>

2012-06-27 15:18:07

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v3] printk: Have printk() never buffer its data

On Wed, 2012-06-27 at 11:13 -0400, Steven Rostedt wrote:

> I just tested this patch and it boots fine for me.

Of course after sending this, I rebooted to the same kernel and it
locked up :-p

Ends with:

[ 47.966454] microcode: Microcode Update Driver: v2.00 <[email protected]>, Peter Oruba
[ 48.188791] kvm: disabled by bios
[ 48.352547] snd_hda_intel 0000:00:1b.0: irq 44 for MSI/MSI-X
[ 48.410944] ALSA hda_auto_parser.c:322 autoconfig: line_outs=4 (0xd/0xf/0x10/0x11/0x0) type:line
[ 48.419774] ALSA hda_auto_parser.c:326 speaker_outs=0 (0x0/0x0/0x0/0x0/0x0)
[ 48.427012] ALSA hda_auto_parser.c:330 hp_outs=1 (0xa/0x0/0x0/0x0/0x0)
[ 48.433848] ALSA hda_auto_parser.c:331 mono: mono_out=0x0
[ 48.439523] ALSA hda_auto_parser.c:334 dig-out=0x21/0x0
[ 48.445023] ALSA hda_auto_parser.c:335 inputs:[ 48.449562] ALSA hda_auto_parser.c:339 Front Mic=0xb[ 48.454451] ALSA hda_auto_parser.c:339 Rear Mic=0xe[ 48.459258] ALSA hda_auto_parser.c:339 Line=0xc

-- Steve

> Better still, I put
> in my 'delay' to make sure the output shows up as it is printed, and it
> seems to do so.
>
> Tested-by: Steven Rostedt <[email protected]>

2012-06-27 15:26:53

by Kay Sievers

[permalink] [raw]
Subject: Re: [PATCH v3] printk: Have printk() never buffer its data

On Wed, Jun 27, 2012 at 11:18 AM, Steven Rostedt <[email protected]> wrote:
> On Wed, 2012-06-27 at 11:13 -0400, Steven Rostedt wrote:
>
>> I just tested this patch and it boots fine for me.
>
> Of course after sending this, I rebooted to the same kernel and it
> locked up :-p

Thanks for testing it. Greg had some issues too, which I can't
reproduce it on any of my boxes. I'll have a look ...

Thanks again,
Kay

2012-06-28 01:39:01

by Kay Sievers

[permalink] [raw]
Subject: Re: [PATCH v3] printk: Have printk() never buffer its data

On Wed, 2012-06-27 at 11:26 -0400, Kay Sievers wrote:
> On Wed, Jun 27, 2012 at 11:18 AM, Steven Rostedt <[email protected]> wrote:
> > On Wed, 2012-06-27 at 11:13 -0400, Steven Rostedt wrote:
> >
> >> I just tested this patch and it boots fine for me.
> >
> > Of course after sending this, I rebooted to the same kernel and it
> > locked up :-p
>
> Thanks for testing it. Greg had some issues too, which I can't
> reproduce it on any of my boxes. I'll have a look ...

Here is an updated version, that should be look a bit cleaner.

It passes all my tests here, including the fake kernel module that has
threads racing against each other. With that, the console output gets a
bit disturbed now without the buffering, but the recorded log still
looks all fine.

Thanks,
Kay


From: Kay Sievers <[email protected]>
Subject: [PATCH] printk: flush continuation lines immediately to console

Continuation lines are buffered internally, intended to merge the
chunked printk()s into a single record, and to isolate potentially
racy continuation users from usual terminated line users.

This though, has the effect that partial lines are not printed to
the console in the moment they are emitted. In case the kernel
crashes in the meantime, the potentially interesting printed
information would never reach the consoles.

Here we share the continuation buffer with the console copy logic,
and partial lines are always immediately flushed to the available
consoles. They are still buffered internally to improve the
readability and integrity of the messages and minimize the amount
of needed record headers to store.

Signed-off-by: Kay Sievers <[email protected]>
---

kernel/printk.c | 244 +++++++++++++++++++++++++++++++++++++++----------------
1 file changed, 176 insertions(+), 68 deletions(-)

diff --git a/kernel/printk.c b/kernel/printk.c
index cdfba44..fbf4d0b 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -193,12 +193,19 @@ static int console_may_schedule;
* separated by ',', and find the message after the ';' character.
*/

+enum log_flags {
+ LOG_DEFAULT = 0,
+ LOG_NOCONS = 1, /* already flushed, do not print to console */
+};
+
struct log {
u64 ts_nsec; /* timestamp in nanoseconds */
u16 len; /* length of entire record */
u16 text_len; /* length of text buffer */
u16 dict_len; /* length of dictionary buffer */
- u16 level; /* syslog level + facility */
+ u8 facility; /* syslog facility */
+ u8 flags:5; /* internal record flags */
+ u8 level:3; /* syslog level */
};

/*
@@ -286,6 +293,7 @@ static u32 log_next(u32 idx)

/* insert record into the buffer, discard old ones, update heads */
static void log_store(int facility, int level,
+ enum log_flags flags, u64 ts_nsec,
const char *dict, u16 dict_len,
const char *text, u16 text_len)
{
@@ -329,8 +337,13 @@ static void log_store(int facility, int level,
msg->text_len = text_len;
memcpy(log_dict(msg), dict, dict_len);
msg->dict_len = dict_len;
- msg->level = (facility << 3) | (level & 7);
- msg->ts_nsec = local_clock();
+ msg->facility = facility;
+ msg->level = level & 7;
+ msg->flags = flags & 0x1f;
+ if (ts_nsec > 0)
+ msg->ts_nsec = ts_nsec;
+ else
+ msg->ts_nsec = local_clock();
memset(log_dict(msg) + dict_len, 0, pad_len);
msg->len = sizeof(struct log) + text_len + dict_len + pad_len;

@@ -446,7 +459,7 @@ static ssize_t devkmsg_read(struct file *file, char __user *buf,
ts_usec = msg->ts_nsec;
do_div(ts_usec, 1000);
len = sprintf(user->buf, "%u,%llu,%llu;",
- msg->level, user->seq, ts_usec);
+ (msg->facility << 3) | msg->level, user->seq, ts_usec);

/* escape non-printable characters */
for (i = 0; i < msg->text_len; i++) {
@@ -787,6 +800,21 @@ static bool printk_time;
#endif
module_param_named(time, printk_time, bool, S_IRUGO | S_IWUSR);

+static size_t print_time(u64 ts, char *buf)
+{
+ unsigned long rem_nsec;
+
+ if (!printk_time)
+ return 0;
+
+ if (!buf)
+ return 15;
+
+ rem_nsec = do_div(ts, 1000000000);
+ return sprintf(buf, "[%5lu.%06lu] ",
+ (unsigned long)ts, rem_nsec / 1000);
+}
+
static size_t print_prefix(const struct log *msg, bool syslog, char *buf)
{
size_t len = 0;
@@ -803,18 +831,7 @@ static size_t print_prefix(const struct log *msg, bool syslog, char *buf)
}
}

- if (printk_time) {
- if (buf) {
- unsigned long long ts = msg->ts_nsec;
- unsigned long rem_nsec = do_div(ts, 1000000000);
-
- len += sprintf(buf + len, "[%5lu.%06lu] ",
- (unsigned long) ts, rem_nsec / 1000);
- } else {
- len += 15;
- }
- }
-
+ len += print_time(msg->ts_nsec, buf ? buf + len : NULL);
return len;
}

@@ -1294,15 +1311,92 @@ static inline void printk_delay(void)
}
}

+/*
+ * Continuation lines are buffered, and not committed to the record buffer
+ * until the line is complete, or a race forces it. The line fragments
+ * though, are printed immediately to the consoles to ensure everything has
+ * reached the console in case of a kernel crash.
+ */
+static struct cont {
+ char buf[LOG_LINE_MAX];
+ size_t len; /* length == 0 means unused buffer */
+ size_t cons; /* bytes written to console */
+ struct task_struct *owner; /* task of first print*/
+ u64 ts_nsec; /* time of first print */
+ u8 level; /* log level of first message */
+ u8 facility; /* log level of first message */
+ bool flushed:1; /* buffer sealed and committed */
+} cont;
+
+static void cont_flush(void)
+{
+ if (cont.flushed)
+ return;
+ if (cont.len == 0)
+ return;
+
+ log_store(cont.facility, cont.level, LOG_NOCONS, cont.ts_nsec,
+ NULL, 0, cont.buf, cont.len);
+
+ cont.flushed = true;
+}
+
+static bool cont_add(int facility, int level, const char *text, size_t len)
+{
+ if (cont.len && cont.flushed)
+ return false;
+
+ if (cont.len + len > sizeof(cont.buf)) {
+ cont_flush();
+ return false;
+ }
+
+ if (!cont.len) {
+ cont.facility = facility;
+ cont.level = level;
+ cont.owner = current;
+ cont.ts_nsec = local_clock();
+ cont.cons = 0;
+ cont.flushed = false;
+ }
+
+ memcpy(cont.buf + cont.len, text, len);
+ cont.len += len;
+ return true;
+}
+
+static size_t cont_print_text(char *text, size_t size)
+{
+ size_t textlen = 0;
+ size_t len;
+
+ if (cont.cons == 0) {
+ textlen += print_time(cont.ts_nsec, text);
+ size -= textlen;
+ }
+
+ len = cont.len - cont.cons;
+ if (len > 0) {
+ if (len+1 > size)
+ len = size-1;
+ memcpy(text + textlen, cont.buf + cont.cons, len);
+ textlen += len;
+ cont.cons = cont.len;
+ }
+
+ if (cont.flushed) {
+ text[textlen++] = '\n';
+ /* got everything, release buffer */
+ cont.len = 0;
+ }
+ return textlen;
+}
+
asmlinkage int vprintk_emit(int facility, int level,
const char *dict, size_t dictlen,
const char *fmt, va_list args)
{
static int recursion_bug;
- static char cont_buf[LOG_LINE_MAX];
- static size_t cont_len;
- static int cont_level;
- static struct task_struct *cont_task;
static char textbuf[LOG_LINE_MAX];
char *text = textbuf;
size_t text_len;
@@ -1348,7 +1442,8 @@ asmlinkage int vprintk_emit(int facility, int level,
recursion_bug = 0;
printed_len += strlen(recursion_msg);
/* emit KERN_CRIT message */
- log_store(0, 2, NULL, 0, recursion_msg, printed_len);
+ log_store(0, 2, LOG_DEFAULT, 0,
+ NULL, 0, recursion_msg, printed_len);
}

/*
@@ -1386,55 +1481,38 @@ asmlinkage int vprintk_emit(int facility, int level,
}

if (!newline) {
- if (cont_len && (prefix || cont_task != current)) {
- /*
- * Flush earlier buffer, which is either from a
- * different thread, or when we got a new prefix.
- */
- log_store(facility, cont_level, NULL, 0, cont_buf, cont_len);
- cont_len = 0;
- }
-
- if (!cont_len) {
- cont_level = level;
- cont_task = current;
- }
+ /*
+ * Flush the conflicting buffer. An earlier newline was missing,
+ * or another task also prints continuation lines.
+ */
+ if (cont.len && (prefix || cont.owner != current))
+ cont_flush();

- /* buffer or append to earlier buffer from the same thread */
- if (cont_len + text_len > sizeof(cont_buf))
- text_len = sizeof(cont_buf) - cont_len;
- memcpy(cont_buf + cont_len, text, text_len);
- cont_len += text_len;
+ /* buffer line if possible, otherwise store it right away */
+ if (!cont_add(facility, level, text, text_len))
+ log_store(facility, level, LOG_DEFAULT, 0,
+ dict, dictlen, text, text_len);
} else {
- if (cont_len && cont_task == current) {
- if (prefix) {
- /*
- * New prefix from the same thread; flush. We
- * either got no earlier newline, or we race
- * with an interrupt.
- */
- log_store(facility, cont_level,
- NULL, 0, cont_buf, cont_len);
- cont_len = 0;
- }
+ bool stored = false;

- /* append to the earlier buffer and flush */
- if (cont_len + text_len > sizeof(cont_buf))
- text_len = sizeof(cont_buf) - cont_len;
- memcpy(cont_buf + cont_len, text, text_len);
- cont_len += text_len;
- log_store(facility, cont_level,
- NULL, 0, cont_buf, cont_len);
- cont_len = 0;
- cont_task = NULL;
- printed_len = cont_len;
- } else {
- /* ordinary single and terminated line */
- log_store(facility, level,
- dict, dictlen, text, text_len);
- printed_len = text_len;
+ /*
+ * Flush the conflicting buffer. An earlier newline was missing,
+ * or we race with a continuation line from an interrupt.
+ */
+ if (cont.len && prefix && cont.owner == current)
+ cont_flush();
+
+ /* Merge with our buffer if possible; flush it in any case */
+ if (cont.len && cont.owner == current) {
+ stored = cont_add(facility, level, text, text_len);
+ cont_flush();
}
+
+ if (!stored)
+ log_store(facility, level, LOG_DEFAULT, 0,
+ dict, dictlen, text, text_len);
}
+ printed_len += text_len;

/*
* Try to acquire and then immediately release the console semaphore.
@@ -1521,11 +1599,18 @@ EXPORT_SYMBOL(printk);
#else

#define LOG_LINE_MAX 0
+static struct cont {
+ size_t len;
+ size_t cons;
+ u8 level;
+ bool flushed:1;
+} cont;
static struct log *log_from_idx(u32 idx) { return NULL; }
static u32 log_next(u32 idx) { return 0; }
static void call_console_drivers(int level, const char *text, size_t len) {}
static size_t msg_print_text(const struct log *msg, bool syslog,
char *buf, size_t size) { return 0; }
+static size_t cont_print_text(char *text, size_t size) { return 0; }

#endif /* CONFIG_PRINTK */

@@ -1817,6 +1902,7 @@ static u32 console_idx;
*/
void console_unlock(void)
{
+ static char text[LOG_LINE_MAX];
static u64 seen_seq;
unsigned long flags;
bool wake_klogd = false;
@@ -1829,10 +1915,23 @@ void console_unlock(void)

console_may_schedule = 0;

+ /* flush buffered message fragment immediately to console */
+ raw_spin_lock_irqsave(&logbuf_lock, flags);
+ if (cont.len && (cont.cons < cont.len || cont.flushed)) {
+ size_t len;
+
+ len = cont_print_text(text, sizeof(text));
+ raw_spin_unlock(&logbuf_lock);
+ stop_critical_timings();
+ call_console_drivers(cont.level, text, len);
+ start_critical_timings();
+ local_irq_restore(flags);
+ } else
+ raw_spin_unlock_irqrestore(&logbuf_lock, flags);
+
again:
for (;;) {
struct log *msg;
- static char text[LOG_LINE_MAX];
size_t len;
int level;

@@ -1847,13 +1946,22 @@ again:
console_seq = log_first_seq;
console_idx = log_first_idx;
}
-
+skip:
if (console_seq == log_next_seq)
break;

msg = log_from_idx(console_idx);
- level = msg->level & 7;
+ if (msg->flags & LOG_NOCONS) {
+ /*
+ * Skip record we have buffered and already printed
+ * directly to the console when we received it.
+ */
+ console_idx = log_next(console_idx);
+ console_seq++;
+ goto skip;
+ }

+ level = msg->level;
len = msg_print_text(msg, false, text, sizeof(text));

console_idx = log_next(console_idx);

2012-06-28 01:48:29

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v3] printk: Have printk() never buffer its data

On Thu, Jun 28, 2012 at 09:38:53AM +0200, Kay Sievers wrote:
> On Wed, 2012-06-27 at 11:26 -0400, Kay Sievers wrote:
> > On Wed, Jun 27, 2012 at 11:18 AM, Steven Rostedt <[email protected]> wrote:
> > > On Wed, 2012-06-27 at 11:13 -0400, Steven Rostedt wrote:
> > >
> > >> I just tested this patch and it boots fine for me.
> > >
> > > Of course after sending this, I rebooted to the same kernel and it
> > > locked up :-p
> >
> > Thanks for testing it. Greg had some issues too, which I can't
> > reproduce it on any of my boxes. I'll have a look ...
>
> Here is an updated version, that should be look a bit cleaner.
>
> It passes all my tests here, including the fake kernel module that has
> threads racing against each other. With that, the console output gets a
> bit disturbed now without the buffering, but the recorded log still
> looks all fine.

And this solves the boot problem I was seeing, thanks a lot for fixing
that issue.

Steven, any objections to me taking this patch?

thanks,

greg k-h

2012-06-28 01:54:19

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v3] printk: Have printk() never buffer its data

On Wed, 2012-06-27 at 18:48 -0700, Greg Kroah-Hartman wrote:

> And this solves the boot problem I was seeing, thanks a lot for fixing
> that issue.
>
> Steven, any objections to me taking this patch?
>

My box crashed on the earlier version too. Let me run some tests on it
first, and then I'll give my Tested-by and Acked-by if it succeeds, and
I don't see anything wrong with it as I give it further review.

-- Steve

2012-06-28 02:55:33

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v3] printk: Have printk() never buffer its data

On Thu, 2012-06-28 at 09:38 +0200, Kay Sievers wrote:

> +static void cont_flush(void)
> +{
> + if (cont.flushed)
> + return;
> + if (cont.len == 0)
> + return;
> +
> + log_store(cont.facility, cont.level, LOG_NOCONS, cont.ts_nsec,
> + NULL, 0, cont.buf, cont.len);
> +
> + cont.flushed = true;
> +}
> +
> +static bool cont_add(int facility, int level, const char *text, size_t len)
> +{
> + if (cont.len && cont.flushed)
> + return false;
> +
> + if (cont.len + len > sizeof(cont.buf)) {
> + cont_flush();
> + return false;
> + }
> +
> + if (!cont.len) {
> + cont.facility = facility;
> + cont.level = level;
> + cont.owner = current;
> + cont.ts_nsec = local_clock();
> + cont.cons = 0;
> + cont.flushed = false;
> + }
> +
> + memcpy(cont.buf + cont.len, text, len);
> + cont.len += len;
> + return true;
> +}
> +
> +static size_t cont_print_text(char *text, size_t size)
> +{
> + size_t textlen = 0;
> + size_t len;
> +
> + if (cont.cons == 0) {
> + textlen += print_time(cont.ts_nsec, text);
> + size -= textlen;
> + }
> +
> + len = cont.len - cont.cons;
> + if (len > 0) {
> + if (len+1 > size)
> + len = size-1;
> + memcpy(text + textlen, cont.buf + cont.cons, len);
> + textlen += len;
> + cont.cons = cont.len;
> + }
> +
> + if (cont.flushed) {
> + text[textlen++] = '\n';
> + /* got everything, release buffer */
> + cont.len = 0;
> + }
> + return textlen;
> +}
> +
> asmlinkage int vprintk_emit(int facility, int level,
> const char *dict, size_t dictlen,
> const char *fmt, va_list args)
> {
> static int recursion_bug;
> - static char cont_buf[LOG_LINE_MAX];
> - static size_t cont_len;
> - static int cont_level;
> - static struct task_struct *cont_task;
> static char textbuf[LOG_LINE_MAX];
> char *text = textbuf;
> size_t text_len;
> @@ -1348,7 +1442,8 @@ asmlinkage int vprintk_emit(int facility, int level,
> recursion_bug = 0;
> printed_len += strlen(recursion_msg);
> /* emit KERN_CRIT message */
> - log_store(0, 2, NULL, 0, recursion_msg, printed_len);
> + log_store(0, 2, LOG_DEFAULT, 0,
> + NULL, 0, recursion_msg, printed_len);
> }
>
> /*
> @@ -1386,55 +1481,38 @@ asmlinkage int vprintk_emit(int facility, int level,
> }
>
> if (!newline) {
> - if (cont_len && (prefix || cont_task != current)) {
> - /*
> - * Flush earlier buffer, which is either from a
> - * different thread, or when we got a new prefix.
> - */
> - log_store(facility, cont_level, NULL, 0, cont_buf, cont_len);
> - cont_len = 0;
> - }
> -
> - if (!cont_len) {
> - cont_level = level;
> - cont_task = current;
> - }
> + /*
> + * Flush the conflicting buffer. An earlier newline was missing,
> + * or another task also prints continuation lines.
> + */
> + if (cont.len && (prefix || cont.owner != current))
> + cont_flush();
>
> - /* buffer or append to earlier buffer from the same thread */
> - if (cont_len + text_len > sizeof(cont_buf))
> - text_len = sizeof(cont_buf) - cont_len;
> - memcpy(cont_buf + cont_len, text, text_len);
> - cont_len += text_len;
> + /* buffer line if possible, otherwise store it right away */
> + if (!cont_add(facility, level, text, text_len))
> + log_store(facility, level, LOG_DEFAULT, 0,
> + dict, dictlen, text, text_len);
> } else {
> - if (cont_len && cont_task == current) {
> - if (prefix) {
> - /*
> - * New prefix from the same thread; flush. We
> - * either got no earlier newline, or we race
> - * with an interrupt.
> - */
> - log_store(facility, cont_level,
> - NULL, 0, cont_buf, cont_len);
> - cont_len = 0;
> - }
> + bool stored = false;
>
> - /* append to the earlier buffer and flush */
> - if (cont_len + text_len > sizeof(cont_buf))
> - text_len = sizeof(cont_buf) - cont_len;
> - memcpy(cont_buf + cont_len, text, text_len);
> - cont_len += text_len;
> - log_store(facility, cont_level,
> - NULL, 0, cont_buf, cont_len);
> - cont_len = 0;
> - cont_task = NULL;
> - printed_len = cont_len;
> - } else {
> - /* ordinary single and terminated line */
> - log_store(facility, level,
> - dict, dictlen, text, text_len);
> - printed_len = text_len;
> + /*
> + * Flush the conflicting buffer. An earlier newline was missing,
> + * or we race with a continuation line from an interrupt.
> + */
> + if (cont.len && prefix && cont.owner == current)
> + cont_flush();
> +
> + /* Merge with our buffer if possible; flush it in any case */
> + if (cont.len && cont.owner == current) {
> + stored = cont_add(facility, level, text, text_len);
> + cont_flush();
> }


I wonder if it would be better to do the following for the above two
ifs:

if (cont.len && cont.owner == current) {
if (!prefix)
stored = cont_add(facility, level, text, text_len);
cont_flush();
}

If the prefix was true, then the cont.flush would be set when cont_add()
is called, and the first thing that cont_add() does:

if (cont.len && cont.flushed)
return false;

which would always be true (returning false) if prefix was set.

And the second cont_flush() is a nop due to it doing:

if (cont.flushed)
return;

-- Steve


> +
> + if (!stored)
> + log_store(facility, level, LOG_DEFAULT, 0,
> + dict, dictlen, text, text_len);
> }
> + printed_len += text_len;
>
> /*
> * Try to acquire and then immediately release the console semaphore.
> @@ -1521,11 +1599,18 @@ EXPORT_SYMBOL(printk);
> #else
>
> #define LOG_LINE_MAX 0
> +static struct cont {
> + size_t len;
> + size_t cons;
> + u8 level;
> + bool flushed:1;
> +} cont;
> static struct log *log_from_idx(u32 idx) { return NULL; }
> static u32 log_next(u32 idx) { return 0; }
> static void call_console_drivers(int level, const char *text, size_t len) {}
> static size_t msg_print_text(const struct log *msg, bool syslog,
> char *buf, size_t size) { return 0; }
> +static size_t cont_print_text(char *text, size_t size) { return 0; }
>
> #endif /* CONFIG_PRINTK */
>
> @@ -1817,6 +1902,7 @@ static u32 console_idx;
> */
> void console_unlock(void)
> {
> + static char text[LOG_LINE_MAX];
> static u64 seen_seq;
> unsigned long flags;
> bool wake_klogd = false;
> @@ -1829,10 +1915,23 @@ void console_unlock(void)
>
> console_may_schedule = 0;
>
> + /* flush buffered message fragment immediately to console */
> + raw_spin_lock_irqsave(&logbuf_lock, flags);
> + if (cont.len && (cont.cons < cont.len || cont.flushed)) {
> + size_t len;
> +
> + len = cont_print_text(text, sizeof(text));
> + raw_spin_unlock(&logbuf_lock);
> + stop_critical_timings();
> + call_console_drivers(cont.level, text, len);
> + start_critical_timings();
> + local_irq_restore(flags);
> + } else
> + raw_spin_unlock_irqrestore(&logbuf_lock, flags);
> +
> again:
> for (;;) {
> struct log *msg;
> - static char text[LOG_LINE_MAX];
> size_t len;
> int level;
>
> @@ -1847,13 +1946,22 @@ again:
> console_seq = log_first_seq;
> console_idx = log_first_idx;
> }
> -
> +skip:
> if (console_seq == log_next_seq)
> break;
>
> msg = log_from_idx(console_idx);
> - level = msg->level & 7;
> + if (msg->flags & LOG_NOCONS) {
> + /*
> + * Skip record we have buffered and already printed
> + * directly to the console when we received it.
> + */
> + console_idx = log_next(console_idx);
> + console_seq++;
> + goto skip;
> + }
>
> + level = msg->level;
> len = msg_print_text(msg, false, text, sizeof(text));
>
> console_idx = log_next(console_idx);
>

2012-06-28 05:00:17

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH v3] printk: Have printk() never buffer its data

On Thu, 2012-06-28 at 09:38 +0200, Kay Sievers wrote:
> Here we share the continuation buffer with the console copy logic,
> and partial lines are always immediately flushed to the available
> consoles. They are still buffered internally to improve the
> readability and integrity of the messages and minimize the amount
> of needed record headers to store.

trivia:

> diff --git a/kernel/printk.c b/kernel/printk.c

> @@ -329,8 +337,13 @@ static void log_store(int facility, int level,
> msg->text_len = text_len;
> memcpy(log_dict(msg), dict, dict_len);
> msg->dict_len = dict_len;
> - msg->level = (facility << 3) | (level & 7);
> - msg->ts_nsec = local_clock();
> + msg->facility = facility;
> + msg->level = level & 7;

Doesn't need & 7

> + msg->flags = flags & 0x1f;
> + if (ts_nsec > 0)
> + msg->ts_nsec = ts_nsec;
> + else
> + msg->ts_nsec = local_clock();

Why local_clock? ts_nsec really can be 0.
[]

> @@ -1294,15 +1311,92 @@ static inline void printk_delay(void)
> }
> }
>
> +/*
> + * Continuation lines are buffered, and not committed to the record buffer
> + * until the line is complete, or a race forces it. The line fragments
> + * though, are printed immediately to the consoles to ensure everything has
> + * reached the console in case of a kernel crash.
> + */
> +static struct cont {
> + char buf[LOG_LINE_MAX];
> + size_t len; /* length == 0 means unused buffer */
> + size_t cons; /* bytes written to console */
> + struct task_struct *owner; /* task of first print*/
> + u64 ts_nsec; /* time of first print */
> + u8 level; /* log level of first message */
> + u8 facility; /* log level of first message */
> + bool flushed:1; /* buffer sealed and committed */

bool flushed:1 seems unnecessary.
bool flushed seems more appropriate.

> +} cont;
> +
> +static void cont_flush(void)
> +{
> + if (cont.flushed)
> + return;
> + if (cont.len == 0)
> + return;
> +
> + log_store(cont.facility, cont.level, LOG_NOCONS, cont.ts_nsec,
> + NULL, 0, cont.buf, cont.len);
> +
> + cont.flushed = true;
> +}
> +
> +static bool cont_add(int facility, int level, const char *text, size_t len)
> +{
> + if (cont.len && cont.flushed)
> + return false;
> +
> + if (cont.len + len > sizeof(cont.buf)) {
> + cont_flush();
> + return false;
> + }
> +
> + if (!cont.len) {
> + cont.facility = facility;
> + cont.level = level;
> + cont.owner = current;
> + cont.ts_nsec = local_clock();
> + cont.cons = 0;
> + cont.flushed = false;
> + }
> +
> + memcpy(cont.buf + cont.len, text, len);

Looks like you can still overrun cont.buf.

2012-06-29 05:30:35

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v3] printk: Have printk() never buffer its data

On Wed, Jun 27, 2012 at 10:55:29PM -0400, Steven Rostedt wrote:
> On Thu, 2012-06-28 at 09:38 +0200, Kay Sievers wrote:
>
> > +static void cont_flush(void)
> > +{
> > + if (cont.flushed)
> > + return;
> > + if (cont.len == 0)
> > + return;
> > +
> > + log_store(cont.facility, cont.level, LOG_NOCONS, cont.ts_nsec,
> > + NULL, 0, cont.buf, cont.len);
> > +
> > + cont.flushed = true;
> > +}
> > +
> > +static bool cont_add(int facility, int level, const char *text, size_t len)
> > +{
> > + if (cont.len && cont.flushed)
> > + return false;
> > +
> > + if (cont.len + len > sizeof(cont.buf)) {
> > + cont_flush();
> > + return false;
> > + }
> > +
> > + if (!cont.len) {
> > + cont.facility = facility;
> > + cont.level = level;
> > + cont.owner = current;
> > + cont.ts_nsec = local_clock();
> > + cont.cons = 0;
> > + cont.flushed = false;
> > + }
> > +
> > + memcpy(cont.buf + cont.len, text, len);
> > + cont.len += len;
> > + return true;
> > +}
> > +
> > +static size_t cont_print_text(char *text, size_t size)
> > +{
> > + size_t textlen = 0;
> > + size_t len;
> > +
> > + if (cont.cons == 0) {
> > + textlen += print_time(cont.ts_nsec, text);
> > + size -= textlen;
> > + }
> > +
> > + len = cont.len - cont.cons;
> > + if (len > 0) {
> > + if (len+1 > size)
> > + len = size-1;
> > + memcpy(text + textlen, cont.buf + cont.cons, len);
> > + textlen += len;
> > + cont.cons = cont.len;
> > + }
> > +
> > + if (cont.flushed) {
> > + text[textlen++] = '\n';
> > + /* got everything, release buffer */
> > + cont.len = 0;
> > + }
> > + return textlen;
> > +}
> > +
> > asmlinkage int vprintk_emit(int facility, int level,
> > const char *dict, size_t dictlen,
> > const char *fmt, va_list args)
> > {
> > static int recursion_bug;
> > - static char cont_buf[LOG_LINE_MAX];
> > - static size_t cont_len;
> > - static int cont_level;
> > - static struct task_struct *cont_task;
> > static char textbuf[LOG_LINE_MAX];
> > char *text = textbuf;
> > size_t text_len;
> > @@ -1348,7 +1442,8 @@ asmlinkage int vprintk_emit(int facility, int level,
> > recursion_bug = 0;
> > printed_len += strlen(recursion_msg);
> > /* emit KERN_CRIT message */
> > - log_store(0, 2, NULL, 0, recursion_msg, printed_len);
> > + log_store(0, 2, LOG_DEFAULT, 0,
> > + NULL, 0, recursion_msg, printed_len);
> > }
> >
> > /*
> > @@ -1386,55 +1481,38 @@ asmlinkage int vprintk_emit(int facility, int level,
> > }
> >
> > if (!newline) {
> > - if (cont_len && (prefix || cont_task != current)) {
> > - /*
> > - * Flush earlier buffer, which is either from a
> > - * different thread, or when we got a new prefix.
> > - */
> > - log_store(facility, cont_level, NULL, 0, cont_buf, cont_len);
> > - cont_len = 0;
> > - }
> > -
> > - if (!cont_len) {
> > - cont_level = level;
> > - cont_task = current;
> > - }
> > + /*
> > + * Flush the conflicting buffer. An earlier newline was missing,
> > + * or another task also prints continuation lines.
> > + */
> > + if (cont.len && (prefix || cont.owner != current))
> > + cont_flush();
> >
> > - /* buffer or append to earlier buffer from the same thread */
> > - if (cont_len + text_len > sizeof(cont_buf))
> > - text_len = sizeof(cont_buf) - cont_len;
> > - memcpy(cont_buf + cont_len, text, text_len);
> > - cont_len += text_len;
> > + /* buffer line if possible, otherwise store it right away */
> > + if (!cont_add(facility, level, text, text_len))
> > + log_store(facility, level, LOG_DEFAULT, 0,
> > + dict, dictlen, text, text_len);
> > } else {
> > - if (cont_len && cont_task == current) {
> > - if (prefix) {
> > - /*
> > - * New prefix from the same thread; flush. We
> > - * either got no earlier newline, or we race
> > - * with an interrupt.
> > - */
> > - log_store(facility, cont_level,
> > - NULL, 0, cont_buf, cont_len);
> > - cont_len = 0;
> > - }
> > + bool stored = false;
> >
> > - /* append to the earlier buffer and flush */
> > - if (cont_len + text_len > sizeof(cont_buf))
> > - text_len = sizeof(cont_buf) - cont_len;
> > - memcpy(cont_buf + cont_len, text, text_len);
> > - cont_len += text_len;
> > - log_store(facility, cont_level,
> > - NULL, 0, cont_buf, cont_len);
> > - cont_len = 0;
> > - cont_task = NULL;
> > - printed_len = cont_len;
> > - } else {
> > - /* ordinary single and terminated line */
> > - log_store(facility, level,
> > - dict, dictlen, text, text_len);
> > - printed_len = text_len;
> > + /*
> > + * Flush the conflicting buffer. An earlier newline was missing,
> > + * or we race with a continuation line from an interrupt.
> > + */
> > + if (cont.len && prefix && cont.owner == current)
> > + cont_flush();
> > +
> > + /* Merge with our buffer if possible; flush it in any case */
> > + if (cont.len && cont.owner == current) {
> > + stored = cont_add(facility, level, text, text_len);
> > + cont_flush();
> > }
>
>
> I wonder if it would be better to do the following for the above two
> ifs:
>
> if (cont.len && cont.owner == current) {
> if (!prefix)
> stored = cont_add(facility, level, text, text_len);
> cont_flush();
> }
>
> If the prefix was true, then the cont.flush would be set when cont_add()
> is called, and the first thing that cont_add() does:
>
> if (cont.len && cont.flushed)
> return false;
>
> which would always be true (returning false) if prefix was set.
>
> And the second cont_flush() is a nop due to it doing:
>
> if (cont.flushed)
> return;

It might be "better", and this would be a nice optimization, but is it
needed right now? In other words, I'd like to get this patch into
linux-next soon to get testing to get to Linus before 3.5-final comes
out, don't you?

thanks,

greg k-h

2012-06-29 11:18:42

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH v3] printk: Have printk() never buffer its data

On Fri, 2012-06-29 at 01:30 -0400, Greg Kroah-Hartman wrote:
>
> > I wonder if it would be better to do the following for the above two
> > ifs:
> >
> > if (cont.len && cont.owner == current) {
> > if (!prefix)
> > stored = cont_add(facility, level, text, text_len);
> > cont_flush();
> > }
> >
> > If the prefix was true, then the cont.flush would be set when cont_add()
> > is called, and the first thing that cont_add() does:
> >
> > if (cont.len && cont.flushed)
> > return false;
> >
> > which would always be true (returning false) if prefix was set.
> >
> > And the second cont_flush() is a nop due to it doing:
> >
> > if (cont.flushed)
> > return;
>
> It might be "better", and this would be a nice optimization, but is it
> needed right now? In other words, I'd like to get this patch into
> linux-next soon to get testing to get to Linus before 3.5-final comes
> out, don't you?

Sure, pull it as is, and you can add my Tested-by, and Acked-by tags
(Steven Rostedt <[email protected]>).

I'll send you a patch to do this update that you can queue for 3.6.
OK?

-- Steve

2012-06-29 15:40:30

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v3] printk: Have printk() never buffer its data

On Fri, Jun 29, 2012 at 07:18:38AM -0400, Steven Rostedt wrote:
> On Fri, 2012-06-29 at 01:30 -0400, Greg Kroah-Hartman wrote:
> >
> > > I wonder if it would be better to do the following for the above two
> > > ifs:
> > >
> > > if (cont.len && cont.owner == current) {
> > > if (!prefix)
> > > stored = cont_add(facility, level, text, text_len);
> > > cont_flush();
> > > }
> > >
> > > If the prefix was true, then the cont.flush would be set when cont_add()
> > > is called, and the first thing that cont_add() does:
> > >
> > > if (cont.len && cont.flushed)
> > > return false;
> > >
> > > which would always be true (returning false) if prefix was set.
> > >
> > > And the second cont_flush() is a nop due to it doing:
> > >
> > > if (cont.flushed)
> > > return;
> >
> > It might be "better", and this would be a nice optimization, but is it
> > needed right now? In other words, I'd like to get this patch into
> > linux-next soon to get testing to get to Linus before 3.5-final comes
> > out, don't you?
>
> Sure, pull it as is, and you can add my Tested-by, and Acked-by tags
> (Steven Rostedt <[email protected]>).
>
> I'll send you a patch to do this update that you can queue for 3.6.
> OK?

Sounds good to me, thanks,

greg k-h