LinuxLists.cc - oops pauser.

2006-01-05 04:52:21

[permalink] [raw]

Subject: oops pauser.

In my quest to get better debug data from users in Fedora bug reports,
I came up with this patch. A majority of users don't have serial
consoles, so when an oops scrolls off the top of the screen,
and locks up, they usually end up reporting a 2nd (or later) oops
that isn't particularly helpful (or worse, some inconsequential
info like 'sleeping whilst atomic' warnings)

With this patch, if we oops, there's a pause for a two minutes..
which hopefully gives people enough time to grab a digital camera
to take a screenshot of the oops.

It has an on-screen timer so the user knows what's going on,
(and that it's going to come back to life [maybe] after the oops).

The one case this doesn't catch is the problem of oopses whilst
in X. Previously a non-fatal oops would stall X momentarily,
and then things continue. Now those cases will lock up completely
for two minutes. Future patches could add some additional feedback
during this 'stall' such as the blinky keyboard leds, or periodic speaker beeps.

Signed-off-by: Dave Jones <[email protected]>

--- vanilla/arch/i386/kernel/traps.c 2006-01-02 22:21:10.000000000 -0500
+++ linux-2.6.15/arch/i386/kernel/traps.c 2006-01-04 23:42:46.000000000 -0500
@@ -256,6 +271,15 @@ void show_registers(struct pt_regs *regs
}
}
printk("\n");
+ {
+ int i;
+ for (i=120;i>0;i--) {
+ mdelay(1000);
+ touch_nmi_watchdog();
+ printk("Continuing in %d seconds. \r", i);
+ }
+ printk("\n");
+ }
}

static void handle_BUG(struct pt_regs *regs)

2006-01-05 06:10:26

by Randy Dunlap

[permalink] [raw]

Subject: Re: oops pauser. / boot_delayer

On Wed, 4 Jan 2006 23:52:12 -0500 Dave Jones wrote:

> In my quest to get better debug data from users in Fedora bug reports,
> I came up with this patch. A majority of users don't have serial
> consoles, so when an oops scrolls off the top of the screen,
> and locks up, they usually end up reporting a 2nd (or later) oops
> that isn't particularly helpful (or worse, some inconsequential
> info like 'sleeping whilst atomic' warnings)
>
> With this patch, if we oops, there's a pause for a two minutes..
> which hopefully gives people enough time to grab a digital camera
> to take a screenshot of the oops.
>
> It has an on-screen timer so the user knows what's going on,
> (and that it's going to come back to life [maybe] after the oops).
>
> The one case this doesn't catch is the problem of oopses whilst
> in X. Previously a non-fatal oops would stall X momentarily,
> and then things continue. Now those cases will lock up completely
> for two minutes. Future patches could add some additional feedback
> during this 'stall' such as the blinky keyboard leds, or periodic speaker beeps.

That's nice. Here's another patch^w hack.

This one delays each printk() during boot by a variable time
(from kernel command line), while system_state == SYSTEM_BOOTING.
Caveat: it's not terribly SMP safe or SMP nice.
Any ideas for improvements (esp. in the SMP area) are appreciated.

---

From: Randy Dunlap <[email protected]>

Optionally add a boot delay after each kernel printk() call,
crudely measured in milliseconds, with a maximum delay of
10 seconds per printk.

Enable CONFIG_BOOT_DELAY=y and then add (e.g.):
"lpj=loops_per_jiffy boot_delay=100"
to the kernel command line.

Signed-off-by: Randy Dunlap <[email protected]>
---

init/calibrate.c | 2 +-
init/main.c | 25 +++++++++++++++++++++++++
kernel/printk.c | 33 +++++++++++++++++++++++++++++++++
lib/Kconfig.debug | 18 ++++++++++++++++++
4 files changed, 77 insertions(+), 1 deletion(-)

--- linux-2615-work.orig/init/main.c
+++ linux-2615-work/init/main.c
@@ -557,6 +557,31 @@ static int __init initcall_debug_setup(c
}
__setup("initcall_debug", initcall_debug_setup);

+#ifdef CONFIG_BOOT_DELAY
+
+unsigned int boot_delay = 0; /* msecs delay after each printk during bootup */
+extern long preset_lpj;
+unsigned long long printk_delay_msec = 0; /* per msec, based on boot_delay */
+
+static int __init boot_delay_setup(char *str)
+{
+ unsigned long lpj = preset_lpj ? preset_lpj : 1000000; /* some guess */
+ unsigned long long loops_per_msec = lpj / 1000 * CONFIG_HZ;
+
+ get_option(&str, &boot_delay);
+ if (boot_delay > 10 * 1000)
+ boot_delay = 0;
+
+ printk_delay_msec = loops_per_msec;
+ printk("boot_delay: %u, preset_lpj: %ld, lpj: %lu, CONFIG_HZ: %d, printk_delay_msec: %llu\n",
+ boot_delay, preset_lpj, lpj, CONFIG_HZ, printk_delay_msec);
+
+ return 1;
+}
+__setup("boot_delay=", boot_delay_setup);
+
+#endif
+
struct task_struct *child_reaper = &init_task;

extern initcall_t __initcall_start[], __initcall_end[];
--- linux-2615-work.orig/init/calibrate.c
+++ linux-2615-work/init/calibrate.c
@@ -10,7 +10,7 @@

#include <asm/timex.h>

-static unsigned long preset_lpj;
+unsigned long preset_lpj;
static int __init lpj_setup(char *str)
{
preset_lpj = simple_strtoul(str,NULL,0);
--- linux-2615-work.orig/kernel/printk.c
+++ linux-2615-work/kernel/printk.c
@@ -23,6 +23,7 @@
#include <linux/smp_lock.h>
#include <linux/console.h>
#include <linux/init.h>
+#include <linux/jiffies.h>
#include <linux/module.h>
#include <linux/interrupt.h> /* For in_interrupt() */
#include <linux/config.h>
@@ -201,6 +202,33 @@ out:

__setup("log_buf_len=", log_buf_len_setup);

+#ifdef CONFIG_BOOT_DELAY
+
+extern unsigned int boot_delay; /* msecs to delay after each printk during bootup */
+extern long preset_lpj;
+extern unsigned long long printk_delay_msec;
+
+static void boot_delay_msec(int millisecs)
+{
+ unsigned long long k = printk_delay_msec * millisecs;
+ unsigned long timeout;
+
+ timeout = jiffies + msecs_to_jiffies(millisecs);
+ while (k) {
+ k--;
+ rep_nop();
+ /*
+ * use (volatile) jiffies to prevent
+ * compiler reduction; loop termination via jiffies
+ * is secondary and may or may not happen.
+ */
+ if (time_after(jiffies, timeout))
+ break;
+ }
+}
+
+#endif
+
/*
* Commands to do_syslog:
*
@@ -520,6 +548,11 @@ asmlinkage int printk(const char *fmt, .
r = vprintk(fmt, args);
va_end(args);

+#ifdef CONFIG_BOOT_DELAY
+ if (boot_delay && system_state == SYSTEM_BOOTING)
+ boot_delay_msec(boot_delay);
+#endif
+
return r;
}

--- linux-2615-work.orig/lib/Kconfig.debug
+++ linux-2615-work/lib/Kconfig.debug
@@ -186,6 +186,24 @@ config FRAME_POINTER
some architectures or if you use external debuggers.
If you don't debug the kernel, you can say N.

+config BOOT_DELAY
+ bool "Delay each boot message by N milliseconds"
+ depends on DEBUG_KERNEL
+ help
+ This build option allows you to read kernel boot messages
+ by inserting a short delay after each one. The delay is
+ specified in milliseconds on the kernel command line,
+ using "boot_delay=N".
+
+ It is likely that you would also need to use "lpj=M" to preset
+ the "loops per jiffie" value.
+ See a previous boot log for the "lpj" value to use for your
+ system, and then set "lpj=M" before setting "boot_delay=N".
+ NOTE: Using this option may adversely affect SMP systems.
+ I.e., processors other than the first one may not boot up.
+ BOOT_DELAY also may cause DETECT_SOFTLOCKUP to detect
+ what it believes to be lockup conditions.
+
config RCU_TORTURE_TEST
tristate "torture tests for RCU"
depends on DEBUG_KERNEL

2006-01-05 07:30:19

[permalink] [raw]

Subject: Re: oops pauser. / boot_delayer

Randy.Dunlap <[email protected]> wrote:
> This one delays each printk() during boot by a variable time
> (from kernel command line), while system_state == SYSTEM_BOOTING.

This sounds a bit like a aprils fool joke, what it is meant to do? You can
read the messages in the bootlog and use the scrollback keys, no?

Gruss
Bernd

2006-01-05 08:07:49

by Jan Engelhardt

[permalink] [raw]

Subject: Re: oops pauser. / boot_delayer

>> This one delays each printk() during boot by a variable time
>> (from kernel command line), while system_state == SYSTEM_BOOTING.
>
>This sounds a bit like a aprils fool joke, what it is meant to do? You can
>read the messages in the bootlog and use the scrollback keys, no?
>
If the end result is a PANIC, then no, then scrollback keys do not work.
Also note that the kernel generates a lot of noise^W text - if now the
start scripts from $YOUR_FAVORITE_DISTRO also fill up, I can barely reach
the top of the kernel when it says
Linux version 2.6.15 ([email protected]) (gcc version 4.0.2
20050901 (prerelease) (SUSE Linux)) #1 Tue Jan 3 09:21:27 CET 2006

Plus, if you happen to oops away, panic away or just get a "VFS root
unmountable" during kernel _boot_, you cannot use scrollback either.

So to say, scrollback starts working (for me) when INIT is spawned.

Jan Engelhardt
--
| Alphagate Systems, http://alphagate.hopto.org/
| jengelh's site, http://jengelh.hopto.org/

2006-01-05 08:15:10

by Jan Engelhardt

[permalink] [raw]

Subject: Re: oops pauser.

>In my quest to get better debug data from users in Fedora bug reports,
>I came up with this patch. A majority of users don't have serial
>consoles, so when an oops scrolls off the top of the screen,
>and locks up, they usually end up reporting a 2nd (or later) oops
>that isn't particularly helpful (or worse, some inconsequential
>info like 'sleeping whilst atomic' warnings)

Here's something interesting too:
Sometimes, an oops is even longer than 25 rows, and the usual user
does not have
- VGA mode with a lot of lines (because it's hard to read)
- FB mode with a lot of lines (slow, and it's also hard to read)

Is it be possible to change the VGA mode to 80x43/80x50/80x60
during protected mode?

>With this patch, if we oops, there's a pause for a two minutes..
>which hopefully gives people enough time to grab a digital camera
>to take a screenshot of the oops.
>
It would be ideal to have something like BSD's "dump to predefined
block device on oops", so extraction of oops logs requires neither
pen-and-paper nor a digital camera. Requires another partition that
can be used for it, though.

>The one case this doesn't catch is the problem of oopses whilst
>in X. Previously a non-fatal oops would stall X momentarily,
>and then things continue. Now those cases will lock up completely
>for two minutes. Future patches could add some additional feedback
>during this 'stall' such as the blinky keyboard leds, or periodic speaker beeps.
>
Oh yes, include Stas Sergeev's PCSP patch and play a WAV telling "your box
just crashed, wait two minutes for uh ... an oops you can't grab
either"(*).

(*) If the oops is longer than 25 lines, ... you can't even use scrollback
because scrollback is cleared when you change consoles. X runs by default
on tty7, and the kernel dumps it somewhere else. (And even if it dumped to
tty7 directly, you would not see it.)

Jan Engelhardt
--
| Alphagate Systems, http://alphagate.hopto.org/
| jengelh's site, http://jengelh.hopto.org/

2006-01-05 09:25:31

[permalink] [raw]

Subject: Re: oops pauser. / boot_delayer

On Thu, 05 Jan 2006 08:30:16 +0100, [email protected] (Bernd Eckenfels) wrote:

>Randy.Dunlap <[email protected]> wrote:
>> This one delays each printk() during boot by a variable time
>> (from kernel command line), while system_state == SYSTEM_BOOTING.
>
>This sounds a bit like a aprils fool joke, what it is meant to do? You can
>read the messages in the bootlog and use the scrollback keys, no?

No, after oops, console dead, very dead . . . no scrollback :(

Just the image on the screen, until one hits the power or reset
button.

Very sad,,, You want a kernel monitor to baby boot process?

Grant.

2006-01-05 10:33:48

[permalink] [raw]

Subject: Re: oops pauser.

On Thu, Jan 05, 2006 at 09:15:02AM +0100, Jan Engelhardt wrote:

> Here's something interesting too:
> Sometimes, an oops is even longer than 25 rows, and the usual user
> does not have
> - VGA mode with a lot of lines (because it's hard to read)
> - FB mode with a lot of lines (slow, and it's also hard to read)

See the other patch I sent which halves the amount of lines needed
for a backtrace on i386 (like x86-64 uses). This helps too.

> Is it be possible to change the VGA mode to 80x43/80x50/80x60
> during protected mode?

After an oops, we can't really rely on anything. What if the
oops came from the console layer, or a framebuffer driver?

> >With this patch, if we oops, there's a pause for a two minutes..
> >which hopefully gives people enough time to grab a digital camera
> >to take a screenshot of the oops.
> >
> It would be ideal to have something like BSD's "dump to predefined
> block device on oops", so extraction of oops logs requires neither
> pen-and-paper nor a digital camera. Requires another partition that
> can be used for it, though.

I dislike most of the disk dump patches that I've seen out there
because most of them rely on the system being in a decent enough
state to be able to write out blocks of data.

If I had any faith in the sturdyness of the floppy driver, I'd
recommend someone looked into a 'dump oops to floppy' patch, but
it too relies on a large part of the system being in a sane
enough state to write blocks out to disk.

> (*) If the oops is longer than 25 lines, ... you can't even use scrollback
> because scrollback is cleared when you change consoles. X runs by default
> on tty7, and the kernel dumps it somewhere else. (And even if it dumped to
> tty7 directly, you would not see it.)

What to do about oopses whilst in X has been the subject of much
head-scratching for years now. It's come up at least at the
last two kernel summits, and I'll hazard a guess it'll come up
again this year. The amount of work necessary to make it all
work on both kernel side and X side isn't unsubstantial however,
so I wouldn't count on it working too soon.

Hmm, SuSE/Novell folks, doesn't NKLD take over an X display?
ISTR during a demo at last years OLS the presenter was flipping
in/out of the debugger between slides. Is there anything
useful there ?

Dave

2006-01-05 11:05:18

by Jan Engelhardt

[permalink] [raw]

Subject: Re: oops pauser.

>See the other patch I sent which halves the amount of lines needed
>for a backtrace on i386 (like x86-64 uses). This helps too.
>
.oO( Compress the oops, encode it base64 and display that instead )Oo. :-)

> > Is it be possible to change the VGA mode to 80x43/80x50/80x60
> > during protected mode?
>
>After an oops, we can't really rely on anything. What if the
>oops came from the console layer, or a framebuffer driver?
>
Well, setting the video mode can be done (on x86, ugh) with a BIOS call, so
we would not need to run through oops-affected code. But that was the
question, if this int 0x10 call was possible at all. Think of VBE -
VBE3 is the first version that can be done in protected mode.

>If I had any faith in the sturdyness of the floppy driver, I'd
>recommend someone looked into a 'dump oops to floppy' patch, but
>it too relies on a large part of the system being in a sane
>enough state to write blocks out to disk.
>
Right, sad world. (With fun I await the day someone writes a morse encoder
that writes oops to keyboard leds.)

Jan Engelhardt
--

2006-01-05 11:11:18

[permalink] [raw]

Subject: Re: oops pauser. / boot_delayer

On Thu, Jan 05, 2006 at 08:30:16AM +0100, Bernd Eckenfels wrote:
> Randy.Dunlap <[email protected]> wrote:
> > This one delays each printk() during boot by a variable time
> > (from kernel command line), while system_state == SYSTEM_BOOTING.
>
> This sounds a bit like a aprils fool joke, what it is meant to do? You can
> read the messages in the bootlog and use the scrollback keys, no?

could be handy for those 'I see a few messages that scroll, and the
box instantly reboots' bugs. Quite rare, but they do happen.

Dave

2006-01-05 12:05:17

[permalink] [raw]

Subject: Re: oops pauser.

Jan Engelhardt (on Thu, 5 Jan 2006 12:05:08 +0100 (MET)) wrote:
>>
>Right, sad world. (With fun I await the day someone writes a morse encoder
>that writes oops to keyboard leds.)

It's already been done, both leds and PC speaker. http://kerneltrap.org/node/575/2355

2006-01-05 13:36:00

[permalink] [raw]

Subject: Re: oops pauser.

On Mer, 2006-01-04 at 23:52 -0500, Dave Jones wrote:
> With this patch, if we oops, there's a pause for a two minutes..
> which hopefully gives people enough time to grab a digital camera
> to take a screenshot of the oops.

This appears to reduce the amount of information available as an oops
instead of spewing to the log and continuing generally will hang the box
stopping the scroll keys being used or dmesg being used to get the data
out.

Who is going to wait two minutes for an oops when for most users its
their only box. Instead of pasting reports people will now reboot, or
perhaps send you the half a report they can see (which because we dump
too much info by default to fit the screen is also useless).

> The one case this doesn't catch is the problem of oopses whilst
> in X. Previously a non-fatal oops would stall X momentarily,
> and then things continue. Now those cases will lock up completely
> for two minutes.

The console has awareness of graphic/text mode at all times and knows
what is going on. Why not use that information if you must go this way ?

Alan

2006-01-05 13:46:54

[permalink] [raw]

Subject: Re: oops pauser.

On Thu, Jan 05, 2006 at 05:33:39AM -0500, Dave Jones took 0 lines to write:
>
> If I had any faith in the sturdyness of the floppy driver, I'd
> recommend someone looked into a 'dump oops to floppy' patch, but
> it too relies on a large part of the system being in a sane
> enough state to write blocks out to disk.

Not to mention that an increasing number of systems ship without a
floppy drive.

Kurt
--
If you perceive that there are four possible ways in which a procedure
can go wrong, and circumvent these, then a fifth way will promptly
develop.

2006-01-05 13:58:56

by Avishay Traeger

[permalink] [raw]

Subject: Re: oops pauser.

Some comments:
1. I think this is a good idea, since serial consoles can also change
timings. I have seen several race conditions where the problem goes
away once I add a serial console.
2. Should this be a separate debugging option?
3. Shouldn't you have KERN____ in your printk statements?
4. Wouldn't printing out the message every second make the oops scroll
off the screen, defeating the purpose of the patch?

Avishay Traeger
http://www.fsl.cs.sunysb.edu/~avishay/

On Wed, 2006-01-04 at 23:52 -0500, Dave Jones wrote:
> In my quest to get better debug data from users in Fedora bug reports,
> I came up with this patch. A majority of users don't have serial
> consoles, so when an oops scrolls off the top of the screen,
> and locks up, they usually end up reporting a 2nd (or later) oops
> that isn't particularly helpful (or worse, some inconsequential
> info like 'sleeping whilst atomic' warnings)
>
> With this patch, if we oops, there's a pause for a two minutes..
> which hopefully gives people enough time to grab a digital camera
> to take a screenshot of the oops.
>
> It has an on-screen timer so the user knows what's going on,
> (and that it's going to come back to life [maybe] after the oops).
>
> The one case this doesn't catch is the problem of oopses whilst
> in X. Previously a non-fatal oops would stall X momentarily,
> and then things continue. Now those cases will lock up completely
> for two minutes. Future patches could add some additional feedback
> during this 'stall' such as the blinky keyboard leds, or periodic speaker beeps.
>
> Signed-off-by: Dave Jones <[email protected]>
>
> --- vanilla/arch/i386/kernel/traps.c 2006-01-02 22:21:10.000000000 -0500
> +++ linux-2.6.15/arch/i386/kernel/traps.c 2006-01-04 23:42:46.000000000 -0500
> @@ -256,6 +271,15 @@ void show_registers(struct pt_regs *regs
> }
> }
> printk("\n");
> + {
> + int i;
> + for (i=120;i>0;i--) {
> + mdelay(1000);
> + touch_nmi_watchdog();
> + printk("Continuing in %d seconds. \r", i);
> + }
> + printk("\n");
> + }
> }
>
> static void handle_BUG(struct pt_regs *regs)
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2006-01-05 14:40:15

by Kyle McMartin

[permalink] [raw]

Subject: Re: oops pauser.

On Wed, Jan 04, 2006 at 11:52:12PM -0500, Dave Jones wrote:
> printk("\n");
> + {
> + int i;
> + for (i=120;i>0;i--) {
> + mdelay(1000);
> + touch_nmi_watchdog();
> + printk("Continuing in %d seconds. \r", i);
> + }
> + printk("\n");
> + }
>

Nice, this is cool. Though, perhaps it would be better if the loop length
was a command line argument like with panic_timeout?

Cheers,
Kyle

2006-01-05 15:17:36

[permalink] [raw]

Subject: Re: oops pauser.

On 1/5/06, Jan Engelhardt <[email protected]> wrote:
>
> >See the other patch I sent which halves the amount of lines needed
> >for a backtrace on i386 (like x86-64 uses). This helps too.
> >
> .oO( Compress the oops, encode it base64 and display that instead )Oo. :-)
>
Not really something we want to do at Oops time and even if the kernel
was in a sane enough state to actually do it you've just increased the
amount of work needing to be done to decode the Oops by everyone
recieving/wanting to read it.

I think a better idea is to try and move things around so the most
useful pieces of information are on the last lines of the Oops output
(most likely to not have scrolled off the screen) and also work to
elliminate lines that are not really useful/helpful and maybe try to
cram more info from multiple short lines into a single line.

--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

2006-01-05 15:32:14

[permalink] [raw]

Subject: Re: oops pauser. / boot_delayer

Grant Coady wrote:
>
> No, after oops, console dead, very dead . . . no scrollback :(

This mis-feature is beginning to annoy more and more.

I seem to recall that "in the old days" (1990s),
this was NOT the case: scrollback still worked from oops.

I wonder if perhaps a better feature here would be to fix that again?

Cheers

2006-01-05 15:39:09

by Avishay Traeger

[permalink] [raw]

Subject: Re: oops pauser. / boot_delayer

On Thu, 2006-01-05 at 10:31 -0500, Mark Lord wrote:
> Grant Coady wrote:
> >
> > No, after oops, console dead, very dead . . . no scrollback :(
>
> This mis-feature is beginning to annoy more and more.
>
> I seem to recall that "in the old days" (1990s),
> this was NOT the case: scrollback still worked from oops.

I am able to scroll up on the console for most regular oopses, but not
panics. Am I missing something here?

Avishay Traeger
http://www.fsl.cs.sunysb.edu/~avishay/

2006-01-05 19:15:52

[permalink] [raw]

Subject: Re: oops pauser. / boot_delayer

Avishay Traeger wrote:
> On Thu, 2006-01-05 at 10:31 -0500, Mark Lord wrote:
>
>>Grant Coady wrote:
>>
>>>No, after oops, console dead, very dead . . . no scrollback :(
>>
>>This mis-feature is beginning to annoy more and more.
>>
>>I seem to recall that "in the old days" (1990s),
>>this was NOT the case: scrollback still worked from oops.

s/oops/panic/

> I am able to scroll up on the console for most regular oopses, but not
> panics. Am I missing something here?

No, I meant "panics".

Cheers

2006-01-06 00:20:16

[permalink] [raw]

Subject: Re: oops pauser.

On Thu, Jan 05, 2006 at 08:58:53AM -0500, Avishay Traeger wrote:
> Some comments:
> 1. I think this is a good idea, since serial consoles can also change
> timings. I have seen several race conditions where the problem goes
> away once I add a serial console.

Agreed.

> 2. Should this be a separate debugging option?

Agreed.

> 3. Shouldn't you have KERN____ in your printk statements?

That's something to watch out for...If you say have:

printk(KERN_DEBUG "fooo.....");
do_foo();
printk(KERN_DEBUG "done.\n");

Then, you'll get the extra "<7>" on the screen and in the logs (assuming
you set the printk levels to display KERN_DEBUG).

Now, I'm not 100% sure about '\r', but I suspect it does the same thing.

> 4. Wouldn't printing out the message every second make the oops scroll
> off the screen, defeating the purpose of the patch?

No, read the patch carefully, it uses '\r' to go back to the begining of
the line and overwrites the message.

Jeff.

2006-01-06 01:13:03

[permalink] [raw]

Subject: Re: oops pauser.

Josef Sipek <[email protected]> wrote:
> That's something to watch out for...If you say have:
>
> printk(KERN_DEBUG "fooo.....");
> do_foo();
> printk(KERN_DEBUG "done.\n");

dont do it. It is better to have the time stamps for both and to have atomic
prints. In fact I would disallow this and add automatic linebreaks.

Gruss
Bernd

2006-01-06 01:24:15

[permalink] [raw]

Subject: Re: oops pauser.

On Thu, 5 Jan 2006, Dave Jones wrote:

> > (*) If the oops is longer than 25 lines, ... you can't even use scrollback
> > because scrollback is cleared when you change consoles. X runs by default
> > on tty7, and the kernel dumps it somewhere else. (And even if it dumped to
> > tty7 directly, you would not see it.)
>
> What to do about oopses whilst in X has been the subject of much
> head-scratching for years now. It's come up at least at the
> last two kernel summits, and I'll hazard a guess it'll come up
> again this year. The amount of work necessary to make it all
> work on both kernel side and X side isn't unsubstantial however,
> so I wouldn't count on it working too soon.

hmm, if you can hope that someone will grab a camera to report an oops,
how about them grabbing a tape recorder/mp3 recorder to record audio from
the speaker. it's not fast, but you don't have that much data to output,
do it in morse (with the audio explination of what's going to happen
first)

David Lang

--
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
-- C.A.R. Hoare

2006-01-06 01:29:13

[permalink] [raw]

Subject: Re: oops pauser. / boot_delayer

On Thu, 5 Jan 2006, Jan Engelhardt wrote:

> Also note that the kernel generates a lot of noise^W text - if now the
> start scripts from $YOUR_FAVORITE_DISTRO also fill up, I can barely reach
> the top of the kernel when it says
> Linux version 2.6.15 ([email protected]) (gcc version 4.0.2
> 20050901 (prerelease) (SUSE Linux)) #1 Tue Jan 3 09:21:27 CET 2006

enable a few different types of encryption and you have to enlarge the
buffer (by quite a bit). the fact that all the encryption tests print
several lines each out and can't be turned off (short of a quiet boot
where you loose everything) is one of the more annoying things to me right
now.

this large boot message issue also slows your boot significantly if you
have a fast box that has a serial console, it takes a long time to dump
all that info out the serial port.

David Lang

--
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
-- C.A.R. Hoare

2006-01-06 01:35:32

[permalink] [raw]

Subject: Re: oops pauser.

On Fri, Jan 06, 2006 at 02:12:59AM +0100, Bernd Eckenfels wrote:
> Josef Sipek <[email protected]> wrote:
> > That's something to watch out for...If you say have:
> >
> > printk(KERN_DEBUG "fooo.....");
> > do_foo();
> > printk(KERN_DEBUG "done.\n");
>
> dont do it. It is better to have the time stamps for both and to have atomic
> prints.

First of all, the above code is to just illustrate a point. And as a matter of
fact it may not even work if some other kernel thread prints something while
do_foo() is executing, the whole thing will get screwed up.

If I remember correctly, I the second line of the "sample" code, will _NOT_
produce a timestamp. So, the output will be:

[1234567.123456] fooo.....<7>done.

where, the timestamp is that of the first printk.

> In fact I would disallow this and add automatic linebreaks.

I wouldn't go that far. I'd just let the kernel janitors people have fun with
the existing code :)

Jeff.

2006-01-06 01:41:50

[permalink] [raw]

Subject: Re: oops pauser.

On Thu, Jan 05, 2006 at 05:24:01PM -0800, David Lang wrote:
> On Thu, 5 Jan 2006, Dave Jones wrote:
>
> >> (*) If the oops is longer than 25 lines, ... you can't even use
> >scrollback
> >> because scrollback is cleared when you change consoles. X runs by default
> >> on tty7, and the kernel dumps it somewhere else. (And even if it dumped
> >to
> >> tty7 directly, you would not see it.)
> >
> >What to do about oopses whilst in X has been the subject of much
> >head-scratching for years now. It's come up at least at the
> >last two kernel summits, and I'll hazard a guess it'll come up
> >again this year. The amount of work necessary to make it all
> >work on both kernel side and X side isn't unsubstantial however,
> >so I wouldn't count on it working too soon.
>
> hmm, if you can hope that someone will grab a camera to report an oops,
> how about them grabbing a tape recorder/mp3 recorder to record audio from
> the speaker. it's not fast, but you don't have that much data to output,
> do it in morse (with the audio explination of what's going to happen
> first)

There is a patch somewhere that uses the keyboard lights to "display" panics,
and a comment that the PC speaker implementation is left up to the reader :)

It shouldn't be hard do, then all you need is just one printk telling the user
to record it :)

Jeff.

2006-01-06 02:07:23

[permalink] [raw]

Subject: Re: oops pauser.

On Thu, Jan 05, 2006 at 08:58:53AM -0500, Avishay Traeger wrote:
> Some comments:
> 1. I think this is a good idea, since serial consoles can also change
> timings. I have seen several race conditions where the problem goes
> away once I add a serial console.
> 2. Should this be a separate debugging option?

maybe

> 3. Shouldn't you have KERN____ in your printk statements?

doesn't make a great deal of difference in this context.

> 4. Wouldn't printing out the message every second make the oops scroll
> off the screen, defeating the purpose of the patch?

no. that's why it uses \r instead of \n.

Dave

2006-01-06 02:07:21

[permalink] [raw]

Subject: Re: oops pauser.

On Thu, Jan 05, 2006 at 01:37:33PM +0000, Alan Cox wrote:
> On Mer, 2006-01-04 at 23:52 -0500, Dave Jones wrote:
> > With this patch, if we oops, there's a pause for a two minutes..
> > which hopefully gives people enough time to grab a digital camera
> > to take a screenshot of the oops.
>
> This appears to reduce the amount of information available as an oops
> instead of spewing to the log

The huge number of oopses never hit the logs.
They either hit early in boot before syslog is even running, or
they kill the box.

> and continuing generally will hang the box
> stopping the scroll keys being used or dmesg being used to get the data
> out.

This is exactly the problem this patch addresses.
The 'scroll keys' do not work in cases where we lock up after an oops.

If the useful parts of the oops scrolled off the top of the screen, we've
lost any chance of debugging whatever just happened.

> Who is going to wait two minutes for an oops when for most users its
> their only box.

The real-world disagrees with you. In the few weeks it's been in Fedora,
several previously undiagnosable oopses were caught, and even *users*
agreed it was a useful addition. If the two minutes is excessive, we can
lower it, or even make it a boot-option.

Another possibility is instantly continuing after a keypress.

> Instead of pasting reports people will now reboot, or
> perhaps send you the half a report they can see (which because we dump
> too much info by default to fit the screen is also useless).

See the other patch which halves the number of lines needed for a backtrace.
With that, even if the user is running 25 line high displays, we've
a pretty good chance it'll fit except for really long backtraces,
and if that's the case, we can ask users to try to reproduce after
booting with vga=1, (or better, vga=791 for eg).

> > The one case this doesn't catch is the problem of oopses whilst
> > in X. Previously a non-fatal oops would stall X momentarily,
> > and then things continue. Now those cases will lock up completely
> > for two minutes.
>
> The console has awareness of graphic/text mode at all times and knows
> what is going on. Why not use that information if you must go this way ?

If we've just oopsed, the console may have no awareness of what day it is,
yet alone anything about video modes. I'm not entirely sure what you're
suggesting, but it gives me the creeps. Are you talking about switching
away from X back to a tty when we oops?

Dave

2006-01-06 02:21:35

[permalink] [raw]

Subject: Re: oops pauser.

Josef Sipek <[email protected]> wrote:
> First of all, the above code is to just illustrate a point. And as a matter of
> fact it may not even work if some other kernel thread prints something while
> do_foo() is executing, the whole thing will get screwed up.

Thats another reason to not do it. And this means for me, we do not need to
support or optimize for this kind of printk abuse.

> If I remember correctly, I the second line of the "sample" code, will _NOT_
> produce a timestamp. So, the output will be:
>
> [1234567.123456] fooo.....<7>done.

> where, the timestamp is that of the first printk.

Yes, thats the other problem, you miss the timestamp for the end of a long
running operation. Thats why it is better to have that in two lines (maybe
the second line with smaller severity)

Gruss
Bernd

2006-01-06 05:36:33

[permalink] [raw]

Subject: Re: oops pauser. / boot_delayer

On Thu, Jan 05, 2006 at 05:28:59PM -0800, David Lang wrote:
> On Thu, 5 Jan 2006, Jan Engelhardt wrote:
>
> >Also note that the kernel generates a lot of noise^W text - if now the
> >start scripts from $YOUR_FAVORITE_DISTRO also fill up, I can barely reach
> >the top of the kernel when it says
> > Linux version 2.6.15 ([email protected]) (gcc version 4.0.2
> > 20050901 (prerelease) (SUSE Linux)) #1 Tue Jan 3 09:21:27 CET 2006
>
> enable a few different types of encryption and you have to enlarge the
> buffer (by quite a bit). the fact that all the encryption tests print
> several lines each out and can't be turned off (short of a quiet boot
> where you loose everything) is one of the more annoying things to me right
> now.
>
> this large boot message issue also slows your boot significantly if you
> have a fast box that has a serial console, it takes a long time to dump
> all that info out the serial port.

So disable CONFIG_CRYPTO_TEST. There's no reason to test this stuff every boot.

Dave

2006-01-06 05:53:32

by Robert Hancock

[permalink] [raw]

Subject: Re: oops pauser.

Dave Jones wrote:
> After an oops, we can't really rely on anything. What if the
> oops came from the console layer, or a framebuffer driver?

In this case, maybe it wouldn't work.. but I think it would be helpful a
big majority of the time.

Surely there must be a way this could be done. After all, Windows seems
to manage to fairly reliably switch the display into VGA 80x50 mode and
put up the BSOD with error dump information with reasonably good
reliability..

--
Robert Hancock Saskatoon, SK, Canada
To email, remove "nospam" from [email protected]
Home Page: http://www.roberthancock.com/

2006-01-06 07:00:45

[permalink] [raw]

Subject: Re: oops pauser. / boot_delayer

On Fri, 6 Jan 2006, Dave Jones wrote:

> On Thu, Jan 05, 2006 at 05:28:59PM -0800, David Lang wrote:
> > On Thu, 5 Jan 2006, Jan Engelhardt wrote:
> >
> > >Also note that the kernel generates a lot of noise^W text - if now the
> > >start scripts from $YOUR_FAVORITE_DISTRO also fill up, I can barely reach
> > >the top of the kernel when it says
> > > Linux version 2.6.15 ([email protected]) (gcc version 4.0.2
> > > 20050901 (prerelease) (SUSE Linux)) #1 Tue Jan 3 09:21:27 CET 2006
> >
> > enable a few different types of encryption and you have to enlarge the
> > buffer (by quite a bit). the fact that all the encryption tests print
> > several lines each out and can't be turned off (short of a quiet boot
> > where you loose everything) is one of the more annoying things to me right
> > now.
> >
> > this large boot message issue also slows your boot significantly if you
> > have a fast box that has a serial console, it takes a long time to dump
> > all that info out the serial port.
>
> So disable CONFIG_CRYPTO_TEST. There's no reason to test this stuff every boot.
>

I've looked for such a config option and not found it in menuconfig. I'll
take another look.

Ok, I found it. the help isn't clear about exactly what this does. Adding
a blurb that you probably want it off unless you are developeing a crypto
module, or that it's intended as a debugging tool would help clarify it.

Thanks.

David Lang

--
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
-- C.A.R. Hoare

2006-01-06 07:06:18

by Jan Engelhardt

[permalink] [raw]

Subject: Re: oops pauser.

>> After an oops, we can't really rely on anything. What if the
>> oops came from the console layer, or a framebuffer driver?

How about this?:

Put an "emergency kernel" into a memory location that is being protected in
some way (i.e. writing there even from kernel space generates an oops).
Upon oops, it gets unlocked and we do some sort of kexec() to it.
Of course, this probably requires that the unlocking must not be done
with help of the standard page mappings.

Jan Engelhardt
--

2006-01-06 07:36:55

by Jan Engelhardt

[permalink] [raw]

Subject: Re: oops pauser. / boot_delayer

> this large boot message issue also slows your boot significantly if you have a
> fast box that has a serial console, it takes a long time to dump all that info
> out the serial port.

Don't blame the kernel that serial is slow.

Jan Engelhardt
--

2006-01-06 07:47:31

by Randy Dunlap

[permalink] [raw]

Subject: Re: oops pauser.

On Fri, 6 Jan 2006 08:06:12 +0100 (MET) Jan Engelhardt wrote:

> >> After an oops, we can't really rely on anything. What if the
> >> oops came from the console layer, or a framebuffer driver?
>
> How about this?:
>
> Put an "emergency kernel" into a memory location that is being protected in
> some way (i.e. writing there even from kernel space generates an oops).
> Upon oops, it gets unlocked and we do some sort of kexec() to it.
> Of course, this probably requires that the unlocking must not be done
> with help of the standard page mappings.

This is what kexec + kdump is.

---
~Randy

2006-01-06 08:34:03

[permalink] [raw]

Subject: Re: oops pauser. / boot_delayer

On Fri, 6 Jan 2006, Jan Engelhardt wrote:

>> this large boot message issue also slows your boot significantly if you have a
>> fast box that has a serial console, it takes a long time to dump all that info
>> out the serial port.
>
> Don't blame the kernel that serial is slow.

the complaint wasn't that the serial was slow, It was a comment on the
amount of data being displayed during a boot (which turned out to be in
large part that I had a verbose config option turned on)

David Lang

--
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
-- C.A.R. Hoare

2006-01-06 08:58:41

[permalink] [raw]

Subject: Re: oops pauser.

On Fri, Jan 06, 2006 at 08:06:12AM +0100, Jan Engelhardt wrote:
> >> After an oops, we can't really rely on anything. What if the
> >> oops came from the console layer, or a framebuffer driver?
>
> How about this?:
>
> Put an "emergency kernel" into a memory location that is being protected in
> some way (i.e. writing there even from kernel space generates an oops).
> Upon oops, it gets unlocked and we do some sort of kexec() to it.

You just reinvented 'kdump' :)
There's been ongoing work in this area for a while.

Dave

2006-01-06 13:29:37

[permalink] [raw]

Subject: Re: oops pauser.

On Iau, 2006-01-05 at 15:52 -0500, Dave Jones wrote:
> The huge number of oopses never hit the logs.
> They either hit early in boot before syslog is even running, or
> they kill the box.

So you don't need a two minute delay for those because as you said it
froze the box
>
> > and continuing generally will hang the box
> > stopping the scroll keys being used or dmesg being used to get the data
> > out.
>
> This is exactly the problem this patch addresses.
> The 'scroll keys' do not work in cases where we lock up after an oops.

And in those cases the 2 minute freeze is meaningless

> The real-world disagrees with you. In the few weeks it's been in Fedora,
> several previously undiagnosable oopses were caught, and even *users*
> agreed it was a useful addition. If the two minutes is excessive, we can
> lower it, or even make it a boot-option.

Any change will capture different oopses. A boot option isnt a bad idea,
or for that matter also truncating the call trace to the *top* few (or
as Bryce suggested on irc reversing the printing order)

> Another possibility is instantly continuing after a keypress.

If the input layer is running that would be sensible.

> > The console has awareness of graphic/text mode at all times and knows
> > what is going on. Why not use that information if you must go this way ?
>
> If we've just oopsed, the console may have no awareness of what day it is,
> yet alone anything about video modes. I'm not entirely sure what you're
> suggesting, but it gives me the creeps. Are you talking about switching
> away from X back to a tty when we oops?

Well you could try and do that but I was more thinking that if the
console has been told we are in graphics mode then the 2 minute delay
shouldn't occur.

Alan

2006-01-06 15:22:25

by Pavel Machek

[permalink] [raw]

Subject: Re: oops pauser.

Hi!

> > > The one case this doesn't catch is the problem of oopses whilst
> > > in X. Previously a non-fatal oops would stall X momentarily,
> > > and then things continue. Now those cases will lock up completely
> > > for two minutes.
> >
> > The console has awareness of graphic/text mode at all times and knows
> > what is going on. Why not use that information if you must go this way ?
>
> If we've just oopsed, the console may have no awareness of what day it is,
> yet alone anything about video modes. I'm not entirely sure what you're
> suggesting, but it gives me the creeps. Are you talking about switching
> away from X back to a tty when we oops?

No.

But you _know_ if user is running X or not -- notice that kernel does
not attempt to printk() when X is running, because that could lock up
the box.

If user is running X, you don't need the delay.

if (CON_IS_VISIBLE(vc) && vc->vc_mode == KD_TEXT) {
delay(10sec)
}

or something like that should do the trick.
Pavel

--
Thanks, Sharp!

2006-01-06 19:06:42

by Jan Engelhardt

[permalink] [raw]

Subject: Re: oops pauser.

>No.
>
>But you _know_ if user is running X or not -- notice that kernel does
>not attempt to printk() when X is running, because that could lock up
>the box.
>
>If user is running X, you don't need the delay.
>
>if (CON_IS_VISIBLE(vc) && vc->vc_mode == KD_TEXT) {

Does framebuffer fall under KD_TEXT?

Jan Engelhardt
--

2006-01-06 20:34:10

[permalink] [raw]

Subject: Re: oops pauser.

On Fri, Jan 06, 2006 at 01:31:10PM +0000, Alan Cox wrote:
> On Iau, 2006-01-05 at 15:52 -0500, Dave Jones wrote:
> > The huge number of oopses never hit the logs.
> > They either hit early in boot before syslog is even running, or
> > they kill the box.
>
> So you don't need a two minute delay for those because as you said it
> froze the box

it froze *AFTER* the oops had scrolled off the top of the screen.

The sequence of events before

oops
scrolly scrolly
random crap about sleeping whilst atomic or the like
scrolly scrolly
HANG

with this patch..

oops
*pause for two minutes whilst user takes a picture/scribbles it down*
scrolly scrolly
random crap about sleeping whilst atomic or the like
scrolly scrolly
HANG

> > > and continuing generally will hang the box
> > > stopping the scroll keys being used or dmesg being used to get the data
> > > out.
> >
> > This is exactly the problem this patch addresses.
> > The 'scroll keys' do not work in cases where we lock up after an oops.
>
> And in those cases the 2 minute freeze is meaningless

it does if it stops the oops scrolling off the screen first long enough
to capture it.

> > Another possibility is instantly continuing after a keypress.
> If the input layer is running that would be sensible.

Yeah, questionable. And polling hardware won't work due to usb keyboards.

> > If we've just oopsed, the console may have no awareness of what day it is,
> > yet alone anything about video modes. I'm not entirely sure what you're
> > suggesting, but it gives me the creeps. Are you talking about switching
> > away from X back to a tty when we oops?
>
> Well you could try and do that but I was more thinking that if the
> console has been told we are in graphics mode then the 2 minute delay
> shouldn't occur.

Hmm. I'll look into that.
Any pointers ? (I don't want to spend longer than necessary looking
in that code :-)

Dave

2006-01-06 22:34:49

by Pavel Machek

[permalink] [raw]

Subject: Re: oops pauser.

On P? 06-01-06 20:06:36, Jan Engelhardt wrote:
> >No.
> >
> >But you _know_ if user is running X or not -- notice that kernel does
> >not attempt to printk() when X is running, because that could lock up
> >the box.
> >
> >If user is running X, you don't need the delay.
> >
> >if (CON_IS_VISIBLE(vc) && vc->vc_mode == KD_TEXT) {
>
> Does framebuffer fall under KD_TEXT?

I think so.

--
Thanks, Sharp!

2006-01-06 22:48:27

[permalink] [raw]

Subject: Re: oops pauser.

On Fri, Jan 06, 2006 at 04:22:03PM +0100, Pavel Machek wrote:
> Hi!
>
> > > > The one case this doesn't catch is the problem of oopses whilst
> > > > in X. Previously a non-fatal oops would stall X momentarily,
> > > > and then things continue. Now those cases will lock up completely
> > > > for two minutes.
> > >
> > > The console has awareness of graphic/text mode at all times and knows
> > > what is going on. Why not use that information if you must go this way ?
> >
> > If we've just oopsed, the console may have no awareness of what day it is,
> > yet alone anything about video modes. I'm not entirely sure what you're
> > suggesting, but it gives me the creeps. Are you talking about switching
> > away from X back to a tty when we oops?
>
> No.
>
> But you _know_ if user is running X or not -- notice that kernel does
> not attempt to printk() when X is running, because that could lock up
> the box.
>
> If user is running X, you don't need the delay.
>
> if (CON_IS_VISIBLE(vc) && vc->vc_mode == KD_TEXT) {
> delay(10sec)
> }

>From this context though, we don't have a 'vc' to reference,
so we'll need to find out from the console layer somehow, which
is the current vc.

Dave

2006-01-07 21:44:47

by Kurtis D. Rader

[permalink] [raw]

Subject: Re: oops pauser. / boot_delayer

On Thu, 2006-01-05 06:11:05, Dave Jones wrote:
> On Thu, Jan 05, 2006 at 08:30:16AM +0100, Bernd Eckenfels wrote:
> > Randy.Dunlap <[email protected]> wrote:
> > > This one delays each printk() during boot by a variable time
> > > (from kernel command line), while system_state == SYSTEM_BOOTING.
> >
> > This sounds a bit like a aprils fool joke, what it is meant to do? You can
> > read the messages in the bootlog and use the scrollback keys, no?
>
> could be handy for those 'I see a few messages that scroll, and the
> box instantly reboots' bugs. Quite rare, but they do happen.

Another very common situation is a system which fails to boot due to
failures to find the root filesystem. This can happen because of device name
slippage, root disk not being found, the proper HBA driver isn't present in
the initrd image, etc. The customer calls us and reports the last thing they
see on the screen:

Mounting root filesystem
Kmod : failed to exec /sbin/modprobe -s -k block-major-8 , error = 2
mount : error 6 mounting ext3
pivotroot : pivot_root(/sysroot,.sysroot/initrd) failed : 2
Freeing unused memory
Kernel panic : No init found . Try passing init= option to kernel

Great! Only problem is the info we really need has already scrolled of the
screen. An option to pause briefly after each boot time printk would be very
useful.

2006-01-07 21:48:17

by Arjan van de Ven

[permalink] [raw]

Subject: Re: oops pauser. / boot_delayer

On Sat, 2006-01-07 at 13:44 -0800, Kurtis D. Rader wrote:
> On Thu, 2006-01-05 06:11:05, Dave Jones wrote:
> > On Thu, Jan 05, 2006 at 08:30:16AM +0100, Bernd Eckenfels wrote:
> > > Randy.Dunlap <[email protected]> wrote:
> > > > This one delays each printk() during boot by a variable time
> > > > (from kernel command line), while system_state == SYSTEM_BOOTING.
> > >
> > > This sounds a bit like a aprils fool joke, what it is meant to do? You can
> > > read the messages in the bootlog and use the scrollback keys, no?
> >
> > could be handy for those 'I see a few messages that scroll, and the
> > box instantly reboots' bugs. Quite rare, but they do happen.
>
> Another very common situation is a system which fails to boot due to
> failures to find the root filesystem. This can happen because of device name
> slippage, root disk not being found, the proper HBA driver isn't present in

mount by label fixes some of that but not all

> the initrd image, etc. The customer calls us and reports the last thing they
> see on the screen:

fwiw it would make sense (at least for distros) to make this print a
more helpful text about potential causes etc, rather than just making
people say "the kernel paniced".

2006-01-07 22:00:15

by Kurtis D. Rader

[permalink] [raw]

Subject: Re: oops pauser. / boot_delayer

On Sat, 2006-01-07 22:48:08, Arjan van de Ven wrote:
> On Sat, 2006-01-07 at 13:44 -0800, Kurtis D. Rader wrote:
> >
> > Another very common situation is a system which fails to boot due to
> > failures to find the root filesystem. This can happen because of device name
> > slippage, root disk not being found, the proper HBA driver isn't present in
>
> mount by label fixes some of that but not all

The "not all" case is important. Especially since the potential causes
of being unable to find the root filesystem keep increasing with each
new capability. And it isn't just failures involving finding the rootfs
that can be problematic to debug without more context than is on the
final screen image.

> > the initrd image, etc. The customer calls us and reports the last thing they
> > see on the screen:
>
> fwiw it would make sense (at least for distros) to make this print a
> more helpful text about potential causes etc, rather than just making
> people say "the kernel paniced".

That might be useful for people who don't have support contracts. It
wouldn't help customer support teams like I'm a member of. We know what
those potential reasons are. The challenge is having enough context to
quickly determine which possible explanation accounts for the failure.
Ideally every customer would have a serial console configured. But a)
most customers don't/won't/can't configure one and b) on many systems a
serial console is not available.

2006-01-07 22:27:17

by Bernd Eckenfels

[permalink] [raw]

Subject: Re: oops pauser. / boot_delayer

On Sat, Jan 07, 2006 at 01:44:39PM -0800, Kurtis D. Rader wrote:
> Great! Only problem is the info we really need has already scrolled of the
> screen. An option to pause briefly after each boot time printk would be very
> useful.

I dont think so. It is too much to read to an supporter by phone, and
somebody who can diag that self knows exactly where the root is searched in
his config. After all it can only be a hardware or driver problem.

I think it makes much more sense to allow scrollback than to delay
printouts. (And I am quite sure scrollback works in this case)

Gruss
Bernd
--
(OO) -- Bernd_Eckenfels@M?rscher_Strasse_8.76185Karlsruhe.de --
( .. ) ecki@{inka.de,linux.de,debian.org} http://www.eckes.org/
o--o 1024D/E383CD7E eckes@IRCNet v:+497211603874 f:+49721151516129
(O____O) When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl!

2006-01-08 13:38:26

[permalink] [raw]

Subject: Re: oops pauser.

On Thu, Jan 05, 2006 at 05:33:39AM -0500, you [Dave Jones] wrote:
>
> If I had any faith in the sturdyness of the floppy driver, I'd
> recommend someone looked into a 'dump oops to floppy' patch, but
> it too relies on a large part of the system being in a sane
> enough state to write blocks out to disk.

I believe kmsgdump (http://www.xenotime.net/linux/kmsgdump/) uses its own
minimal 16-bit floppy driver to save the oops dump.

Kmsgdump has been around for ages and still works with 2.6.x. I almost
always use it (all of my boxes still have floppy drives.)

-- v --

[email protected]

2006-01-08 13:46:04

by Pavel Machek

[permalink] [raw]

Subject: Re: oops pauser. / boot_delayer

On P? 06-01-06 00:36:09, Dave Jones wrote:
> On Thu, Jan 05, 2006 at 05:28:59PM -0800, David Lang wrote:
> > On Thu, 5 Jan 2006, Jan Engelhardt wrote:
> >
> > >Also note that the kernel generates a lot of noise^W text - if now the
> > >start scripts from $YOUR_FAVORITE_DISTRO also fill up, I can barely reach
> > >the top of the kernel when it says
> > > Linux version 2.6.15 ([email protected]) (gcc version 4.0.2
> > > 20050901 (prerelease) (SUSE Linux)) #1 Tue Jan 3 09:21:27 CET 2006
> >
> > enable a few different types of encryption and you have to enlarge the
> > buffer (by quite a bit). the fact that all the encryption tests print
> > several lines each out and can't be turned off (short of a quiet boot
> > where you loose everything) is one of the more annoying things to me right
> > now.
> >
> > this large boot message issue also slows your boot significantly if you
> > have a fast box that has a serial console, it takes a long time to dump
> > all that info out the serial port.
>
> So disable CONFIG_CRYPTO_TEST. There's no reason to test this stuff every boot.

Maybe even with CRYPTO_TEST enabled we could only report _failures_?
Pavel
--
Thanks, Sharp!

2006-01-08 13:53:28

by Randy Dunlap

[permalink] [raw]

Subject: Re: oops pauser.

On Sun, 8 Jan 2006 15:38:22 +0200 Ville Herva wrote:

> On Thu, Jan 05, 2006 at 05:33:39AM -0500, you [Dave Jones] wrote:
> >
> > If I had any faith in the sturdyness of the floppy driver, I'd
> > recommend someone looked into a 'dump oops to floppy' patch, but
> > it too relies on a large part of the system being in a sane
> > enough state to write blocks out to disk.
>
> I believe kmsgdump (http://www.xenotime.net/linux/kmsgdump/) uses its own
> minimal 16-bit floppy driver to save the oops dump.

It just switches to real mode and uses BIOS calls.

> Kmsgdump has been around for ages and still works with 2.6.x. I almost
> always use it (all of my boxes still have floppy drives.)

---
~Randy

2006-01-08 19:30:33

[permalink] [raw]

Subject: Re: oops pauser. / boot_delayer

On Sun, Jan 08, 2006 at 02:21:32PM +0100, Pavel Machek wrote:
> On P? 06-01-06 00:36:09, Dave Jones wrote:
> > On Thu, Jan 05, 2006 at 05:28:59PM -0800, David Lang wrote:
> > > On Thu, 5 Jan 2006, Jan Engelhardt wrote:
> > >
> > > >Also note that the kernel generates a lot of noise^W text - if now the
> > > >start scripts from $YOUR_FAVORITE_DISTRO also fill up, I can barely reach
> > > >the top of the kernel when it says
> > > > Linux version 2.6.15 ([email protected]) (gcc version 4.0.2
> > > > 20050901 (prerelease) (SUSE Linux)) #1 Tue Jan 3 09:21:27 CET 2006
> > >
> > > enable a few different types of encryption and you have to enlarge the
> > > buffer (by quite a bit). the fact that all the encryption tests print
> > > several lines each out and can't be turned off (short of a quiet boot
> > > where you loose everything) is one of the more annoying things to me right
> > > now.
> > >
> > > this large boot message issue also slows your boot significantly if you
> > > have a fast box that has a serial console, it takes a long time to dump
> > > all that info out the serial port.
> >
> > So disable CONFIG_CRYPTO_TEST. There's no reason to test this stuff every boot.
>
> Maybe even with CRYPTO_TEST enabled we could only report _failures_?

Why? As far as I know, it is intended for developers as a regression test. I say
if you don't like the output, make the thing a module or don't compile it at all.

Jeff.

2006-01-08 19:35:11

by Jan Engelhardt

[permalink] [raw]

Subject: Re: oops pauser.

>> I believe kmsgdump (http://www.xenotime.net/linux/kmsgdump/) uses its own
>> minimal 16-bit floppy driver to save the oops dump.
>
>It just switches to real mode and uses BIOS calls.
>

This technique btw is what I suggested (switch to 80x50 vga mode
(if not in X)) in case of a longer oops trace.

Jan Engelhardt
--

2006-01-08 19:41:05

[permalink] [raw]

Subject: Re: oops pauser.

On Sun, 8 Jan 2006 05:53:22 -0800, "Randy.Dunlap" <[email protected]> wrote:

>On Sun, 8 Jan 2006 15:38:22 +0200 Ville Herva wrote:
>
>> On Thu, Jan 05, 2006 at 05:33:39AM -0500, you [Dave Jones] wrote:
>> >
>> > If I had any faith in the sturdyness of the floppy driver, I'd
>> > recommend someone looked into a 'dump oops to floppy' patch, but
>> > it too relies on a large part of the system being in a sane
>> > enough state to write blocks out to disk.
>>
>> I believe kmsgdump (http://www.xenotime.net/linux/kmsgdump/) uses its own
>> minimal 16-bit floppy driver to save the oops dump.
>
>It just switches to real mode and uses BIOS calls.

So would it be viable to take over the screen in similar fashion?

Set it to 80x50 in BIOS and dump there --> call it the Penguin Oops
screen, or Poops for short :o)

Grant.

2006-01-08 23:08:41

by Pavel Machek

[permalink] [raw]

Subject: Re: oops pauser. / boot_delayer

On Ne 08-01-06 14:30:00, Josef Sipek wrote:
> On Sun, Jan 08, 2006 at 02:21:32PM +0100, Pavel Machek wrote:
> > On P? 06-01-06 00:36:09, Dave Jones wrote:
> > > On Thu, Jan 05, 2006 at 05:28:59PM -0800, David Lang wrote:
> > > > On Thu, 5 Jan 2006, Jan Engelhardt wrote:
> > > >
> > > > >Also note that the kernel generates a lot of noise^W text - if now the
> > > > >start scripts from $YOUR_FAVORITE_DISTRO also fill up, I can barely reach
> > > > >the top of the kernel when it says
> > > > > Linux version 2.6.15 ([email protected]) (gcc version 4.0.2
> > > > > 20050901 (prerelease) (SUSE Linux)) #1 Tue Jan 3 09:21:27 CET 2006
> > > >
> > > > enable a few different types of encryption and you have to enlarge the
> > > > buffer (by quite a bit). the fact that all the encryption tests print
> > > > several lines each out and can't be turned off (short of a quiet boot
> > > > where you loose everything) is one of the more annoying things to me right
> > > > now.
> > > >
> > > > this large boot message issue also slows your boot significantly if you
> > > > have a fast box that has a serial console, it takes a long time to dump
> > > > all that info out the serial port.
> > >
> > > So disable CONFIG_CRYPTO_TEST. There's no reason to test this stuff every boot.
> >
> > Maybe even with CRYPTO_TEST enabled we could only report _failures_?
>
> Why? As far as I know, it is intended for developers as a regression test. I say
> if you don't like the output, make the thing a module or don't compile it at all.

I don't like the output, but if it only reported failures, I could
leave it running and potentially catch some strange failures. Is
reporting successes actually useful?
Pavel

--
Thanks, Sharp!

2006-01-08 23:30:04

[permalink] [raw]

Subject: Re: oops pauser. / boot_delayer

On Sat, 7 Jan 2006, Arjan van de Ven wrote:

> On Sat, 2006-01-07 at 13:44 -0800, Kurtis D. Rader wrote:
>> On Thu, 2006-01-05 06:11:05, Dave Jones wrote:
>>> On Thu, Jan 05, 2006 at 08:30:16AM +0100, Bernd Eckenfels wrote:
>>> > Randy.Dunlap <[email protected]> wrote:
>>> >> This one delays each printk() during boot by a variable time
>>> >> (from kernel command line), while system_state == SYSTEM_BOOTING.
>>> >
>>> > This sounds a bit like a aprils fool joke, what it is meant to do? You can
>>> > read the messages in the bootlog and use the scrollback keys, no?
>>>
>>> could be handy for those 'I see a few messages that scroll, and the
>>> box instantly reboots' bugs. Quite rare, but they do happen.
>>
>> Another very common situation is a system which fails to boot due to
>> failures to find the root filesystem. This can happen because of device name
>> slippage, root disk not being found, the proper HBA driver isn't present in
>
> mount by label fixes some of that but not all

there appears to be a limit on how many disks get checked for their label.
I've got one system where I've got 2xraid cards each with 8 drives on them
and then another raid card with my boot disk on it.

depending on how I have the two raid cards the boot disk can be anything
from sdc to sdq, mounting by label works for sdc, but not for sdq.

David Lang

--
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
-- C.A.R. Hoare

2006-01-08 23:39:48

[permalink] [raw]

Subject: Re: oops pauser. / boot_delayer

On Mon, Jan 09, 2006 at 12:08:27AM +0100, Pavel Machek wrote:
> On Ne 08-01-06 14:30:00, Josef Sipek wrote:
> > On Sun, Jan 08, 2006 at 02:21:32PM +0100, Pavel Machek wrote:
> > > On P? 06-01-06 00:36:09, Dave Jones wrote:
> > > > So disable CONFIG_CRYPTO_TEST. There's no reason to test this stuff every boot.
> > >
> > > Maybe even with CRYPTO_TEST enabled we could only report _failures_?
> >
> > Why? As far as I know, it is intended for developers as a regression test. I say
> > if you don't like the output, make the thing a module or don't compile it at all.
>
> I don't like the output, but if it only reported failures, I could
> leave it running and potentially catch some strange failures.

I agree that it is useful to know about strange failures, however I still maintain
that _if_ the module is intended as a regression test for developers, than the
excessive (?) output is fair. I think that the most logical course of action is to
have a verbosity module paramter which defaults to displaying errors only, but it still
allows developers to get all the information they need.

> Is reporting successes actually useful?

Then I propose: :)

diff -r b4fca0ece97f kernel/sys.c
--- a/kernel/sys.c Sat Oct 22 19:24:10 2005 +0300
+++ b/kernel/sys.c Sun Jan 8 18:26:49 2006 -0500
@@ -436,7 +436,6 @@
void kernel_halt(void)
{
kernel_halt_prepare();
- printk(KERN_EMERG "System halted.\n");
machine_halt();
}
EXPORT_SYMBOL_GPL(kernel_halt);

Jeff.

2006-01-09 01:43:17

by Randy Dunlap

[permalink] [raw]

Subject: Re: oops pauser.

On Sun, 8 Jan 2006 20:35:08 +0100 (MET) Jan Engelhardt wrote:

> >> I believe kmsgdump (http://www.xenotime.net/linux/kmsgdump/) uses its own
> >> minimal 16-bit floppy driver to save the oops dump.
> >
> >It just switches to real mode and uses BIOS calls.
> >
>
> This technique btw is what I suggested (switch to 80x50 vga mode
> (if not in X)) in case of a longer oops trace.

kmsgdump already shows all of the kernel log buffer that is in
memory (has not been written to disk, basically).

If I (or we) had some time and motivation, I have a
contributed patch to kmsgdump that:

a. saves and dumps all of the kernel log buffer
(reminder: current dump targets are display, parallel port
printer, and legacy floppy disk)
b. adds a hard disk dump target and attempts to make this safe
by pre-reserving and writing each block of it with a
signature + block number (and maybe more, I'm not sure
right now)
c. add x86-64 support

but I have not merged this code into kmsgdump yet, nor have
I even tested it. I can't test the x86-64 support since I
don't (yet) have an x86-64 system available for this.

If anyone wants to work on this, I'll put the additional
code on the web.

---
~Randy

2006-01-09 01:45:10

by Randy Dunlap

[permalink] [raw]

Subject: Re: oops pauser.

On Mon, 09 Jan 2006 06:40:57 +1100 Grant Coady wrote:

> On Sun, 8 Jan 2006 05:53:22 -0800, "Randy.Dunlap" <[email protected]> wrote:
>
> >On Sun, 8 Jan 2006 15:38:22 +0200 Ville Herva wrote:
> >
> >> On Thu, Jan 05, 2006 at 05:33:39AM -0500, you [Dave Jones] wrote:
> >> >
> >> > If I had any faith in the sturdyness of the floppy driver, I'd
> >> > recommend someone looked into a 'dump oops to floppy' patch, but
> >> > it too relies on a large part of the system being in a sane
> >> > enough state to write blocks out to disk.
> >>
> >> I believe kmsgdump (http://www.xenotime.net/linux/kmsgdump/) uses its own
> >> minimal 16-bit floppy driver to save the oops dump.
> >
> >It just switches to real mode and uses BIOS calls.
>
> So would it be viable to take over the screen in similar fashion?
>
> Set it to 80x50 in BIOS and dump there --> call it the Penguin Oops
> screen, or Poops for short :o)

It does take over the screen. 80x50 isn't needed since it knows how
to scroll the kernel log buffer on 80x25.

---
~Randy

2006-01-09 16:16:03

by Jan Engelhardt

[permalink] [raw]

Subject: Re: oops pauser.

>> So would it be viable to take over the screen in similar fashion?
>>
>> Set it to 80x50 in BIOS and dump there --> call it the Penguin Oops
>> screen, or Poops for short :o)
>
>It does take over the screen. 80x50 isn't needed since it knows how
>to scroll the kernel log buffer on 80x25.

It's needed because scrolling back might be impossible (shift-up in panic
= no-go), not because it knows how to scroll.

Jan Engelhardt
--
| Alphagate Systems, http://alphagate.hopto.org/
| jengelh's site, http://jengelh.hopto.org/

2006-01-09 16:25:15

[permalink] [raw]

Subject: Re: oops pauser.

On Mon, Jan 09, 2006 at 05:15:55PM +0100, you [Jan Engelhardt] wrote:
> >> So would it be viable to take over the screen in similar fashion?
> >>
> >> Set it to 80x50 in BIOS and dump there --> call it the Penguin Oops
> >> screen, or Poops for short :o)
> >
> >It does take over the screen. 80x50 isn't needed since it knows how
> >to scroll the kernel log buffer on 80x25.
>
> It's needed because scrolling back might be impossible (shift-up in panic
> = no-go), not because it knows how to scroll.

Please try kmsgdump.

It has its own real-mode terminal (with scrolling) to which it switches on
oops. Hung kernel console doesn't affect it.

-- v --

[email protected]

2006-01-09 16:39:40

by Randy Dunlap

[permalink] [raw]

Subject: Re: oops pauser.

On Mon, 9 Jan 2006, Jan Engelhardt wrote:

> >> So would it be viable to take over the screen in similar fashion?
> >>
> >> Set it to 80x50 in BIOS and dump there --> call it the Penguin Oops
> >> screen, or Poops for short :o)
> >
> >It does take over the screen. 80x50 isn't needed since it knows how
> >to scroll the kernel log buffer on 80x25.
>
> It's needed because scrolling back might be impossible (shift-up in panic
> = no-go), not because it knows how to scroll.

Oh, I see. You are talking about the kernel message(s), not
kmsgdump. Sorry, I switched to kmsgdump there somehow.
Yes, more info on the screen from the kernel would be good.

--
~Randy

2006-01-09 18:43:50

[permalink] [raw]

Subject: Console debugging wishlist was: Re: oops pauser.

Dave Jones <[email protected]> writes:

> In my quest to get better debug data from users in Fedora bug reports,
> I came up with this patch. A majority of users don't have serial
> consoles, so when an oops scrolls off the top of the screen,
> and locks up, they usually end up reporting a 2nd (or later) oops
> that isn't particularly helpful (or worse, some inconsequential
> info like 'sleeping whilst atomic' warnings)

Ok - here's my personal wishlist. If someone is interested ...

What I would like to have is a "more" option for the kernel that makes
it page kernel output like "more" and asks you before scrolling
to the next page.

What would be also cool would be to fix the VGA console to have
a larger scroll back buffer. The standard kernel boot output
is far larger than the default scrollback, so if you get a hang
late you have no way to look back to all the earlier
messages.

(it is hard to understand that with 128MB+ graphic cards and 512+MB
computers the scroll back must be still so short...)

And fixing sysrq to work after panics would be also nice.

And maybe a sysrq key to switch the font to the smallest one available
so as much as possible would fit onto a digital photo.
>
> The one case this doesn't catch is the problem of oopses whilst
> in X. Previously a non-fatal oops would stall X momentarily,
> and then things continue. Now those cases will lock up completely
> for two minutes. Future patches could add some additional feedback
> during this 'stall' such as the blinky keyboard leds, or periodic speaker beeps.

That's the killer issues why this patch is a bad idea.

-Andi

2006-01-10 20:25:59

by Jan Engelhardt

[permalink] [raw]

Subject: Re: Console debugging wishlist was: Re: oops pauser.

>Ok - here's my personal wishlist. If someone is interested ...
>
>What I would like to have is a "more" option for the kernel that makes
>it page kernel output like "more" and asks you before scrolling
>to the next page.

An oops is usually a condition you can recover from in some/most/depends
cases (e.g. a null deref in a filesystem "only" makes that vfsmount
(filesystem at all?) blocked), so if the kernel is waiting for user input
on a non-panic condition, this means userspace stops too, which is not
too good if the kernel is still 'alive'.
It's like we are entering kdb although everything is fine enough to go
through a proper `init 6`.

>What would be also cool would be to fix the VGA console to have
>a larger scroll back buffer. The standard kernel boot output
>is far larger than the default scrollback, so if you get a hang
>late you have no way to look back to all the earlier
>messages.
>
>(it is hard to understand that with 128MB+ graphic cards and 512+MB
>computers the scroll back must be still so short...)

I doubt this scrollback buffer is implemented as part of the video cards.
It is rather a kernel invention, and therefore uses standard RAM. But the
idea is good, preferably make it a CONFIG_ option.

>And fixing sysrq to work after panics would be also nice.

I am not sure, but would enabling interrupts be enough?

>And maybe a sysrq key to switch the font to the smallest one available
>so as much as possible would fit onto a digital photo.

And analog photos? ;)

>> The one case this doesn't catch is the problem of oopses whilst
>> in X. Previously a non-fatal oops would stall X momentarily,
>> and then things continue. Now those cases will lock up completely
>> for two minutes. Future patches could add some additional feedback
>> during this 'stall' such as the blinky keyboard leds, or periodic speaker beeps.
>
>That's the killer issues why this patch is a bad idea.
>

Whilst few can be done in X situations, let's at least improve consoles.

Jan Engelhardt
--

2006-01-10 20:29:34

[permalink] [raw]

Subject: Re: Console debugging wishlist was: Re: oops pauser.

On Tue, Jan 10, 2006 at 09:25:46PM +0100, Jan Engelhardt wrote:
> >What would be also cool would be to fix the VGA console to have
> >a larger scroll back buffer. The standard kernel boot output
> >is far larger than the default scrollback, so if you get a hang
> >late you have no way to look back to all the earlier
> >messages.
> >
> >(it is hard to understand that with 128MB+ graphic cards and 512+MB
> >computers the scroll back must be still so short...)
>
> I doubt this scrollback buffer is implemented as part of the video cards.
> It is rather a kernel invention, and therefore uses standard RAM. But the
> idea is good, preferably make it a CONFIG_ option.

There is a config option that lets you specify the size of this buffer:
CONFIG_LOG_BUF_SHIFT

Jeff.

2006-01-10 20:44:55

by Jan Engelhardt

[permalink] [raw]

Subject: Re: Console debugging wishlist was: Re: oops pauser.

>> I doubt this scrollback buffer is implemented as part of the video cards.
>> It is rather a kernel invention, and therefore uses standard RAM. But the
>> idea is good, preferably make it a CONFIG_ option.
>
>There is a config option that lets you specify the size of this buffer:
>CONFIG_LOG_BUF_SHIFT

menuconfig help says

"Select kernel log buffer size as a power of 2."

That does not sound like "console scroll buffer".

Jan Engelhardt
--

2006-01-10 20:46:49

[permalink] [raw]

Subject: Re: Console debugging wishlist was: Re: oops pauser.

On Tuesday 10 January 2006 21:25, Jan Engelhardt wrote:
> An oops is usually a condition you can recover from in some/most/depends
> cases (e.g. a null deref in a filesystem "only" makes that vfsmount
> (filesystem at all?) blocked), so if the kernel is waiting for user input
> on a non-panic condition, this means userspace stops too, which is not
> too good if the kernel is still 'alive'.
> It's like we are entering kdb although everything is fine enough to go
> through a proper `init 6`.

-ENOPARSE

>
> >What would be also cool would be to fix the VGA console to have
> >a larger scroll back buffer. The standard kernel boot output
> >is far larger than the default scrollback, so if you get a hang
> >late you have no way to look back to all the earlier
> >messages.
> >
> >(it is hard to understand that with 128MB+ graphic cards and 512+MB
> >computers the scroll back must be still so short...)
>
> I doubt this scrollback buffer is implemented as part of the video cards.
> It is rather a kernel invention, and therefore uses standard RAM. But the
> idea is good, preferably make it a CONFIG_ option.

At least long ago (when I last looked) it was in video RAM.

>
> >And fixing sysrq to work after panics would be also nice.
>
> I am not sure, but would enabling interrupts be enough?

Interrupts are already enabled, but no - it's not.

Thank you for an useful contribution to the thread.

-Andi

2006-01-10 20:46:50

[permalink] [raw]

Subject: Re: Console debugging wishlist was: Re: oops pauser.

On Tuesday 10 January 2006 21:29, Josef Sipek wrote:

> There is a config option that lets you specify the size of this buffer:
> CONFIG_LOG_BUF_SHIFT

That is the dmesg buffer, not the scroll back buffer. Completely different
things.

-Andi

2006-01-10 21:06:49

by Jan Engelhardt

[permalink] [raw]

Subject: Re: Console debugging wishlist was: Re: oops pauser.

>-ENOPARSE

Try the oops.ko from http://jengelh.hopto.org/f/oops_ko.tbz2. It won't kill
your system, you can continue to work.

If you now had a kernel-level pager that would jump in everytime an oops
happened, control would normally not be given back to userspace unless we quit
the pager. kdb has a similar behavior: it "stops" userspace until someone
chooses to "c"ontinue.
Therefore this pager would not be too good. In a panic, yes, it would be
perfect.

I hope this makes it a little bit clearer, if not, -EAGAIN.

>> >(it is hard to understand that with 128MB+ graphic cards and 512+MB
>> >computers the scroll back must be still so short...)
>>
>> I doubt this scrollback buffer is implemented as part of the video cards.
>> It is rather a kernel invention, and therefore uses standard RAM. But the
>> idea is good, preferably make it a CONFIG_ option.
>
>At least long ago (when I last looked) it was in video RAM.

Let's put it from another POV: if the scrollback buffer was somewhere within
the video card, it would usually not be cleared when you change from one
console tty to another. Currently, doing this switch clears the buffer (can we
do anything about that? - would be great)

Jan Engelhardt
--

2006-01-10 21:18:47

[permalink] [raw]

Subject: Re: Console debugging wishlist was: Re: oops pauser.

On Tuesday 10 January 2006 22:06, Jan Engelhardt wrote:

> If you now had a kernel-level pager that would jump in everytime an oops
> happened, control would normally not be given back to userspace unless we quit
> the pager. kdb has a similar behavior: it "stops" userspace until someone
> chooses to "c"ontinue.
> Therefore this pager would not be too good. In a panic, yes, it would be
> perfect.

First for an recoverable oops there is no reason you couldn't use
schedule_timeout(). And for those you don't need it anyways
because you can as well use dmesg. For others you can use poll loops.

But it wasn't actually my point. If you get
an problem during bootup - not necessarily an oops, but could
be also a no root panic or your SCSI controller not working or
something else - and you can reproduce it it's a PITA to examine
the kernel output before because there is no way to get
enough scrollback. For the oops itself it's not needed - it typically
fits on the screen. But if it happens every boot it would be nice
if you could just boot with "more" and then page through
the kernel output and check what's going on.

The feature would be mainly useful for problems during kernel bootup,
although it might be sometimes useful too e.g. when user space
hangs, but you want to page through the hotkey process dump
which might be longer than console scrollback.

Just more scrollback does not necessarily replace this because
sometimes youe end up with so much output so quickly (e.g. some errors
are very verbose) that any scrollback buffer would be overflown.

Now the only issue would be to work out when to use schedule_timeout
and when to use a delay, but that can be all distingushed with some code.

Anyways mind you - i suspect actually implementing this would be somewhat
ugly, so the chances of it actually getting in would be likely slim.
Still it would be often useful.

-Andi

2006-01-10 21:30:40

by Jan Engelhardt

[permalink] [raw]

Subject: Re: Console debugging wishlist was: Re: oops pauser.

>But it wasn't actually my point. If you get
>an problem during bootup - not necessarily an oops, but could
>be also a no root panic or your SCSI controller not working or
>something else - and you can reproduce it it's a PITA to examine
>the kernel output before because there is no way to get
>enough scrollback. For the oops itself it's not needed - it typically
>fits on the screen. But if it happens every boot it would be nice
>if you could just boot with "more" and then page through
>the kernel output and check what's going on.

Ah yes, I had not considered boot oopses/panics. My bad.

Jan Engelhardt
--

2006-01-10 22:55:08

[permalink] [raw]

Subject: Re: Console debugging wishlist was: Re: oops pauser.

On Tue, Jan 10, 2006 at 09:44:43PM +0100, Jan Engelhardt wrote:
>
> >> I doubt this scrollback buffer is implemented as part of the video cards.
> >> It is rather a kernel invention, and therefore uses standard RAM. But the
> >> idea is good, preferably make it a CONFIG_ option.
> >
> >There is a config option that lets you specify the size of this buffer:
> >CONFIG_LOG_BUF_SHIFT
>
> menuconfig help says
>
> "Select kernel log buffer size as a power of 2."
>
> That does not sound like "console scroll buffer".

True. I should think more about what I say before I say it.

Jeff.

2006-01-11 12:24:24

by Antonino A. Daplas

[permalink] [raw]

Subject: Re: Console debugging wishlist was: Re: oops pauser.

Jan Engelhardt wrote:
>> Ok - here's my personal wishlist. If someone is interested ...
>>
>> What I would like to have is a "more" option for the kernel that makes
>> it page kernel output like "more" and asks you before scrolling
>> to the next page.
>
> An oops is usually a condition you can recover from in some/most/depends
> cases (e.g. a null deref in a filesystem "only" makes that vfsmount
> (filesystem at all?) blocked), so if the kernel is waiting for user input
> on a non-panic condition, this means userspace stops too, which is not
> too good if the kernel is still 'alive'.
> It's like we are entering kdb although everything is fine enough to go
> through a proper `init 6`.
>
>> What would be also cool would be to fix the VGA console to have
>> a larger scroll back buffer. The standard kernel boot output
>> is far larger than the default scrollback, so if you get a hang
>> late you have no way to look back to all the earlier
>> messages.
>>
>> (it is hard to understand that with 128MB+ graphic cards and 512+MB
>> computers the scroll back must be still so short...)
>
> I doubt this scrollback buffer is implemented as part of the video cards.
> It is rather a kernel invention, and therefore uses standard RAM. But the
> idea is good, preferably make it a CONFIG_ option.

In the VGA console, all buffers, including scrollback is in video RAM, but
the size is fixed and is very small.

With the framebuffer console, you can increase the size of the scrollback
buffer with the boot option:

fbcon=scrollback:<n> (default is 32K)

Tony

2006-01-11 12:35:47

[permalink] [raw]

Subject: Re: Console debugging wishlist was: Re: oops pauser.

On Wednesday 11 January 2006 13:24, Antonino A. Daplas wrote:

> In the VGA console, all buffers, including scrollback is in video RAM, but
> the size is fixed and is very small.

I wonder if that can be fixed.

> With the framebuffer console, you can increase the size of the scrollback
> buffer with the boot option:
>
> fbcon=scrollback:<n> (default is 32K)

On x86-64 vesafb is unusable slow because it does CPU scrolling cause
it can't use the vesa BIOS - and the others don't work everywhere. So I don't
think fbcon is an usable replacement.

-Andi

2006-01-11 13:05:54

by Antonino A. Daplas

[permalink] [raw]

Subject: Re: Console debugging wishlist was: Re: oops pauser.

Andi Kleen wrote:
> On Wednesday 11 January 2006 13:24, Antonino A. Daplas wrote:
>
>> In the VGA console, all buffers, including scrollback is in video RAM, but
>> the size is fixed and is very small.
>
> I wonder if that can be fixed.

It can be done, but it will affect VGA console performance.

>
>> With the framebuffer console, you can increase the size of the scrollback
>> buffer with the boot option:
>>
>> fbcon=scrollback:<n> (default is 32K)
>
> On x86-64 vesafb is unusable slow because it does CPU scrolling cause
> it can't use the vesa BIOS - and the others don't work everywhere. So I don't
> think fbcon is an usable replacement.

How about vga16fb + fbcon? If scrolling is slow in vga16fb, fbset -vyres 800 should
increase performance significantly.

Tony

2006-01-11 13:17:29

[permalink] [raw]

Subject: Re: Console debugging wishlist was: Re: oops pauser.

On Wednesday 11 January 2006 14:05, Antonino A. Daplas wrote:
> Andi Kleen wrote:
> > On Wednesday 11 January 2006 13:24, Antonino A. Daplas wrote:
> >
> >> In the VGA console, all buffers, including scrollback is in video RAM, but
> >> the size is fixed and is very small.
> >
> > I wonder if that can be fixed.
>
> It can be done, but it will affect VGA console performance.

By how much? As long as it still scrolls reasonably fast it would be ok for me.

>
> >
> >> With the framebuffer console, you can increase the size of the scrollback
> >> buffer with the boot option:
> >>
> >> fbcon=scrollback:<n> (default is 32K)
> >
> > On x86-64 vesafb is unusable slow because it does CPU scrolling cause
> > it can't use the vesa BIOS - and the others don't work everywhere. So I don't
> > think fbcon is an usable replacement.
>
> How about vga16fb + fbcon? If scrolling is slow in vga16fb, fbset -vyres 800 should
> increase performance significantly.

I can try it.

-Andi

2006-01-11 13:43:16

by Antonino A. Daplas

[permalink] [raw]

Subject: Re: Console debugging wishlist was: Re: oops pauser.

Andi Kleen wrote:
> On Wednesday 11 January 2006 14:05, Antonino A. Daplas wrote:
>> Andi Kleen wrote:
>>> On Wednesday 11 January 2006 13:24, Antonino A. Daplas wrote:
>>>
>>>> In the VGA console, all buffers, including scrollback is in video RAM, but
>>>> the size is fixed and is very small.
>>> I wonder if that can be fixed.
>> It can be done, but it will affect VGA console performance.
>
> By how much? As long as it still scrolls reasonably fast it would be ok for me.

Each character will need to be written twice, one to VGA RAM and another to
the shadow/scrollback buffer in system RAM. It would still be reasonably fast.

Perhaps I can implement this for vgacon.

Tony

2006-01-11 13:54:46

[permalink] [raw]

Subject: Re: Console debugging wishlist was: Re: oops pauser.

On Wednesday 11 January 2006 14:43, Antonino A. Daplas wrote:
> Andi Kleen wrote:
> > On Wednesday 11 January 2006 14:05, Antonino A. Daplas wrote:
> >> Andi Kleen wrote:
> >>> On Wednesday 11 January 2006 13:24, Antonino A. Daplas wrote:
> >>>
> >>>> In the VGA console, all buffers, including scrollback is in video RAM, but
> >>>> the size is fixed and is very small.
> >>> I wonder if that can be fixed.
> >> It can be done, but it will affect VGA console performance.
> >
> > By how much? As long as it still scrolls reasonably fast it would be ok for me.
>
> Each character will need to be written twice, one to VGA RAM and another to
> the shadow/scrollback buffer in system RAM.

That should be basically unnoticeable.

> It would still be reasonably fast.
>
> Perhaps I can implement this for vgacon.

Please do. And increase the default scrollback please or make it a CONFIG.

Thanks,
-Andi

2006-01-11 18:34:10

by Jan Engelhardt

[permalink] [raw]

Subject: Re: Console debugging wishlist was: Re: oops pauser.

>>> With the framebuffer console, you can increase the size of the scrollback
>>> buffer with the boot option:
>>>
>>> fbcon=scrollback:<n> (default is 32K)
>>
>> On x86-64 vesafb is unusable slow because it does CPU scrolling cause
>> it can't use the vesa BIOS - and the others don't work everywhere. So I don't
>> think fbcon is an usable replacement.
>
>How about vga16fb + fbcon? If scrolling is slow in vga16fb, fbset -vyres 800 should
>increase performance significantly.
>

Benchmarks first.

Jan Engelhardt
--

2006-01-15 16:43:11

[permalink] [raw]

Subject: Re: Console debugging wishlist was: Re: oops pauser.

Andi Kleen <[email protected]> wrote:

> (it is hard to understand that with 128MB+ graphic cards and 512+MB
> computers the scroll back must be still so short...)

The VGA scrollback buffer is limited by the text area of the video RAM.
The text area is in the DOS memory at 0xB800 (or 0xB000) and extends
32 KB (or in case of MDA, 4 KB). Each character will use 2 Bytes.
Therefore you can store up to 16,000 characters or 4 pages of text.

--
Ich danke GMX daf?r, die Verwendung meiner Adressen mittels per SPF
verbreiteten L?gen zu sabotieren.

2006-01-15 17:25:35

[permalink] [raw]

Subject: Re: Console debugging wishlist was: Re: oops pauser.

On Sunday 15 January 2006 17:48, Bodo Eggert wrote:
> Andi Kleen <[email protected]> wrote:
>
> > (it is hard to understand that with 128MB+ graphic cards and 512+MB
> > computers the scroll back must be still so short...)
>
> The VGA scrollback buffer is limited by the text area of the video RAM.
> The text area is in the DOS memory at 0xB800 (or 0xB000) and extends
> 32 KB (or in case of MDA, 4 KB). Each character will use 2 Bytes.
> Therefore you can store up to 16,000 characters or 4 pages of text.

It was a rhetorical question.

-Andi

2006-01-15 20:51:28

by Jan Engelhardt

[permalink] [raw]

Subject: Re: Console debugging wishlist was: Re: oops pauser.

>> > (it is hard to understand that with 128MB+ graphic cards and 512+MB
>> > computers the scroll back must be still so short...)
>>
>> The VGA scrollback buffer is limited by the text area of the video RAM.
>> The text area is in the DOS memory at 0xB800 (or 0xB000) and extends
>> 32 KB (or in case of MDA, 4 KB). Each character will use 2 Bytes.
>> Therefore you can store up to 16,000 characters or 4 pages of text.
>
>It was a rhetorical question.
>
And I assumed that scrollback was stored in some regular kmalloc()ed page(s).

Jan Engelhardt
--