2006-02-24 10:41:20

by Andres Salomon

[permalink] [raw]
Subject: [PATCH] x86_64 stack trace cleanup

Hi,

This patch cleans up the clutter of x86_64 stack traces, making the
output closer to what i386 and sparc64 stack traces look like. It uses
print_symbol instead of resolving the symbols manually, and prints one
frame per line instead of displaying multiple frames per line. I left
the other stuff in the stack dump alone; this affects only the frame
list.

I know this has been brought up before
(http://www.uwsg.iu.edu/hypermail/linux/kernel/0602.0/2238.html,
although I noticed a slight problem w/ that patch, as __print_symbol
returns void); however, for people that don't spend all their time
looking at x86_64 backtraces, I think this consistency shouldn't be
scoffed at. When you switch back and forth between different archs,
x86_64's backtrace is cluttered and confusing in comparison.

With this patch, traces end up looking as follows:

getty S ffff81001e7d1db8 0 3812 1 3814 3811
(NOTLB)
ffff81001e7d1db8 0000000000000008 ffffffff80240323 0000000000001d93
ffff81001f82a2d8 ffff81001f82a0c0 ffff81001f68b0c0
0000000000000000
ffff81001fa841fc ffff81001e7d0000
Call Trace:
[<ffffffff80240323>] do_con_write+0x1853/0x1890
[<ffffffff80164daa>] __pagevec_free+0x2a/0x40
[<ffffffff802e0e55>] schedule_timeout+0x25/0xd0
[<ffffffff80134868>] release_console_sem+0x1a8/0x220
[<ffffffff8014b93c>] add_wait_queue+0x1c/0x60
[<ffffffff8023441f>] read_chan+0x48f/0x6e2
[<ffffffff8012e7e0>] default_wake_function+0x0/0x10
[<ffffffff8022f292>] tty_read+0xa2/0x110
[<ffffffff80185eff>] vfs_read+0xdf/0x1a0
[<ffffffff80186b63>] sys_read+0x53/0x90
[<ffffffff8010afee>] system_call+0x7e/0x83


The old-style trace:

pdflush D ffff810037a417a0 0 4293 11 4294 4292
(L-TLB)
ffff810031fb3d68 0000000000000046 ffff8100107b1d78 0000000000000f60
0000000000000160 ffff810037a419b8 ffff810037a417a0 f
fff810037a40300
ffff810001707d80 ffffffff80146b79
Call Trace:<ffffffff80146b79>{lock_timer_base+41}
<ffffffff8014785d>{__mod_timer+189}
<ffffffff80154690>{keventd_create_kthread+0}
<ffffffff8030f96a>{schedule_timeout+154}
<ffffffff80147110>{process_timeout+0}
<ffffffff8030e621>{__sched_text_start+49}
<ffffffff801fd279>{blk_congestion_wait+153}
<ffffffff80154bc0>{autoremove_wake_function+0}
<ffffffff801b4051>{writeback_inodes+177}
<ffffffff8016d2d5>{background_writeout+165}
<ffffffff8016df40>{pdflush+0} <ffffffff8016e095>{pdflush+341}




Signed-off-by: Andres Salomon <[email protected]>


Attachments:
x86_64-stack-trace.patch (3.18 kB)
signature.asc (191.00 B)
This is a digitally signed message part
Download all attachments

2006-02-24 10:47:12

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] x86_64 stack trace cleanup

On Friday 24 February 2006 11:41, Andres Salomon wrote:
> Hi,
>
> This patch cleans up the clutter of x86_64 stack traces, making the
> output closer to what i386 and sparc64 stack traces look like. It uses
> print_symbol instead of resolving the symbols manually, and prints one
> frame per line instead of displaying multiple frames per line. I left
> the other stuff in the stack dump alone; this affects only the frame
> list.
>
> I know this has been brought up before
> (http://www.uwsg.iu.edu/hypermail/linux/kernel/0602.0/2238.html,
> although I noticed a slight problem w/ that patch, as __print_symbol
> returns void); however, for people that don't spend all their time
> looking at x86_64 backtraces, I think this consistency shouldn't be
> scoffed at. When you switch back and forth between different archs,
> x86_64's backtrace is cluttered and confusing in comparison.

If the formatting of the oopses is your only problem you are a
lucky man.

The problem is your new format uses more screen estate, which is precious
after an oops because the VGA scrollback is so small.
That is why i rejected the earlier attempts at changing this.

I can offer you a deal though: if you fix VGA scrollback to have
at least 1000 lines by default we can change the oops formatting too.

-Andi

2006-02-24 11:29:11

by Andres Salomon

[permalink] [raw]
Subject: Re: [PATCH] x86_64 stack trace cleanup

On Fri, 2006-02-24 at 11:47 +0100, Andi Kleen wrote:
> On Friday 24 February 2006 11:41, Andres Salomon wrote:
> > Hi,
> >
> > This patch cleans up the clutter of x86_64 stack traces, making the
> > output closer to what i386 and sparc64 stack traces look like. It uses
> > print_symbol instead of resolving the symbols manually, and prints one
> > frame per line instead of displaying multiple frames per line. I left
> > the other stuff in the stack dump alone; this affects only the frame
> > list.
> >
> > I know this has been brought up before
> > (http://www.uwsg.iu.edu/hypermail/linux/kernel/0602.0/2238.html,
> > although I noticed a slight problem w/ that patch, as __print_symbol
> > returns void); however, for people that don't spend all their time
> > looking at x86_64 backtraces, I think this consistency shouldn't be
> > scoffed at. When you switch back and forth between different archs,
> > x86_64's backtrace is cluttered and confusing in comparison.
>
> If the formatting of the oopses is your only problem you are a
> lucky man.
>

That would be nice. Unfortunately, I'm trying to figure out why my dual
opteron box likes to push the load up to 15 and then hang while doing
i/o to the 3ware 9500S-8 card. Looks like the load/d-state processes
are caused by a whole lot (well, MAX_PDFLUSH_THREADS) of pdflush
processes spinning on base->lock in lock_timer_base(); not sure if
that's intentional or not, but it seems rather odd. Whether the hanging
is related to the high load remains to be seen.


> The problem is your new format uses more screen estate, which is precious
> after an oops because the VGA scrollback is so small.
> That is why i rejected the earlier attempts at changing this.
>

I don't see why this is a problem. Other architectures have done this
for ages, without problems. I suspect most people get their backtraces
from either serial console or logs, as copying them down from the screen
or taking a picture of the panic is a rather large pain. It seems like
you're penalizing everyone for a few select use cases.

Of course, this is all opinion.


Attachments:
signature.asc (191.00 B)
This is a digitally signed message part

2006-02-24 12:22:34

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] x86_64 stack trace cleanup

On Friday 24 February 2006 12:29, Andres Salomon wrote:

> That would be nice. Unfortunately, I'm trying to figure out why my dual
> opteron box likes to push the load up to 15 and then hang while doing
> i/o to the 3ware 9500S-8 card. Looks like the load/d-state processes
> are caused by a whole lot (well, MAX_PDFLUSH_THREADS) of pdflush
> processes spinning on base->lock in lock_timer_base(); not sure if
> that's intentional or not, but it seems rather odd. Whether the hanging
> is related to the high load remains to be seen.

Sounds like some timer handler is broken. You have to find out which
one it is.


> I don't see why this is a problem. Other architectures have done this
> for ages, without problems. I suspect most people get their backtraces
> from either serial console or logs, as copying them down from the screen
> or taking a picture of the panic is a rather large pain. It seems like
> you're penalizing everyone for a few select use cases.

People submitting jpegs of photographed oopses or even badly scribbled
down oopses is quite common. Serial consoles are only used by a small
elite.

-Andi

2006-02-24 12:51:30

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] x86_64 stack trace cleanup

Andi Kleen <[email protected]> wrote:
>
> I can offer you a deal though: if you fix VGA scrollback to have
> at least 1000 lines by default we can change the oops formatting too.

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.16-rc4/2.6.16-rc4-mm2/broken-out/vgacon-add-support-for-soft-scrollback.patch

Problem is, scrollback doesn't work after panic(). I don't know why..

2006-02-24 13:00:48

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] x86_64 stack trace cleanup

On Friday 24 February 2006 13:50, Andrew Morton wrote:
> Andi Kleen <[email protected]> wrote:
> >
> > I can offer you a deal though: if you fix VGA scrollback to have
> > at least 1000 lines by default we can change the oops formatting too.
>
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.16-rc4/2.6.16-rc4-mm2/broken-out/vgacon-add-support-for-soft-scrollback.patch

Once that is in and works we can consider changing the oopses.

> Problem is, scrollback doesn't work after panic(). I don't know why..

Someone claimed it was related to the panic keyboard blinking. Never verified
though. But without it working we still can't change the oops.

-Andi

2006-02-24 13:13:47

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] x86_64 stack trace cleanup

Andi Kleen <[email protected]> wrote:
>
> On Friday 24 February 2006 13:50, Andrew Morton wrote:
> > Andi Kleen <[email protected]> wrote:
> > >
> > > I can offer you a deal though: if you fix VGA scrollback to have
> > > at least 1000 lines by default we can change the oops formatting too.
> >
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.16-rc4/2.6.16-rc4-mm2/broken-out/vgacon-add-support-for-soft-scrollback.patch
>
> Once that is in and works we can consider changing the oopses.
>

I don't think we should change the oops format.

Apart from no longer printing a hex-base+decimal-offset, which is braindead.

> > Problem is, scrollback doesn't work after panic(). I don't know why..
>
> Someone claimed it was related to the panic keyboard blinking.
>

Strange. It looks pretty harmless.

2006-02-24 13:20:19

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] x86_64 stack trace cleanup

On Friday 24 February 2006 14:13, Andrew Morton wrote:


>
> > > Problem is, scrollback doesn't work after panic(). I don't know why..
> >
> > Someone claimed it was related to the panic keyboard blinking.
> >
>
> Strange. It looks pretty harmless.

[just speculation, haven't examined the code in detail]

One credible theory also was that the keyboard or console driver does too much
work in workqueues, which need the scheduler and scheduling doesn't work anymore
after panic. I remember hacking around a problem with this long ago on 2.4
The hack was to just check after_panic and call the function directly.

-Andi

2006-02-24 13:35:15

by Jesper Juhl

[permalink] [raw]
Subject: Re: [PATCH] x86_64 stack trace cleanup

On 2/24/06, Andres Salomon <[email protected]> wrote:
> On Fri, 2006-02-24 at 11:47 +0100, Andi Kleen wrote:
[snip]
>
> > The problem is your new format uses more screen estate, which is precious
> > after an oops because the VGA scrollback is so small.
> > That is why i rejected the earlier attempts at changing this.
> >
>
> I don't see why this is a problem. Other architectures have done this
> for ages, without problems. I suspect most people get their backtraces
> from either serial console or logs, as copying them down from the screen
> or taking a picture of the panic is a rather large pain. It seems like
> you're penalizing everyone for a few select use cases.
>

Some of us don't have a digital camera for taking a picture (and
besides, being able to take a picture doesn't fix the problem of oops
output scrolling out of the visible screen area).
Some of us also don't have a second PC on which to capture logs via
netconsole or serial console.
Copying oopses down by hand from screen to paper sure is a pain (I
know, I've had to do it quite a few times), but for some it's the only
option and then we generally want as much info as possible on-screen
to copy down.

And btw, multi-column oops output has recently become an option for
i386 as well - in my oppinion a good thing.

--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html

2006-02-24 15:54:03

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH] x86_64 stack trace cleanup

On Fri, 24 Feb 2006, Andrew Morton wrote:

> Andi Kleen <[email protected]> wrote:
> >
> > On Friday 24 February 2006 13:50, Andrew Morton wrote:
> > > Andi Kleen <[email protected]> wrote:
> > > >
> > > > I can offer you a deal though: if you fix VGA scrollback to have
> > > > at least 1000 lines by default we can change the oops formatting too.
> > >
> > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.16-rc4/2.6.16-rc4-mm2/broken-out/vgacon-add-support-for-soft-scrollback.patch
> >
> > Once that is in and works we can consider changing the oopses.
> >
>
> I don't think we should change the oops format.
>
> Apart from no longer printing a hex-base+decimal-offset, which is braindead.

strongly agree with the hex/decimal braindead part.

> > > Problem is, scrollback doesn't work after panic(). I don't know why..
> >
> > Someone claimed it was related to the panic keyboard blinking.
> >
>
> Strange. It looks pretty harmless.

--
~Randy

2006-02-24 19:29:48

by Alistair John Strachan

[permalink] [raw]
Subject: Re: [PATCH] x86_64 stack trace cleanup

On Friday 24 February 2006 12:22, Andi Kleen wrote:
> On Friday 24 February 2006 12:29, Andres Salomon wrote:
> > That would be nice. Unfortunately, I'm trying to figure out why my dual
> > opteron box likes to push the load up to 15 and then hang while doing
> > i/o to the 3ware 9500S-8 card. Looks like the load/d-state processes
> > are caused by a whole lot (well, MAX_PDFLUSH_THREADS) of pdflush
> > processes spinning on base->lock in lock_timer_base(); not sure if
> > that's intentional or not, but it seems rather odd. Whether the hanging
> > is related to the high load remains to be seen.
>
> Sounds like some timer handler is broken. You have to find out which
> one it is.
>
> > I don't see why this is a problem. Other architectures have done this
> > for ages, without problems. I suspect most people get their backtraces
> > from either serial console or logs, as copying them down from the screen
> > or taking a picture of the panic is a rather large pain. It seems like
> > you're penalizing everyone for a few select use cases.
>
> People submitting jpegs of photographed oopses or even badly scribbled
> down oopses is quite common. Serial consoles are only used by a small
> elite.

I agree, I've had to report using a JPEG file on multiple occasions, because
my mainboard has no serial ports. However, if you're using a 1280x1024
vesafb, which is supported by most systems, you can get a lot of lines on
screen at once..

--
Cheers,
Alistair.

'No sense being pessimistic, it probably wouldn't work anyway.'
Third year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.

2006-02-24 19:40:14

by Jesper Juhl

[permalink] [raw]
Subject: Re: [PATCH] x86_64 stack trace cleanup

On 2/24/06, Alistair John Strachan <[email protected]> wrote:
> On Friday 24 February 2006 12:22, Andi Kleen wrote:
> > On Friday 24 February 2006 12:29, Andres Salomon wrote:
> > > That would be nice. Unfortunately, I'm trying to figure out why my dual
> > > opteron box likes to push the load up to 15 and then hang while doing
> > > i/o to the 3ware 9500S-8 card. Looks like the load/d-state processes
> > > are caused by a whole lot (well, MAX_PDFLUSH_THREADS) of pdflush
> > > processes spinning on base->lock in lock_timer_base(); not sure if
> > > that's intentional or not, but it seems rather odd. Whether the hanging
> > > is related to the high load remains to be seen.
> >
> > Sounds like some timer handler is broken. You have to find out which
> > one it is.
> >
> > > I don't see why this is a problem. Other architectures have done this
> > > for ages, without problems. I suspect most people get their backtraces
> > > from either serial console or logs, as copying them down from the screen
> > > or taking a picture of the panic is a rather large pain. It seems like
> > > you're penalizing everyone for a few select use cases.
> >
> > People submitting jpegs of photographed oopses or even badly scribbled
> > down oopses is quite common. Serial consoles are only used by a small
> > elite.
>
> I agree, I've had to report using a JPEG file on multiple occasions, because
> my mainboard has no serial ports. However, if you're using a 1280x1024
> vesafb, which is supported by most systems, you can get a lot of lines on
> screen at once..
>

true, but still, if you have two columns of output you get even more
lines on-screen (and in the cases where the oops os long that's IMHO a
good thing).

--
Jesper Juhl <[email protected]>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html