2014-04-26 18:10:40

by Richard Yao

[permalink] [raw]
Subject: [PATCH] x86/dumpstack: Walk frames when built with frame pointers

Stack traces are generated by scanning the stack and interpeting
anything that looks like it could be a pointer to something. We do not
need to do this when we have frame pointers, but we do it anyway, with
the distinction that we use the return pointers to mark actual frames by
the absence of a question mark.

The additional verbosity of stack scanning tends to bombard us with
walls of text for no gain in practice, so lets switch to printing only
stack frames when frame pointers are available. That we can spend less
time reading stack traces and more time looking at code.

Signed-off-by: Richard Yao <[email protected]>
---
arch/x86/kernel/dumpstack.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index d9c12d3..94ffe06 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -162,7 +162,11 @@ static void print_trace_address(void *data, unsigned long addr, int reliable)
static const struct stacktrace_ops print_trace_ops = {
.stack = print_trace_stack,
.address = print_trace_address,
+#ifdef CONFIG_FRAME_POINTER
+ .walk_stack = print_context_stack_bp,
+#else
.walk_stack = print_context_stack,
+#endif
};

void
--
1.8.3.2


2014-04-27 12:08:28

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] x86/dumpstack: Walk frames when built with frame pointers


* Richard Yao <[email protected]> wrote:

> Stack traces are generated by scanning the stack and interpeting
> anything that looks like it could be a pointer to something. We do
> not need to do this when we have frame pointers, but we do it
> anyway, with the distinction that we use the return pointers to mark
> actual frames by the absence of a question mark.
>
> The additional verbosity of stack scanning tends to bombard us with
> walls of text for no gain in practice, so lets switch to printing
> only stack frames when frame pointers are available. That we can
> spend less time reading stack traces and more time looking at code.
>
> Signed-off-by: Richard Yao <[email protected]>
> ---
> arch/x86/kernel/dumpstack.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
> index d9c12d3..94ffe06 100644
> --- a/arch/x86/kernel/dumpstack.c
> +++ b/arch/x86/kernel/dumpstack.c
> @@ -162,7 +162,11 @@ static void print_trace_address(void *data, unsigned long addr, int reliable)
> static const struct stacktrace_ops print_trace_ops = {
> .stack = print_trace_stack,
> .address = print_trace_address,
> +#ifdef CONFIG_FRAME_POINTER
> + .walk_stack = print_context_stack_bp,
> +#else
> .walk_stack = print_context_stack,
> +#endif
> };

I don't really like this patch, and this question comes up often, so
let me try to explain it more verbosely.

Lets take a real-life example of a BP-only trace about a lockup:

Call Trace:
[<79018d24>] arch_trigger_all_cpu_backtrace+0x3c/0x64
[<7916a34c>] do_raw_spin_lock+0xb7/0xe8
[<792b9412>] _raw_spin_lock_irqsave+0x35/0x3e
[<790391e8>] prepare_to_wait+0x18/0x57
[<792b7158>] __wait_on_bit+0x20/0x5f
[<792b71eb>] out_of_line_wait_on_bit+0x54/0x5c
[<790e4eff>] __wait_on_buffer+0x1b/0x1e
[<7910f615>] __ext3_get_inode_loc+0x1ef/0x254
[<7911015f>] ext3_iget+0x45/0x346
[<79114e03>] ext3_lookup+0x97/0xa6
[<790c9c0e>] lookup_real+0x20/0x35
[<790ca1d2>] __lookup_hash+0x2a/0x31
[<790caa0e>] lookup_slow+0x36/0x8c
[<790cbdda>] path_lookupat+0xf9/0x5c5
[<790cc2c5>] filename_lookup+0x1f/0x84
[<790cde98>] user_path_at_empty+0x3f/0x65
[<790cdecb>] user_path_at+0xd/0xf
[<790c6585>] vfs_fstatat+0x40/0x88
[<790c65f8>] vfs_stat+0x13/0x15
[<790c67e2>] sys_stat64+0x11/0x22
[<792b9d04>] syscall_call+0x7/0xb

What does this call trace tell us? That we locked up somewhere in the
ext3 code. That's all that we know.

Now lets take a look at a 'verbose' entry of the same lockup, that
also outputs other entries that 'look like' kernel text addresses:

Call Trace:
[<79018d24>] arch_trigger_all_cpu_backtrace+0x3c/0x64
[<7916a34c>] do_raw_spin_lock+0xb7/0xe8
[<792b9412>] _raw_spin_lock_irqsave+0x35/0x3e
[<790391e8>] ? prepare_to_wait+0x18/0x57
[<790391e8>] prepare_to_wait+0x18/0x57
[<7914a320>] ? generic_make_request+0x80/0xb5
[<790e4f30>] ? unmap_underlying_metadata+0x2e/0x2e
[<792b7158>] __wait_on_bit+0x20/0x5f
[<7914a427>] ? submit_bio+0xd2/0xfb
[<792b71eb>] out_of_line_wait_on_bit+0x54/0x5c
[<790e4f30>] ? unmap_underlying_metadata+0x2e/0x2e
[<79039086>] ? autoremove_wake_function+0x31/0x31
[<790e4eff>] __wait_on_buffer+0x1b/0x1e
[<7910f615>] __ext3_get_inode_loc+0x1ef/0x254
[<7911015f>] ext3_iget+0x45/0x346
[<79114e03>] ext3_lookup+0x97/0xa6
[<790c9c0e>] lookup_real+0x20/0x35
[<790ca1d2>] __lookup_hash+0x2a/0x31
[<790caa0e>] lookup_slow+0x36/0x8c
[<790cbdda>] path_lookupat+0xf9/0x5c5
[<790cc2c5>] filename_lookup+0x1f/0x84
[<790cde98>] user_path_at_empty+0x3f/0x65
[<790cdecb>] user_path_at+0xd/0xf
[<790c6585>] vfs_fstatat+0x40/0x88
[<7903f844>] ? lg_local_unlock+0x31/0x47
[<790c65f8>] vfs_stat+0x13/0x15
[<790c67e2>] sys_stat64+0x11/0x22
[<790c3c47>] ? __fput+0x187/0x1a0
[<792b9d37>] ? restore_all+0xf/0xf
[<79165bb4>] ? trace_hardirqs_on_thunk+0xc/0x10
[<792b9d04>] syscall_call+0x7/0xb

Firstly, it's still easy to tell the true backtrace: ignore all the
'?' entries.

But the '?' entries also tell us something more: that recently this
CPU submitted IO. That might or might not be suspected from the
original trace, but from the 'extended' trace it's really obvious that
IO activity happened before the lockup as well.

There were many other examples in the past where '?' entries gave us
clues about the nature of a bug: they represent a poor man's trace of
what happened recently on that CPU.

So it's useful information for hairy bugs and it would be sad to
remove them.

Having said that, your complaint that '?' entries can make reading of
back traces more difficult is valid as well - so maybe we can do
something about that.

For example if we used more horizontal separation in the output
format:

Call Trace:
[<79018d24>] arch_trigger_all_cpu_backtrace+0x3c/0x64
[<7916a34c>] do_raw_spin_lock+0xb7/0xe8
[<792b9412>] _raw_spin_lock_irqsave+0x35/0x3e
[<790391e8>] # ? prepare_to_wait+0x18/0x57
[<790391e8>] prepare_to_wait+0x18/0x57
[<7914a320>] # ? generic_make_request+0x80/0xb5
[<790e4f30>] # ? unmap_underlying_metadata+0x2e/0x2e
[<792b7158>] __wait_on_bit+0x20/0x5f
[<7914a427>] # ? submit_bio+0xd2/0xfb
[<792b71eb>] out_of_line_wait_on_bit+0x54/0x5c
[<790e4f30>] # ? unmap_underlying_metadata+0x2e/0x2e
[<79039086>] # ? autoremove_wake_function+0x31/0x31
[<790e4eff>] __wait_on_buffer+0x1b/0x1e
[<7910f615>] __ext3_get_inode_loc+0x1ef/0x254
[<7911015f>] ext3_iget+0x45/0x346
[<79114e03>] ext3_lookup+0x97/0xa6
[<790c9c0e>] lookup_real+0x20/0x35
[<790ca1d2>] __lookup_hash+0x2a/0x31
[<790caa0e>] lookup_slow+0x36/0x8c
[<790cbdda>] path_lookupat+0xf9/0x5c5
[<790cc2c5>] filename_lookup+0x1f/0x84
[<790cde98>] user_path_at_empty+0x3f/0x65
[<790cdecb>] user_path_at+0xd/0xf
[<790c6585>] vfs_fstatat+0x40/0x88
[<7903f844>] # ? lg_local_unlock+0x31/0x47
[<790c65f8>] vfs_stat+0x13/0x15
[<790c67e2>] sys_stat64+0x11/0x22
[<790c3c47>] # ? __fput+0x187/0x1a0
[<792b9d37>] # ? restore_all+0xf/0xf
[<79165bb4>] # ? trace_hardirqs_on_thunk+0xc/0x10
[<792b9d04>] syscall_call+0x7/0xb

then the information would still be there, it would still be in the
right place, but it would also be much easier to ignore the right side
column it visually. (The '# ?' prefix would also make it clear that
this information is not so important as the left hand side column.)

Or something like that. Assuming this doesn't confuse tools and such.

Thanks,

Ingo

2014-04-27 19:42:17

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] x86/dumpstack: Walk frames when built with frame pointers

On Sun, Apr 27, 2014 at 02:08:20PM +0200, Ingo Molnar wrote:
> For example if we used more horizontal separation in the output
> format:
>
> Call Trace:
> [<79018d24>] arch_trigger_all_cpu_backtrace+0x3c/0x64
> [<7916a34c>] do_raw_spin_lock+0xb7/0xe8
> [<792b9412>] _raw_spin_lock_irqsave+0x35/0x3e
> [<790391e8>] # ? prepare_to_wait+0x18/0x57
> [<790391e8>] prepare_to_wait+0x18/0x57
> [<7914a320>] # ? generic_make_request+0x80/0xb5
> [<790e4f30>] # ? unmap_underlying_metadata+0x2e/0x2e
> [<792b7158>] __wait_on_bit+0x20/0x5f
> [<7914a427>] # ? submit_bio+0xd2/0xfb
> [<792b71eb>] out_of_line_wait_on_bit+0x54/0x5c
> [<790e4f30>] # ? unmap_underlying_metadata+0x2e/0x2e
> [<79039086>] # ? autoremove_wake_function+0x31/0x31
> [<790e4eff>] __wait_on_buffer+0x1b/0x1e
> [<7910f615>] __ext3_get_inode_loc+0x1ef/0x254
> [<7911015f>] ext3_iget+0x45/0x346
> [<79114e03>] ext3_lookup+0x97/0xa6
> [<790c9c0e>] lookup_real+0x20/0x35
> [<790ca1d2>] __lookup_hash+0x2a/0x31
> [<790caa0e>] lookup_slow+0x36/0x8c
> [<790cbdda>] path_lookupat+0xf9/0x5c5
> [<790cc2c5>] filename_lookup+0x1f/0x84
> [<790cde98>] user_path_at_empty+0x3f/0x65
> [<790cdecb>] user_path_at+0xd/0xf
> [<790c6585>] vfs_fstatat+0x40/0x88
> [<7903f844>] # ? lg_local_unlock+0x31/0x47
> [<790c65f8>] vfs_stat+0x13/0x15
> [<790c67e2>] sys_stat64+0x11/0x22
> [<790c3c47>] # ? __fput+0x187/0x1a0
> [<792b9d37>] # ? restore_all+0xf/0xf
> [<79165bb4>] # ? trace_hardirqs_on_thunk+0xc/0x10
> [<792b9d04>] syscall_call+0x7/0xb
>
> then the information would still be there, it would still be in the
> right place, but it would also be much easier to ignore the right side
> column it visually. (The '# ?' prefix would also make it clear that
> this information is not so important as the left hand side column.)

Might be my brain being weird, but I find the above actually harder to
parse.

2014-04-27 19:52:10

by Richard Yao

[permalink] [raw]
Subject: Re: [PATCH] x86/dumpstack: Walk frames when built with frame pointers

I happened to write this with an email client that used Rich Text, which the LKML rejected, so I am resending it in plaintext format. My future replies will be made in plaintext, which is what my normal development system?s email client does.

On Apr 27, 2014, at 8:08 AM, Ingo Molnar <[email protected]> wrote:

>
> * Richard Yao <[email protected]> wrote:
>
>> Stack traces are generated by scanning the stack and interpeting
>> anything that looks like it could be a pointer to something. We do
>> not need to do this when we have frame pointers, but we do it
>> anyway, with the distinction that we use the return pointers to mark
>> actual frames by the absence of a question mark.
>>
>> The additional verbosity of stack scanning tends to bombard us with
>> walls of text for no gain in practice, so lets switch to printing
>> only stack frames when frame pointers are available. That we can
>> spend less time reading stack traces and more time looking at code.
>>
>> Signed-off-by: Richard Yao <[email protected]>
>> ---
>> arch/x86/kernel/dumpstack.c | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
>> index d9c12d3..94ffe06 100644
>> --- a/arch/x86/kernel/dumpstack.c
>> +++ b/arch/x86/kernel/dumpstack.c
>> @@ -162,7 +162,11 @@ static void print_trace_address(void *data, unsigned long addr, int reliable)
>> static const struct stacktrace_ops print_trace_ops = {
>> .stack = print_trace_stack,
>> .address = print_trace_address,
>> +#ifdef CONFIG_FRAME_POINTER
>> + .walk_stack = print_context_stack_bp,
>> +#else
>> .walk_stack = print_context_stack,
>> +#endif
>> };
>
> I don't really like this patch, and this question comes up often, so
> let me try to explain it more verbosely.
>
> Lets take a real-life example of a BP-only trace about a lockup:
>
> Call Trace:
> [<79018d24>] arch_trigger_all_cpu_backtrace+0x3c/0x64
> [<7916a34c>] do_raw_spin_lock+0xb7/0xe8
> [<792b9412>] _raw_spin_lock_irqsave+0x35/0x3e
> [<790391e8>] prepare_to_wait+0x18/0x57
> [<792b7158>] __wait_on_bit+0x20/0x5f
> [<792b71eb>] out_of_line_wait_on_bit+0x54/0x5c
> [<790e4eff>] __wait_on_buffer+0x1b/0x1e
> [<7910f615>] __ext3_get_inode_loc+0x1ef/0x254
> [<7911015f>] ext3_iget+0x45/0x346
> [<79114e03>] ext3_lookup+0x97/0xa6
> [<790c9c0e>] lookup_real+0x20/0x35
> [<790ca1d2>] __lookup_hash+0x2a/0x31
> [<790caa0e>] lookup_slow+0x36/0x8c
> [<790cbdda>] path_lookupat+0xf9/0x5c5
> [<790cc2c5>] filename_lookup+0x1f/0x84
> [<790cde98>] user_path_at_empty+0x3f/0x65
> [<790cdecb>] user_path_at+0xd/0xf
> [<790c6585>] vfs_fstatat+0x40/0x88
> [<790c65f8>] vfs_stat+0x13/0x15
> [<790c67e2>] sys_stat64+0x11/0x22
> [<792b9d04>] syscall_call+0x7/0xb
>
> What does this call trace tell us? That we locked up somewhere in the
> ext3 code. That's all that we know.

Actually, we know some other things:

1. Something should have set the bit and started IO.
2. The callback meant to clear that bit has been executed yet.
3. We want to find out who is responsible for that callback. i.e. We want stack traces of all other threads.

It is possible that the thread handling the callback deadlocked. In that case, backtraces of the other threads should suggest the possibility. Reviewing the backtraces is *much* easier without the added noise of the historical stack frames and in all likelihood, the person doing that review would catch the problem faster when they do reveal the issue. There is also no possibility of it being present in those traces, but missed because of the noise from these frames.

>
> Now lets take a look at a 'verbose' entry of the same lockup, that
> also outputs other entries that 'look like' kernel text addresses:
>
> Call Trace:
> [<79018d24>] arch_trigger_all_cpu_backtrace+0x3c/0x64
> [<7916a34c>] do_raw_spin_lock+0xb7/0xe8
> [<792b9412>] _raw_spin_lock_irqsave+0x35/0x3e
> [<790391e8>] ? prepare_to_wait+0x18/0x57
> [<790391e8>] prepare_to_wait+0x18/0x57
> [<7914a320>] ? generic_make_request+0x80/0xb5
> [<790e4f30>] ? unmap_underlying_metadata+0x2e/0x2e
> [<792b7158>] __wait_on_bit+0x20/0x5f
> [<7914a427>] ? submit_bio+0xd2/0xfb
> [<792b71eb>] out_of_line_wait_on_bit+0x54/0x5c
> [<790e4f30>] ? unmap_underlying_metadata+0x2e/0x2e
> [<79039086>] ? autoremove_wake_function+0x31/0x31
> [<790e4eff>] __wait_on_buffer+0x1b/0x1e
> [<7910f615>] __ext3_get_inode_loc+0x1ef/0x254
> [<7911015f>] ext3_iget+0x45/0x346
> [<79114e03>] ext3_lookup+0x97/0xa6
> [<790c9c0e>] lookup_real+0x20/0x35
> [<790ca1d2>] __lookup_hash+0x2a/0x31
> [<790caa0e>] lookup_slow+0x36/0x8c
> [<790cbdda>] path_lookupat+0xf9/0x5c5
> [<790cc2c5>] filename_lookup+0x1f/0x84
> [<790cde98>] user_path_at_empty+0x3f/0x65
> [<790cdecb>] user_path_at+0xd/0xf
> [<790c6585>] vfs_fstatat+0x40/0x88
> [<7903f844>] ? lg_local_unlock+0x31/0x47
> [<790c65f8>] vfs_stat+0x13/0x15
> [<790c67e2>] sys_stat64+0x11/0x22
> [<790c3c47>] ? __fput+0x187/0x1a0
> [<792b9d37>] ? restore_all+0xf/0xf
> [<79165bb4>] ? trace_hardirqs_on_thunk+0xc/0x10
> [<792b9d04>] syscall_call+0x7/0xb
>
> Firstly, it's still easy to tell the true backtrace: ignore all the
> '?' entries.
>
> But the '?' entries also tell us something more: that recently this
> CPU submitted IO. That might or might not be suspected from the
> original trace, but from the 'extended' trace it's really obvious that
> IO activity happened before the lockup as well.
>
> There were many other examples in the past where '?' entries gave us
> clues about the nature of a bug: they represent a poor man's trace of
> what happened recently on that CPU.
>
> So it's useful information for hairy bugs and it would be sad to
> remove them.

It is true that this thread did block IO at some point in the past. It is also possible that block IO operation is related to this stack trace, but the many ways that those frames could get there makes it impossible to conclude that the operation is related. A few ways that occur to me are:

1. This could have gone onto the stack as a result of directory reclaim.
2. A previous syscall by user land could have triggered block IO on something unrelated.
3. If this had been on a kernel thread that has no user land stack like kswapd, then this information can be expected to be there.

Upon seeing the stack trace you provided, I would want to look at all threads on the system under the assumption that some thread started IO and some other thread (or maybe even an interrupt handler) should be responsible for finishing it. I suspect that is a more effective way to get a handle on the problem. If that does not work, then I likely want to attach kgdb or at the very least, start looking at code paths that could start the IO. In such a case, being told that the current thread might have started IO is not really that useful to me. I still need to consider all possible points where this IO could have started.

In this context, having the historical stack frames in the backtrace seems to be a hinderance, rather than a help. They serve to slow down manual inspection of the actual stacks and can obscure useful information that would have been caught had they never been printed. They also can waste valuable screen space and cause the cause of useful information by overflowing scroll back buffers.

Reviewing the historical stack frames under the assumption that they might be useful encourages one to think that the absence of such frames would be useful give that the presence might be useful, but that could not be further than the truth. Having them appear depends on the kernel .config, the toolchain used, the amount of memory pressure (direct reclaim), the call paths taken, etcetera. In short, it is mostly a matter of luck and I find their usefulness when they do appear in backtraces to be very questionable.

In addition, the lack of determines from non-useful information in backtraces makes it more difficult to catch issues with the debug information itself. This is often the case in binary distributions, which often strip kernel modules by passing --strip-uneeded to strip. That creates modules where `file` reports that they are unstripped, but in actuality, they are stripped of all static symbols. In such cases, the stack frames are often garbage, but someone trying to make sense of historical stack frames might not notice because they are used to seeing strange things. In such cases, trying to interpret these frames is definitely a waste of time, but there is nothing (e.g. a taint message?) to inform us of this.

While I believe that the the historical stack frames have been useful in the past for some difficult bugs (when the stars aligned properly) and can still be useful in such instances, I consider the consequences of printing them to outweigh the benefits. In the common case where these frames are not useful, their presence wastes time that could have gone to things that would actually be useful. In the rare cases where they are useful, people relying on them spend their time on one-off solutions. Spending time on many of these one-off solutions is a less efficient use of time than spending time on tools to catch such issues out of necessity.

In both cases, there seems to plenty of time that would be better spent by not printing these frames. Some possibilities that occur to me are better static analysis tools, more widespread adoption of crash dumps and more time spent on dynamic tracing tools. Any one of them should ultimately be beneficial to the overall bug churn rate and consequently, get more eyeballs on difficult to debug problems, such as the rare ones where these historical stack frames are useful.

> Having said that, your complaint that '?' entries can make reading of
> back traces more difficult is valid as well - so maybe we can do
> something about that.
>
> For example if we used more horizontal separation in the output
> format:
>
> Call Trace:
> [<79018d24>] arch_trigger_all_cpu_backtrace+0x3c/0x64
> [<7916a34c>] do_raw_spin_lock+0xb7/0xe8
> [<792b9412>] _raw_spin_lock_irqsave+0x35/0x3e
> [<790391e8>] # ? prepare_to_wait+0x18/0x57
> [<790391e8>] prepare_to_wait+0x18/0x57
> [<7914a320>] # ? generic_make_request+0x80/0xb5
> [<790e4f30>] # ? unmap_underlying_metadata+0x2e/0x2e
> [<792b7158>] __wait_on_bit+0x20/0x5f
> [<7914a427>] # ? submit_bio+0xd2/0xfb
> [<792b71eb>] out_of_line_wait_on_bit+0x54/0x5c
> [<790e4f30>] # ? unmap_underlying_metadata+0x2e/0x2e
> [<79039086>] # ? autoremove_wake_function+0x31/0x31
> [<790e4eff>] __wait_on_buffer+0x1b/0x1e
> [<7910f615>] __ext3_get_inode_loc+0x1ef/0x254
> [<7911015f>] ext3_iget+0x45/0x346
> [<79114e03>] ext3_lookup+0x97/0xa6
> [<790c9c0e>] lookup_real+0x20/0x35
> [<790ca1d2>] __lookup_hash+0x2a/0x31
> [<790caa0e>] lookup_slow+0x36/0x8c
> [<790cbdda>] path_lookupat+0xf9/0x5c5
> [<790cc2c5>] filename_lookup+0x1f/0x84
> [<790cde98>] user_path_at_empty+0x3f/0x65
> [<790cdecb>] user_path_at+0xd/0xf
> [<790c6585>] vfs_fstatat+0x40/0x88
> [<7903f844>] # ? lg_local_unlock+0x31/0x47
> [<790c65f8>] vfs_stat+0x13/0x15
> [<790c67e2>] sys_stat64+0x11/0x22
> [<790c3c47>] # ? __fput+0x187/0x1a0
> [<792b9d37>] # ? restore_all+0xf/0xf
> [<79165bb4>] # ? trace_hardirqs_on_thunk+0xc/0x10
> [<792b9d04>] syscall_call+0x7/0xb
>
> then the information would still be there, it would still be in the
> right place, but it would also be much easier to ignore the right side
> column it visually. (The '# ?' prefix would also make it clear that
> this information is not so important as the left hand side column.)
>
> Or something like that. Assuming this doesn't confuse tools and such.

If we do something like this, it would be best to make it configurable via a sysctl. We could have settings for something like the behavior you proposed, the original behavior we have now and the behavior I would like to see.

That being said, I would like to know if you still feel this information is useful after hearing my thoughts on it.-

2014-04-27 20:08:23

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] x86/dumpstack: Walk frames when built with frame pointers

On Sun, Apr 27, 2014 at 5:08 AM, Ingo Molnar <[email protected]> wrote:
>
> So it's useful information for hairy bugs and it would be sad to
> remove them.

I tend to agree. I've often found the left-overs to be good clues
about what just got called. Although equally often it's another kind
of clue entirely: that the stack frame of some of the functions
involved in the real call frame is much too big, leaving room for that
stale information to lay around.

> Having said that, your complaint that '?' entries can make reading of
> back traces more difficult is valid as well - so maybe we can do
> something about that.

Quite frankly, I'd much rather just remove the annoying hex numbers
that are imnsho *much* more distracting. Possibly even the "/0xsize"
part (although that is at least somewhat useful to judge where in the
function it is).

And while it would be horrible for readability, it might also be a
good idea to replace the newlines with something like " -> " instead,
because we are quite often vertically challenged. But that could
really make things pretty unreadable.

So to take your example, it might be something like this

arch_trigger_all_cpu_backtrace+0x3c -> do_raw_spin_lock+0xb7
-> _raw_spin_lock_irqsave+0x35 -> ? prepare_to_wait+0x18
-> prepare_to_wait+0x18 -> ? generic_make_request+0x80
-> ? unmap_underlying_metadata+0x2e -> __wait_on_bit+0x20
-> ? submit_bio+0xd2 -> out_of_line_wait_on_bit+0x54
-> ? unmap_underlying_metadata+0x2e -> ? autoremove_wake_function+0x31
-> __wait_on_buffer+0x1b -> __ext3_get_inode_loc+0x1ef -> ext3_iget+0x45
-> ext3_lookup+0x97 -> lookup_real+0x20 -> __lookup_hash+0x2a
-> lookup_slow+0x36 -> path_lookupat+0xf9 -> filename_lookup+0x1f
-> user_path_at_empty+0x3f -> user_path_at+0xd -> vfs_fstatat+0x40
-> ? lg_local_unlock+0x31 -> vfs_stat+0x13 -> sys_stat64+0x11
-> ? __fput+0x187 -> ? restore_all+0xf -> ? trace_hardirqs_on_thunk+0xc
-> syscall_call+0x7

which is admittedly complete line noise, but is just 13 lines rather
than 31. That can sometimes be a really big deal.

Also, we might want to cap the number of lines regardless. It is true
that sometimes the really deep call chains can be interesting, but
equally often they make other important stuff scroll off the screen
(oopses that don't get caught in /sys/log/messages because they kill
the machine are the worst to debug, and we still end up having people
send pictures taken with digital cameras of them), so it's a "win
some, lose some" kind of thing.

Of course, the questionable stale entries on the stack can (and do)
make the whole "scroll off the screen" thing worse. So I dunno.

Linus

2014-04-27 20:36:51

by Richard Yao

[permalink] [raw]
Subject: Re: [PATCH] x86/dumpstack: Walk frames when built with frame pointers

On Apr 27, 2014, at 4:08 PM, Linus Torvalds <[email protected]> wrote:

> On Sun, Apr 27, 2014 at 5:08 AM, Ingo Molnar <[email protected]> wrote:
>>
>> So it's useful information for hairy bugs and it would be sad to
>> remove them.
>
> I tend to agree. I've often found the left-overs to be good clues
> about what just got called. Although equally often it's another kind
> of clue entirely: that the stack frame of some of the functions
> involved in the real call frame is much too big, leaving room for that
> stale information to lay around.

Getting information about the frames being too big is a useful side effect. I don?t think the omission of historical frames from the stack traces needs to eliminate that. In particular, we could print the stack depth, frame size and number of possible pointers to kernel code spotted in the frame alongside the stack frames that we obtain from unwinding the stack.

>> Having said that, your complaint that '?' entries can make reading of
>> back traces more difficult is valid as well - so maybe we can do
>> something about that.
>
> Quite frankly, I'd much rather just remove the annoying hex numbers
> that are imnsho *much* more distracting. Possibly even the "/0xsize"
> part (although that is at least somewhat useful to judge where in the
> function it is).
>
> And while it would be horrible for readability, it might also be a
> good idea to replace the newlines with something like " -> " instead,
> because we are quite often vertically challenged. But that could
> really make things pretty unreadable.
>
> So to take your example, it might be something like this
>
> arch_trigger_all_cpu_backtrace+0x3c -> do_raw_spin_lock+0xb7
> -> _raw_spin_lock_irqsave+0x35 -> ? prepare_to_wait+0x18
> -> prepare_to_wait+0x18 -> ? generic_make_request+0x80
> -> ? unmap_underlying_metadata+0x2e -> __wait_on_bit+0x20
> -> ? submit_bio+0xd2 -> out_of_line_wait_on_bit+0x54
> -> ? unmap_underlying_metadata+0x2e -> ? autoremove_wake_function+0x31
> -> __wait_on_buffer+0x1b -> __ext3_get_inode_loc+0x1ef -> ext3_iget+0x45
> -> ext3_lookup+0x97 -> lookup_real+0x20 -> __lookup_hash+0x2a
> -> lookup_slow+0x36 -> path_lookupat+0xf9 -> filename_lookup+0x1f
> -> user_path_at_empty+0x3f -> user_path_at+0xd -> vfs_fstatat+0x40
> -> ? lg_local_unlock+0x31 -> vfs_stat+0x13 -> sys_stat64+0x11
> -> ? __fput+0x187 -> ? restore_all+0xf -> ? trace_hardirqs_on_thunk+0xc
> -> syscall_call+0x7
>
> which is admittedly complete line noise, but is just 13 lines rather
> than 31. That can sometimes be a really big deal.
>
> Also, we might want to cap the number of lines regardless. It is true
> that sometimes the really deep call chains can be interesting, but
> equally often they make other important stuff scroll off the screen
> (oopses that don't get caught in /sys/log/messages because they kill
> the machine are the worst to debug, and we still end up having people
> send pictures taken with digital cameras of them), so it's a "win
> some, lose some" kind of thing.
>
> Of course, the questionable stale entries on the stack can (and do)
> make the whole "scroll off the screen" thing worse. So I dunno.
>
> Linus

I suppose one option would be to write a patch to make this configurable via sysctl, throw it into staging so only kernel developers have access to it (I hope) and then try out each of these variations on our own machines to see which one we like best. The patch I submitted to the list is one that I am using myself and I have come to really like having only stack frames described by frame pointers printed. I think others who try more condensed stacks will come to like them too.

If you are interested in doing that experiment, I could put a patch together to do it.-

2014-04-30 21:56:14

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH] x86/dumpstack: Walk frames when built with frame pointers

On Sun, Apr 27, 2014 at 02:08:20PM +0200, Ingo Molnar wrote:
>
> * Richard Yao <[email protected]> wrote:
>
> > Stack traces are generated by scanning the stack and interpeting
> > anything that looks like it could be a pointer to something. We do
> > not need to do this when we have frame pointers, but we do it
> > anyway, with the distinction that we use the return pointers to mark
> > actual frames by the absence of a question mark.
> >
> > The additional verbosity of stack scanning tends to bombard us with
> > walls of text for no gain in practice, so lets switch to printing
> > only stack frames when frame pointers are available. That we can
> > spend less time reading stack traces and more time looking at code.
> >
> > Signed-off-by: Richard Yao <[email protected]>
> > ---
> > arch/x86/kernel/dumpstack.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
> > index d9c12d3..94ffe06 100644
> > --- a/arch/x86/kernel/dumpstack.c
> > +++ b/arch/x86/kernel/dumpstack.c
> > @@ -162,7 +162,11 @@ static void print_trace_address(void *data, unsigned long addr, int reliable)
> > static const struct stacktrace_ops print_trace_ops = {
> > .stack = print_trace_stack,
> > .address = print_trace_address,
> > +#ifdef CONFIG_FRAME_POINTER
> > + .walk_stack = print_context_stack_bp,
> > +#else
> > .walk_stack = print_context_stack,
> > +#endif
> > };

Besides the complementary informations brought by the full stack walk,
another big argument toward keeping full stack walk is that if your frame
pointer is screwed for whatever reason, you still have a useful stack trace.

I have seen and fixed several broken frame links in x86-64 by the past. Those
are very subtle and often hardly visible issues because, if they are easily spotted
on common frame scenarios like : task > irq, they are much harder to find on trickier,
rarer frame scenarios such as: task -> softirq -> irq -> nmi -> debug exception ->....

For example before a2bbe75089d5eb9a3a46d50dd5c215e213790288
("x86: Don't use frame pointer to save old stack on irq entry"), we were missing
entire stack frames on nesting irqs (hardirq on softirqs) while using pure frame
pointer based unwinding.

Who knows if we have other remaining issues like this? Especially given the high
possible number of frame combinations between task, irq, softirq, nmi and exceptions.
Multiply the contexts possibility by the number of possible archs out there and their
stack switch implementations.

Also further frame links breakages, we have many other possibilities to end up
with misleading frame pointers. Relying on that source alone definetly reduce the
reliability of our stacktraces.

So this goes way beyond just missing complementary informations. Debugging robustness
itself is actually very concerned here if we remove the full stack walk.

2014-06-06 08:17:52

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] x86/dumpstack: Walk frames when built with frame pointers

On Wed, May 07, 2014 at 06:40:14PM +0200, Ingo Molnar wrote:
> Agreed, that's a very good point.
>
> Also, consider the following holistic argument, what is easier to
> achieve, when looking at an oops and not seeing the bug:
>
> - if only I had more information
> - if only I had less information
>
> we cannot put in information that we cut out, but it's not
> particularly hard to skip overly verbose information in most cases.
>
> Yes, there's a line to be drawn with verbosity: scroll-off is a
> concern when the oops does not make it to a log file.
>
> So I don't really know.

So it really stinks its so hard to debug laptops, and it doesn't help
the ehci debug dongle costs an arm and a leg.

Hopefully the EFI pstore thing will work out, but I've already heard
that can wear out very quickly and render your machine a brick :-(

Me, I simply refuse to work on machines if they don't have a serial
line, but in general that's not a tenable position I suppose :/


Attachments:
(No filename) (983.00 B)
(No filename) (836.00 B)
Download all attachments

2014-06-06 08:24:23

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] x86/dumpstack: Walk frames when built with frame pointers

On Fri, Jun 06, 2014 at 10:17:26AM +0200, Peter Zijlstra wrote:
> Hopefully the EFI pstore thing will work out, but I've already heard
> that can wear out very quickly and render your machine a brick :-(

And not only that - write speed is awfully low. I hear you might be ok
to record an oops or so. Continuous debug output, not so much.

> Me, I simply refuse to work on machines if they don't have a serial
> line, but in general that's not a tenable position I suppose :/

Well, with all the other crap they put in modern machines, how hard it
is to put a goddam stupid serial line out everywhere...? WTF do I need a
goddam adaptive keyboard and other idiotic bling-bling if I can't have a
simple serial line for debugging... Blergh.

2014-06-06 09:36:22

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] x86/dumpstack: Walk frames when built with frame pointers

On Fri, Jun 06, 2014 at 10:24:20AM +0200, Borislav Petkov wrote:
> Well, with all the other crap they put in modern machines, how hard it
> is to put a goddam stupid serial line out everywhere...? WTF do I need a
> goddam adaptive keyboard and other idiotic bling-bling if I can't have a
> simple serial line for debugging... Blergh.

This DE-9 is not esthetically pleasing :-) They could of course use a
different connector, its only 5 wires or so iirc.



Attachments:
(No filename) (457.00 B)
(No filename) (836.00 B)
Download all attachments

2014-06-07 03:09:51

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] x86/dumpstack: Walk frames when built with frame pointers

Yes, but now you need a custom adapter. Some have used a phono plug for the absolute minimum 3 wires. However, no standards...

Sent from my tablet, pardon any formatting problems.

> On Jun 6, 2014, at 2:35, Peter Zijlstra <[email protected]> wrote:
>
>> On Fri, Jun 06, 2014 at 10:24:20AM +0200, Borislav Petkov wrote:
>> Well, with all the other crap they put in modern machines, how hard it
>> is to put a goddam stupid serial line out everywhere...? WTF do I need a
>> goddam adaptive keyboard and other idiotic bling-bling if I can't have a
>> simple serial line for debugging... Blergh.
>
> This DE-9 is not esthetically pleasing :-) They could of course use a
> different connector, its only 5 wires or so iirc.
>
>

2014-06-07 03:10:52

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] x86/dumpstack: Walk frames when built with frame pointers

The USB debug port is nice once you get it to work. Assumes they haven't done anything clever like hide the special port behind a hub...

Sent from my tablet, pardon any formatting problems.

> On Jun 6, 2014, at 1:24, Borislav Petkov <[email protected]> wrote:
>
>> On Fri, Jun 06, 2014 at 10:17:26AM +0200, Peter Zijlstra wrote:
>> Hopefully the EFI pstore thing will work out, but I've already heard
>> that can wear out very quickly and render your machine a brick :-(
>
> And not only that - write speed is awfully low. I hear you might be ok
> to record an oops or so. Continuous debug output, not so much.
>
>> Me, I simply refuse to work on machines if they don't have a serial
>> line, but in general that's not a tenable position I suppose :/
>
> Well, with all the other crap they put in modern machines, how hard it
> is to put a goddam stupid serial line out everywhere...? WTF do I need a
> goddam adaptive keyboard and other idiotic bling-bling if I can't have a
> simple serial line for debugging... Blergh.