2023-03-21 06:29:26

by Tiezhu Yang

[permalink] [raw]
Subject: [PATCH] LoongArch: Check unwind_error() in arch_stack_walk()

We can see the following messages with CONFIG_PROVE_LOCKING=y on
LoongArch:

BUG: MAX_STACK_TRACE_ENTRIES too low!
turning off the locking correctness validator.

This is because stack_trace_save() returns a big value after call
arch_stack_walk(), here is the call trace:

save_trace()
stack_trace_save()
arch_stack_walk()
stack_trace_consume_entry()

arch_stack_walk() should return immediately if unwind_next_frame()
failed, no need to do the useless loops to increase the value of
c->len in stack_trace_consume_entry(), then we can fix the above
problem.

Reported-by: Guenter Roeck <[email protected]>
Link: https://lore.kernel.org/all/[email protected]/
Signed-off-by: Tiezhu Yang <[email protected]>
---
arch/loongarch/kernel/stacktrace.c | 3 ++-
arch/loongarch/kernel/unwind.c | 1 +
arch/loongarch/kernel/unwind_prologue.c | 4 +++-
3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/loongarch/kernel/stacktrace.c b/arch/loongarch/kernel/stacktrace.c
index 3a690f9..7c15ba5 100644
--- a/arch/loongarch/kernel/stacktrace.c
+++ b/arch/loongarch/kernel/stacktrace.c
@@ -30,7 +30,8 @@ void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,

regs->regs[1] = 0;
for (unwind_start(&state, task, regs);
- !unwind_done(&state); unwind_next_frame(&state)) {
+ !unwind_done(&state) && !unwind_error(&state);
+ unwind_next_frame(&state)) {
addr = unwind_get_return_address(&state);
if (!addr || !consume_entry(cookie, addr))
break;
diff --git a/arch/loongarch/kernel/unwind.c b/arch/loongarch/kernel/unwind.c
index a463d69..ba324ba 100644
--- a/arch/loongarch/kernel/unwind.c
+++ b/arch/loongarch/kernel/unwind.c
@@ -28,5 +28,6 @@ bool default_next_frame(struct unwind_state *state)

} while (!get_stack_info(state->sp, state->task, info));

+ state->error = true;
return false;
}
diff --git a/arch/loongarch/kernel/unwind_prologue.c b/arch/loongarch/kernel/unwind_prologue.c
index 9095fde..55afc27 100644
--- a/arch/loongarch/kernel/unwind_prologue.c
+++ b/arch/loongarch/kernel/unwind_prologue.c
@@ -211,7 +211,7 @@ static bool next_frame(struct unwind_state *state)
pc = regs->csr_era;

if (user_mode(regs) || !__kernel_text_address(pc))
- return false;
+ goto out;

state->first = true;
state->pc = pc;
@@ -226,6 +226,8 @@ static bool next_frame(struct unwind_state *state)

} while (!get_stack_info(state->sp, state->task, info));

+out:
+ state->error = true;
return false;
}

--
2.1.0



2023-03-21 12:35:42

by Xi Ruoyao

[permalink] [raw]
Subject: Re: [PATCH] LoongArch: Check unwind_error() in arch_stack_walk()

On Tue, 2023-03-21 at 14:29 +0800, Tiezhu Yang wrote:
> We can see the following messages with CONFIG_PROVE_LOCKING=y on
> LoongArch:
>
>   BUG: MAX_STACK_TRACE_ENTRIES too low!
>   turning off the locking correctness validator.
>
> This is because stack_trace_save() returns a big value after call
> arch_stack_walk(), here is the call trace:
>
>   save_trace()
>     stack_trace_save()
>       arch_stack_walk()
>         stack_trace_consume_entry()
>
> arch_stack_walk() should return immediately if unwind_next_frame()
> failed, no need to do the useless loops to increase the value of
> c->len in stack_trace_consume_entry(), then we can fix the above
> problem.
>
> Reported-by: Guenter Roeck <[email protected]>
> Link: https://lore.kernel.org/all/[email protected]/
> Signed-off-by: Tiezhu Yang <[email protected]>

The fix makes sense, but I'm asking the same question again (sorry if
it's noisy): should we Cc [email protected] and/or make a PR for
6.3?

To me a bug fixes should be backported into all stable branches affected
by the bug, unless there is some serious difficulty. As 6.3 release
will work on launched 3A5000 boards out-of-box, people may want to stop
staying on the leading edge and use a LTS/stable release series. We
can't just say (or behave like) "we don't backport, please use latest
mainline" IMO :).

> ---
>  arch/loongarch/kernel/stacktrace.c      | 3 ++-
>  arch/loongarch/kernel/unwind.c          | 1 +
>  arch/loongarch/kernel/unwind_prologue.c | 4 +++-
>  3 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/arch/loongarch/kernel/stacktrace.c b/arch/loongarch/kernel/stacktrace.c
> index 3a690f9..7c15ba5 100644
> --- a/arch/loongarch/kernel/stacktrace.c
> +++ b/arch/loongarch/kernel/stacktrace.c
> @@ -30,7 +30,8 @@ void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie,
>  
>         regs->regs[1] = 0;
>         for (unwind_start(&state, task, regs);
> -             !unwind_done(&state); unwind_next_frame(&state)) {
> +            !unwind_done(&state) && !unwind_error(&state);
> +            unwind_next_frame(&state)) {
>                 addr = unwind_get_return_address(&state);
>                 if (!addr || !consume_entry(cookie, addr))
>                         break;
> diff --git a/arch/loongarch/kernel/unwind.c b/arch/loongarch/kernel/unwind.c
> index a463d69..ba324ba 100644
> --- a/arch/loongarch/kernel/unwind.c
> +++ b/arch/loongarch/kernel/unwind.c
> @@ -28,5 +28,6 @@ bool default_next_frame(struct unwind_state *state)
>  
>         } while (!get_stack_info(state->sp, state->task, info));
>  
> +       state->error = true;
>         return false;
>  }
> diff --git a/arch/loongarch/kernel/unwind_prologue.c b/arch/loongarch/kernel/unwind_prologue.c
> index 9095fde..55afc27 100644
> --- a/arch/loongarch/kernel/unwind_prologue.c
> +++ b/arch/loongarch/kernel/unwind_prologue.c
> @@ -211,7 +211,7 @@ static bool next_frame(struct unwind_state *state)
>                         pc = regs->csr_era;
>  
>                         if (user_mode(regs) || !__kernel_text_address(pc))
> -                               return false;
> +                               goto out;
>  
>                         state->first = true;
>                         state->pc = pc;
> @@ -226,6 +226,8 @@ static bool next_frame(struct unwind_state *state)
>  
>         } while (!get_stack_info(state->sp, state->task, info));
>  
> +out:
> +       state->error = true;
>         return false;
>  }
>  

--
Xi Ruoyao <[email protected]>
School of Aerospace Science and Technology, Xidian University

2023-03-21 14:26:40

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH] LoongArch: Check unwind_error() in arch_stack_walk()

On Tue, Mar 21, 2023 at 08:35:34PM +0800, Xi Ruoyao wrote:
> On Tue, 2023-03-21 at 14:29 +0800, Tiezhu Yang wrote:
> > We can see the following messages with CONFIG_PROVE_LOCKING=y on
> > LoongArch:
> >
> > ? BUG: MAX_STACK_TRACE_ENTRIES too low!
> > ? turning off the locking correctness validator.
> >
> > This is because stack_trace_save() returns a big value after call
> > arch_stack_walk(), here is the call trace:
> >
> > ? save_trace()
> > ??? stack_trace_save()
> > ????? arch_stack_walk()
> > ??????? stack_trace_consume_entry()
> >
> > arch_stack_walk() should return immediately if unwind_next_frame()
> > failed, no need to do the useless loops to increase the value of
> > c->len in stack_trace_consume_entry(), then we can fix the above
> > problem.
> >
> > Reported-by: Guenter Roeck <[email protected]>
> > Link: https://lore.kernel.org/all/[email protected]/
> > Signed-off-by: Tiezhu Yang <[email protected]>
>
> The fix makes sense, but I'm asking the same question again (sorry if
> it's noisy): should we Cc [email protected] and/or make a PR for
> 6.3?
>
> To me a bug fixes should be backported into all stable branches affected
> by the bug, unless there is some serious difficulty. As 6.3 release
> will work on launched 3A5000 boards out-of-box, people may want to stop
> staying on the leading edge and use a LTS/stable release series. We
> can't just say (or behave like) "we don't backport, please use latest
> mainline" IMO :).

It is a bug fix, isn't it ? It should be backported to v6.1+. Otherwise,
if your policy is to not backport bug fixes, I might as well stop testing
loongarch on all but the most recent kernel branch. Let me know if this is
what you want. If so, I think you should let all other regression testers
know that they should only test loongarch on mainline and possibly on
linux-next.

Thanks,
Guenter

2023-03-22 00:51:58

by Huacai Chen

[permalink] [raw]
Subject: Re: [PATCH] LoongArch: Check unwind_error() in arch_stack_walk()

On Tue, Mar 21, 2023 at 10:25 PM Guenter Roeck <[email protected]> wrote:
>
> On Tue, Mar 21, 2023 at 08:35:34PM +0800, Xi Ruoyao wrote:
> > On Tue, 2023-03-21 at 14:29 +0800, Tiezhu Yang wrote:
> > > We can see the following messages with CONFIG_PROVE_LOCKING=y on
> > > LoongArch:
> > >
> > > BUG: MAX_STACK_TRACE_ENTRIES too low!
> > > turning off the locking correctness validator.
> > >
> > > This is because stack_trace_save() returns a big value after call
> > > arch_stack_walk(), here is the call trace:
> > >
> > > save_trace()
> > > stack_trace_save()
> > > arch_stack_walk()
> > > stack_trace_consume_entry()
> > >
> > > arch_stack_walk() should return immediately if unwind_next_frame()
> > > failed, no need to do the useless loops to increase the value of
> > > c->len in stack_trace_consume_entry(), then we can fix the above
> > > problem.
> > >
> > > Reported-by: Guenter Roeck <[email protected]>
> > > Link: https://lore.kernel.org/all/[email protected]/
> > > Signed-off-by: Tiezhu Yang <[email protected]>
> >
> > The fix makes sense, but I'm asking the same question again (sorry if
> > it's noisy): should we Cc [email protected] and/or make a PR for
> > 6.3?
> >
> > To me a bug fixes should be backported into all stable branches affected
> > by the bug, unless there is some serious difficulty. As 6.3 release
> > will work on launched 3A5000 boards out-of-box, people may want to stop
> > staying on the leading edge and use a LTS/stable release series. We
> > can't just say (or behave like) "we don't backport, please use latest
> > mainline" IMO :).
>
> It is a bug fix, isn't it ? It should be backported to v6.1+. Otherwise,
> if your policy is to not backport bug fixes, I might as well stop testing
> loongarch on all but the most recent kernel branch. Let me know if this is
> what you want. If so, I think you should let all other regression testers
> know that they should only test loongarch on mainline and possibly on
> linux-next.
This is of course a bug fix, but should Tiezhu resend this patch? Or
just replying to this message with CC [email protected] is
enough?

Huacai
>
> Thanks,
> Guenter

2023-03-22 02:26:25

by Guenter Roeck

[permalink] [raw]
Subject: Re: [PATCH] LoongArch: Check unwind_error() in arch_stack_walk()

On Wed, Mar 22, 2023 at 08:50:07AM +0800, Huacai Chen wrote:
> On Tue, Mar 21, 2023 at 10:25 PM Guenter Roeck <[email protected]> wrote:
> >
> > On Tue, Mar 21, 2023 at 08:35:34PM +0800, Xi Ruoyao wrote:
> > > On Tue, 2023-03-21 at 14:29 +0800, Tiezhu Yang wrote:
> > > > We can see the following messages with CONFIG_PROVE_LOCKING=y on
> > > > LoongArch:
> > > >
> > > > BUG: MAX_STACK_TRACE_ENTRIES too low!
> > > > turning off the locking correctness validator.
> > > >
> > > > This is because stack_trace_save() returns a big value after call
> > > > arch_stack_walk(), here is the call trace:
> > > >
> > > > save_trace()
> > > > stack_trace_save()
> > > > arch_stack_walk()
> > > > stack_trace_consume_entry()
> > > >
> > > > arch_stack_walk() should return immediately if unwind_next_frame()
> > > > failed, no need to do the useless loops to increase the value of
> > > > c->len in stack_trace_consume_entry(), then we can fix the above
> > > > problem.
> > > >
> > > > Reported-by: Guenter Roeck <[email protected]>
> > > > Link: https://lore.kernel.org/all/[email protected]/
> > > > Signed-off-by: Tiezhu Yang <[email protected]>
> > >
> > > The fix makes sense, but I'm asking the same question again (sorry if
> > > it's noisy): should we Cc [email protected] and/or make a PR for
> > > 6.3?
> > >
> > > To me a bug fixes should be backported into all stable branches affected
> > > by the bug, unless there is some serious difficulty. As 6.3 release
> > > will work on launched 3A5000 boards out-of-box, people may want to stop
> > > staying on the leading edge and use a LTS/stable release series. We
> > > can't just say (or behave like) "we don't backport, please use latest
> > > mainline" IMO :).
> >
> > It is a bug fix, isn't it ? It should be backported to v6.1+. Otherwise,
> > if your policy is to not backport bug fixes, I might as well stop testing
> > loongarch on all but the most recent kernel branch. Let me know if this is
> > what you want. If so, I think you should let all other regression testers
> > know that they should only test loongarch on mainline and possibly on
> > linux-next.
> This is of course a bug fix, but should Tiezhu resend this patch? Or
> just replying to this message with CC [email protected] is
> enough?
>

Normally the maintainer, before sending a pull request to Linus, would add
"Cc: [email protected]" to the patch. Actually sending the patch to
the stable@ mailing list is only necessary if it was applied to the
upstream kernel without Cc: stable@ in the commit message.

Guenter

2023-03-23 01:36:50

by Huacai Chen

[permalink] [raw]
Subject: Re: [PATCH] LoongArch: Check unwind_error() in arch_stack_walk()

OK, thanks.

Huacai

On Wed, Mar 22, 2023 at 10:20 AM Guenter Roeck <[email protected]> wrote:
>
> On Wed, Mar 22, 2023 at 08:50:07AM +0800, Huacai Chen wrote:
> > On Tue, Mar 21, 2023 at 10:25 PM Guenter Roeck <[email protected]> wrote:
> > >
> > > On Tue, Mar 21, 2023 at 08:35:34PM +0800, Xi Ruoyao wrote:
> > > > On Tue, 2023-03-21 at 14:29 +0800, Tiezhu Yang wrote:
> > > > > We can see the following messages with CONFIG_PROVE_LOCKING=y on
> > > > > LoongArch:
> > > > >
> > > > > BUG: MAX_STACK_TRACE_ENTRIES too low!
> > > > > turning off the locking correctness validator.
> > > > >
> > > > > This is because stack_trace_save() returns a big value after call
> > > > > arch_stack_walk(), here is the call trace:
> > > > >
> > > > > save_trace()
> > > > > stack_trace_save()
> > > > > arch_stack_walk()
> > > > > stack_trace_consume_entry()
> > > > >
> > > > > arch_stack_walk() should return immediately if unwind_next_frame()
> > > > > failed, no need to do the useless loops to increase the value of
> > > > > c->len in stack_trace_consume_entry(), then we can fix the above
> > > > > problem.
> > > > >
> > > > > Reported-by: Guenter Roeck <[email protected]>
> > > > > Link: https://lore.kernel.org/all/[email protected]/
> > > > > Signed-off-by: Tiezhu Yang <[email protected]>
> > > >
> > > > The fix makes sense, but I'm asking the same question again (sorry if
> > > > it's noisy): should we Cc [email protected] and/or make a PR for
> > > > 6.3?
> > > >
> > > > To me a bug fixes should be backported into all stable branches affected
> > > > by the bug, unless there is some serious difficulty. As 6.3 release
> > > > will work on launched 3A5000 boards out-of-box, people may want to stop
> > > > staying on the leading edge and use a LTS/stable release series. We
> > > > can't just say (or behave like) "we don't backport, please use latest
> > > > mainline" IMO :).
> > >
> > > It is a bug fix, isn't it ? It should be backported to v6.1+. Otherwise,
> > > if your policy is to not backport bug fixes, I might as well stop testing
> > > loongarch on all but the most recent kernel branch. Let me know if this is
> > > what you want. If so, I think you should let all other regression testers
> > > know that they should only test loongarch on mainline and possibly on
> > > linux-next.
> > This is of course a bug fix, but should Tiezhu resend this patch? Or
> > just replying to this message with CC [email protected] is
> > enough?
> >
>
> Normally the maintainer, before sending a pull request to Linus, would add
> "Cc: [email protected]" to the patch. Actually sending the patch to
> the stable@ mailing list is only necessary if it was applied to the
> upstream kernel without Cc: stable@ in the commit message.
>
> Guenter