2022-07-12 02:20:56

by Li Huafei

[permalink] [raw]
Subject: [PATCH 3/5] ARM: stacktrace: Allow stack trace saving for non-current tasks

The current ARM implementation of save_stack_trace_tsk() does not allow
saving stack trace for non-current tasks, which may limit the scenarios
in which stack_trace_save_tsk() can be used. Like other architectures,
or like ARM's unwind_backtrace(), we can leave it up to the caller to
ensure that the task that needs to be unwound is not running.

Signed-off-by: Li Huafei <[email protected]>
---
arch/arm/kernel/stacktrace.c | 10 +---------
1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/arch/arm/kernel/stacktrace.c b/arch/arm/kernel/stacktrace.c
index 3acf51ee46bb..836420c00938 100644
--- a/arch/arm/kernel/stacktrace.c
+++ b/arch/arm/kernel/stacktrace.c
@@ -171,19 +171,11 @@ static noinline void __save_stack_trace(struct task_struct *tsk,
data.no_sched_functions = nosched;

if (tsk != current) {
-#ifdef CONFIG_SMP
- /*
- * What guarantees do we have here that 'tsk' is not
- * running on another CPU? For now, ignore it as we
- * can't guarantee we won't explode.
- */
- return;
-#else
+ /* task blocked in __switch_to */
frame.fp = thread_saved_fp(tsk);
frame.sp = thread_saved_sp(tsk);
frame.lr = 0; /* recovered from the stack */
frame.pc = thread_saved_pc(tsk);
-#endif
} else {
/* We don't want this function nor the caller */
data.skip += 2;
--
2.17.1


2022-07-18 09:15:32

by Linus Walleij

[permalink] [raw]
Subject: Re: [PATCH 3/5] ARM: stacktrace: Allow stack trace saving for non-current tasks

On Tue, Jul 12, 2022 at 4:18 AM Li Huafei <[email protected]> wrote:

> The current ARM implementation of save_stack_trace_tsk() does not allow
> saving stack trace for non-current tasks, which may limit the scenarios
> in which stack_trace_save_tsk() can be used. Like other architectures,
> or like ARM's unwind_backtrace(), we can leave it up to the caller to
> ensure that the task that needs to be unwound is not running.
>
> Signed-off-by: Li Huafei <[email protected]>

That sounds good, but:

> if (tsk != current) {
> -#ifdef CONFIG_SMP
> - /*
> - * What guarantees do we have here that 'tsk' is not
> - * running on another CPU? For now, ignore it as we
> - * can't guarantee we won't explode.
> - */
> - return;
> -#else
> + /* task blocked in __switch_to */

The commit text is not consistent with the comment you are removing.

The commit is talking about "non-current" tasks which is one thing,
but the code is avoiding any tasks under SMP because they may be
running on another CPU. So you need to update the commit
message to say something like "non-current or running on another CPU".

If this condition will be checked at call sites in following patches,
then mention
that in the commit as well, so we know the end result is that we do
not break it,

I think Russell want to check this commit as well,

Yours,
Linus Walleij

2022-07-26 09:38:13

by Li Huafei

[permalink] [raw]
Subject: Re: [PATCH 3/5] ARM: stacktrace: Allow stack trace saving for non-current tasks



On 2022/7/18 17:07, Linus Walleij wrote:
> On Tue, Jul 12, 2022 at 4:18 AM Li Huafei <[email protected]> wrote:
>
>> The current ARM implementation of save_stack_trace_tsk() does not allow
>> saving stack trace for non-current tasks, which may limit the scenarios
>> in which stack_trace_save_tsk() can be used. Like other architectures,
>> or like ARM's unwind_backtrace(), we can leave it up to the caller to
>> ensure that the task that needs to be unwound is not running.
>>
>> Signed-off-by: Li Huafei <[email protected]>
>
> That sounds good, but:
>
>> if (tsk != current) {
>> -#ifdef CONFIG_SMP
>> - /*
>> - * What guarantees do we have here that 'tsk' is not
>> - * running on another CPU? For now, ignore it as we
>> - * can't guarantee we won't explode.
>> - */
>> - return;
>> -#else
>> + /* task blocked in __switch_to */
>
> The commit text is not consistent with the comment you are removing.
>
> The commit is talking about "non-current" tasks which is one thing,
> but the code is avoiding any tasks under SMP because they may be
> running on another CPU. So you need to update the commit
> message to say something like "non-current or running on another CPU".
>
> If this condition will be checked at call sites in following patches,
> then mention
> that in the commit as well, so we know the end result is that we do
> not break it,

The generic code stack_trace_save_tsk() does not have this check, and by
'caller' I mean the caller of stack_trace_save_tsk(), expecting the
'caller' to ensure that the task is not running. So in effect this check
has been dropped and there is no more guarantee. Sorry for not
clarifying the change here.

But can we assume that the user should know that the stacktrace is
unreliable for a task that is running on another CPU? If not, I should
remove this patch and keep the check.

Thanks,
Huafei
>
> I think Russell want to check this commit as well,
>
> Yours,
> Linus Walleij
> .
>

2022-07-26 10:00:48

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH 3/5] ARM: stacktrace: Allow stack trace saving for non-current tasks

On Tue, Jul 26, 2022 at 05:12:39PM +0800, Li Huafei wrote:
>
>
> On 2022/7/18 17:07, Linus Walleij wrote:
> > On Tue, Jul 12, 2022 at 4:18 AM Li Huafei <[email protected]> wrote:
> >
> >> The current ARM implementation of save_stack_trace_tsk() does not allow
> >> saving stack trace for non-current tasks, which may limit the scenarios
> >> in which stack_trace_save_tsk() can be used. Like other architectures,
> >> or like ARM's unwind_backtrace(), we can leave it up to the caller to
> >> ensure that the task that needs to be unwound is not running.
> >>
> >> Signed-off-by: Li Huafei <[email protected]>
> >
> > That sounds good, but:
> >
> >> if (tsk != current) {
> >> -#ifdef CONFIG_SMP
> >> - /*
> >> - * What guarantees do we have here that 'tsk' is not
> >> - * running on another CPU? For now, ignore it as we
> >> - * can't guarantee we won't explode.
> >> - */
> >> - return;
> >> -#else
> >> + /* task blocked in __switch_to */
> >
> > The commit text is not consistent with the comment you are removing.
> >
> > The commit is talking about "non-current" tasks which is one thing,
> > but the code is avoiding any tasks under SMP because they may be
> > running on another CPU. So you need to update the commit
> > message to say something like "non-current or running on another CPU".
> >
> > If this condition will be checked at call sites in following patches,
> > then mention
> > that in the commit as well, so we know the end result is that we do
> > not break it,
>
> The generic code stack_trace_save_tsk() does not have this check, and by
> 'caller' I mean the caller of stack_trace_save_tsk(), expecting the
> 'caller' to ensure that the task is not running. So in effect this check
> has been dropped and there is no more guarantee. Sorry for not
> clarifying the change here.

Can you prove in every case that the thread we're being asked to unwind
is not running? I don't think you can.

There are things like proc_pid_stack() in procfs and the stack traces
in sysrq-t which have attempted to unwind everything whether it's
running or not.

So no, there is no guarantee that the thread is blocked in
__switch_to().

> But can we assume that the user should know that the stacktrace is
> unreliable for a task that is running on another CPU? If not, I should
> remove this patch and keep the check.

It's not about "unreliable" stack traces, it's about the unwinder
killing the kernel.

The hint is this:

frame.fp = thread_saved_fp(tsk);
frame.sp = thread_saved_sp(tsk);
frame.lr = 0; /* recovered from the stack */
frame.pc = thread_saved_pc(tsk);

These access the context saved by the scheduler when the task is
sleeping. When the thread is running, these saved values will be
the state when the thread last slept. However, with the thread
running, the stack could now contain any data what so ever, and
could change at any moment.

Whether the unwind-table unwinder is truely safe in such a
situation is unknown - we try to ensure that it won't do anything
stupid, but proving that is a hard task, and we've recently had
issues with the unwinder even without that.

So, allowing this feels like we're opening the door to DoS attacks
from userspace, where userspace sits there reading /proc/*/stack of
some thread running on a different CPU waiting for the kernel to
oops itself, possibly holding a lock, resulting in the system
dying.

These decisions need to be made by architecture code not generic
code, particularly where the method of unwinding is architecture
specific and thus may have criteria defining when its safe to do so.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

2022-07-26 12:26:17

by Li Huafei

[permalink] [raw]
Subject: Re: [PATCH 3/5] ARM: stacktrace: Allow stack trace saving for non-current tasks



On 2022/7/26 17:49, Russell King (Oracle) wrote:
> On Tue, Jul 26, 2022 at 05:12:39PM +0800, Li Huafei wrote:
>>
>>
>> On 2022/7/18 17:07, Linus Walleij wrote:
>>> On Tue, Jul 12, 2022 at 4:18 AM Li Huafei <[email protected]> wrote:
>>>
>>>> The current ARM implementation of save_stack_trace_tsk() does not allow
>>>> saving stack trace for non-current tasks, which may limit the scenarios
>>>> in which stack_trace_save_tsk() can be used. Like other architectures,
>>>> or like ARM's unwind_backtrace(), we can leave it up to the caller to
>>>> ensure that the task that needs to be unwound is not running.
>>>>
>>>> Signed-off-by: Li Huafei <[email protected]>
>>>
>>> That sounds good, but:
>>>
>>>> if (tsk != current) {
>>>> -#ifdef CONFIG_SMP
>>>> - /*
>>>> - * What guarantees do we have here that 'tsk' is not
>>>> - * running on another CPU? For now, ignore it as we
>>>> - * can't guarantee we won't explode.
>>>> - */
>>>> - return;
>>>> -#else
>>>> + /* task blocked in __switch_to */
>>>
>>> The commit text is not consistent with the comment you are removing.
>>>
>>> The commit is talking about "non-current" tasks which is one thing,
>>> but the code is avoiding any tasks under SMP because they may be
>>> running on another CPU. So you need to update the commit
>>> message to say something like "non-current or running on another CPU".
>>>
>>> If this condition will be checked at call sites in following patches,
>>> then mention
>>> that in the commit as well, so we know the end result is that we do
>>> not break it,
>>
>> The generic code stack_trace_save_tsk() does not have this check, and by
>> 'caller' I mean the caller of stack_trace_save_tsk(), expecting the
>> 'caller' to ensure that the task is not running. So in effect this check
>> has been dropped and there is no more guarantee. Sorry for not
>> clarifying the change here.
>
> Can you prove in every case that the thread we're being asked to unwind
> is not running? I don't think you can.
>
> There are things like proc_pid_stack() in procfs and the stack traces
> in sysrq-t which have attempted to unwind everything whether it's
> running or not.
>
> So no, there is no guarantee that the thread is blocked in
> __switch_to().
>

Yes, I agree.

>> But can we assume that the user should know that the stacktrace is
>> unreliable for a task that is running on another CPU? If not, I should
>> remove this patch and keep the check.
>
> It's not about "unreliable" stack traces, it's about the unwinder
> killing the kernel.
>
> The hint is this:
>
> frame.fp = thread_saved_fp(tsk);
> frame.sp = thread_saved_sp(tsk);
> frame.lr = 0; /* recovered from the stack */
> frame.pc = thread_saved_pc(tsk);
>
> These access the context saved by the scheduler when the task is
> sleeping. When the thread is running, these saved values will be
> the state when the thread last slept. However, with the thread
> running, the stack could now contain any data what so ever, and
> could change at any moment.
>

I get it. For example, since the data on the stack is changing,
'*(unsigned long *)fp' could access any illegal address and crash the
kernel.

> Whether the unwind-table unwinder is truely safe in such a
> situation is unknown - we try to ensure that it won't do anything
> stupid, but proving that is a hard task, and we've recently had
> issues with the unwinder even without that.
>
> So, allowing this feels like we're opening the door to DoS attacks
> from userspace, where userspace sits there reading /proc/*/stack of
> some thread running on a different CPU waiting for the kernel to
> oops itself, possibly holding a lock, resulting in the system
> dying.
>
> These decisions need to be made by architecture code not generic
> code, particularly where the method of unwinding is architecture
> specific and thus may have criteria defining when its safe to do so.
>

Thank you for your comments, I'll remove the patch and keep the check.

Thanks,
Huafei