2021-10-12 08:40:03

by Woody Lin

[permalink] [raw]
Subject: [PATCH] sched/scs: Reset the shadow stack when idle_task_exit

There was a 'init_idle' that resets scs sp to base, but is removed by
f1a0a376ca0c. Without the resetting, the hot-plugging implemented by
cpu_psci_cpu_boot will use the previous scs sp as new base when starting
up a CPU core, so the usage on scs page is being stacked up until
overflow.

This only happens on idle task since __cpu_up is using idle task as the
main thread to start up a CPU core, so the overflow can be fixed by
resetting scs sp to base in 'idle_task_exit'.

Fixes: f1a0a376ca0c ("sched/core: Initialize the idle task with preemption disabled")
Signed-off-by: Woody Lin <[email protected]>
---
kernel/sched/core.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 1bba4128a3e6..f21714ea3db8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8795,6 +8795,7 @@ void idle_task_exit(void)
finish_arch_post_lock_switch();
}

+ scs_task_reset(current);
/* finish_cpu(), as ran on the BP, will clean up the active_mm state */
}

--
2.33.0.882.g93a45727a2-goog


2021-10-12 10:02:55

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH] sched/scs: Reset the shadow stack when idle_task_exit

On 12/10/21 16:35, Woody Lin wrote:
> There was a 'init_idle' that resets scs sp to base, but is removed by
> f1a0a376ca0c. Without the resetting, the hot-plugging implemented by
> cpu_psci_cpu_boot will use the previous scs sp as new base when starting
> up a CPU core, so the usage on scs page is being stacked up until
> overflow.
>
> This only happens on idle task since __cpu_up is using idle task as the
> main thread to start up a CPU core, so the overflow can be fixed by
> resetting scs sp to base in 'idle_task_exit'.
>

Looking at init_idle() for similar issues, it looks like we might also want
to re-issue kasan_unpoison_task_stack() on the idle task upon hotplug.

> Fixes: f1a0a376ca0c ("sched/core: Initialize the idle task with preemption disabled")
> Signed-off-by: Woody Lin <[email protected]>
> ---
> kernel/sched/core.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 1bba4128a3e6..f21714ea3db8 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -8795,6 +8795,7 @@ void idle_task_exit(void)
> finish_arch_post_lock_switch();
> }
>
> + scs_task_reset(current);
> /* finish_cpu(), as ran on the BP, will clean up the active_mm state */

So AIUI for SCS that works just fine - one thing I'm unclear on is how the
following pops are going to work given the SP reset happens in the middle
of a call stack, but AFAICT that was already the case before I messed about
with init_idle(), so that must already be handled.

I'm not familiar enough with KASAN to say whether that
kasan_unpoison_task_stack() should rather happen upon hotplugging the CPU
back (rather than on hot-unplug). If that is the case, then maybe somewhere
around cpu_startup_entry() might work (and then you could bunch these two
"needs to be re-run at init for the idle task" functions into a common
helper).

> }
>
> --
> 2.33.0.882.g93a45727a2-goog

2021-10-12 10:38:17

by Woody Lin

[permalink] [raw]
Subject: Re: [PATCH] sched/scs: Reset the shadow stack when idle_task_exit

On Tue, Oct 12, 2021 at 6:00 PM Valentin Schneider
<[email protected]> wrote:
>
> On 12/10/21 16:35, Woody Lin wrote:
> > There was a 'init_idle' that resets scs sp to base, but is removed by
> > f1a0a376ca0c. Without the resetting, the hot-plugging implemented by
> > cpu_psci_cpu_boot will use the previous scs sp as new base when starting
> > up a CPU core, so the usage on scs page is being stacked up until
> > overflow.
> >
> > This only happens on idle task since __cpu_up is using idle task as the
> > main thread to start up a CPU core, so the overflow can be fixed by
> > resetting scs sp to base in 'idle_task_exit'.
> >
>
> Looking at init_idle() for similar issues, it looks like we might also want
> to re-issue kasan_unpoison_task_stack() on the idle task upon hotplug.
>
> > Fixes: f1a0a376ca0c ("sched/core: Initialize the idle task with preemption disabled")
> > Signed-off-by: Woody Lin <[email protected]>
> > ---
> > kernel/sched/core.c | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 1bba4128a3e6..f21714ea3db8 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -8795,6 +8795,7 @@ void idle_task_exit(void)
> > finish_arch_post_lock_switch();
> > }
> >
> > + scs_task_reset(current);
> > /* finish_cpu(), as ran on the BP, will clean up the active_mm state */
>
> So AIUI for SCS that works just fine - one thing I'm unclear on is how the
> following pops are going to work given the SP reset happens in the middle
> of a call stack, but AFAICT that was already the case before I messed about
> with init_idle(), so that must already be handled.

Hi Valentin,

Thanks for the question. The 'scs_task_reset' here resets only the
'.thread_info.scs_sp' of the task, so the register (on arm64 it's x18)
is still pointing to the same location for popping and storing call
frames. The register will be updated to '.thread_info.scs_sp' in
'__secondary_switched', which starts a new core and there is no popping
after the updating, so it won't introduce an underflow.

>
> I'm not familiar enough with KASAN to say whether that
> kasan_unpoison_task_stack() should rather happen upon hotplugging the CPU
> back (rather than on hot-unplug). If that is the case, then maybe somewhere
> around cpu_startup_entry() might work (and then you could bunch these two
> "needs to be re-run at init for the idle task" functions into a common
> helper).

unpoison looks more like an one-time thing to me; the idle tasks will
reuse the same stack pages until system resets, so I think we don't need
to re-unpoison that during hotplugging as long as it's unpoisoned in
'init_idle'.

>
> > }
> >
> > --
> > 2.33.0.882.g93a45727a2-goog

2021-10-12 10:58:47

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH] sched/scs: Reset the shadow stack when idle_task_exit

On 12/10/21 18:35, Woody Lin wrote:
> On Tue, Oct 12, 2021 at 6:00 PM Valentin Schneider
> <[email protected]> wrote:
>>
>> So AIUI for SCS that works just fine - one thing I'm unclear on is how the
>> following pops are going to work given the SP reset happens in the middle
>> of a call stack, but AFAICT that was already the case before I messed about
>> with init_idle(), so that must already be handled.
>
> Hi Valentin,
>
> Thanks for the question. The 'scs_task_reset' here resets only the
> '.thread_info.scs_sp' of the task, so the register (on arm64 it's x18)
> is still pointing to the same location for popping and storing call
> frames. The register will be updated to '.thread_info.scs_sp' in
> '__secondary_switched', which starts a new core and there is no popping
> after the updating, so it won't introduce an underflow.
>

I think I got it; __secondary_switched() -> init_cpu_task() -> scs_load()

Thanks!

>>
>> I'm not familiar enough with KASAN to say whether that
>> kasan_unpoison_task_stack() should rather happen upon hotplugging the CPU
>> back (rather than on hot-unplug). If that is the case, then maybe somewhere
>> around cpu_startup_entry() might work (and then you could bunch these two
>> "needs to be re-run at init for the idle task" functions into a common
>> helper).
>
> unpoison looks more like an one-time thing to me; the idle tasks will
> reuse the same stack pages until system resets, so I think we don't need
> to re-unpoison that during hotplugging as long as it's unpoisoned in
> 'init_idle'.
>

I would tend to agree, but was bitten by s390 freeing some memory on
hot-unplug and re-allocating it upon hotplug:

6a942f578054 ("s390: preempt: Fix preempt_count initialization")

This makes me doubt whether we can assert the idle task stack pages are
perennial vs hotplug on all architectures.

>>
>> > }
>> >
>> > --
>> > 2.33.0.882.g93a45727a2-goog

2021-10-13 01:27:28

by Woody Lin

[permalink] [raw]
Subject: Re: [PATCH] sched/scs: Reset the shadow stack when idle_task_exit

On Tue, Oct 12, 2021 at 6:57 PM Valentin Schneider
<[email protected]> wrote:
>
> On 12/10/21 18:35, Woody Lin wrote:
> > On Tue, Oct 12, 2021 at 6:00 PM Valentin Schneider
> > <[email protected]> wrote:
> >>
> >> So AIUI for SCS that works just fine - one thing I'm unclear on is how the
> >> following pops are going to work given the SP reset happens in the middle
> >> of a call stack, but AFAICT that was already the case before I messed about
> >> with init_idle(), so that must already be handled.
> >
> > Hi Valentin,
> >
> > Thanks for the question. The 'scs_task_reset' here resets only the
> > '.thread_info.scs_sp' of the task, so the register (on arm64 it's x18)
> > is still pointing to the same location for popping and storing call
> > frames. The register will be updated to '.thread_info.scs_sp' in
> > '__secondary_switched', which starts a new core and there is no popping
> > after the updating, so it won't introduce an underflow.
> >
>
> I think I got it; __secondary_switched() -> init_cpu_task() -> scs_load()
>
> Thanks!
>
> >>
> >> I'm not familiar enough with KASAN to say whether that
> >> kasan_unpoison_task_stack() should rather happen upon hotplugging the CPU
> >> back (rather than on hot-unplug). If that is the case, then maybe somewhere
> >> around cpu_startup_entry() might work (and then you could bunch these two
> >> "needs to be re-run at init for the idle task" functions into a common
> >> helper).
> >
> > unpoison looks more like an one-time thing to me; the idle tasks will
> > reuse the same stack pages until system resets, so I think we don't need
> > to re-unpoison that during hotplugging as long as it's unpoisoned in
> > 'init_idle'.
> >
>
> I would tend to agree, but was bitten by s390 freeing some memory on
> hot-unplug and re-allocating it upon hotplug:
>
> 6a942f578054 ("s390: preempt: Fix preempt_count initialization")
>
> This makes me doubt whether we can assert the idle task stack pages are
> perennial vs hotplug on all architectures.

I made a quick study on memory-hotplug and seems that only memory contains
nothing other than migratable pages can be unplugged. So process stack
pages should not be a concern for this, since which is an unmovable
memory.

However I don't have a chance to work on a system that enables
memory-hotplug so far, so couldn't verify this assumption further. Guess
we can create a separate thread to clarify this more.

Regards,
Woody

>
> >>
> >> > }
> >> >
> >> > --
> >> > 2.33.0.882.g93a45727a2-goog

2021-10-13 13:37:40

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH] sched/scs: Reset the shadow stack when idle_task_exit

On 13/10/21 09:22, Woody Lin wrote:
> On Tue, Oct 12, 2021 at 6:57 PM Valentin Schneider <[email protected]> wrote:
>> On 12/10/21 18:35, Woody Lin wrote:
>> > unpoison looks more like an one-time thing to me; the idle tasks will
>> > reuse the same stack pages until system resets, so I think we don't need
>> > to re-unpoison that during hotplugging as long as it's unpoisoned in
>> > 'init_idle'.
>> >
>>
>> I would tend to agree, but was bitten by s390 freeing some memory on
>> hot-unplug and re-allocating it upon hotplug:
>>
>> 6a942f578054 ("s390: preempt: Fix preempt_count initialization")
>>
>> This makes me doubt whether we can assert the idle task stack pages are
>> perennial vs hotplug on all architectures.
>
> I made a quick study on memory-hotplug and seems that only memory contains
> nothing other than migratable pages can be unplugged. So process stack
> pages should not be a concern for this, since which is an unmovable
> memory.
>
> However I don't have a chance to work on a system that enables
> memory-hotplug so far, so couldn't verify this assumption further. Guess
> we can create a separate thread to clarify this more.
>

That sounds sensible; I'll try to dig some more into this.

As for the SCS change, someone might argue for placing this elsewhere in
the hotplug path, but that looks fine to me:

Reviewed-by: Valentin Schneider <[email protected]>

> Regards,
> Woody
>
>>
>> >>
>> >> > }
>> >> >
>> >> > --
>> >> > 2.33.0.882.g93a45727a2-goog

2021-10-16 03:23:20

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched/scs: Reset the shadow stack when idle_task_exit

On Wed, Oct 13, 2021 at 02:32:51PM +0100, Valentin Schneider wrote:
> As for the SCS change, someone might argue for placing this elsewhere in
> the hotplug path, but that looks fine to me:
>
> Reviewed-by: Valentin Schneider <[email protected]>

Thanks!

2021-10-19 15:27:45

by tip-bot2 for Jacob Pan

[permalink] [raw]
Subject: [tip: sched/urgent] sched/scs: Reset the shadow stack when idle_task_exit

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID: 45dfb89b8f96643268449c25d7025b17de46717c
Gitweb: https://git.kernel.org/tip/45dfb89b8f96643268449c25d7025b17de46717c
Author: Woody Lin <[email protected]>
AuthorDate: Tue, 12 Oct 2021 16:35:21 +08:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Mon, 18 Oct 2021 16:58:41 +02:00

sched/scs: Reset the shadow stack when idle_task_exit

There was a 'init_idle' that resets scs sp to base, but is removed by
f1a0a376ca0c. Without the resetting, the hot-plugging implemented by
cpu_psci_cpu_boot will use the previous scs sp as new base when starting
up a CPU core, so the usage on scs page is being stacked up until
overflow.

This only happens on idle task since __cpu_up is using idle task as the
main thread to start up a CPU core, so the overflow can be fixed by
resetting scs sp to base in 'idle_task_exit'.

Fixes: f1a0a376ca0c ("sched/core: Initialize the idle task with preemption disabled")
Signed-off-by: Woody Lin <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Valentin Schneider <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
kernel/sched/core.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 1bba412..f21714e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8795,6 +8795,7 @@ void idle_task_exit(void)
finish_arch_post_lock_switch();
}

+ scs_task_reset(current);
/* finish_cpu(), as ran on the BP, will clean up the active_mm state */
}

2021-10-19 15:57:42

by tip-bot2 for Jacob Pan

[permalink] [raw]
Subject: [tip: sched/urgent] sched/scs: Reset the shadow stack when idle_task_exit

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID: 63acd42c0d4942f74710b11c38602fb14dea7320
Gitweb: https://git.kernel.org/tip/63acd42c0d4942f74710b11c38602fb14dea7320
Author: Woody Lin <[email protected]>
AuthorDate: Tue, 12 Oct 2021 16:35:21 +08:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Tue, 19 Oct 2021 17:46:11 +02:00

sched/scs: Reset the shadow stack when idle_task_exit

Commit f1a0a376ca0c ("sched/core: Initialize the idle task with
preemption disabled") removed the init_idle() call from
idle_thread_get(). This was the sole call-path on hotplug that resets
the Shadow Call Stack (scs) Stack Pointer (sp).

Not resetting the scs-sp leads to scs overflow after enough hotplug
cycles. Therefore add an explicit scs_task_reset() to the hotplug code
to make sure the scs-sp does get reset on hotplug.

Fixes: f1a0a376ca0c ("sched/core: Initialize the idle task with preemption disabled")
Signed-off-by: Woody Lin <[email protected]>
[peterz: Changelog]
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Valentin Schneider <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
kernel/sched/core.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 1bba412..f21714e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8795,6 +8795,7 @@ void idle_task_exit(void)
finish_arch_post_lock_switch();
}

+ scs_task_reset(current);
/* finish_cpu(), as ran on the BP, will clean up the active_mm state */
}