As TOP_OF_KERNEL_STACK_PADDING is defined as 0 on x86_64, no one noticed
it's missing in the calculation of the .sp field in INIT_THREAD until it
is defined to 16 with CONFIG_X86_FRED=y.
Subtract TOP_OF_KERNEL_STACK_PADDING from the .sp field of INIT_THREAD.
Fixes: 65c9cc9e2c14 ("x86/fred: Reserve space for the FRED stack frame")
Fixes: 3adee777ad0d ("x86/smpboot: Remove initial_stack on 64-bit")
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-lkp/[email protected]
Signed-off-by: Xin Li (Intel) <[email protected]>
---
arch/x86/include/asm/processor.h | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 26620d7642a9..17fe81998ce4 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -664,8 +664,10 @@ static __always_inline void prefetchw(const void *x)
#else
extern unsigned long __end_init_task[];
-#define INIT_THREAD { \
- .sp = (unsigned long)&__end_init_task - sizeof(struct pt_regs), \
+#define INIT_THREAD { \
+ .sp = (unsigned long)&__end_init_task - \
+ TOP_OF_KERNEL_STACK_PADDING - \
+ sizeof(struct pt_regs), \
}
extern unsigned long KSTK_ESP(struct task_struct *task);
base-commit: e13841907b8fda0ae0ce1ec03684665f578416a8
--
2.44.0
On Fri, Mar 1, 2024 at 3:41 AM Xin Li (Intel) <[email protected]> wrote:
>
> As TOP_OF_KERNEL_STACK_PADDING is defined as 0 on x86_64, no one noticed
> it's missing in the calculation of the .sp field in INIT_THREAD until it
> is defined to 16 with CONFIG_X86_FRED=y.
>
> Subtract TOP_OF_KERNEL_STACK_PADDING from the .sp field of INIT_THREAD.
>
> Fixes: 65c9cc9e2c14 ("x86/fred: Reserve space for the FRED stack frame")
> Fixes: 3adee777ad0d ("x86/smpboot: Remove initial_stack on 64-bit")
> Reported-by: kernel test robot <[email protected]>
> Closes: https://lore.kernel.org/oe-lkp/[email protected]
> Signed-off-by: Xin Li (Intel) <[email protected]>
> ---
> arch/x86/include/asm/processor.h | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index 26620d7642a9..17fe81998ce4 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -664,8 +664,10 @@ static __always_inline void prefetchw(const void *x)
> #else
> extern unsigned long __end_init_task[];
>
> -#define INIT_THREAD { \
> - .sp = (unsigned long)&__end_init_task - sizeof(struct pt_regs), \
> +#define INIT_THREAD { \
> + .sp = (unsigned long)&__end_init_task - \
> + TOP_OF_KERNEL_STACK_PADDING - \
> + sizeof(struct pt_regs), \
> }
>
> extern unsigned long KSTK_ESP(struct task_struct *task);
>
There is another spot in head_64.S that also needs this offset:
/* Set up the stack for verify_cpu() */
leaq (__end_init_task - PTREGS_SIZE)(%rip), %rsp
Brian Gerst
On 3/1/2024 5:15 AM, Brian Gerst wrote:
> On Fri, Mar 1, 2024 at 3:41 AM Xin Li (Intel) <[email protected]> wrote:
> There is another spot in head_64.S that also needs this offset:
I checked all references to __end_init_task before sending out this
patch, and I doubt we need to make more similar changes.
First of all, "movq TASK_threadsp(%rcx), %rsp" you added in
3adee777ad0d ("x86/smpboot: Remove initial_stack on 64-bit") is exactly
what we need to set up %rsp for the init task.
> /* Set up the stack for verify_cpu() */
> leaq (__end_init_task - PTREGS_SIZE)(%rip), %rsp
As the comment says, it's a _temporary_ stack for calling verify_cpu()
(but only for BSP, as APs use a different bring up stack), at which
stage the concept of "task" has not formed. I'm thinking maybe it's
better to do:
/* Set up the stack for verify_cpu() */
leaq __end_init_task(%rip), %rsp
Previously it was "leaq (__end_init_task - FRAME_SIZE)(%rip), %rsp",
but the kernel unwinder goes up only to secondary_startup_64_no_verify()
after the new way you introduced to set up %rsp for the init task, and
it seems to me that there is no point to subtract FRAME_SIZE or
PTREGS_SIZE.
On the other hand, TOP_OF_KERNEL_STACK_PADDING is required for x86_32,
but probably not for x86_64 (defined as 0 before FRED). The most
important usage of TOP_OF_KERNEL_STACK_PADDING is to get the pt_regs
pointer for a task, i.e., task_pt_regs(task), which assumes a fixed
offset from the top of a task stack, but also limits the space that
could be used by future hardware above the pt_regs structure. Thus I
prefer to limit the usage of TOP_OF_KERNEL_STACK_PADDING on x86_64.
Thanks!
Xin
On Fri, Mar 1, 2024 at 11:18 PM Xin Li <[email protected]> wrote:
>
> On 3/1/2024 5:15 AM, Brian Gerst wrote:
> > On Fri, Mar 1, 2024 at 3:41 AM Xin Li (Intel) <[email protected]> wrote:
> > There is another spot in head_64.S that also needs this offset:
>
> I checked all references to __end_init_task before sending out this
> patch, and I doubt we need to make more similar changes.
>
> First of all, "movq TASK_threadsp(%rcx), %rsp" you added in
> 3adee777ad0d ("x86/smpboot: Remove initial_stack on 64-bit") is exactly
> what we need to set up %rsp for the init task.
>
> > /* Set up the stack for verify_cpu() */
> > leaq (__end_init_task - PTREGS_SIZE)(%rip), %rsp
>
> As the comment says, it's a _temporary_ stack for calling verify_cpu()
> (but only for BSP, as APs use a different bring up stack), at which
> stage the concept of "task" has not formed. I'm thinking maybe it's
> better to do:
>
> /* Set up the stack for verify_cpu() */
> leaq __end_init_task(%rip), %rsp
>
> Previously it was "leaq (__end_init_task - FRAME_SIZE)(%rip), %rsp",
> but the kernel unwinder goes up only to secondary_startup_64_no_verify()
> after the new way you introduced to set up %rsp for the init task, and
> it seems to me that there is no point to subtract FRAME_SIZE or
> PTREGS_SIZE.
>
> On the other hand, TOP_OF_KERNEL_STACK_PADDING is required for x86_32,
> but probably not for x86_64 (defined as 0 before FRED). The most
> important usage of TOP_OF_KERNEL_STACK_PADDING is to get the pt_regs
> pointer for a task, i.e., task_pt_regs(task), which assumes a fixed
> offset from the top of a task stack, but also limits the space that
> could be used by future hardware above the pt_regs structure. Thus I
> prefer to limit the usage of TOP_OF_KERNEL_STACK_PADDING on x86_64.
The point is to keep consistency with other kernel threads, which have
the pt_regs area cleared (see copy_thread()). In particular, the CS
field can't have junk in it or else user_mode(regs) could return the
wrong result. So the stack needs to start below pt_regs, or we need
to explicitly zero pt_regs later.
Ideally, the load from thread->sp should just shift RSP by phys_base,
pointing to the same memory location in the virtual mapping.
Brian Gerst
On 3/2/2024 5:20 AM, Brian Gerst wrote:
> On Fri, Mar 1, 2024 at 11:18 PM Xin Li <[email protected]> wrote:
>>
>> On 3/1/2024 5:15 AM, Brian Gerst wrote:
>>> On Fri, Mar 1, 2024 at 3:41 AM Xin Li (Intel) <[email protected]> wrote:
>>> There is another spot in head_64.S that also needs this offset:
>>
>> I checked all references to __end_init_task before sending out this
>> patch, and I doubt we need to make more similar changes.
>>
>> First of all, "movq TASK_threadsp(%rcx), %rsp" you added in
>> 3adee777ad0d ("x86/smpboot: Remove initial_stack on 64-bit") is exactly
>> what we need to set up %rsp for the init task.
>>
>>> /* Set up the stack for verify_cpu() */
>>> leaq (__end_init_task - PTREGS_SIZE)(%rip), %rsp
>>
>> As the comment says, it's a _temporary_ stack for calling verify_cpu()
>> (but only for BSP, as APs use a different bring up stack), at which
>> stage the concept of "task" has not formed. I'm thinking maybe it's
>> better to do:
>>
>> /* Set up the stack for verify_cpu() */
>> leaq __end_init_task(%rip), %rsp
>>
>> Previously it was "leaq (__end_init_task - FRAME_SIZE)(%rip), %rsp",
>> but the kernel unwinder goes up only to secondary_startup_64_no_verify()
>> after the new way you introduced to set up %rsp for the init task, and
>> it seems to me that there is no point to subtract FRAME_SIZE or
>> PTREGS_SIZE.
>>
>> On the other hand, TOP_OF_KERNEL_STACK_PADDING is required for x86_32,
>> but probably not for x86_64 (defined as 0 before FRED). The most
>> important usage of TOP_OF_KERNEL_STACK_PADDING is to get the pt_regs
>> pointer for a task, i.e., task_pt_regs(task), which assumes a fixed
>> offset from the top of a task stack, but also limits the space that
>> could be used by future hardware above the pt_regs structure. Thus I
>> prefer to limit the usage of TOP_OF_KERNEL_STACK_PADDING on x86_64.
>
> The point is to keep consistency with other kernel threads, which have
> the pt_regs area cleared (see copy_thread()). In particular, the CS
> field can't have junk in it or else user_mode(regs) could return the
> wrong result. So the stack needs to start below pt_regs, or we need
> to explicitly zero pt_regs later.
Okay, I will add TOP_OF_KERNEL_STACK_PADDING to the spot in
arch/x86/kernel/head_64.S, plus another spot in arch/x86/xen/xen-head.S.
However, I still think it would be better to not have
TOP_OF_KERNEL_STACK_PADDING in these spots, instead we should explicitly
zero pt_regs later; any *implicit* protocol is NOT welcome. I will find
a timer later to check with the x86 maintainers :)
> Ideally, the load from thread->sp should just shift RSP by phys_base,
> pointing to the same memory location in the virtual mapping.
I prefer to do this explicitly as what's done now; it may be easier to
understand for future kernel developers.
Thanks!
Xin