2021-06-01 16:38:42

by Lai Jiangshan

[permalink] [raw]
Subject: [RFC PATCH 1/4] x86/entry/nmi: Switch to the entry stack before switching to the thread stack

From: Lai Jiangshan <[email protected]>

Current kernel has no code to enforce data breakpoint not on the thread
stack. If there is any data breakpoint on the top area of the thread
stack, there might be problem.

For example, when NMI hits on userspace in this setting, the code copies
the exception frame from the NMI stack to the thread stack and it will
cause #DB and after #DB is handled, the not yet copied portion on the
NMI stack is in danger of corruption because the NMI is unmasked.

Stashing the exception frame on the entry stack before touching the
entry stack can fix the problem.

Signed-off-by: Lai Jiangshan <[email protected]>
---
arch/x86/entry/entry_64.S | 22 ++++++++++++++++++++++
arch/x86/kernel/asm-offsets.c | 1 +
2 files changed, 23 insertions(+)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index a5f02d03c585..4190e668f346 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1121,8 +1121,30 @@ SYM_CODE_START(asm_exc_nmi)
*
* We also must not push anything to the stack before switching
* stacks lest we corrupt the "NMI executing" variable.
+ *
+ * Before switching to the thread stack, it switches to the entry
+ * stack first lest there is any data breakpoint in the thread
+ * stack and the iret of #DB will cause NMI unmasked before
+ * finishing switching.
*/

+ /* Switch stack to entry stack */
+ movq %rsp, %rdx
+ addq $(+6*8 /* to NMI stack top */ \
+ -EXCEPTION_STKSZ /* to NMI stack bottom */ \
+ -CPU_ENTRY_AREA_nmi_stack /* to entry area */ \
+ +CPU_ENTRY_AREA_entry_stack /* to entry stack bottom */\
+ +SIZEOF_entry_stack /* to entry stack top */ \
+ ), %rsp
+
+ /* Stash exception frame and %rdx to entry stack */
+ pushq 5*8(%rdx) /* pt_regs->ss */
+ pushq 4*8(%rdx) /* pt_regs->rsp */
+ pushq 3*8(%rdx) /* pt_regs->flags */
+ pushq 2*8(%rdx) /* pt_regs->cs */
+ pushq 1*8(%rdx) /* pt_regs->rip */
+ pushq 0*8(%rdx) /* %rdx */
+
swapgs
cld
FENCE_SWAPGS_USER_ENTRY
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index ecd3fd6993d1..dfafa0c7e887 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -88,6 +88,7 @@ static void __used common(void)
OFFSET(CPU_ENTRY_AREA_entry_stack, cpu_entry_area, entry_stack_page);
DEFINE(SIZEOF_entry_stack, sizeof(struct entry_stack));
DEFINE(MASK_entry_stack, (~(sizeof(struct entry_stack) - 1)));
+ OFFSET(CPU_ENTRY_AREA_nmi_stack, cpu_entry_area, estacks.NMI_stack);

/* Offset for fields in tss_struct */
OFFSET(TSS_sp0, tss_struct, x86_tss.sp0);
--
2.19.1.6.gb485710b


2021-06-01 17:09:19

by Steven Rostedt

[permalink] [raw]
Subject: Re: [RFC PATCH 1/4] x86/entry/nmi: Switch to the entry stack before switching to the thread stack

On Tue, 1 Jun 2021 14:52:14 +0800
Lai Jiangshan <[email protected]> wrote:

> From: Lai Jiangshan <[email protected]>
>
> Current kernel has no code to enforce data breakpoint not on the thread
> stack. If there is any data breakpoint on the top area of the thread
> stack, there might be problem.
>
> For example, when NMI hits on userspace in this setting, the code copies
> the exception frame from the NMI stack to the thread stack and it will
> cause #DB and after #DB is handled, the not yet copied portion on the
> NMI stack is in danger of corruption because the NMI is unmasked.
>
> Stashing the exception frame on the entry stack before touching the
> entry stack can fix the problem.
>
> Signed-off-by: Lai Jiangshan <[email protected]>
> ---
> arch/x86/entry/entry_64.S | 22 ++++++++++++++++++++++
> arch/x86/kernel/asm-offsets.c | 1 +
> 2 files changed, 23 insertions(+)
>
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index a5f02d03c585..4190e668f346 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -1121,8 +1121,30 @@ SYM_CODE_START(asm_exc_nmi)
> *
> * We also must not push anything to the stack before switching
> * stacks lest we corrupt the "NMI executing" variable.
> + *
> + * Before switching to the thread stack, it switches to the entry
> + * stack first lest there is any data breakpoint in the thread
> + * stack and the iret of #DB will cause NMI unmasked before
> + * finishing switching.
> */
>
> + /* Switch stack to entry stack */
> + movq %rsp, %rdx
> + addq $(+6*8 /* to NMI stack top */ \
> + -EXCEPTION_STKSZ /* to NMI stack bottom */ \
> + -CPU_ENTRY_AREA_nmi_stack /* to entry area */ \

Just so that I understand this correctly. This "entry area" is not part
of the NMI stack, but just at the bottom of it? That is, this part of
the stack will never be touched by an NMI coming in from kernel space,
correct?

-- Steve


> + +CPU_ENTRY_AREA_entry_stack /* to entry stack bottom */\
> + +SIZEOF_entry_stack /* to entry stack top */ \
> + ), %rsp
> +
> + /* Stash exception frame and %rdx to entry stack */
> + pushq 5*8(%rdx) /* pt_regs->ss */
> + pushq 4*8(%rdx) /* pt_regs->rsp */
> + pushq 3*8(%rdx) /* pt_regs->flags */
> + pushq 2*8(%rdx) /* pt_regs->cs */
> + pushq 1*8(%rdx) /* pt_regs->rip */
> + pushq 0*8(%rdx) /* %rdx */
> +
> swapgs
> cld
> FENCE_SWAPGS_USER_ENTRY
> diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
> index ecd3fd6993d1..dfafa0c7e887 100644
> --- a/arch/x86/kernel/asm-offsets.c
> +++ b/arch/x86/kernel/asm-offsets.c
> @@ -88,6 +88,7 @@ static void __used common(void)
> OFFSET(CPU_ENTRY_AREA_entry_stack, cpu_entry_area, entry_stack_page);
> DEFINE(SIZEOF_entry_stack, sizeof(struct entry_stack));
> DEFINE(MASK_entry_stack, (~(sizeof(struct entry_stack) - 1)));
> + OFFSET(CPU_ENTRY_AREA_nmi_stack, cpu_entry_area, estacks.NMI_stack);
>
> /* Offset for fields in tss_struct */
> OFFSET(TSS_sp0, tss_struct, x86_tss.sp0);

2021-06-02 00:10:21

by Lai Jiangshan

[permalink] [raw]
Subject: Re: [RFC PATCH 1/4] x86/entry/nmi: Switch to the entry stack before switching to the thread stack



On 2021/6/2 01:05, Steven Rostedt wrote:
> On Tue, 1 Jun 2021 14:52:14 +0800
> Lai Jiangshan <[email protected]> wrote:
>
>> From: Lai Jiangshan <[email protected]>
>>
>> Current kernel has no code to enforce data breakpoint not on the thread
>> stack. If there is any data breakpoint on the top area of the thread
>> stack, there might be problem.
>>
>> For example, when NMI hits on userspace in this setting, the code copies
>> the exception frame from the NMI stack to the thread stack and it will
>> cause #DB and after #DB is handled, the not yet copied portion on the
>> NMI stack is in danger of corruption because the NMI is unmasked.
>>
>> Stashing the exception frame on the entry stack before touching the
>> entry stack can fix the problem.
>>
>> Signed-off-by: Lai Jiangshan <[email protected]>
>> ---
>> arch/x86/entry/entry_64.S | 22 ++++++++++++++++++++++
>> arch/x86/kernel/asm-offsets.c | 1 +
>> 2 files changed, 23 insertions(+)
>>
>> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
>> index a5f02d03c585..4190e668f346 100644
>> --- a/arch/x86/entry/entry_64.S
>> +++ b/arch/x86/entry/entry_64.S
>> @@ -1121,8 +1121,30 @@ SYM_CODE_START(asm_exc_nmi)
>> *
>> * We also must not push anything to the stack before switching
>> * stacks lest we corrupt the "NMI executing" variable.
>> + *
>> + * Before switching to the thread stack, it switches to the entry
>> + * stack first lest there is any data breakpoint in the thread
>> + * stack and the iret of #DB will cause NMI unmasked before
>> + * finishing switching.
>> */
>>
>> + /* Switch stack to entry stack */
>> + movq %rsp, %rdx
>> + addq $(+6*8 /* to NMI stack top */ \
>> + -EXCEPTION_STKSZ /* to NMI stack bottom */ \
>> + -CPU_ENTRY_AREA_nmi_stack /* to entry area */ \
>
> Just so that I understand this correctly. This "entry area" is not part
> of the NMI stack, but just at the bottom of it? That is, this part of
> the stack will never be touched by an NMI coming in from kernel space,
> correct?

This "entry area" is the pointer of current CPU's struct cpu_entry_area.

This instruction puts %rsp onto the top of the entry/trampoline stack
which is not touched by an NMI coming in from kernel space.

>
> -- Steve
>
>
>> + +CPU_ENTRY_AREA_entry_stack /* to entry stack bottom */\
>> + +SIZEOF_entry_stack /* to entry stack top */ \
>> + ), %rsp
>> +
>> + /* Stash exception frame and %rdx to entry stack */
>> + pushq 5*8(%rdx) /* pt_regs->ss */
>> + pushq 4*8(%rdx) /* pt_regs->rsp */
>> + pushq 3*8(%rdx) /* pt_regs->flags */
>> + pushq 2*8(%rdx) /* pt_regs->cs */
>> + pushq 1*8(%rdx) /* pt_regs->rip */
>> + pushq 0*8(%rdx) /* %rdx */
>> +
>> swapgs
>> cld
>> FENCE_SWAPGS_USER_ENTRY
>> diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
>> index ecd3fd6993d1..dfafa0c7e887 100644
>> --- a/arch/x86/kernel/asm-offsets.c
>> +++ b/arch/x86/kernel/asm-offsets.c
>> @@ -88,6 +88,7 @@ static void __used common(void)
>> OFFSET(CPU_ENTRY_AREA_entry_stack, cpu_entry_area, entry_stack_page);
>> DEFINE(SIZEOF_entry_stack, sizeof(struct entry_stack));
>> DEFINE(MASK_entry_stack, (~(sizeof(struct entry_stack) - 1)));
>> + OFFSET(CPU_ENTRY_AREA_nmi_stack, cpu_entry_area, estacks.NMI_stack);
>>
>> /* Offset for fields in tss_struct */
>> OFFSET(TSS_sp0, tss_struct, x86_tss.sp0);

2021-06-02 03:04:05

by Lai Jiangshan

[permalink] [raw]
Subject: Re: [RFC PATCH 1/4] x86/entry/nmi: Switch to the entry stack before switching to the thread stack



On 2021/6/2 01:05, Steven Rostedt wrote:
> On Tue, 1 Jun 2021 14:52:14 +0800
> Lai Jiangshan <[email protected]> wrote:
>
>> From: Lai Jiangshan <[email protected]>
>>
>> Current kernel has no code to enforce data breakpoint not on the thread
>> stack. If there is any data breakpoint on the top area of the thread
>> stack, there might be problem.
>>
>> For example, when NMI hits on userspace in this setting, the code copies
>> the exception frame from the NMI stack to the thread stack and it will
>> cause #DB and after #DB is handled, the not yet copied portion on the
>> NMI stack is in danger of corruption because the NMI is unmasked.
>>
>> Stashing the exception frame on the entry stack before touching the
>> entry stack can fix the problem.
>>
>> Signed-off-by: Lai Jiangshan <[email protected]>
>> ---
>> arch/x86/entry/entry_64.S | 22 ++++++++++++++++++++++
>> arch/x86/kernel/asm-offsets.c | 1 +
>> 2 files changed, 23 insertions(+)
>>
>> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
>> index a5f02d03c585..4190e668f346 100644
>> --- a/arch/x86/entry/entry_64.S
>> +++ b/arch/x86/entry/entry_64.S
>> @@ -1121,8 +1121,30 @@ SYM_CODE_START(asm_exc_nmi)
>> *
>> * We also must not push anything to the stack before switching
>> * stacks lest we corrupt the "NMI executing" variable.
>> + *
>> + * Before switching to the thread stack, it switches to the entry
>> + * stack first lest there is any data breakpoint in the thread
>> + * stack and the iret of #DB will cause NMI unmasked before
>> + * finishing switching.
>> */
>>
>> + /* Switch stack to entry stack */
>> + movq %rsp, %rdx
>> + addq $(+6*8 /* to NMI stack top */ \
>> + -EXCEPTION_STKSZ /* to NMI stack bottom */ \
>> + -CPU_ENTRY_AREA_nmi_stack /* to entry area */ \
>
> Just so that I understand this correctly. This "entry area" is not part
> of the NMI stack, but just at the bottom of it? That is, this part of
> the stack will never be touched by an NMI coming in from kernel space,
> correct?

The NMI stack, exception stacks, entry stack, TSS, GDT are part of this
"entry area" (struct cpu_entry_area).

>
> -- Steve
>
>
>> + +CPU_ENTRY_AREA_entry_stack /* to entry stack bottom */\
>> + +SIZEOF_entry_stack /* to entry stack top */ \
>> + ), %rsp
>> +
>> + /* Stash exception frame and %rdx to entry stack */
>> + pushq 5*8(%rdx) /* pt_regs->ss */
>> + pushq 4*8(%rdx) /* pt_regs->rsp */
>> + pushq 3*8(%rdx) /* pt_regs->flags */
>> + pushq 2*8(%rdx) /* pt_regs->cs */
>> + pushq 1*8(%rdx) /* pt_regs->rip */
>> + pushq 0*8(%rdx) /* %rdx */
>> +
>> swapgs
>> cld
>> FENCE_SWAPGS_USER_ENTRY
>> diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
>> index ecd3fd6993d1..dfafa0c7e887 100644
>> --- a/arch/x86/kernel/asm-offsets.c
>> +++ b/arch/x86/kernel/asm-offsets.c
>> @@ -88,6 +88,7 @@ static void __used common(void)
>> OFFSET(CPU_ENTRY_AREA_entry_stack, cpu_entry_area, entry_stack_page);
>> DEFINE(SIZEOF_entry_stack, sizeof(struct entry_stack));
>> DEFINE(MASK_entry_stack, (~(sizeof(struct entry_stack) - 1)));
>> + OFFSET(CPU_ENTRY_AREA_nmi_stack, cpu_entry_area, estacks.NMI_stack);
>>
>> /* Offset for fields in tss_struct */
>> OFFSET(TSS_sp0, tss_struct, x86_tss.sp0);

2021-06-19 22:53:26

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC PATCH 1/4] x86/entry/nmi: Switch to the entry stack before switching to the thread stack

On Tue, Jun 01 2021 at 14:52, Lai Jiangshan wrote:
> From: Lai Jiangshan <[email protected]>
>
> Current kernel has no code to enforce data breakpoint not on the thread
> stack. If there is any data breakpoint on the top area of the thread
> stack, there might be problem.

And because the kernel does not prevent data breakpoints on the thread
stack we need to do more complicated things in the already horrible
entry code instead of just doing the obvious and preventing data
breakpoints on the thread stack?

Confused.

Thanks,

tglx

2021-06-20 03:18:18

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RFC PATCH 1/4] x86/entry/nmi: Switch to the entry stack before switching to the thread stack



On Sat, Jun 19, 2021, at 3:51 PM, Thomas Gleixner wrote:
> On Tue, Jun 01 2021 at 14:52, Lai Jiangshan wrote:
> > From: Lai Jiangshan <[email protected]>
> >
> > Current kernel has no code to enforce data breakpoint not on the thread
> > stack. If there is any data breakpoint on the top area of the thread
> > stack, there might be problem.
>
> And because the kernel does not prevent data breakpoints on the thread
> stack we need to do more complicated things in the already horrible
> entry code instead of just doing the obvious and preventing data
> breakpoints on the thread stack?

Preventing breakpoints on the thread stack is a bit messy: it’s possible for a breakpoint to be set before the address in question is allocated for the thread stack.

None of this is NMI-specific. #DB itself has the same problem. We could plausibly solve it differently by disarming breakpoints in the entry asm before switching stacks. I’m not sure how much I like that approach.

>
> Confused.
>
> Thanks,
>
> tglx
>

2021-06-20 11:24:40

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC PATCH 1/4] x86/entry/nmi: Switch to the entry stack before switching to the thread stack

On Sat, Jun 19 2021 at 20:13, Andy Lutomirski wrote:
> On Sat, Jun 19, 2021, at 3:51 PM, Thomas Gleixner wrote:
>> On Tue, Jun 01 2021 at 14:52, Lai Jiangshan wrote:
>> > From: Lai Jiangshan <[email protected]>
>> >
>> > Current kernel has no code to enforce data breakpoint not on the thread
>> > stack. If there is any data breakpoint on the top area of the thread
>> > stack, there might be problem.
>>
>> And because the kernel does not prevent data breakpoints on the thread
>> stack we need to do more complicated things in the already horrible
>> entry code instead of just doing the obvious and preventing data
>> breakpoints on the thread stack?
>
> Preventing breakpoints on the thread stack is a bit messy: it’s
> possible for a breakpoint to be set before the address in question is
> allocated for the thread stack.

Bah.

> None of this is NMI-specific. #DB itself has the same problem.

Oh well.

> We could plausibly solve it differently by disarming breakpoints in
> the entry asm before switching stacks. I’m not sure how much I like
> that approach.

That's ugly and TBH in some sense is a breakpoint on the thread stack a
violation of noinstr. I rather see them prevented completely, but yes
that would have to be expanded to pretty much any variable which is
touched in noinstr sections. What a mess.

Thanks,

tglx


2021-06-25 10:42:45

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH 1/4] x86/entry/nmi: Switch to the entry stack before switching to the thread stack

On Sat, Jun 19, 2021 at 08:13:15PM -0700, Andy Lutomirski wrote:
>
>
> On Sat, Jun 19, 2021, at 3:51 PM, Thomas Gleixner wrote:
> > On Tue, Jun 01 2021 at 14:52, Lai Jiangshan wrote:
> > > From: Lai Jiangshan <[email protected]>
> > >
> > > Current kernel has no code to enforce data breakpoint not on the thread
> > > stack. If there is any data breakpoint on the top area of the thread
> > > stack, there might be problem.
> >
> > And because the kernel does not prevent data breakpoints on the thread
> > stack we need to do more complicated things in the already horrible
> > entry code instead of just doing the obvious and preventing data
> > breakpoints on the thread stack?
>
> Preventing breakpoints on the thread stack is a bit messy: it’s
> possible for a breakpoint to be set before the address in question is
> allocated for the thread stack.

How about we call into C from the entry stack and have the from-user
stack swizzle there. The from-kernel entries land on the ISTs and those
are already excluded.

> None of this is NMI-specific. #DB itself has the same problem. We
> could plausibly solve it differently by disarming breakpoints in the
> entry asm before switching stacks. I’m not sure how much I like that
> approach.

I'm not sure I see how, from-user #DB already doesn't clear DR7, and if
we recurse, we'll get a from-kernel trap, which will land on the IST,
whcih is excluded, and then we clear DR7 there.

IST and entry stack are excluded, the only problem we have is thread
stack, and that can be solved by calling into C from the entry stack.

I should put teaching objtool about .data references from .noinstr.text
and .entry.text higher on the todo list I suppose ...

2021-06-25 11:02:06

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH 1/4] x86/entry/nmi: Switch to the entry stack before switching to the thread stack

On Fri, Jun 25, 2021 at 12:40:53PM +0200, Peter Zijlstra wrote:
> On Sat, Jun 19, 2021 at 08:13:15PM -0700, Andy Lutomirski wrote:
> >
> >
> > On Sat, Jun 19, 2021, at 3:51 PM, Thomas Gleixner wrote:
> > > On Tue, Jun 01 2021 at 14:52, Lai Jiangshan wrote:
> > > > From: Lai Jiangshan <[email protected]>
> > > >
> > > > Current kernel has no code to enforce data breakpoint not on the thread
> > > > stack. If there is any data breakpoint on the top area of the thread
> > > > stack, there might be problem.
> > >
> > > And because the kernel does not prevent data breakpoints on the thread
> > > stack we need to do more complicated things in the already horrible
> > > entry code instead of just doing the obvious and preventing data
> > > breakpoints on the thread stack?
> >
> > Preventing breakpoints on the thread stack is a bit messy: it’s
> > possible for a breakpoint to be set before the address in question is
> > allocated for the thread stack.
>
> How about we call into C from the entry stack and have the from-user
> stack swizzle there. The from-kernel entries land on the ISTs and those
> are already excluded.
>
> > None of this is NMI-specific. #DB itself has the same problem. We
> > could plausibly solve it differently by disarming breakpoints in the
> > entry asm before switching stacks. I’m not sure how much I like that
> > approach.
>
> I'm not sure I see how, from-user #DB already doesn't clear DR7, and if
> we recurse, we'll get a from-kernel trap, which will land on the IST,
> whcih is excluded, and then we clear DR7 there.
>
> IST and entry stack are excluded, the only problem we have is thread
> stack, and that can be solved by calling into C from the entry stack.
>
> I should put teaching objtool about .data references from .noinstr.text
> and .entry.text higher on the todo list I suppose ...

Also, I think we can run the from-user exceptions on the entry stack,
without ever switching to the kernel stack, except for #PF, which is
magical and schedules.

Same for SYSCALL, leave switching to the thread stack until C, somewhere
late, right before we'd enable IRQs or something.

2021-06-26 07:04:26

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC PATCH 1/4] x86/entry/nmi: Switch to the entry stack before switching to the thread stack

On Fri, Jun 25 2021 at 13:00, Peter Zijlstra wrote:
> On Fri, Jun 25, 2021 at 12:40:53PM +0200, Peter Zijlstra wrote:
>> On Sat, Jun 19, 2021 at 08:13:15PM -0700, Andy Lutomirski wrote:
>> >
>> >
>> > On Sat, Jun 19, 2021, at 3:51 PM, Thomas Gleixner wrote:
>> > > On Tue, Jun 01 2021 at 14:52, Lai Jiangshan wrote:
>> > > > From: Lai Jiangshan <[email protected]>
>> > > >
>> > > > Current kernel has no code to enforce data breakpoint not on the thread
>> > > > stack. If there is any data breakpoint on the top area of the thread
>> > > > stack, there might be problem.
>> > >
>> > > And because the kernel does not prevent data breakpoints on the thread
>> > > stack we need to do more complicated things in the already horrible
>> > > entry code instead of just doing the obvious and preventing data
>> > > breakpoints on the thread stack?
>> >
>> > Preventing breakpoints on the thread stack is a bit messy: it’s
>> > possible for a breakpoint to be set before the address in question is
>> > allocated for the thread stack.
>>
>> How about we call into C from the entry stack and have the from-user
>> stack swizzle there. The from-kernel entries land on the ISTs and those
>> are already excluded.
>>
>> > None of this is NMI-specific. #DB itself has the same problem. We
>> > could plausibly solve it differently by disarming breakpoints in the
>> > entry asm before switching stacks. I’m not sure how much I like that
>> > approach.
>>
>> I'm not sure I see how, from-user #DB already doesn't clear DR7, and if
>> we recurse, we'll get a from-kernel trap, which will land on the IST,
>> whcih is excluded, and then we clear DR7 there.
>>
>> IST and entry stack are excluded, the only problem we have is thread
>> stack, and that can be solved by calling into C from the entry stack.
>>
>> I should put teaching objtool about .data references from .noinstr.text
>> and .entry.text higher on the todo list I suppose ...
>
> Also, I think we can run the from-user exceptions on the entry stack,
> without ever switching to the kernel stack, except for #PF, which is
> magical and schedules.

No. Pretty much any exception coming from user space can schedule and
even if it does not voluntary it can be preempted.

Thanks,

tglx

2021-06-26 08:30:19

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH 1/4] x86/entry/nmi: Switch to the entry stack before switching to the thread stack

On Sat, Jun 26, 2021 at 09:03:23AM +0200, Thomas Gleixner wrote:
> On Fri, Jun 25 2021 at 13:00, Peter Zijlstra wrote:
> > On Fri, Jun 25, 2021 at 12:40:53PM +0200, Peter Zijlstra wrote:
> >> On Sat, Jun 19, 2021 at 08:13:15PM -0700, Andy Lutomirski wrote:
> >> >
> >> >
> >> > On Sat, Jun 19, 2021, at 3:51 PM, Thomas Gleixner wrote:
> >> > > On Tue, Jun 01 2021 at 14:52, Lai Jiangshan wrote:
> >> > > > From: Lai Jiangshan <[email protected]>
> >> > > >
> >> > > > Current kernel has no code to enforce data breakpoint not on the thread
> >> > > > stack. If there is any data breakpoint on the top area of the thread
> >> > > > stack, there might be problem.
> >> > >
> >> > > And because the kernel does not prevent data breakpoints on the thread
> >> > > stack we need to do more complicated things in the already horrible
> >> > > entry code instead of just doing the obvious and preventing data
> >> > > breakpoints on the thread stack?
> >> >
> >> > Preventing breakpoints on the thread stack is a bit messy: it’s
> >> > possible for a breakpoint to be set before the address in question is
> >> > allocated for the thread stack.
> >>
> >> How about we call into C from the entry stack and have the from-user
> >> stack swizzle there. The from-kernel entries land on the ISTs and those
> >> are already excluded.
> >>
> >> > None of this is NMI-specific. #DB itself has the same problem. We
> >> > could plausibly solve it differently by disarming breakpoints in the
> >> > entry asm before switching stacks. I’m not sure how much I like that
> >> > approach.
> >>
> >> I'm not sure I see how, from-user #DB already doesn't clear DR7, and if
> >> we recurse, we'll get a from-kernel trap, which will land on the IST,
> >> whcih is excluded, and then we clear DR7 there.
> >>
> >> IST and entry stack are excluded, the only problem we have is thread
> >> stack, and that can be solved by calling into C from the entry stack.
> >>
> >> I should put teaching objtool about .data references from .noinstr.text
> >> and .entry.text higher on the todo list I suppose ...
> >
> > Also, I think we can run the from-user exceptions on the entry stack,
> > without ever switching to the kernel stack, except for #PF, which is
> > magical and schedules.
>
> No. Pretty much any exception coming from user space can schedule and
> even if it does not voluntary it can be preempted.

Won't most of them have IRQs disabled throughout? In any case, I think
we should only switch to the task stack right around the time we're
ready to enable IRQs just like for syscall/#PF, not earlier.