2020-05-21 20:38:21

by Thomas Gleixner

[permalink] [raw]
Subject: [patch V9 00/39] x86/entry: Rework leftovers (was part V)

Folks!

This is V9 of the rework series. V7 and V8 were never posted but I used the
version numbers for tags while fixing up 0day complaints. The last posted
version was V6 which can be found here:

https://lore.kernel.org/r/[email protected]

The V9 leftover series is based on:

git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/entry

That branch contains the merged part 1-4 of the original 5 part series.

V9 has the following changes vs. V6:

- Rebase on tip x86/entry

- Simplified the hardware latency detector changes by moving the
invocation to the right place in nmi_enter/exit() and annotate it.

- Reworked the conditional RCU handling so it is now used
unconditionally everywhere. That simplified the idtentry_enter/exit
code significantly and also allowed to simplify the XEN hypercall
voluntary preemption handling.

- Moved the run on irq stack logic into an inline to avoid having the
same conditionals all over the place and fixed up the relevant places.

- Picked up Acked-by and Reviewed-by tags where appropriate.

The full series is available from:

git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git entry-v9-the-rest

If we agree on the RCU changes, then these will be applied into the core/rcu branch
first so Paul can pick them up to avoid the next conflict horrors.

Thanks,

tglx


2020-05-22 07:38:51

by Andrew Cooper

[permalink] [raw]
Subject: Re: [patch V9 00/39] x86/entry: Rework leftovers (was part V)

On 21/05/2020 21:05, Thomas Gleixner wrote:
> Folks!
>
> This is V9 of the rework series. V7 and V8 were never posted but I used the
> version numbers for tags while fixing up 0day complaints. The last posted
> version was V6 which can be found here:
>
> https://lore.kernel.org/r/[email protected]
>
> The V9 leftover series is based on:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/entry
>
> That branch contains the merged part 1-4 of the original 5 part series.
>
> V9 has the following changes vs. V6:
>
> - Rebase on tip x86/entry

Apologies for opening a related can of worms.

The new debug_enter() has propagated a pre-existing issue forward,
ultimately caused by bad advice in the SDM.

Because the RTM status bit in DR6 has inverted polarity, writing DR6 to
0 causes RTM to appear asserted to any logic which cares, despite RTM
debugging not being enabled.  The same is true in principle for what is
handed to userspace via u_debugreg[DR_STATUS].

On the subject of DR6, the SDM now reads:

"Certain debug exceptions may clear bits 0-3. The remaining contents of
the DR6 register are never cleared by the processor. To avoid confusion
in identifying debug exceptions, debug handlers should clear the
register (except bit 16, which they should set) before returning to the
interrupted task."

First of all, that should read "are never de-asserted by the processor"
rather than "cleared", but the advice has still failed to learn from its
first mistake.  The forward-compatible way to fix this is to set
DR6_DEFAULT (0xffff0ff0) which also covers future inverted polarity bits.

As for what to do about userspace, that is harder.  One approach is to
express everything in terms of positive polarity (i.e. pass on dr6 ^
DR6_DEFAULT), so DR6_RTM only appears set when RTM debugging is
enabled.  This approach is already taken with the VMCS PENDING_DBG
field, so there is at least previous form.

I realise that "do nothing" might be acceptable at this point, given the
lack of support for RTM debugging.

Thanks,

~Andrew

2020-05-22 14:32:12

by Boris Ostrovsky

[permalink] [raw]
Subject: Re: [patch V9 00/39] x86/entry: Rework leftovers (was part V)

On 5/21/20 4:05 PM, Thomas Gleixner wrote:
>
> The full series is available from:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git entry-v9-the-rest


Did you mean noinstr-v9-the-rest? I don't see entry-v9-the-rest tag.


(Also, this series as posted probably won't build. At least based on
definition of get_and_clear_inhcall() in patch 13)


-boris




2020-05-22 17:50:11

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [patch V9 00/39] x86/entry: Rework leftovers (was part V)

Boris Ostrovsky <[email protected]> writes:
> On 5/21/20 4:05 PM, Thomas Gleixner wrote:
>>
>> The full series is available from:
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git entry-v9-the-rest
>
>
> Did you mean noinstr-v9-the-rest? I don't see entry-v9-the-rest tag.

Bah. Yes.

> (Also, this series as posted probably won't build. At least based on
> definition of get_and_clear_inhcall() in patch 13)

Darn. I'm very sure that I built this and then did some final cleanups.

Lemme fix that.



2020-05-22 18:10:31

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [patch V9 00/39] x86/entry: Rework leftovers (was part V)

Thomas Gleixner <[email protected]> writes:
> Boris Ostrovsky <[email protected]> writes:
>> On 5/21/20 4:05 PM, Thomas Gleixner wrote:
>>>
>>> The full series is available from:
>>>
>>> git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git entry-v9-the-rest
>>
>>
>> Did you mean noinstr-v9-the-rest? I don't see entry-v9-the-rest tag.
>
> Bah. Yes.
>
>> (Also, this series as posted probably won't build. At least based on
>> definition of get_and_clear_inhcall() in patch 13)
>
> Darn. I'm very sure that I built this and then did some final cleanups.

Just figured out why i did not notice: The final test had preemption
enabled ...

Fixed and pushed and the tag is now correct

git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git entry-v9-the-rest

/me goes and rumages in the brown paperbag supply stash.

Thanks,

tglx

2020-05-22 21:22:14

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [patch V9 00/39] x86/entry: Rework leftovers (was part V)

On Fri, May 22, 2020 at 08:20:15AM +0100, Andrew Cooper wrote:
> Apologies for opening a related can of worms.
>
> The new debug_enter() has propagated a pre-existing issue forward,
> ultimately caused by bad advice in the SDM.
>
> Because the RTM status bit in DR6 has inverted polarity, writing DR6 to
> 0 causes RTM to appear asserted to any logic which cares, despite RTM
> debugging not being enabled.? The same is true in principle for what is
> handed to userspace via u_debugreg[DR_STATUS].
>
> On the subject of DR6, the SDM now reads:
>
> "Certain debug exceptions may clear bits 0-3. The remaining contents of
> the DR6 register are never cleared by the processor. To avoid confusion
> in identifying debug exceptions, debug handlers should clear the
> register (except bit 16, which they should set) before returning to the
> interrupted task."

*URGH*

> First of all, that should read "are never de-asserted by the processor"
> rather than "cleared", but the advice has still failed to learn from its
> first mistake.? The forward-compatible way to fix this is to set
> DR6_DEFAULT (0xffff0ff0) which also covers future inverted polarity bits.
>
> As for what to do about userspace, that is harder.? One approach is to
> express everything in terms of positive polarity (i.e. pass on dr6 ^
> DR6_DEFAULT), so DR6_RTM only appears set when RTM debugging is
> enabled.? This approach is already taken with the VMCS PENDING_DBG
> field, so there is at least previous form.
>
> I realise that "do nothing" might be acceptable at this point, given the
> lack of support for RTM debugging.

This! I'm thinking "do nothing" is, at this moment, the right thing to
do. If/when someone goes and tries to make RTM debugging work, they get
to figure out how to deal with this mess.

2020-05-26 04:37:16

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [patch V9 00/39] x86/entry: Rework leftovers (was part V)

On Thu, May 21, 2020 at 1:31 PM Thomas Gleixner <[email protected]> wrote:
>
> Folks!
>
> This is V9 of the rework series. V7 and V8 were never posted but I used the
> version numbers for tags while fixing up 0day complaints. The last posted
> version was V6 which can be found here:

The whole pile is Acked-by: Andy Lutomirski <[email protected]>

Go test on Linus' new AMD laptop!

--Andy

2020-06-03 19:21:25

by Andrew Cooper

[permalink] [raw]
Subject: Re: [patch V9 00/39] x86/entry: Rework leftovers (was part V)

On 22/05/2020 22:17, Peter Zijlstra wrote:
> On Fri, May 22, 2020 at 08:20:15AM +0100, Andrew Cooper wrote:
>> Apologies for opening a related can of worms.
>>
>> The new debug_enter() has propagated a pre-existing issue forward,
>> ultimately caused by bad advice in the SDM.
>>
>> Because the RTM status bit in DR6 has inverted polarity, writing DR6 to
>> 0 causes RTM to appear asserted to any logic which cares, despite RTM
>> debugging not being enabled.  The same is true in principle for what is
>> handed to userspace via u_debugreg[DR_STATUS].
>>
>> On the subject of DR6, the SDM now reads:
>>
>> "Certain debug exceptions may clear bits 0-3. The remaining contents of
>> the DR6 register are never cleared by the processor. To avoid confusion
>> in identifying debug exceptions, debug handlers should clear the
>> register (except bit 16, which they should set) before returning to the
>> interrupted task."
> *URGH*
>
>> First of all, that should read "are never de-asserted by the processor"
>> rather than "cleared", but the advice has still failed to learn from its
>> first mistake.  The forward-compatible way to fix this is to set
>> DR6_DEFAULT (0xffff0ff0) which also covers future inverted polarity bits.
>>
>> As for what to do about userspace, that is harder.  One approach is to
>> express everything in terms of positive polarity (i.e. pass on dr6 ^
>> DR6_DEFAULT), so DR6_RTM only appears set when RTM debugging is
>> enabled.  This approach is already taken with the VMCS PENDING_DBG
>> field, so there is at least previous form.
>>
>> I realise that "do nothing" might be acceptable at this point, given the
>> lack of support for RTM debugging.
> This! I'm thinking "do nothing" is, at this moment, the right thing to
> do. If/when someone goes and tries to make RTM debugging work, they get
> to figure out how to deal with this mess.

Well that didn't last long...

The new ISE (rev 39, published today) introduces BUS LOCK DEBUG
EXCEPTION which is now a second inverted polarity sticky bit (bit 11) in
%dr6.

This one is liable to get more traction than RTM debugging, so something
probably does want fixing in the #DB handler.

~Andrew

2020-06-04 13:31:14

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [patch V9 00/39] x86/entry: Rework leftovers (was part V)

On Wed, Jun 03, 2020 at 08:18:44PM +0100, Andrew Cooper wrote:

> Well that didn't last long...
>
> The new ISE (rev 39, published today) introduces BUS LOCK DEBUG
> EXCEPTION which is now a second inverted polarity sticky bit (bit 11) in
> %dr6.
>
> This one is liable to get more traction than RTM debugging, so something
> probably does want fixing in the #DB handler.

Well that's crap :-(

It being enabled through IA32_DEBUGCTL instead of through DR7 means that
the current code doesn't disable it and this then means we can have
nested #DB again.

Who sodding throught this was a good idea ?! What happened to #AC that
SLD currently uses?

What hardware will this be in and can we get this fixed?

2020-06-04 13:32:39

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [patch V9 00/39] x86/entry: Rework leftovers (was part V)

On 04/06/20 15:25, Peter Zijlstra wrote:
> It being enabled through IA32_DEBUGCTL instead of through DR7 means that
> the current code doesn't disable it and this then means we can have
> nested #DB again.

/me bangs head on door

> Who sodding throught this was a good idea ?! What happened to #AC that
> SLD currently uses?

It was per-core and (presumably) considered unfixable?

Paolo

2020-06-04 13:41:04

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [patch V9 00/39] x86/entry: Rework leftovers (was part V)

On Thu, Jun 04, 2020 at 03:29:26PM +0200, Paolo Bonzini wrote:
> On 04/06/20 15:25, Peter Zijlstra wrote:
> > It being enabled through IA32_DEBUGCTL instead of through DR7 means that
> > the current code doesn't disable it and this then means we can have
> > nested #DB again.
>
> /me bangs head on door
>
> > Who sodding throught this was a good idea ?! What happened to #AC that
> > SLD currently uses?
>
> It was per-core and (presumably) considered unfixable?

Yeah, but I don't see how changing the exception vector helps with that.
#DB is an IST, and it must be, because of that lovely MOV SS thing. #AC
has none of that, _please_ use #AC.

2020-06-04 15:47:48

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [patch V9 00/39] x86/entry: Rework leftovers (was part V)

On Thu, Jun 4, 2020 at 6:35 AM Peter Zijlstra <[email protected]> wrote:
>
> On Thu, Jun 04, 2020 at 03:29:26PM +0200, Paolo Bonzini wrote:
> > On 04/06/20 15:25, Peter Zijlstra wrote:
> > > It being enabled through IA32_DEBUGCTL instead of through DR7 means that
> > > the current code doesn't disable it and this then means we can have
> > > nested #DB again.
> >
> > /me bangs head on door
> >
> > > Who sodding throught this was a good idea ?! What happened to #AC that
> > > SLD currently uses?
> >
> > It was per-core and (presumably) considered unfixable?
>
> Yeah, but I don't see how changing the exception vector helps with that.
> #DB is an IST, and it must be, because of that lovely MOV SS thing. #AC
> has none of that, _please_ use #AC.

x86 is not an architecture. x86 is a gauntlet through which operating
system developers must run.

I think we can tolerate this particular mess -- can't we just say that
a BUS LOCK DEBUG EXCEPTION is fatal if it came from kernel mode? So
what if it nests inside #DB -- we can survive an oops from a corrupt
context like that.

2020-06-04 15:58:56

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [patch V9 00/39] x86/entry: Rework leftovers (was part V)

On Thu, Jun 04, 2020 at 08:42:52AM -0700, Andy Lutomirski wrote:

> x86 is not an architecture. x86 is a gauntlet through which operating
> system developers must run.

That made my day :-)

> I think we can tolerate this particular mess -- can't we just say that
> a BUS LOCK DEBUG EXCEPTION is fatal if it came from kernel mode? So
> what if it nests inside #DB -- we can survive an oops from a corrupt
> context like that.

Yes, SLD or this new thing is unconditionally fatal when from kernel
space. As long as we can get to the OOPS with our stacks completely
wrecked, we should be good I think.

We'll just need to make this one of the very first things is checks for,
to minimize the amount of code ran before OOPSing, so at to minimize the
risk for recursive exceptions.

All signs of excellent design, I'm sure.