2016-12-15 16:47:25

by Thomas Gleixner

[permalink] [raw]
Subject: [patch 2/3] x86/process: Optimize TIF_BLOCKSTEP switch

Provide and use a seperate helper for toggling the DEBUGCTLMSR_BTF bit
instead of doing it open coded with a branch and eventually evaluating
boot_cpu_data twice.

x86_64:
3694 8505 16 12215 2fb7 Before
3662 8505 16 12183 2f97 After

i386:
5986 9388 1804 17178 431a Before
5906 9388 1804 17098 42ca After

Signed-off-by: Thomas Gleixner <[email protected]>
---
arch/x86/include/asm/processor.h | 12 ++++++++++++
arch/x86/kernel/process.c | 10 ++--------
2 files changed, 14 insertions(+), 8 deletions(-)

--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -676,6 +676,18 @@ static inline void update_debugctlmsr(un
wrmsrl(MSR_IA32_DEBUGCTLMSR, debugctlmsr);
}

+static inline void toggle_debugctlmsr(unsigned long mask)
+{
+ unsigned long msrval;
+
+#ifndef CONFIG_X86_DEBUGCTLMSR
+ if (boot_cpu_data.x86 < 6)
+ return;
+#endif
+ rdmsrl(MSR_IA32_DEBUGCTLMSR, msrval);
+ wrmsrl(MSR_IA32_DEBUGCTLMSR, msrval ^ mask);
+}
+
extern void set_task_blockstep(struct task_struct *task, bool on);

/* Boot loader type from the setup header: */
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -209,14 +209,8 @@ void __switch_to_xtra(struct task_struct

propagate_user_return_notify(prev_p, next_p);

- if ((tifp ^ tifn) & _TIF_BLOCKSTEP) {
- unsigned long debugctl = get_debugctlmsr();
-
- debugctl &= ~DEBUGCTLMSR_BTF;
- if (tifn & _TIF_BLOCKSTEP)
- debugctl |= DEBUGCTLMSR_BTF;
- update_debugctlmsr(debugctl);
- }
+ if ((tifp ^ tifn) & _TIF_BLOCKSTEP)
+ toggle_debugctlmsr(DEBUGCTLMSR_BTF);

if ((tifp ^ tifn) & _TIF_NOTSC) {
if (tifn & _TIF_NOTSC)



2016-12-15 17:29:16

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [patch 2/3] x86/process: Optimize TIF_BLOCKSTEP switch

On Thu, Dec 15, 2016 at 8:44 AM, Thomas Gleixner <[email protected]> wrote:
> Provide and use a seperate helper for toggling the DEBUGCTLMSR_BTF bit
> instead of doing it open coded with a branch and eventually evaluating
> boot_cpu_data twice.
>
> x86_64:
> 3694 8505 16 12215 2fb7 Before
> 3662 8505 16 12183 2f97 After
>
> i386:
> 5986 9388 1804 17178 431a Before
> 5906 9388 1804 17098 42ca After
>
> Signed-off-by: Thomas Gleixner <[email protected]>
> ---
> arch/x86/include/asm/processor.h | 12 ++++++++++++
> arch/x86/kernel/process.c | 10 ++--------
> 2 files changed, 14 insertions(+), 8 deletions(-)
>
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -676,6 +676,18 @@ static inline void update_debugctlmsr(un
> wrmsrl(MSR_IA32_DEBUGCTLMSR, debugctlmsr);
> }
>
> +static inline void toggle_debugctlmsr(unsigned long mask)
> +{
> + unsigned long msrval;
> +
> +#ifndef CONFIG_X86_DEBUGCTLMSR
> + if (boot_cpu_data.x86 < 6)
> + return;
> +#endif
> + rdmsrl(MSR_IA32_DEBUGCTLMSR, msrval);
> + wrmsrl(MSR_IA32_DEBUGCTLMSR, msrval ^ mask);
> +}
> +

This scares me. If the MSR ever gets out of sync with the TI flag,
this will malfunction. And IIRC the MSR is highly magical and the CPU
clears it all by itself under a variety of not-so-well documented
circumstances.

How about adding a real feature bit and doing:

if (!static_cpu_has(X86_FEATURE_BLOCKSTEP))
return;

rdmsrl(MSR_IA32_DEBUGCTLMSR, msrval);
msrval &= DEBUGCTLMSR_BTF;
msrval |= (tifn >> TIF_BLOCKSTEP) << DEBUGCTLMSR_BIT;

--Andy

2016-12-16 08:51:14

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [patch 2/3] x86/process: Optimize TIF_BLOCKSTEP switch

On Thu, 15 Dec 2016, Andy Lutomirski wrote:
> On Thu, Dec 15, 2016 at 8:44 AM, Thomas Gleixner <[email protected]> wrote:
> > +static inline void toggle_debugctlmsr(unsigned long mask)
> > +{
> > + unsigned long msrval;
> > +
> > +#ifndef CONFIG_X86_DEBUGCTLMSR
> > + if (boot_cpu_data.x86 < 6)
> > + return;
> > +#endif
> > + rdmsrl(MSR_IA32_DEBUGCTLMSR, msrval);
> > + wrmsrl(MSR_IA32_DEBUGCTLMSR, msrval ^ mask);
> > +}
> > +
>
> This scares me. If the MSR ever gets out of sync with the TI flag,
> this will malfunction. And IIRC the MSR is highly magical and the CPU
> clears it all by itself under a variety of not-so-well documented
> circumstances.

If that is true, then the code today is broken as well, when the flag has
been cleared and both prev and next have the flag set. Then it won't be
updated for the next task.

The we should not use the TIF flag and store a debugmask in thread info and
do:

if (prev->debugmask || next->debugmask) {
if (static_cpu_has(X86_FEATURE_BLOCKSTEP)) {
rdmsrl(MSR_IA32_DEBUGCTLMSR, msrval);
msrval &= DEBUGCTLMSR_BTF;
msrval |= next->debugmask;
}
}

Thanks,

tglx


2016-12-16 19:23:18

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [patch 2/3] x86/process: Optimize TIF_BLOCKSTEP switch

On Fri, Dec 16, 2016 at 12:47 AM, Thomas Gleixner <[email protected]> wrote:
> On Thu, 15 Dec 2016, Andy Lutomirski wrote:
>> On Thu, Dec 15, 2016 at 8:44 AM, Thomas Gleixner <[email protected]> wrote:
>> > +static inline void toggle_debugctlmsr(unsigned long mask)
>> > +{
>> > + unsigned long msrval;
>> > +
>> > +#ifndef CONFIG_X86_DEBUGCTLMSR
>> > + if (boot_cpu_data.x86 < 6)
>> > + return;
>> > +#endif
>> > + rdmsrl(MSR_IA32_DEBUGCTLMSR, msrval);
>> > + wrmsrl(MSR_IA32_DEBUGCTLMSR, msrval ^ mask);
>> > +}
>> > +
>>
>> This scares me. If the MSR ever gets out of sync with the TI flag,
>> this will malfunction. And IIRC the MSR is highly magical and the CPU
>> clears it all by itself under a variety of not-so-well documented
>> circumstances.
>
> If that is true, then the code today is broken as well, when the flag has
> been cleared and both prev and next have the flag set. Then it won't be
> updated for the next task.
>
> The we should not use the TIF flag and store a debugmask in thread info and
> do:
>
> if (prev->debugmask || next->debugmask) {
> if (static_cpu_has(X86_FEATURE_BLOCKSTEP)) {
> rdmsrl(MSR_IA32_DEBUGCTLMSR, msrval);
> msrval &= DEBUGCTLMSR_BTF;
> msrval |= next->debugmask;
> }
> }

Seems reasonable to me. Although keeping it in flags might simplify
the logic a bit. FWIW, I doubt we care about performance much when
either prev or next has the bit set.

--Andy