2016-12-15 16:47:27

by Thomas Gleixner

[permalink] [raw]
Subject: [patch 3/3] x86/process: Optimize TIF_NOTSC switch

Provide and use a toggle helper instead of doing it with a branch.

x86_64:
3662 8505 16 12183 2f97 Before
3646 8505 16 12167 2f87 After

i386:
5906 9388 1804 17098 42ca Before
5834 9324 1740 16898 4202 After

Signed-off-by: Thomas Gleixner <[email protected]>
---
arch/x86/include/asm/tlbflush.h | 10 ++++++++++
arch/x86/kernel/process.c | 22 ++++------------------
2 files changed, 14 insertions(+), 18 deletions(-)

--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -110,6 +110,16 @@ static inline void cr4_clear_bits(unsign
}
}

+static inline void cr4_toggle_bits(unsigned long mask)
+{
+ unsigned long cr4;
+
+ cr4 = this_cpu_read(cpu_tlbstate.cr4);
+ cr4 ^= mask;
+ this_cpu_write(cpu_tlbstate.cr4, cr4);
+ __write_cr4(cr4);
+}
+
/* Read the CR4 shadow. */
static inline unsigned long cr4_read_shadow(void)
{
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -116,11 +116,6 @@ void flush_thread(void)
fpu__clear(&tsk->thread.fpu);
}

-static void hard_disable_TSC(void)
-{
- cr4_set_bits(X86_CR4_TSD);
-}
-
void disable_TSC(void)
{
preempt_disable();
@@ -129,15 +124,10 @@ void disable_TSC(void)
* Must flip the CPU state synchronously with
* TIF_NOTSC in the current running context.
*/
- hard_disable_TSC();
+ cr4_set_bits(X86_CR4_TSD);
preempt_enable();
}

-static void hard_enable_TSC(void)
-{
- cr4_clear_bits(X86_CR4_TSD);
-}
-
static void enable_TSC(void)
{
preempt_disable();
@@ -146,7 +136,7 @@ static void enable_TSC(void)
* Must flip the CPU state synchronously with
* TIF_NOTSC in the current running context.
*/
- hard_enable_TSC();
+ cr4_clear_bits(X86_CR4_TSD);
preempt_enable();
}

@@ -212,12 +202,8 @@ void __switch_to_xtra(struct task_struct
if ((tifp ^ tifn) & _TIF_BLOCKSTEP)
toggle_debugctlmsr(DEBUGCTLMSR_BTF);

- if ((tifp ^ tifn) & _TIF_NOTSC) {
- if (tifn & _TIF_NOTSC)
- hard_disable_TSC();
- else
- hard_enable_TSC();
- }
+ if ((tifp ^ tifn) & _TIF_NOTSC)
+ cr4_toggle_bits(X86_CR4_TSD);
}

/*



2016-12-15 17:32:25

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [patch 3/3] x86/process: Optimize TIF_NOTSC switch

On Thu, Dec 15, 2016 at 8:44 AM, Thomas Gleixner <[email protected]> wrote:
> Provide and use a toggle helper instead of doing it with a branch.
>
> x86_64:
> 3662 8505 16 12183 2f97 Before
> 3646 8505 16 12167 2f87 After
>
> i386:
> 5906 9388 1804 17098 42ca Before
> 5834 9324 1740 16898 4202 After
>
> Signed-off-by: Thomas Gleixner <[email protected]>
> ---
> arch/x86/include/asm/tlbflush.h | 10 ++++++++++
> arch/x86/kernel/process.c | 22 ++++------------------
> 2 files changed, 14 insertions(+), 18 deletions(-)
>
> --- a/arch/x86/include/asm/tlbflush.h
> +++ b/arch/x86/include/asm/tlbflush.h
> @@ -110,6 +110,16 @@ static inline void cr4_clear_bits(unsign
> }
> }
>
> +static inline void cr4_toggle_bits(unsigned long mask)
> +{
> + unsigned long cr4;
> +
> + cr4 = this_cpu_read(cpu_tlbstate.cr4);
> + cr4 ^= mask;
> + this_cpu_write(cpu_tlbstate.cr4, cr4);
> + __write_cr4(cr4);
> +}

This scares me for the same reason as BTF, although this should at
least be less fragile. But how about:

static inline void cr4_set_bit_to(unsigned long mask, bool set)
{
...
cr4 &= ~mask;
cr4 ^= (set << ilog2(mask));
...
}

This should generate code that's nearly as good.

2016-12-16 08:53:32

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [patch 3/3] x86/process: Optimize TIF_NOTSC switch

On Thu, 15 Dec 2016, Andy Lutomirski wrote:
> On Thu, Dec 15, 2016 at 8:44 AM, Thomas Gleixner <[email protected]> wrote:
> > +static inline void cr4_toggle_bits(unsigned long mask)
> > +{
> > + unsigned long cr4;
> > +
> > + cr4 = this_cpu_read(cpu_tlbstate.cr4);
> > + cr4 ^= mask;
> > + this_cpu_write(cpu_tlbstate.cr4, cr4);
> > + __write_cr4(cr4);
> > +}
>
> This scares me for the same reason as BTF, although this should at
> least be less fragile. But how about:

If that is fragile then all cr4 manipulation code is fragile because it
relies on cpu_tlbstate.cr4. The TIF flag and that per cpu thing are kept in
sync.

Thanks,

tglx



2016-12-16 18:35:06

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [patch 3/3] x86/process: Optimize TIF_NOTSC switch

On Fri, Dec 16, 2016 at 12:50 AM, Thomas Gleixner <[email protected]> wrote:
> On Thu, 15 Dec 2016, Andy Lutomirski wrote:
>> On Thu, Dec 15, 2016 at 8:44 AM, Thomas Gleixner <[email protected]> wrote:
>> > +static inline void cr4_toggle_bits(unsigned long mask)
>> > +{
>> > + unsigned long cr4;
>> > +
>> > + cr4 = this_cpu_read(cpu_tlbstate.cr4);
>> > + cr4 ^= mask;
>> > + this_cpu_write(cpu_tlbstate.cr4, cr4);
>> > + __write_cr4(cr4);
>> > +}
>>
>> This scares me for the same reason as BTF, although this should at
>> least be less fragile. But how about:
>
> If that is fragile then all cr4 manipulation code is fragile because it
> relies on cpu_tlbstate.cr4. The TIF flag and that per cpu thing are kept in
> sync.

True.