2016-12-15 16:48:02

by Thomas Gleixner

[permalink] [raw]
Subject: [patch 1/3] x86/process: Optimize TIF checks in switch_to_extra()

Help the compiler to avoid reevaluating the thread flags for each checked
bit by reordering the bit checks and providing an explicit xor for
evaluation.

x8664: arch/x86/kernel/process.o
text data bss dec hex
3726 8505 16 12247 2fd7 Before
3694 8505 16 12215 2fb7 After

i386: No change

Originally-from: Kyle Huey <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
---
arch/x86/kernel/process.c | 54 ++++++++++++++++++++++++++--------------------
1 file changed, 31 insertions(+), 23 deletions(-)

--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -174,48 +174,56 @@ int set_tsc_mode(unsigned int val)
return 0;
}

+static inline void switch_to_bitmap(struct tss_struct *tss,
+ struct thread_struct *prev,
+ struct thread_struct *next,
+ unsigned long tifp, unsigned long tifn)
+{
+ if (tifn & _TIF_IO_BITMAP) {
+ /*
+ * Copy the relevant range of the IO bitmap.
+ * Normally this is 128 bytes or less:
+ */
+ memcpy(tss->io_bitmap, next->io_bitmap_ptr,
+ max(prev->io_bitmap_max, next->io_bitmap_max));
+ } else if (tifp & _TIF_IO_BITMAP) {
+ /*
+ * Clear any possible leftover bits:
+ */
+ memset(tss->io_bitmap, 0xff, prev->io_bitmap_max);
+ }
+}
+
void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
struct tss_struct *tss)
{
struct thread_struct *prev, *next;
+ unsigned long tifp, tifn;

prev = &prev_p->thread;
next = &next_p->thread;

- if (test_tsk_thread_flag(prev_p, TIF_BLOCKSTEP) ^
- test_tsk_thread_flag(next_p, TIF_BLOCKSTEP)) {
+ tifn = task_thread_info(next_p)->flags;
+ tifp = task_thread_info(prev_p)->flags;
+ switch_to_bitmap(tss, prev, next, tifp, tifn);
+
+ propagate_user_return_notify(prev_p, next_p);
+
+ if ((tifp ^ tifn) & _TIF_BLOCKSTEP) {
unsigned long debugctl = get_debugctlmsr();

debugctl &= ~DEBUGCTLMSR_BTF;
- if (test_tsk_thread_flag(next_p, TIF_BLOCKSTEP))
+ if (tifn & _TIF_BLOCKSTEP)
debugctl |= DEBUGCTLMSR_BTF;
-
update_debugctlmsr(debugctl);
}

- if (test_tsk_thread_flag(prev_p, TIF_NOTSC) ^
- test_tsk_thread_flag(next_p, TIF_NOTSC)) {
- /* prev and next are different */
- if (test_tsk_thread_flag(next_p, TIF_NOTSC))
+ if ((tifp ^ tifn) & _TIF_NOTSC) {
+ if (tifn & _TIF_NOTSC)
hard_disable_TSC();
else
hard_enable_TSC();
}
-
- if (test_tsk_thread_flag(next_p, TIF_IO_BITMAP)) {
- /*
- * Copy the relevant range of the IO bitmap.
- * Normally this is 128 bytes or less:
- */
- memcpy(tss->io_bitmap, next->io_bitmap_ptr,
- max(prev->io_bitmap_max, next->io_bitmap_max));
- } else if (test_tsk_thread_flag(prev_p, TIF_IO_BITMAP)) {
- /*
- * Clear any possible leftover bits:
- */
- memset(tss->io_bitmap, 0xff, prev->io_bitmap_max);
- }
- propagate_user_return_notify(prev_p, next_p);
}

/*



2016-12-15 17:20:28

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [patch 1/3] x86/process: Optimize TIF checks in switch_to_extra()

On Thu, Dec 15, 2016 at 04:44:02PM -0000, Thomas Gleixner wrote:
> void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
> struct tss_struct *tss)
> {
> struct thread_struct *prev, *next;
> + unsigned long tifp, tifn;
>
> prev = &prev_p->thread;
> next = &next_p->thread;
>
> + tifn = task_thread_info(next_p)->flags;
> + tifp = task_thread_info(prev_p)->flags;
> + switch_to_bitmap(tss, prev, next, tifp, tifn);
> +
> + propagate_user_return_notify(prev_p, next_p);
> +
> + if ((tifp ^ tifn) & _TIF_BLOCKSTEP) {
> unsigned long debugctl = get_debugctlmsr();
>
> debugctl &= ~DEBUGCTLMSR_BTF;
> + if (tifn & _TIF_BLOCKSTEP)
> debugctl |= DEBUGCTLMSR_BTF;
> update_debugctlmsr(debugctl);
> }

Going by the toggle patter you have elsewhere, wouldn't that then be:

if ((tifp ^ tifn) & _TIF_BLOCKSTEP) {
unsigned long debugctl = get_debugctlmsr();

debugctl ^= DEBUGCTLMSR_BTF;
update_debugctlmsr(debugctl);
}

?

2016-12-15 17:25:23

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [patch 1/3] x86/process: Optimize TIF checks in switch_to_extra()

On Thu, Dec 15, 2016 at 8:44 AM, Thomas Gleixner <[email protected]> wrote:
> - if (test_tsk_thread_flag(prev_p, TIF_BLOCKSTEP) ^
> - test_tsk_thread_flag(next_p, TIF_BLOCKSTEP)) {
> + tifn = task_thread_info(next_p)->flags;
> + tifp = task_thread_info(prev_p)->flags;

Minor nit, but I think that a sufficiently clever compiler could
interpret this to mean "no one else is modifying these flags, so I can
do clever crazy things". Wrapping these in READ_ONCE might be
helpful.

2016-12-15 17:29:17

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [patch 1/3] x86/process: Optimize TIF checks in switch_to_extra()

On Thu, 15 Dec 2016, Peter Zijlstra wrote:
> On Thu, Dec 15, 2016 at 04:44:02PM -0000, Thomas Gleixner wrote:
> > void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p,
> > struct tss_struct *tss)
> > {
> > struct thread_struct *prev, *next;
> > + unsigned long tifp, tifn;
> >
> > prev = &prev_p->thread;
> > next = &next_p->thread;
> >
> > + tifn = task_thread_info(next_p)->flags;
> > + tifp = task_thread_info(prev_p)->flags;
> > + switch_to_bitmap(tss, prev, next, tifp, tifn);
> > +
> > + propagate_user_return_notify(prev_p, next_p);
> > +
> > + if ((tifp ^ tifn) & _TIF_BLOCKSTEP) {
> > unsigned long debugctl = get_debugctlmsr();
> >
> > debugctl &= ~DEBUGCTLMSR_BTF;
> > + if (tifn & _TIF_BLOCKSTEP)
> > debugctl |= DEBUGCTLMSR_BTF;
> > update_debugctlmsr(debugctl);
> > }
>
> Going by the toggle patter you have elsewhere, wouldn't that then be:
>
> if ((tifp ^ tifn) & _TIF_BLOCKSTEP) {
> unsigned long debugctl = get_debugctlmsr();
>
> debugctl ^= DEBUGCTLMSR_BTF;
> update_debugctlmsr(debugctl);
> }

See the next patch

2016-12-15 17:33:21

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [patch 1/3] x86/process: Optimize TIF checks in switch_to_extra()

On Thu, Dec 15, 2016 at 06:26:28PM +0100, Thomas Gleixner wrote:
> See the next patch

Duh, I'm an idiot. For some reason I though this one got missed.