LinuxLists.cc - Re: [PATCH 2/9] x86/asm/entry/32: Use PUSH instructions to build pt

2015-04-01 08:51:50

Subject: Re: [PATCH 2/9] x86/asm/entry/32: Use PUSH instructions to build pt_regs on stack

* Denys Vlasenko <[email protected]> wrote:

> This mimics the recent similar 64-bit change.
> Saves ~110 bytes of code.
>
> Patch was run-tested on 32 and 64 bits, Intel and AMD CPU.
> I also looked at the diff of entry_64.o disassembly, to have
> a different view of the changes.

The other important question would be: what performance difference (if
any) did you observe before/after the change?

Thanks,

Ingo

2015-04-01 13:13:26

by Denys Vlasenko

[permalink] [raw]

Subject: Re: [PATCH 2/9] x86/asm/entry/32: Use PUSH instructions to build pt_regs on stack

On 04/01/2015 10:51 AM, Ingo Molnar wrote:
>
> * Denys Vlasenko <[email protected]> wrote:
>
>> This mimics the recent similar 64-bit change.
>> Saves ~110 bytes of code.
>>
>> Patch was run-tested on 32 and 64 bits, Intel and AMD CPU.
>> I also looked at the diff of entry_64.o disassembly, to have
>> a different view of the changes.
>
> The other important question would be: what performance difference (if
> any) did you observe before/after the change?

I did not measure it then.

At the moment I don't have AMD CPUs here, cant benchmark
32-bit syscall-based codepath.

On a Sandy Bridge CPU (IOW: sysenter codepath) -

Before: 78.57 ns per getpid
After: 76.90 ns per getpid

It's better than I thought it would be.
Probably because this load:

movl ASM_THREAD_INFO(TI_sysenter_return, %rsp, 0), %r10d

has been moved up by the patch (happens sooner).

2015-04-01 13:21:11

by Ingo Molnar

[permalink] [raw]

Subject: Re: [PATCH 2/9] x86/asm/entry/32: Use PUSH instructions to build pt_regs on stack

* Denys Vlasenko <[email protected]> wrote:

> On 04/01/2015 10:51 AM, Ingo Molnar wrote:
> >
> > * Denys Vlasenko <[email protected]> wrote:
> >
> >> This mimics the recent similar 64-bit change.
> >> Saves ~110 bytes of code.
> >>
> >> Patch was run-tested on 32 and 64 bits, Intel and AMD CPU.
> >> I also looked at the diff of entry_64.o disassembly, to have
> >> a different view of the changes.
> >
> > The other important question would be: what performance difference (if
> > any) did you observe before/after the change?
>
> I did not measure it then.
>
> At the moment I don't have AMD CPUs here, cant benchmark
> 32-bit syscall-based codepath.
>
> On a Sandy Bridge CPU (IOW: sysenter codepath) -
>
> Before: 78.57 ns per getpid
> After: 76.90 ns per getpid
>
> It's better than I thought it would be.
> Probably because this load:
>
> movl ASM_THREAD_INFO(TI_sysenter_return, %rsp, 0), %r10d
>
> has been moved up by the patch (happens sooner).

There's also less I$ used, and in straight, continuous spots, which
should result in less cache misses in the very common "the kernel's
code is cache cold" situation that syscall entry operates under - and
that's not captured by your benchmark.

So it's a good change.

Thanks,

Ingo

2015-04-01 13:55:22

by Borislav Petkov

[permalink] [raw]

Subject: Re: [PATCH 2/9] x86/asm/entry/32: Use PUSH instructions to build pt_regs on stack

On Wed, Apr 01, 2015 at 03:12:50PM +0200, Denys Vlasenko wrote:
> At the moment I don't have AMD CPUs here, cant benchmark
> 32-bit syscall-based codepath.

You could send me your measuring tool - I'll run it on AMD.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--