2018-01-07 10:19:01

by Willy Tarreau

[permalink] [raw]
Subject: Feedback on 4.9 performance after PTI fixes

Hi,

I managed to take a bit of time to run some more tests on PTI both
native and hosted in KVM, on stable versions built with
CONFIG_PAGE_TABLE_ISOLATION=y. Here it's 4.9.75, used both on the
host and the VM. I could compare pti=on/off both in the host and the
VM. A single CPU was exposed in the VM.

It was running on my laptop (core i7 3320M at 2.6 GHz, 3.3 GHz single
core turbo).

The test was run on haproxy's ability to forward connections. The
results are below :

Host | Guest | conn/s | ratio_to_host | ratio_to_VM | Notes
---------+---------+---------+---------------+--------------+----------------
pti=off | - | 27400 | 100.0% | - | host reference
pti=off | pti=off | 24200 | 88.3% | 100.0% | VM reference
pti=off | pti=on | 13300 | 48.5% | 55.0% |
pti=on | - | 23800 | 86.9% | - | protected host
pti=on | pti=off | 23100 | 84.3% | 95.5% |
pti=on | pti=on | 13300 | 48.5% | 55.0% |

The ratio_to_host column shows the performance relative to the host
with pti=off. The ratio_to_VM column shows the performance relative to
the VM running with pti=off in a host also having pti=off (ie:
performance before upgrading the systems).

On this test we see a few things :
- the performance impact on the native host is around 13%

- the highest performance impact on VMs comes from having PTI on the
guest kernel (-45%). At this point it makes no difference whether
the host kernel has it or not.

- the host kernel's protection has a very limited impact on the guest
system's performance (-4.5%), which is probably nice for some cloud
users who might want to take the risk of turning the protection off
on their VMs.

The impact inside VMs is quite big but it's not where we usuall install
processes sensitive to syscall performance. I could find an even higher
impact on a packet generation program dropping from 2.5 Mpps to 600kpps
in the VM after the fix, but it doesn't make much sense to do this in
VMs so I don't really care.

I have not yet tried the retpoline patches.

Regards,
Willy


2018-01-08 17:08:03

by Yves-Alexis Perez

[permalink] [raw]
Subject: Re: Feedback on 4.9 performance after PTI fixes

On Sun, 2018-01-07 at 11:18 +0100, Willy Tarreau wrote:
> - the highest performance impact on VMs comes from having PTI on the
> guest kernel (-45%). At this point it makes no difference whether
> the host kernel has it or not.

Hi Willy,

out of curiosity, is the pcid/invpcid flags exposed to and used by your guest
CPU? It might very well that the PCID optimisations are not used by the guests
here, and it might be worth either checking on bare metal or with the PCID
optimisations enabled.

Regards,
--
Yves-Alexis


Attachments:
signature.asc (488.00 B)
This is a digitally signed message part

2018-01-08 17:21:46

by Yves-Alexis Perez

[permalink] [raw]
Subject: Re: Feedback on 4.9 performance after PTI fixes

On Mon, 2018-01-08 at 18:07 +0100, Yves-Alexis Perez wrote:
> On Sun, 2018-01-07 at 11:18 +0100, Willy Tarreau wrote:
> > - the highest performance impact on VMs comes from having PTI on the
> > guest kernel (-45%). At this point it makes no difference whether
> > the host kernel has it or not.
>
> Hi Willy,
>
> out of curiosity, is the pcid/invpcid flags exposed to and used by your guest
> CPU? It might very well that the PCID optimisations are not used by the guests
> here, and it might be worth either checking on bare metal or with the PCID
> optimisations enabled.

More details on this: https://groups.google.com/forum/m/#!topic/mechanical-sympathy/L9mHTbeQLNU

Regards,
--
Yves-Alexis


Attachments:
signature.asc (488.00 B)
This is a digitally signed message part

2018-01-08 17:25:26

by David Laight

[permalink] [raw]
Subject: RE: Feedback on 4.9 performance after PTI fixes

From: Willy Tarreau
> Sent: 07 January 2018 10:19
...
> The impact inside VMs is quite big but it's not where we usuall install
> processes sensitive to syscall performance. I could find an even higher
> impact on a packet generation program dropping from 2.5 Mpps to 600kpps
> in the VM after the fix, but it doesn't make much sense to do this in
> VMs so I don't really care.

Why not?
It will be related to the cost of sending (and probably receiving)
network traffic in a VM.
This is something that is done a lot.

Maybe not packet generation, but a UDP/IP benchmark inside a VM would
be sensible.
It may well be that moderate ethernet packet rates cause a massive
performance drop when the host kernel has PTI enabled.

David

2018-01-08 18:26:34

by Willy Tarreau

[permalink] [raw]
Subject: Re: Feedback on 4.9 performance after PTI fixes

Hi Yves-Alexis,

On Mon, Jan 08, 2018 at 06:07:54PM +0100, Yves-Alexis Perez wrote:
> On Sun, 2018-01-07 at 11:18 +0100, Willy Tarreau wrote:
> > - the highest performance impact on VMs comes from having PTI on the
> > guest kernel (-45%). At this point it makes no difference whether
> > the host kernel has it or not.
>
> Hi Willy,
>
> out of curiosity, is the pcid/invpcid flags exposed to and used by your guest
> CPU? It might very well that the PCID optimisations are not used by the guests
> here, and it might be worth either checking on bare metal or with the PCID
> optimisations enabled.

You're totally right, I discovered during my later developments that
indeed PCID is not exposed there. So we take the hit of a full TLB
flush twice per syscall.

Willy

2018-01-08 20:26:22

by Yves-Alexis Perez

[permalink] [raw]
Subject: Re: Feedback on 4.9 performance after PTI fixes

On Mon, 2018-01-08 at 19:26 +0100, Willy Tarreau wrote:
> You're totally right, I discovered during my later developments that
> indeed PCID is not exposed there. So we take the hit of a full TLB
> flush twice per syscall.

So I really think it might make sense to redo the tests with PCID, because the
assumptions you're basing your patch series on might actually not hold.

Regards,
--
Yves-Alexis


Attachments:
signature.asc (488.00 B)
This is a digitally signed message part

2018-01-08 20:39:29

by Willy Tarreau

[permalink] [raw]
Subject: Re: Feedback on 4.9 performance after PTI fixes

On Mon, Jan 08, 2018 at 09:26:10PM +0100, Yves-Alexis Perez wrote:
> On Mon, 2018-01-08 at 19:26 +0100, Willy Tarreau wrote:
> > You're totally right, I discovered during my later developments that
> > indeed PCID is not exposed there. So we take the hit of a full TLB
> > flush twice per syscall.
>
> So I really think it might make sense to redo the tests with PCID, because the
> assumptions you're basing your patch series on might actually not hold.

I'll have to do it on the bare-metal server soon anyway.

Cheers,
Willy

2018-01-09 07:09:52

by Willy Tarreau

[permalink] [raw]
Subject: Re: Feedback on 4.9 performance after PTI fixes

Hi again,

updating the table after Yves-Alexis' comment on PCID. Rerunning the test
with -cpu=Haswell to enable PCID gave me much better numbers :

On Sun, Jan 07, 2018 at 11:18:56AM +0100, Willy Tarreau wrote:
> Hi,
>
> I managed to take a bit of time to run some more tests on PTI both
> native and hosted in KVM, on stable versions built with
> CONFIG_PAGE_TABLE_ISOLATION=y. Here it's 4.9.75, used both on the
> host and the VM. I could compare pti=on/off both in the host and the
> VM. A single CPU was exposed in the VM.
>
> It was running on my laptop (core i7 3320M at 2.6 GHz, 3.3 GHz single
> core turbo).
>
> The test was run on haproxy's ability to forward connections. The
> results are below :
>
> Host | Guest | conn/s | ratio_to_host | ratio_to_VM | Notes
> ---------+---------+---------+---------------+--------------+----------------
> pti=off | - | 27400 | 100.0% | - | host reference
> pti=off | pti=off | 24200 | 88.3% | 100.0% | VM reference
> pti=off | pti=on | 13300 | 48.5% | 55.0% |
> pti=on | - | 23800 | 86.9% | - | protected host
> pti=on | pti=off | 23100 | 84.3% | 95.5% |
> pti=on | pti=on | 13300 | 48.5% | 55.0% |

New run :

Host | Guest | conn/s | ratio | Notes
---------+---------+---------+--------+----------------
pti=off | pti=off | 23100 | 100.0% | VM reference without PTI
pti=off | pti=on | 19700 | 85.2% | VM with PTI and PCID
pti=off | pti=on | 12700 | 55.0% | VM with PTI without PCID

So the performance cut in half was indeed caused by the lack of PCID
here. With it the impact is much less, though still important.

Willy