2023-11-02 15:49:57

by Parshuram Sangle

[permalink] [raw]
Subject: [PATCH 0/2] KVM: enable halt poll shrink parameter

KVM halt polling interval growth and shrink behavior has evolved since its
inception. The current mechanism adjusts the polling interval based on whether
vcpu wakeup was received or not during polling interval using grow and shrink
parameter values. Though grow parameter is logically set to 2 by default,
shrink parameter is kept disabled (set to 0).

Disabled shrink has two issues:
1) Resets polling interval to 0 on every un-successful poll assuming it is
less likely to receive a vcpu wakeup in further shrunk intervals.
2) Even on successful poll, if total block time is greater or equal to current
poll_ns value, polling interval is reset to 0 instead shrinking gradually.

These aspects reduce the chances receiving valid wakeup during polling and
lose potential performance benefits for VM workloads.

Below is the summary of experiments conducted to assess performance and power
impact by enabling the halt_poll_ns_shrink parameter(value set to 2).

Performance Test Summary: (Higher is better)
--------------------------------------------
Platform Details: Chrome Brya platform
CPU - Alder Lake (12th Gen Intel CPU i7-1255U)
Host kernel version - 5.15.127-20371-g710a1611ad33

Android VM workload (Score) Base Shrink Enabled (value 2) Delta
---------------------------------------------------------------------------
GeekBench Multi-core(CPU) 5754 5856 2%
3D Mark Slingshot(CPU+GPU) 15486 15885 3%
Stream (handopt)(Memory) 20566 21594 5%
fio seq-read (Storage) 727 747 3%
fio seq-write (Storage) 331 343 3%
fio rand-read (Storage) 690 732 6%
fio rand-write (Storage) 299 300 1%

Steam Gaming VM (Avg FPS) Base Shrink Enabled (value 2) Delta
---------------------------------------------------------------------------
Metro Redux (OpenGL) 54.80 59.60 9%
Dota 2 (Open GL) 48.74 51.40 5%
Dota 2 (Vulkan) 20.80 21.10 1%
SpaceShip (Vulkan) 20.40 21.52 6%

With Shrink enabled, majority of workloads show higher % of successful polling.
Reduced latency of returning control back to VM and avoided overhead of vm_exit
contribute to these performance gains.

Power Impact Assessment Summary: (Lower is better)
--------------------------------------------------
Method : DAQ measurements of CPU and Memory rails

CPU+Memory (Watt) Base Shrink Enabled (value 2) Delta
---------------------------------------------------------------------------
Idle* (Host) 0.636 0.631 -0.8%
Video Playback (Host) 2.225 2.210 -0.7%
Tomb Raider (VM) 17.261 17.175 -0.5%
SpaceShip Benchmark(VM) 17.079 17.123 0.3%

*Idle power - Idle system with no application running, Android and Borealis
VMs enabled running no workload. Duration 180 sec.

Power measurements done for Chrome idle scenario and active Gaming VM
workload show negligible power overhead since additional polling creates
very short duration bursts which are less likely to have gone to a
complete idle CPU state.

NOTE: No tests are conducted on non-x86 platform with this changed config

The default values of grow and shrink parameters get commonly used by
various VM deployments unless specifically tuned for performance. Hence
referring to performance and power measurements results shown above, it is
recommended to have shrink enabled (with value 2) by default so that there
is no need to explicitly set this parameter through kernel cmdline or by
other means.

Parshuram Sangle (2):
KVM: enable halt polling shrink parameter by default
KVM: documentation update to halt polling

Documentation/virt/kvm/halt-polling.rst | 26 +++++++++++++------------
virt/kvm/kvm_main.c | 4 ++--
2 files changed, 16 insertions(+), 14 deletions(-)


base-commit: 2b3f2325e71f09098723727d665e2e8003d455dc
--
2.17.1


2023-12-11 15:28:23

by Parshuram Sangle

[permalink] [raw]
Subject: Re: [PATCH 0/2] KVM: enable halt poll shrink parameter

Soft reminder for patch review

2024-05-03 21:49:00

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 0/2] KVM: enable halt poll shrink parameter

On Thu, Nov 02, 2023, Parshuram Sangle wrote:
> KVM halt polling interval growth and shrink behavior has evolved since its
> inception. The current mechanism adjusts the polling interval based on whether
> vcpu wakeup was received or not during polling interval using grow and shrink
> parameter values. Though grow parameter is logically set to 2 by default,
> shrink parameter is kept disabled (set to 0).
>
> Disabled shrink has two issues:
> 1) Resets polling interval to 0 on every un-successful poll assuming it is
> less likely to receive a vcpu wakeup in further shrunk intervals.
> 2) Even on successful poll, if total block time is greater or equal to current
> poll_ns value, polling interval is reset to 0 instead shrinking gradually.
>
> These aspects reduce the chances receiving valid wakeup during polling and
> lose potential performance benefits for VM workloads.
>
> Below is the summary of experiments conducted to assess performance and power
> impact by enabling the halt_poll_ns_shrink parameter(value set to 2).
>
> Performance Test Summary: (Higher is better)
> --------------------------------------------
> Platform Details: Chrome Brya platform
> CPU - Alder Lake (12th Gen Intel CPU i7-1255U)
> Host kernel version - 5.15.127-20371-g710a1611ad33
>
> Android VM workload (Score) Base Shrink Enabled (value 2) Delta
> ---------------------------------------------------------------------------
> GeekBench Multi-core(CPU) 5754 5856 2%
> 3D Mark Slingshot(CPU+GPU) 15486 15885 3%
> Stream (handopt)(Memory) 20566 21594 5%
> fio seq-read (Storage) 727 747 3%
> fio seq-write (Storage) 331 343 3%
> fio rand-read (Storage) 690 732 6%
> fio rand-write (Storage) 299 300 1%
>
> Steam Gaming VM (Avg FPS) Base Shrink Enabled (value 2) Delta
> ---------------------------------------------------------------------------
> Metro Redux (OpenGL) 54.80 59.60 9%
> Dota 2 (Open GL) 48.74 51.40 5%
> Dota 2 (Vulkan) 20.80 21.10 1%
> SpaceShip (Vulkan) 20.40 21.52 6%
>
> With Shrink enabled, majority of workloads show higher % of successful polling.
> Reduced latency of returning control back to VM and avoided overhead of vm_exit
> contribute to these performance gains.
>
> Power Impact Assessment Summary: (Lower is better)
> --------------------------------------------------
> Method : DAQ measurements of CPU and Memory rails
>
> CPU+Memory (Watt) Base Shrink Enabled (value 2) Delta
> ---------------------------------------------------------------------------
> Idle* (Host) 0.636 0.631 -0.8%
> Video Playback (Host) 2.225 2.210 -0.7%
> Tomb Raider (VM) 17.261 17.175 -0.5%
> SpaceShip Benchmark(VM) 17.079 17.123 0.3%
>
> *Idle power - Idle system with no application running, Android and Borealis
> VMs enabled running no workload. Duration 180 sec.
>
> Power measurements done for Chrome idle scenario and active Gaming VM
> workload show negligible power overhead since additional polling creates
> very short duration bursts which are less likely to have gone to a
> complete idle CPU state.
>
> NOTE: No tests are conducted on non-x86 platform with this changed config
>
> The default values of grow and shrink parameters get commonly used by
> various VM deployments unless specifically tuned for performance. Hence
> referring to performance and power measurements results shown above, it is
> recommended to have shrink enabled (with value 2) by default so that there
> is no need to explicitly set this parameter through kernel cmdline or by
> other means.

I am by no means an expert on halt polling or power management, but all of this
seems like a reasonable tradeoff. And even without the numbers you provided,
starting from scratch after a single failure is rather odd.

So unless someone objects, I'll plan on applying this for 6.11 in a few weeks
(after the 6.10 merge window closes).