From: Yang Zhang <yang.zhang.wz@gmail.com>
To: linux-kernel@vger.kernel.org
Cc: kvm@vger.kernel.org, wanpeng.li@hotmail.com, mst@redhat.com,
        pbonzini@redhat.com, tglx@linutronix.de, rkrcmar@redhat.com,
        dmatlack@google.com, agraf@suse.de, peterz@infradead.org,
        linux-doc@vger.kernel.org, Yang Zhang <yang.zhang.wz@gmail.com>
Subject: [RFC PATCH v2 0/7] x86/idle: add halt poll support
Date: Tue, 29 Aug 2017 11:46:34 +0000
Message-Id: <1504007201-12904-1-git-send-email-yang.zhang.wz@gmail.com>
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 3361
Lines: 88

Some latency-intensive workload will see obviously performance 
drop when running inside VM. The main reason is that the overhead 
is amplified when running inside VM. The most cost i have seen is 
inside idle path. 

This patch introduces a new mechanism to poll for a while before 
entering idle state. If schedule is needed during poll, then we 
don't need to goes through the heavy overhead path. 

Here is the data we get when running benchmark contextswitch to measure
the latency(lower is better):

   1. w/o patch:
      2493.14 ns/ctxsw -- 200.3 %CPU
   
   2. w/ patch:
      halt_poll_threshold=10000 -- 1485.96ns/ctxsw -- 201.0 %CPU
      halt_poll_threshold=20000 -- 1391.26 ns/ctxsw -- 200.7 %CPU
      halt_poll_threshold=30000 -- 1488.55 ns/ctxsw -- 200.1 %CPU
      halt_poll_threshold=500000 -- 1159.14 ns/ctxsw -- 201.5 %CPU
   
   3. kvm dynamic poll
      halt_poll_ns=10000 -- 2296.11 ns/ctxsw -- 201.2 %CPU
      halt_poll_ns=20000 -- 2599.7 ns/ctxsw -- 201.7 %CPU
      halt_poll_ns=30000 -- 2588.68 ns/ctxsw -- 211.6 %CPU
      halt_poll_ns=500000 -- 2423.20 ns/ctxsw -- 229.2 %CPU
   
   4. idle=poll
      2050.1 ns/ctxsw -- 1003 %CPU
   
   5. idle=mwait
      2188.06 ns/ctxsw -- 206.3 %CPU

Here is the data we get when running benchmark netperf:

   1. w/o patch:
      14556.8 bits/s  -- 144.2 %CPU

   2. w/ patch:
      halt_poll_threshold=10000 -- 15803.89 bits/s -- 159.5 %CPU
      halt_poll_threshold=20000 -- 15899.04 bits/s -- 161.5 %CPU
      halt_poll_threshold=30000 -- 15642.38 bits/s -- 161.8 %CPU
      halt_poll_threshold=40000 -- 18040.76 bits/s -- 184.0 %CPU
      halt_poll_threshold=50000 -- 18877.61 bits/s -- 197.3 %CPU

   3. kvm dynamic poll
      halt_poll_ns=10000 -- 15876.00 bits/s -- 172.2 %CPU
      halt_poll_ns=20000 -- 15602.58 bits/s -- 185.4 %CPU
      halt_poll_ns=30000 -- 15930.69 bits/s -- 194.4 %CPU
      halt_poll_ns=40000 -- 16413.09 bits/s -- 195.3 %CPU
      halt_poll_ns=50000 -- 16417.42 bits/s -- 196.3 %CPU

   4. idle=poll in guest
      18441.3bit/s -- 1003 %CPU

   5. idle=mwait in guest
      15760.6  bits/s  -- 157.6 %CPU

V1 -> V2:
- integrate the smart halt poll into paravirt code
- use idle_stamp instead of check_poll
- since it hard to get whether vcpu is the only task in pcpu, so we
  don't consider it in this series.(May improve it in future)

Yang Zhang (7):
  x86/paravirt: Add pv_idle_ops to paravirt ops
  KVM guest: register kvm_idle_poll for pv_idle_ops
  sched/idle: Add poll before enter real idle path
  x86/paravirt: Add update in x86/paravirt pv_idle_ops
  Documentation: Add three sysctls for smart idle poll
  KVM guest: introduce smart idle poll algorithm
  sched/idle: update poll time when wakeup from idle

 Documentation/sysctl/kernel.txt       | 25 +++++++++++++
 arch/x86/include/asm/paravirt.h       | 10 ++++++
 arch/x86/include/asm/paravirt_types.h |  7 ++++
 arch/x86/kernel/kvm.c                 | 67 +++++++++++++++++++++++++++++++++++
 arch/x86/kernel/paravirt.c            | 11 ++++++
 arch/x86/kernel/process.c             |  7 ++++
 include/linux/kernel.h                |  6 ++++
 include/linux/sched/idle.h            |  4 +++
 kernel/sched/core.c                   |  4 +++
 kernel/sched/idle.c                   |  9 +++++
 kernel/sysctl.c                       | 23 ++++++++++++
 11 files changed, 173 insertions(+)

-- 
1.8.3.1