Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752783AbdI3HVc (ORCPT ); Sat, 30 Sep 2017 03:21:32 -0400 Received: from mga01.intel.com ([192.55.52.88]:36037 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752767AbdI3HVa (ORCPT ); Sat, 30 Sep 2017 03:21:30 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.42,456,1500966000"; d="scan'208";a="1020089283" From: Aubrey Li To: tglx@linutronix.de, peterz@infradead.org, rjw@rjwysocki.net, len.brown@intel.com, ak@linux.intel.com, tim.c.chen@linux.intel.com Cc: x86@kernel.org, linux-kernel@vger.kernel.org, Aubrey Li Subject: [RFC PATCH v2 0/8] Introduct cpu idle prediction functionality Date: Sat, 30 Sep 2017 15:20:26 +0800 Message-Id: <1506756034-6340-1-git-send-email-aubrey.li@intel.com> X-Mailer: git-send-email 2.7.4 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2301 Lines: 54 We found under some latency intensive workloads, short idle periods occurs very common, then idle entry and exit path starts to dominate, so it's important to optimize them. To determine the short idle pattern, we need to figure out how long of the coming idle and the threshold of the short idle interval. A cpu idle prediction functionality is introduced in this proposal to catch the short idle pattern. Firstly, we check the IRQ timings subsystem, if there is an event coming soon. -- https://lwn.net/Articles/691297/ Secondly, we check the idle statistics of scheduler, if it's likely we'll go into a short idle. -- https://patchwork.kernel.org/patch/2839221/ Thirdly, we predict the next idle interval by using the prediction fucntionality in the idle governor if it has. For the threshold of the short idle interval, we record the timestamps of the idle entry, and multiply by a tunable parameter at here: -- /proc/sys/kernel/fast_idle_ratio We use the output of the idle prediction to skip turning tick off if a short idle is determined in this proposal. Reprogramming hardware timer twice(off and on) is expensive for a very short idle. There are some potential optimizations can be done according to the same indicator. I observed when system is idle, the idle predictor reports 20/s long idle and ZERO fast idle on one CPU. And when the workload is running, the idle predictor reports 72899/s fast idle and ZERO long idle on the same CPU. Aubrey Li (8): cpuidle: menu: extract prediction functionality cpuidle: record the overhead of idle entry cpuidle: add a new predict interface tick/nohz: keep tick on for a fast idle timers: keep sleep length updated as needed cpuidle: make fast idle threshold tunable cpuidle: introduce irq timing to make idle prediction cpuidle: introduce run queue average idle to make idle prediction drivers/cpuidle/Kconfig | 1 + drivers/cpuidle/cpuidle.c | 109 +++++++++++++++++++++++++++++++++++++++ drivers/cpuidle/governors/menu.c | 69 ++++++++++++++++--------- include/linux/cpuidle.h | 21 ++++++++ kernel/sched/idle.c | 14 ++++- kernel/sysctl.c | 12 +++++ kernel/time/tick-sched.c | 7 +++ 7 files changed, 209 insertions(+), 24 deletions(-) -- 2.7.4