2021-04-19 02:18:58

by Rik van Riel

[permalink] [raw]
Subject: [PATCH] sched,fair: skip newidle_balance if a wakeup is pending

The try_to_wake_up function has an optimization where it can queue
a task for wakeup on its previous CPU, if the task is still in the
middle of going to sleep inside schedule().

Once schedule() re-enables IRQs, the task will be woken up with an
IPI, and placed back on the runqueue.

If we have such a wakeup pending, there is no need to search other
CPUs for runnable tasks. Just skip (or bail out early from) newidle
balancing, and run the just woken up task.

For a memcache like workload test, this reduces total CPU use by
about 2%, proportionally split between user and system time,
and p99 and p95 application response time by 2-3% on average.
The schedstats run_delay number shows a similar improvement.

Signed-off-by: Rik van Riel <[email protected]>
---
kernel/sched/fair.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 69680158963f..19a92c48939f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7163,6 +7163,14 @@ done: __maybe_unused;
if (!rf)
return NULL;

+ /*
+ * We have a woken up task pending here. No need to search for ones
+ * elsewhere. This task will be enqueued the moment we unblock irqs
+ * upon exiting the scheduler.
+ */
+ if (rq->ttwu_pending)
+ return NULL;
+
new_tasks = newidle_balance(rq, rf);

/*
@@ -10661,7 +10669,8 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
* Stop searching for tasks to pull if there are
* now runnable tasks on this rq.
*/
- if (pulled_task || this_rq->nr_running > 0)
+ if (pulled_task || this_rq->nr_running > 0 ||
+ this_rq->ttwu_pending)
break;
}
rcu_read_unlock();
--
2.25.4



2021-04-19 06:38:19

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH] sched,fair: skip newidle_balance if a wakeup is pending

Hi Rik,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on tip/sched/core]
[also build test ERROR on linux/master linus/master v5.12-rc8 next-20210416]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/0day-ci/linux/commits/Rik-van-Riel/sched-fair-skip-newidle_balance-if-a-wakeup-is-pending/20210419-101843
base: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 9406415f46f6127fd31bb66f0260f7a61a8d2786
config: riscv-randconfig-r025-20210419 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 2b50f5a4343f8fb06acaa5c36355bcf58092c9cd)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install riscv cross compiling tool for clang build
# apt-get install binutils-riscv64-linux-gnu
# https://github.com/0day-ci/linux/commit/3f6b55f5258c8d8d217e8b8408de056a20745824
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Rik-van-Riel/sched-fair-skip-newidle_balance-if-a-wakeup-is-pending/20210419-101843
git checkout 3f6b55f5258c8d8d217e8b8408de056a20745824
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 ARCH=riscv

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>

All errors (new ones prefixed by >>):

In file included from include/linux/tick.h:8:
In file included from include/linux/clockchips.h:14:
In file included from include/linux/clocksource.h:21:
In file included from arch/riscv/include/asm/io.h:149:
include/asm-generic/io.h:564:9: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
return inw(addr);
^~~~~~~~~
arch/riscv/include/asm/io.h:56:76: note: expanded from macro 'inw'
#define inw(c) ({ u16 __v; __io_pbr(); __v = readw_cpu((void*)(PCI_IOBASE + (c))); __io_par(__v); __v; })
~~~~~~~~~~ ^
arch/riscv/include/asm/mmio.h:88:76: note: expanded from macro 'readw_cpu'
#define readw_cpu(c) ({ u16 __r = le16_to_cpu((__force __le16)__raw_readw(c)); __r; })
^
include/uapi/linux/byteorder/little_endian.h:36:51: note: expanded from macro '__le16_to_cpu'
#define __le16_to_cpu(x) ((__force __u16)(__le16)(x))
^
In file included from kernel/sched/fair.c:23:
In file included from kernel/sched/sched.h:17:
In file included from include/linux/sched/isolation.h:6:
In file included from include/linux/tick.h:8:
In file included from include/linux/clockchips.h:14:
In file included from include/linux/clocksource.h:21:
In file included from arch/riscv/include/asm/io.h:149:
include/asm-generic/io.h:572:9: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
return inl(addr);
^~~~~~~~~
arch/riscv/include/asm/io.h:57:76: note: expanded from macro 'inl'
#define inl(c) ({ u32 __v; __io_pbr(); __v = readl_cpu((void*)(PCI_IOBASE + (c))); __io_par(__v); __v; })
~~~~~~~~~~ ^
arch/riscv/include/asm/mmio.h:89:76: note: expanded from macro 'readl_cpu'
#define readl_cpu(c) ({ u32 __r = le32_to_cpu((__force __le32)__raw_readl(c)); __r; })
^
include/uapi/linux/byteorder/little_endian.h:34:51: note: expanded from macro '__le32_to_cpu'
#define __le32_to_cpu(x) ((__force __u32)(__le32)(x))
^
In file included from kernel/sched/fair.c:23:
In file included from kernel/sched/sched.h:17:
In file included from include/linux/sched/isolation.h:6:
In file included from include/linux/tick.h:8:
In file included from include/linux/clockchips.h:14:
In file included from include/linux/clocksource.h:21:
In file included from arch/riscv/include/asm/io.h:149:
include/asm-generic/io.h:580:2: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
outb(value, addr);
^~~~~~~~~~~~~~~~~
arch/riscv/include/asm/io.h:59:68: note: expanded from macro 'outb'
#define outb(v,c) ({ __io_pbw(); writeb_cpu((v),(void*)(PCI_IOBASE + (c))); __io_paw(); })
~~~~~~~~~~ ^
arch/riscv/include/asm/mmio.h:91:52: note: expanded from macro 'writeb_cpu'
#define writeb_cpu(v, c) ((void)__raw_writeb((v), (c)))
^
In file included from kernel/sched/fair.c:23:
In file included from kernel/sched/sched.h:17:
In file included from include/linux/sched/isolation.h:6:
In file included from include/linux/tick.h:8:
In file included from include/linux/clockchips.h:14:
In file included from include/linux/clocksource.h:21:
In file included from arch/riscv/include/asm/io.h:149:
include/asm-generic/io.h:588:2: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
outw(value, addr);
^~~~~~~~~~~~~~~~~
arch/riscv/include/asm/io.h:60:68: note: expanded from macro 'outw'
#define outw(v,c) ({ __io_pbw(); writew_cpu((v),(void*)(PCI_IOBASE + (c))); __io_paw(); })
~~~~~~~~~~ ^
arch/riscv/include/asm/mmio.h:92:76: note: expanded from macro 'writew_cpu'
#define writew_cpu(v, c) ((void)__raw_writew((__force u16)cpu_to_le16(v), (c)))
^
In file included from kernel/sched/fair.c:23:
In file included from kernel/sched/sched.h:17:
In file included from include/linux/sched/isolation.h:6:
In file included from include/linux/tick.h:8:
In file included from include/linux/clockchips.h:14:
In file included from include/linux/clocksource.h:21:
In file included from arch/riscv/include/asm/io.h:149:
include/asm-generic/io.h:596:2: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
outl(value, addr);
^~~~~~~~~~~~~~~~~
arch/riscv/include/asm/io.h:61:68: note: expanded from macro 'outl'
#define outl(v,c) ({ __io_pbw(); writel_cpu((v),(void*)(PCI_IOBASE + (c))); __io_paw(); })
~~~~~~~~~~ ^
arch/riscv/include/asm/mmio.h:93:76: note: expanded from macro 'writel_cpu'
#define writel_cpu(v, c) ((void)__raw_writel((__force u32)cpu_to_le32(v), (c)))
^
In file included from kernel/sched/fair.c:23:
In file included from kernel/sched/sched.h:17:
In file included from include/linux/sched/isolation.h:6:
In file included from include/linux/tick.h:8:
In file included from include/linux/clockchips.h:14:
In file included from include/linux/clocksource.h:21:
In file included from arch/riscv/include/asm/io.h:149:
include/asm-generic/io.h:1005:55: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
return (port > MMIO_UPPER_LIMIT) ? NULL : PCI_IOBASE + port;
~~~~~~~~~~ ^
kernel/sched/fair.c:637:5: warning: no previous prototype for function 'sched_update_scaling' [-Wmissing-prototypes]
int sched_update_scaling(void)
^
kernel/sched/fair.c:637:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
int sched_update_scaling(void)
^
static
>> kernel/sched/fair.c:7208:10: error: no member named 'ttwu_pending' in 'struct rq'
if (rq->ttwu_pending)
~~ ^
8 warnings and 1 error generated.


vim +7208 kernel/sched/fair.c

7191
7192 if (hrtick_enabled_fair(rq))
7193 hrtick_start_fair(rq, p);
7194
7195 update_misfit_status(p, rq);
7196
7197 return p;
7198
7199 idle:
7200 if (!rf)
7201 return NULL;
7202
7203 /*
7204 * We have a woken up task pending here. No need to search for ones
7205 * elsewhere. This task will be enqueued the moment we unblock irqs
7206 * upon exiting the scheduler.
7207 */
> 7208 if (rq->ttwu_pending)
7209 return NULL;
7210
7211 new_tasks = newidle_balance(rq, rf);
7212
7213 /*
7214 * Because newidle_balance() releases (and re-acquires) rq->lock, it is
7215 * possible for any higher priority task to appear. In that case we
7216 * must re-start the pick_next_entity() loop.
7217 */
7218 if (new_tasks < 0)
7219 return RETRY_TASK;
7220
7221 if (new_tasks > 0)
7222 goto again;
7223
7224 /*
7225 * rq is about to be idle, check if we need to update the
7226 * lost_idle_time of clock_pelt
7227 */
7228 update_idle_rq_clock_pelt(rq);
7229
7230 return NULL;
7231 }
7232

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
(No filename) (9.81 kB)
.config.gz (26.87 kB)
Download all attachments

2021-04-19 11:26:05

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched,fair: skip newidle_balance if a wakeup is pending

On Sun, Apr 18, 2021 at 10:17:51PM -0400, Rik van Riel wrote:
> The try_to_wake_up function has an optimization where it can queue
> a task for wakeup on its previous CPU, if the task is still in the
> middle of going to sleep inside schedule().
>
> Once schedule() re-enables IRQs, the task will be woken up with an
> IPI, and placed back on the runqueue.
>
> If we have such a wakeup pending, there is no need to search other
> CPUs for runnable tasks. Just skip (or bail out early from) newidle
> balancing, and run the just woken up task.
>
> For a memcache like workload test, this reduces total CPU use by
> about 2%, proportionally split between user and system time,
> and p99 and p95 application response time by 2-3% on average.
> The schedstats run_delay number shows a similar improvement.

Nice.

> Signed-off-by: Rik van Riel <[email protected]>
> ---
> kernel/sched/fair.c | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 69680158963f..19a92c48939f 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7163,6 +7163,14 @@ done: __maybe_unused;
> if (!rf)
> return NULL;
>
> + /*
> + * We have a woken up task pending here. No need to search for ones
> + * elsewhere. This task will be enqueued the moment we unblock irqs
> + * upon exiting the scheduler.
> + */
> + if (rq->ttwu_pending)
> + return NULL;

As reported by the robot, that needs an CONFIG_SMP guard of sorts,
#ifdef might work I suppose.

> new_tasks = newidle_balance(rq, rf);
>
> /*
> @@ -10661,7 +10669,8 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
> * Stop searching for tasks to pull if there are
> * now runnable tasks on this rq.
> */
> - if (pulled_task || this_rq->nr_running > 0)
> + if (pulled_task || this_rq->nr_running > 0 ||
> + this_rq->ttwu_pending)

Either cino=(0:0 or just bust the limit and make it 84 chars.

2021-04-19 12:13:38

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH] sched,fair: skip newidle_balance if a wakeup is pending

On Mon, 19 Apr 2021 at 04:18, Rik van Riel <[email protected]> wrote:
>
> The try_to_wake_up function has an optimization where it can queue
> a task for wakeup on its previous CPU, if the task is still in the
> middle of going to sleep inside schedule().
>
> Once schedule() re-enables IRQs, the task will be woken up with an
> IPI, and placed back on the runqueue.
>
> If we have such a wakeup pending, there is no need to search other
> CPUs for runnable tasks. Just skip (or bail out early from) newidle
> balancing, and run the just woken up task.
>
> For a memcache like workload test, this reduces total CPU use by
> about 2%, proportionally split between user and system time,
> and p99 and p95 application response time by 2-3% on average.
> The schedstats run_delay number shows a similar improvement.
>
> Signed-off-by: Rik van Riel <[email protected]>
> ---
> kernel/sched/fair.c | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 69680158963f..19a92c48939f 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7163,6 +7163,14 @@ done: __maybe_unused;
> if (!rf)
> return NULL;
>
> + /*
> + * We have a woken up task pending here. No need to search for ones
> + * elsewhere. This task will be enqueued the moment we unblock irqs
> + * upon exiting the scheduler.
> + */
> + if (rq->ttwu_pending)
> + return NULL;

Would it be better to put this check at the beg of newidle_balance() ?
If prev is not a cfs task, we never reach this point but instead use the path:
class->balance => balance_fair => newidle_balance

and we will not check for rq->ttwu_pending

> +
> new_tasks = newidle_balance(rq, rf);
>
> /*
> @@ -10661,7 +10669,8 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
> * Stop searching for tasks to pull if there are
> * now runnable tasks on this rq.
> */
> - if (pulled_task || this_rq->nr_running > 0)
> + if (pulled_task || this_rq->nr_running > 0 ||
> + this_rq->ttwu_pending)
> break;
> }
> rcu_read_unlock();
> --
> 2.25.4
>
>

2021-04-19 13:52:35

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH] sched,fair: skip newidle_balance if a wakeup is pending

On 18/04/21 22:17, Rik van Riel wrote:
> @@ -10661,7 +10669,8 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
> * Stop searching for tasks to pull if there are
> * now runnable tasks on this rq.
> */
> - if (pulled_task || this_rq->nr_running > 0)
> + if (pulled_task || this_rq->nr_running > 0 ||
> + this_rq->ttwu_pending)
> break;

I thought newidle_balance() would already handle this by checking
idle_cpu(), but that can't work due to rq->curr never being rq->idle here
(we're trying very hard to prevent this!).

Would there be any point in bunching up these two checks from idle_cpu()
into an inline helper and reusing it here?

_nohz_idle_balance() "accidentally" already checks for that, but at the
head of the domain loop. What's the reason for doing it at the tail here?
It means we try at least one loop, but is there a point in that?

> }
> rcu_read_unlock();
> --
> 2.25.4

2021-04-19 20:14:09

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH] sched,fair: skip newidle_balance if a wakeup is pending

On Mon, 2021-04-19 at 12:22 +0100, Valentin Schneider wrote:
> On 18/04/21 22:17, Rik van Riel wrote:
> > @@ -10661,7 +10669,8 @@ static int newidle_balance(struct rq
> > *this_rq, struct rq_flags *rf)
> > * Stop searching for tasks to pull if there are
> > * now runnable tasks on this rq.
> > */
> > - if (pulled_task || this_rq->nr_running > 0)
> > + if (pulled_task || this_rq->nr_running > 0 ||
> > + this_rq->ttwu_pending)
> > break;
>
> I thought newidle_balance() would already handle this by checking
> idle_cpu(), but that can't work due to rq->curr never being rq->idle
> here
> (we're trying very hard to prevent this!).
>
> Would there be any point in bunching up these two checks from
> idle_cpu()
> into an inline helper and reusing it here?

I'm not sure, all the sched classes seem to have their own
magic in the balance functions, and there are enough special
situations out there that we might only be able to consolidate
a few callsites, not the rest.

Also, it turns out newidle_balance needs a different return
value depending on whether we have ->ttwu_pending, a pulled
task, or a queued realtime task...

I'll send v2 without any considation here, since I cannot
figure out a way to make things better here :)

--
All Rights Reversed.


Attachments:
signature.asc (499.00 B)
This is a digitally signed message part