Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20;
MIME-Version: 1.0
References: <20220609025515.2086253-1-joshdon@google.com> <CAKfTPtCc4NZ51qruKX1zROFHqg-_MQBadaJfVbYPxT=7c4So=Q@mail.gmail.com>
 <CABk29NtX7EZsAqrT8vXd6tgWe2HPRNPM=cWxFSSxBtW1MjFqOA@mail.gmail.com> <CAKfTPtCZyig6MrdS7zAL=4mzhD5KBba74C0UCtvX+7N7BrzSig@mail.gmail.com>
In-Reply-To: <CAKfTPtCZyig6MrdS7zAL=4mzhD5KBba74C0UCtvX+7N7BrzSig@mail.gmail.com>
From:   Josh Don <joshdon@google.com>
Date:   Fri, 10 Jun 2022 11:46:10 -0700
Message-ID: <CABk29NsHOdxuCmfXBRv2ndo+u=GAz6JUvY2C1usPNDUbNtu7TA@mail.gmail.com>
Subject: Re: [PATCH] sched: allow newidle balancing to bail out of load_balance
To:     Vincent Guittot <vincent.guittot@linaro.org>
Cc:     Ingo Molnar <mingo@redhat.com>,
        Peter Zijlstra <peterz@infradead.org>,
        Juri Lelli <juri.lelli@redhat.com>,
        Dietmar Eggemann <dietmar.eggemann@arm.com>,
        Steven Rostedt <rostedt@goodmis.org>,
        Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
        Daniel Bristot de Oliveira <bristot@redhat.com>,
        Valentin Schneider <vschneid@redhat.com>,
        linux-kernel <linux-kernel@vger.kernel.org>
Content-Type: text/plain; charset="UTF-8"
Precedence: bulk

On Fri, Jun 10, 2022 at 1:10 AM Vincent Guittot
<vincent.guittot@linaro.org> wrote:
>
> On Thu, 9 Jun 2022 at 21:40, Josh Don <joshdon@google.com> wrote:
> >
> > Thanks Vincent,
> >
> > On Thu, Jun 9, 2022 at 6:42 AM Vincent Guittot
> > <vincent.guittot@linaro.org> wrote:
> > >
> > > On Thu, 9 Jun 2022 at 04:55, Josh Don <joshdon@google.com> wrote:
> > > >
> > > > While doing newidle load balancing, it is possible for new tasks to
> > > > arrive, such as with pending wakeups. newidle_balance() already accounts
> > > > for this by exiting the sched_domain load_balance() iteration if it
> > > > detects these cases. This is very important for minimizing wakeup
> > > > latency.
> > > >
> > > > However, if we are already in load_balance(), we may stay there for a
> > > > while before returning back to newidle_balance(). This is most
> > > > exacerbated if we enter a 'goto redo' loop in the LBF_ALL_PINNED case. A
> > > > very straightforward workaround to this is to adjust should_we_balance()
> > > > to bail out if we're doing a CPU_NEWLY_IDLE balance and new tasks are
> > > > detected.
> > >
> > > This one is close to the other tests and I wonder if it should be
> > > better placed before taking the busiest rq lock and detaching some
> > > tasks.
> > >
> > > Beside your use case where all other threads can't move in local cpu
> > > and load_balance() loops and clears other cpus, most of the time is
> > > probably spent in fbg() and fbq() so there are more chance that a task
> > > woke in this meantime and I imagine that it becomes useless to take
> > > lock and move tasks from another cpu if the local cpu is no more newly
> > > idle.
> > >
> > > Have you tried other places in load_balance() and does this one
> > > provide the lowest wakeup latency ?
> > >
> > > That being said, the current patch makes sense.
> >
> > I tested with another check after fbg/fbq and there wasn't any
> > noticeable improvement to observed wakeup latency (not totally
> > unexpected, since it only helps for wakeups that come during fbg/fbq).
>
> ok. so IIUC the wakeup has already happened when we start
> load_balance() in your case so the additional test is useless in your
> case

Not necessarily; the wakeup could also happen while we're in the
ALL_PINNED redo loop (this lasts ~100us), but the added check doesn't
meaningfully affect latency for my specific repro.

> > However, I don't think there's any harm in having that extra check in
> > the CPU_NEWLY_IDLE case; might as well avoid bouncing the rq lock if
> > we can. fbq+fbg are together taking ~3-4us per iteration in my repro.
> >
> > If there are no objections I can send a v2 with the added delta:
>
> Would be good to get figures that show some benefits of this
> additional check for some benchmarks
>
> So I think that we can stay with your current proposal for now

Sounds good, thanks for taking a look!

> >
> > @@ -9906,6 +9906,16 @@ static int load_balance(int this_cpu, struct rq *this_rq,
> >                 goto out_balanced;
> >         }
> >
> > +       /*
> > +        * fbg/fbq can take a while. In the newly idle case, recheck whether
> > +        * we should continue with balancing, since it is possible that a
> > +        * task woke up in the interim.
> > +        */
> > +       if (env.idle == CPU_NEWLY_IDLE && !should_we_balance(&env)) {
> > +               *continue_balancing = 0;
> > +               goto out_balanced;
> > +       }
> > +
> >         BUG_ON(busiest == env.dst_rq);
> >
> >         schedstat_add(sd->lb_imbalance[idle], env.imbalance);