2023-09-13 14:23:05

by Suleiman Souhlal

[permalink] [raw]
Subject: NOHZ interaction between IPI-less kick_ilb() and nohz_csd_func().

Hello,

I noticed that on x86 machines that have MWAIT, with NOHZ, when the
kernel decides to kick the idle load balance on another CPU in
kick_ilb(), there's an optimization that makes it avoid using an IPI
and instead exploit the fact that the remote CPU is MWAITing on the
thread_info flags, by just setting TIF_NEED_RESCHED, in
call_function_single_prep_ipi().
However, on the remote CPU, in nohz_csd_func(), we end up not raising
the sched softirq due to NEED_RESCHED being set, so the ILB doesn't
end up getting done.

Is this intended?

Thanks,
-- Suleiman


2023-09-13 16:44:37

by Suleiman Souhlal

[permalink] [raw]
Subject: Re: NOHZ interaction between IPI-less kick_ilb() and nohz_csd_func().

(I forgot to also add Vincent...)

On Wed, Sep 13, 2023 at 9:49 PM Suleiman Souhlal <[email protected]> wrote:
>
> Hello,
>
> I noticed that on x86 machines that have MWAIT, with NOHZ, when the
> kernel decides to kick the idle load balance on another CPU in
> kick_ilb(), there's an optimization that makes it avoid using an IPI
> and instead exploit the fact that the remote CPU is MWAITing on the
> thread_info flags, by just setting TIF_NEED_RESCHED, in
> call_function_single_prep_ipi().
> However, on the remote CPU, in nohz_csd_func(), we end up not raising
> the sched softirq due to NEED_RESCHED being set, so the ILB doesn't
> end up getting done.
>
> Is this intended?
>
> Thanks,
> -- Suleiman

2023-10-04 16:10:19

by Joel Fernandes

[permalink] [raw]
Subject: Re: NOHZ interaction between IPI-less kick_ilb() and nohz_csd_func().

+Frederic Weisbecker

On Wed, Sep 13, 2023 at 10:32 AM Suleiman Souhlal <[email protected]> wrote:
>
> (I forgot to also add Vincent...)
>
> On Wed, Sep 13, 2023 at 9:49 PM Suleiman Souhlal <[email protected]> wrote:
> >
> > Hello,
> >
> > I noticed that on x86 machines that have MWAIT, with NOHZ, when the
> > kernel decides to kick the idle load balance on another CPU in
> > kick_ilb(), there's an optimization that makes it avoid using an IPI
> > and instead exploit the fact that the remote CPU is MWAITing on the
> > thread_info flags, by just setting TIF_NEED_RESCHED, in
> > call_function_single_prep_ipi().
> > However, on the remote CPU, in nohz_csd_func(), we end up not raising
> > the sched softirq due to NEED_RESCHED being set, so the ILB doesn't
> > end up getting done.
> >
> > Is this intended?
> >
> > Thanks,
> > -- Suleiman

2023-10-04 16:17:25

by Joel Fernandes

[permalink] [raw]
Subject: Re: NOHZ interaction between IPI-less kick_ilb() and nohz_csd_func().

On Wed, Oct 4, 2023 at 12:09 PM Joel Fernandes <[email protected]> wrote:
>
> +Frederic Weisbecker
>
> On Wed, Sep 13, 2023 at 10:32 AM Suleiman Souhlal <[email protected]> wrote:
> >
> > (I forgot to also add Vincent...)
> >
> > On Wed, Sep 13, 2023 at 9:49 PM Suleiman Souhlal <[email protected]> wrote:
> > >
> > > Hello,
> > >
> > > I noticed that on x86 machines that have MWAIT, with NOHZ, when the
> > > kernel decides to kick the idle load balance on another CPU in
> > > kick_ilb(), there's an optimization that makes it avoid using an IPI
> > > and instead exploit the fact that the remote CPU is MWAITing on the
> > > thread_info flags, by just setting TIF_NEED_RESCHED, in
> > > call_function_single_prep_ipi().
> > > However, on the remote CPU, in nohz_csd_func(), we end up not raising
> > > the sched softirq due to NEED_RESCHED being set, so the ILB doesn't
> > > end up getting done.
> > >
> > > Is this intended?

Just thinking out loud I was wondering how nohz-ILB really matters if
based on what Suleiman is saying - it is not even triggering on x86
due to the mwait optimization. And if it does matter, how much
improvement will fixing this bug give. I think at least on ARM, I
remember it matters.

I am meanwhile looking at it more closely...

thanks,

- Joel