2015-05-14 14:42:01

by Doug Smythies

[permalink] [raw]
Subject: On resume from suspend only CPU 0 comes back on-line [REGRESSION][BISECTED]

As of, or about, Kernel 4.1RC1 on resume from suspend only CPU 0 comes back on-line.
The issue persists through Kernel 4.1RC3.
This is on my test computer with an i7-2600K.
I do not normally use suspend on this computer, but was doing so while working on a bug report.

The kernel was bisected, and this is the result:

3c18d447b3b36a8d3c90dc37dfbd363cdb685d0a is the first bad commit
commit 3c18d447b3b36a8d3c90dc37dfbd363cdb685d0a
Author: Juri Lelli <[email protected]>
Date: Tue Mar 31 09:53:37 2015 +0100

sched/core: Check for available DL bandwidth in cpuset_cpu_inactive()

Hotplug operations are destructive w.r.t. cpusets. In case such an
operation is performed on a CPU belonging to an exlusive cpuset, the
DL bandwidth information associated with the corresponding root
domain is gone even if the operation fails (in sched_cpu_inactive()).

For this reason we need to move the check we currently have in
sched_cpu_inactive() to cpuset_cpu_inactive() to prevent useless
cpusets reconfiguration in the CPU_DOWN_FAILED path.

Signed-off-by: Juri Lelli <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Juri Lelli <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>

:040000 040000 10f8d81afdc8e625f8e6720883d3eb42c28d452b c08264528890941bad35d5d4cc134c03f259c534 M kernel

Since I sometimes mess up using git bisect, and end up at some random result,
the above was double checked manually:

3c18d447b3b36a8d3c90dc37dfbd363cdb685d0a has the issue.
4cd57f97135840f637431c92380c8da3edbe44ed (the previous commit) does not have the issue.


2015-05-14 15:01:46

by Juri Lelli

[permalink] [raw]
Subject: Re: On resume from suspend only CPU 0 comes back on-line [REGRESSION][BISECTED]

Hi Doug,

On 14/05/15 15:41, Doug Smythies wrote:
> As of, or about, Kernel 4.1RC1 on resume from suspend only CPU 0 comes back on-line.
> The issue persists through Kernel 4.1RC3.
> This is on my test computer with an i7-2600K.
> I do not normally use suspend on this computer, but was doing so while working on a bug report.
>
> The kernel was bisected, and this is the result:
>

Does commit 533445c6e533 "sched/core: Fix regression in
cpuset_cpu_inactive() for suspend" on tip/sched/core
fix the bug?

Thanks,

- Juri

> 3c18d447b3b36a8d3c90dc37dfbd363cdb685d0a is the first bad commit
> commit 3c18d447b3b36a8d3c90dc37dfbd363cdb685d0a
> Author: Juri Lelli <[email protected]>
> Date: Tue Mar 31 09:53:37 2015 +0100
>
> sched/core: Check for available DL bandwidth in cpuset_cpu_inactive()
>
> Hotplug operations are destructive w.r.t. cpusets. In case such an
> operation is performed on a CPU belonging to an exlusive cpuset, the
> DL bandwidth information associated with the corresponding root
> domain is gone even if the operation fails (in sched_cpu_inactive()).
>
> For this reason we need to move the check we currently have in
> sched_cpu_inactive() to cpuset_cpu_inactive() to prevent useless
> cpusets reconfiguration in the CPU_DOWN_FAILED path.
>
> Signed-off-by: Juri Lelli <[email protected]>
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> Cc: Juri Lelli <[email protected]>
> Link: http://lkml.kernel.org/r/[email protected]
> Signed-off-by: Ingo Molnar <[email protected]>
>
> :040000 040000 10f8d81afdc8e625f8e6720883d3eb42c28d452b c08264528890941bad35d5d4cc134c03f259c534 M kernel
>
> Since I sometimes mess up using git bisect, and end up at some random result,
> the above was double checked manually:
>
> 3c18d447b3b36a8d3c90dc37dfbd363cdb685d0a has the issue.
> 4cd57f97135840f637431c92380c8da3edbe44ed (the previous commit) does not have the issue.
>
>

2015-05-14 17:48:12

by Ingo Molnar

[permalink] [raw]
Subject: Re: On resume from suspend only CPU 0 comes back on-line [REGRESSION][BISECTED]


* Juri Lelli <[email protected]> wrote:

> Hi Doug,
>
> On 14/05/15 15:41, Doug Smythies wrote:
> > As of, or about, Kernel 4.1RC1 on resume from suspend only CPU 0 comes back on-line.
> > The issue persists through Kernel 4.1RC3.
> > This is on my test computer with an i7-2600K.
> > I do not normally use suspend on this computer, but was doing so while working on a bug report.
> >
> > The kernel was bisected, and this is the result:
> >
>
> Does commit 533445c6e533 "sched/core: Fix regression in
> cpuset_cpu_inactive() for suspend" on tip/sched/core
> fix the bug?

That would be sched/urgent primarily, not sched/core:

git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/urgent

I'll get this fix to Linus ASAP.

Thanks,

Ingo

2015-05-14 22:12:32

by Doug Smythies

[permalink] [raw]
Subject: RE: On resume from suspend only CPU 0 comes back on-line [REGRESSION][BISECTED]

On 2015.05.14 10:48 Ingo Molnar wrote:

> * Juri Lelli <[email protected]> wrote:
>>> On 14/05/15 15:41, Doug Smythies wrote:
>>> As of, or about, Kernel 4.1RC1 on resume from suspend only CPU 0 comes back on-line.
>>> The issue persists through Kernel 4.1RC3.
>>> This is on my test computer with an i7-2600K.
>>> I do not normally use suspend on this computer, but was doing so while working on a bug report.
>>>
>>> The kernel was bisected, and this is the result:
>>>
>>
>> Does commit 533445c6e533 "sched/core: Fix regression in
>> cpuset_cpu_inactive() for suspend" on tip/sched/core
>> fix the bug?

> That would be sched/urgent primarily, not sched/core:
> git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/urgent
> I'll get this fix to Linus ASAP.

I could not seem to figure out how to get the patch from tip,
and any version I could find on internet would not apply properly.

Juri kindly sent me the patch.

With the patch applied to an otherwise unmodified Kernel 4.1RC3,
Resume from suspend is working fine again.

Thanks.

2015-05-15 06:51:54

by Ingo Molnar

[permalink] [raw]
Subject: Re: On resume from suspend only CPU 0 comes back on-line [REGRESSION][BISECTED]


* Doug Smythies <[email protected]> wrote:

> On 2015.05.14 10:48 Ingo Molnar wrote:
>
> > * Juri Lelli <[email protected]> wrote:
> >>> On 14/05/15 15:41, Doug Smythies wrote:
> >>> As of, or about, Kernel 4.1RC1 on resume from suspend only CPU 0 comes back on-line.
> >>> The issue persists through Kernel 4.1RC3.
> >>> This is on my test computer with an i7-2600K.
> >>> I do not normally use suspend on this computer, but was doing so while working on a bug report.
> >>>
> >>> The kernel was bisected, and this is the result:
> >>>
> >>
> >> Does commit 533445c6e533 "sched/core: Fix regression in
> >> cpuset_cpu_inactive() for suspend" on tip/sched/core
> >> fix the bug?
>
> > That would be sched/urgent primarily, not sched/core:
> > git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/urgent
> > I'll get this fix to Linus ASAP.
>
> I could not seem to figure out how to get the patch from tip,
> and any version I could find on internet would not apply properly.

So the way you could have picked it up is something like:

# pick up 533445c6e533
git fetch git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/urgent
git cherry-pick 533445c6e533

or pick up all pending scheduler fixes, amongst them 533445c6e533:

git pull git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/urgent

Or to export just that single one as a patch:

git fetch git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/urgent
git log --pretty=email --stat -p -1 533445c6e533

> Juri kindly sent me the patch.
>
> With the patch applied to an otherwise unmodified Kernel 4.1RC3,
> Resume from suspend is working fine again.

Great, thanks for testing!

Ingo

2015-05-15 08:51:01

by Juri Lelli

[permalink] [raw]
Subject: Re: On resume from suspend only CPU 0 comes back on-line [REGRESSION][BISECTED]

On 14/05/15 23:12, Doug Smythies wrote:
> On 2015.05.14 10:48 Ingo Molnar wrote:
>
>> * Juri Lelli <[email protected]> wrote:
>>>> On 14/05/15 15:41, Doug Smythies wrote:
>>>> As of, or about, Kernel 4.1RC1 on resume from suspend only CPU 0 comes back on-line.
>>>> The issue persists through Kernel 4.1RC3.
>>>> This is on my test computer with an i7-2600K.
>>>> I do not normally use suspend on this computer, but was doing so while working on a bug report.
>>>>
>>>> The kernel was bisected, and this is the result:
>>>>
>>>
>>> Does commit 533445c6e533 "sched/core: Fix regression in
>>> cpuset_cpu_inactive() for suspend" on tip/sched/core
>>> fix the bug?
>
>> That would be sched/urgent primarily, not sched/core:
>> git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched/urgent
>> I'll get this fix to Linus ASAP.
>
> I could not seem to figure out how to get the patch from tip,
> and any version I could find on internet would not apply properly.
>
> Juri kindly sent me the patch.
>
> With the patch applied to an otherwise unmodified Kernel 4.1RC3,
> Resume from suspend is working fine again.
>

Great! Thanks for testing :).

Best,

- Juri