2015-05-17 15:48:56

by Gabriele Mazzotta

[permalink] [raw]
Subject: Regression: turbostat stops working after suspend/resume cycle

Hi,

I've recently noticed that if I suspend and resume my laptop, I can no
longer execute turbostat. This is what I get when I try to start it:
# turbostat
Could not migrate to CPU 1
turbostat: re-initialized with num_cpus 4
Could not migrate to CPU 1

Since everything works as expected with v4.0, I ran a bisection and
found that commit 3c18d447b3b36a8d ("sched/core: Check for available
DL bandwidth in cpuset_cpu_inactive()") is the cause of the regression.

I don't know if there's something else affected by that change, but
I can consistently reproduce the bug with turbostat.

I can provide more info if needed.

Regards,
Gabriele


2015-05-18 06:46:08

by Ingo Molnar

[permalink] [raw]
Subject: Re: Regression: turbostat stops working after suspend/resume cycle


* Gabriele Mazzotta <[email protected]> wrote:

> Hi,
>
> I've recently noticed that if I suspend and resume my laptop, I can no
> longer execute turbostat. This is what I get when I try to start it:
> # turbostat
> Could not migrate to CPU 1
> turbostat: re-initialized with num_cpus 4
> Could not migrate to CPU 1
>
> Since everything works as expected with v4.0, I ran a bisection and
> found that commit 3c18d447b3b36a8d ("sched/core: Check for available
> DL bandwidth in cpuset_cpu_inactive()") is the cause of the regression.
>
> I don't know if there's something else affected by that change, but
> I can consistently reproduce the bug with turbostat.
>
> I can provide more info if needed.

Does this commit:

533445c6e533 sched/core: Fix regression in cpuset_cpu_inactive() for suspend

which is already in Linus's tree, and which should be part of -rc4,
fix it? Also attached below.

Thanks,

Ingo

====================>
>From 533445c6e53368569e50ab3fb712230c03d523f3 Mon Sep 17 00:00:00 2001
From: Omar Sandoval <[email protected]>
Date: Mon, 4 May 2015 03:09:36 -0700
Subject: [PATCH] sched/core: Fix regression in cpuset_cpu_inactive() for suspend

Commit 3c18d447b3b3 ("sched/core: Check for available DL bandwidth in
cpuset_cpu_inactive()"), a SCHED_DEADLINE bugfix, had a logic error that
caused a regression in setting a CPU inactive during suspend. I ran into
this when a program was failing pthread_setaffinity_np() with EINVAL after
a suspend+wake up.

A simple reproducer:

$ ./a.out
sched_setaffinity: Success
$ systemctl suspend
$ ./a.out
sched_setaffinity: Invalid argument

... where ./a.out is:

#define _GNU_SOURCE
#include <errno.h>
#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

int main(void)
{
long num_cores;
cpu_set_t cpu_set;
int ret;

num_cores = sysconf(_SC_NPROCESSORS_ONLN);
CPU_ZERO(&cpu_set);
CPU_SET(num_cores - 1, &cpu_set);
errno = 0;
ret = sched_setaffinity(getpid(), sizeof(cpu_set), &cpu_set);
perror("sched_setaffinity");
return ret ? EXIT_FAILURE : EXIT_SUCCESS;
}

The mistake is that suspend is handled in the action ==
CPU_DOWN_PREPARE_FROZEN case of the switch statement in
cpuset_cpu_inactive().

However, the commit in question masked out CPU_TASKS_FROZEN
from the action, making this case dead.

The fix is straightforward.

Signed-off-by: Omar Sandoval <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Fixes: 3c18d447b3b3 ("sched/core: Check for available DL bandwidth in cpuset_cpu_inactive()")
Link: http://lkml.kernel.org/r/1cb5ecb3d6543c38cce5790387f336f54ec8e2bc.1430733960.git.osandov@osandov.com
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/sched/core.c | 28 ++++++++++++----------------
1 file changed, 12 insertions(+), 16 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 34db9bf892a3..57bd333bc4ab 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6999,27 +6999,23 @@ static int cpuset_cpu_inactive(struct notifier_block *nfb, unsigned long action,
unsigned long flags;
long cpu = (long)hcpu;
struct dl_bw *dl_b;
+ bool overflow;
+ int cpus;

- switch (action & ~CPU_TASKS_FROZEN) {
+ switch (action) {
case CPU_DOWN_PREPARE:
- /* explicitly allow suspend */
- if (!(action & CPU_TASKS_FROZEN)) {
- bool overflow;
- int cpus;
-
- rcu_read_lock_sched();
- dl_b = dl_bw_of(cpu);
+ rcu_read_lock_sched();
+ dl_b = dl_bw_of(cpu);

- raw_spin_lock_irqsave(&dl_b->lock, flags);
- cpus = dl_bw_cpus(cpu);
- overflow = __dl_overflow(dl_b, cpus, 0, 0);
- raw_spin_unlock_irqrestore(&dl_b->lock, flags);
+ raw_spin_lock_irqsave(&dl_b->lock, flags);
+ cpus = dl_bw_cpus(cpu);
+ overflow = __dl_overflow(dl_b, cpus, 0, 0);
+ raw_spin_unlock_irqrestore(&dl_b->lock, flags);

- rcu_read_unlock_sched();
+ rcu_read_unlock_sched();

- if (overflow)
- return notifier_from_errno(-EBUSY);
- }
+ if (overflow)
+ return notifier_from_errno(-EBUSY);
cpuset_update_active_cpus(false);
break;
case CPU_DOWN_PREPARE_FROZEN:

2015-05-18 06:48:27

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Regression: turbostat stops working after suspend/resume cycle\

On Sun, May 17, 2015 at 05:48:44PM +0200, Gabriele Mazzotta wrote:
> Hi,
>
> I've recently noticed that if I suspend and resume my laptop, I can no
> longer execute turbostat. This is what I get when I try to start it:
> # turbostat
> Could not migrate to CPU 1
> turbostat: re-initialized with num_cpus 4
> Could not migrate to CPU 1
>
> Since everything works as expected with v4.0, I ran a bisection and
> found that commit 3c18d447b3b36a8d ("sched/core: Check for available
> DL bandwidth in cpuset_cpu_inactive()") is the cause of the regression.
>
> I don't know if there's something else affected by that change, but
> I can consistently reproduce the bug with turbostat.


This should be fixed by the below commit which is already in Linus'
tree.

---
commit 533445c6e53368569e50ab3fb712230c03d523f3
Author: Omar Sandoval <[email protected]>
Date: Mon May 4 03:09:36 2015 -0700

sched/core: Fix regression in cpuset_cpu_inactive() for suspend

Commit 3c18d447b3b3 ("sched/core: Check for available DL bandwidth in
cpuset_cpu_inactive()"), a SCHED_DEADLINE bugfix, had a logic error that
caused a regression in setting a CPU inactive during suspend. I ran into
this when a program was failing pthread_setaffinity_np() with EINVAL after
a suspend+wake up.

A simple reproducer:

$ ./a.out
sched_setaffinity: Success
$ systemctl suspend
$ ./a.out
sched_setaffinity: Invalid argument

... where ./a.out is:

#define _GNU_SOURCE
#include <errno.h>
#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

int main(void)
{
long num_cores;
cpu_set_t cpu_set;
int ret;

num_cores = sysconf(_SC_NPROCESSORS_ONLN);
CPU_ZERO(&cpu_set);
CPU_SET(num_cores - 1, &cpu_set);
errno = 0;
ret = sched_setaffinity(getpid(), sizeof(cpu_set), &cpu_set);
perror("sched_setaffinity");
return ret ? EXIT_FAILURE : EXIT_SUCCESS;
}

The mistake is that suspend is handled in the action ==
CPU_DOWN_PREPARE_FROZEN case of the switch statement in
cpuset_cpu_inactive().

However, the commit in question masked out CPU_TASKS_FROZEN
from the action, making this case dead.

The fix is straightforward.

Signed-off-by: Omar Sandoval <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Fixes: 3c18d447b3b3 ("sched/core: Check for available DL bandwidth in cpuset_cpu_inactive()")
Link: http://lkml.kernel.org/r/1cb5ecb3d6543c38cce5790387f336f54ec8e2bc.1430733960.git.osandov@osandov.com
Signed-off-by: Ingo Molnar <[email protected]>

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 34db9bf892a3..57bd333bc4ab 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6999,27 +6999,23 @@ static int cpuset_cpu_inactive(struct notifier_block *nfb, unsigned long action,
unsigned long flags;
long cpu = (long)hcpu;
struct dl_bw *dl_b;
+ bool overflow;
+ int cpus;

- switch (action & ~CPU_TASKS_FROZEN) {
+ switch (action) {
case CPU_DOWN_PREPARE:
- /* explicitly allow suspend */
- if (!(action & CPU_TASKS_FROZEN)) {
- bool overflow;
- int cpus;
-
- rcu_read_lock_sched();
- dl_b = dl_bw_of(cpu);
+ rcu_read_lock_sched();
+ dl_b = dl_bw_of(cpu);

- raw_spin_lock_irqsave(&dl_b->lock, flags);
- cpus = dl_bw_cpus(cpu);
- overflow = __dl_overflow(dl_b, cpus, 0, 0);
- raw_spin_unlock_irqrestore(&dl_b->lock, flags);
+ raw_spin_lock_irqsave(&dl_b->lock, flags);
+ cpus = dl_bw_cpus(cpu);
+ overflow = __dl_overflow(dl_b, cpus, 0, 0);
+ raw_spin_unlock_irqrestore(&dl_b->lock, flags);

- rcu_read_unlock_sched();
+ rcu_read_unlock_sched();

- if (overflow)
- return notifier_from_errno(-EBUSY);
- }
+ if (overflow)
+ return notifier_from_errno(-EBUSY);
cpuset_update_active_cpus(false);
break;
case CPU_DOWN_PREPARE_FROZEN:

2015-05-18 07:12:40

by Gabriele Mazzotta

[permalink] [raw]
Subject: Re: Regression: turbostat stops working after suspend/resume cycle

On Monday 18 May 2015 08:45:52 Ingo Molnar wrote:
>
> * Gabriele Mazzotta <[email protected]> wrote:
>
> > Hi,
> >
> > I've recently noticed that if I suspend and resume my laptop, I can no
> > longer execute turbostat. This is what I get when I try to start it:
> > # turbostat
> > Could not migrate to CPU 1
> > turbostat: re-initialized with num_cpus 4
> > Could not migrate to CPU 1
> >
> > Since everything works as expected with v4.0, I ran a bisection and
> > found that commit 3c18d447b3b36a8d ("sched/core: Check for available
> > DL bandwidth in cpuset_cpu_inactive()") is the cause of the regression.
> >
> > I don't know if there's something else affected by that change, but
> > I can consistently reproduce the bug with turbostat.
> >
> > I can provide more info if needed.
>
> Does this commit:
>
> 533445c6e533 sched/core: Fix regression in cpuset_cpu_inactive() for suspend
>
> which is already in Linus's tree, and which should be part of -rc4,
> fix it? Also attached below.

Yes, this fixes the problem, thanks. Sorry for not noticing it.

Gabriele

2015-05-18 07:19:12

by Gabriele Mazzotta

[permalink] [raw]
Subject: Re: Regression: turbostat stops working after suspend/resume cycle\

On Monday 18 May 2015 08:48:04 Peter Zijlstra wrote:
> On Sun, May 17, 2015 at 05:48:44PM +0200, Gabriele Mazzotta wrote:
> > Hi,
> >
> > I've recently noticed that if I suspend and resume my laptop, I can no
> > longer execute turbostat. This is what I get when I try to start it:
> > # turbostat
> > Could not migrate to CPU 1
> > turbostat: re-initialized with num_cpus 4
> > Could not migrate to CPU 1
> >
> > Since everything works as expected with v4.0, I ran a bisection and
> > found that commit 3c18d447b3b36a8d ("sched/core: Check for available
> > DL bandwidth in cpuset_cpu_inactive()") is the cause of the regression.
> >
> > I don't know if there's something else affected by that change, but
> > I can consistently reproduce the bug with turbostat.
>
>
> This should be fixed by the below commit which is already in Linus'
> tree.

Thank you for the quick reply. As I replied to Ingo's mail, which
arrived just a bit earlier than yours, yes, the commit here below fixes
the problem.

Thanks,
Gabriele

> ---
> commit 533445c6e53368569e50ab3fb712230c03d523f3
> Author: Omar Sandoval <[email protected]>
> Date: Mon May 4 03:09:36 2015 -0700
>
> sched/core: Fix regression in cpuset_cpu_inactive() for suspend
>
> Commit 3c18d447b3b3 ("sched/core: Check for available DL bandwidth in
> cpuset_cpu_inactive()"), a SCHED_DEADLINE bugfix, had a logic error that
> caused a regression in setting a CPU inactive during suspend. I ran into
> this when a program was failing pthread_setaffinity_np() with EINVAL after
> a suspend+wake up.
>
> A simple reproducer:
>
> $ ./a.out
> sched_setaffinity: Success
> $ systemctl suspend
> $ ./a.out
> sched_setaffinity: Invalid argument
>
> ... where ./a.out is:
>
> #define _GNU_SOURCE
> #include <errno.h>
> #include <sched.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <unistd.h>
>
> int main(void)
> {
> long num_cores;
> cpu_set_t cpu_set;
> int ret;
>
> num_cores = sysconf(_SC_NPROCESSORS_ONLN);
> CPU_ZERO(&cpu_set);
> CPU_SET(num_cores - 1, &cpu_set);
> errno = 0;
> ret = sched_setaffinity(getpid(), sizeof(cpu_set), &cpu_set);
> perror("sched_setaffinity");
> return ret ? EXIT_FAILURE : EXIT_SUCCESS;
> }
>
> The mistake is that suspend is handled in the action ==
> CPU_DOWN_PREPARE_FROZEN case of the switch statement in
> cpuset_cpu_inactive().
>
> However, the commit in question masked out CPU_TASKS_FROZEN
> from the action, making this case dead.
>
> The fix is straightforward.
>
> Signed-off-by: Omar Sandoval <[email protected]>
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: H. Peter Anvin <[email protected]>
> Cc: Juri Lelli <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Fixes: 3c18d447b3b3 ("sched/core: Check for available DL bandwidth in cpuset_cpu_inactive()")
> Link: http://lkml.kernel.org/r/1cb5ecb3d6543c38cce5790387f336f54ec8e2bc.1430733960.git.osandov@osandov.com
> Signed-off-by: Ingo Molnar <[email protected]>
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 34db9bf892a3..57bd333bc4ab 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -6999,27 +6999,23 @@ static int cpuset_cpu_inactive(struct notifier_block *nfb, unsigned long action,
> unsigned long flags;
> long cpu = (long)hcpu;
> struct dl_bw *dl_b;
> + bool overflow;
> + int cpus;
>
> - switch (action & ~CPU_TASKS_FROZEN) {
> + switch (action) {
> case CPU_DOWN_PREPARE:
> - /* explicitly allow suspend */
> - if (!(action & CPU_TASKS_FROZEN)) {
> - bool overflow;
> - int cpus;
> -
> - rcu_read_lock_sched();
> - dl_b = dl_bw_of(cpu);
> + rcu_read_lock_sched();
> + dl_b = dl_bw_of(cpu);
>
> - raw_spin_lock_irqsave(&dl_b->lock, flags);
> - cpus = dl_bw_cpus(cpu);
> - overflow = __dl_overflow(dl_b, cpus, 0, 0);
> - raw_spin_unlock_irqrestore(&dl_b->lock, flags);
> + raw_spin_lock_irqsave(&dl_b->lock, flags);
> + cpus = dl_bw_cpus(cpu);
> + overflow = __dl_overflow(dl_b, cpus, 0, 0);
> + raw_spin_unlock_irqrestore(&dl_b->lock, flags);
>
> - rcu_read_unlock_sched();
> + rcu_read_unlock_sched();
>
> - if (overflow)
> - return notifier_from_errno(-EBUSY);
> - }
> + if (overflow)
> + return notifier_from_errno(-EBUSY);
> cpuset_update_active_cpus(false);
> break;
> case CPU_DOWN_PREPARE_FROZEN: