2010-06-13 20:30:28

by Sergey Senozhatsky

[permalink] [raw]
Subject: BUG: using smp_processor_id() in preemptible code: s2disk

Hello,

.35-rc3, x86

Hit the following error today:

kernel: [ 94.817525] CPU1: Thermal monitoring handled by SMI
kernel: [ 94.951454] BUG: using smp_processor_id() in preemptible [00000000] code: s2disk/3392
kernel: [ 94.951462] caller is nr_iowait_cpu+0xe/0x1e
kernel: [ 94.951466] Pid: 3392, comm: s2disk Not tainted 2.6.35-rc3-dbg-00106-ga75e02b #2
kernel: [ 94.951469] Call Trace:
kernel: [ 94.951478] [<c1184c55>] debug_smp_processor_id+0xa5/0xbc
kernel: [ 94.951484] [<c10282a5>] nr_iowait_cpu+0xe/0x1e
kernel: [ 94.951489] [<c104ab7c>] update_ts_time_stats+0x32/0x6c
kernel: [ 94.951494] [<c104ac73>] get_cpu_idle_time_us+0x36/0x58
kernel: [ 94.951500] [<c124229b>] get_cpu_idle_time+0x12/0x74
kernel: [ 94.951505] [<c1242963>] cpufreq_governor_dbs+0xc3/0x2dc
kernel: [ 94.951511] [<c1240437>] __cpufreq_governor+0x51/0x85
kernel: [ 94.951515] [<c1241190>] __cpufreq_set_policy+0x10c/0x13d
kernel: [ 94.951520] [<c12413d3>] cpufreq_add_dev_interface+0x212/0x233
kernel: [ 94.951526] [<c1241b1e>] ? handle_update+0x0/0xd
kernel: [ 94.951533] [<c1241a18>] cpufreq_add_dev+0x34b/0x35a
kernel: [ 94.951538] [<c103c973>] ? schedule_delayed_work_on+0x11/0x13
kernel: [ 94.951544] [<c12c14db>] cpufreq_cpu_callback+0x59/0x63
kernel: [ 94.951550] [<c1042f39>] notifier_call_chain+0x26/0x48
kernel: [ 94.951555] [<c1042f7d>] __raw_notifier_call_chain+0xe/0x10
kernel: [ 94.951560] [<c102efb9>] __cpu_notify+0x15/0x29
kernel: [ 94.951564] [<c102efda>] cpu_notify+0xd/0xf
kernel: [ 94.951568] [<c12bfb30>] _cpu_up+0xaf/0xd2
kernel: [ 94.951574] [<c12b3ad4>] enable_nonboot_cpus+0x3d/0x94
kernel: [ 94.951580] [<c1055eef>] hibernation_snapshot+0x104/0x1a2
kernel: [ 94.951585] [<c1058b49>] snapshot_ioctl+0x24b/0x53e
kernel: [ 94.951589] [<c1028ad1>] ? sub_preempt_count+0x7c/0x89
kernel: [ 94.951595] [<c10ab91d>] vfs_ioctl+0x2e/0x8c
kernel: [ 94.951599] [<c10588fe>] ? snapshot_ioctl+0x0/0x53e
kernel: [ 94.951604] [<c10ac2c7>] do_vfs_ioctl+0x42f/0x45a
kernel: [ 94.951609] [<c10a0ba5>] ? fsnotify_modify+0x4f/0x5a
kernel: [ 94.951615] [<c11e9dc3>] ? tty_write+0x0/0x1d0
kernel: [ 94.951619] [<c10a12d6>] ? vfs_write+0xa2/0xda
kernel: [ 94.951623] [<c10ac333>] sys_ioctl+0x41/0x62
kernel: [ 94.951629] [<c10027d3>] sysenter_do_call+0x12/0x2d
kernel: [ 94.954195] CPU1 is up
kernel: [ 94.954862] ACPI: Waking up from system sleep state S4


Sergey


Attachments:
(No filename) (2.39 kB)
(No filename) (316.00 B)
Download all attachments

2010-06-13 20:33:12

by Maxim Levitsky

[permalink] [raw]
Subject: Re: BUG: using smp_processor_id() in preemptible code: s2disk

On Sun, 2010-06-13 at 23:33 +0300, Sergey Senozhatsky wrote:
> Hello,
>
> .35-rc3, x86
>
> Hit the following error today:

I confirm the same issue. Sorry for not reporting it earlier.


Best regards,
Maxim Levitsky

2010-06-13 23:38:16

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: BUG: using smp_processor_id() in preemptible code: s2disk

On Sunday, June 13, 2010, Maxim Levitsky wrote:
> On Sun, 2010-06-13 at 23:33 +0300, Sergey Senozhatsky wrote:
> > Hello,
> >
> > .35-rc3, x86
> >
> > Hit the following error today:
>
> I confirm the same issue. Sorry for not reporting it earlier.

It looks like a cpufreq issue to me, but it may be related to CPU hotplug as
well. I'm not sure who's been messing up with that recently, though.

Rafael

2010-06-14 14:06:16

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: BUG: using smp_processor_id() in preemptible code: s2disk

Hello,
Not sure if this simple solution is the correct one.

---

diff --git a/kernel/sched.c b/kernel/sched.c
index f8b8996..cfb262b 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2866,7 +2866,10 @@ unsigned long nr_iowait(void)

unsigned long nr_iowait_cpu(void)
{
- struct rq *this = this_rq();
+ int cpu = get_cpu();
+ struct rq *this = cpu_rq(cpu);
+ put_cpu();
+
return atomic_read(&this->nr_iowait);
}

2010-06-14 14:25:21

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: BUG: using smp_processor_id() in preemptible code: s2disk

On Monday, June 14, 2010, Sergey Senozhatsky wrote:
> Hello,
> Not sure if this simple solution is the correct one.

Well, let's ask the scheduler people.

Ingo, Peter, what do you think of the patch below?

Rafael


> ---
>
> diff --git a/kernel/sched.c b/kernel/sched.c
> index f8b8996..cfb262b 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -2866,7 +2866,10 @@ unsigned long nr_iowait(void)
>
> unsigned long nr_iowait_cpu(void)
> {
> - struct rq *this = this_rq();
> + int cpu = get_cpu();
> + struct rq *this = cpu_rq(cpu);
> + put_cpu();
> +
> return atomic_read(&this->nr_iowait);
> }
>

2010-06-14 14:36:38

by Arjan van de Ven

[permalink] [raw]
Subject: Re: BUG: using smp_processor_id() in preemptible code: s2disk

On Mon, 14 Jun 2010 17:09:41 +0300
Sergey Senozhatsky <[email protected]> wrote:

> Hello,
> Not sure if this simple solution is the correct one.

it's not; the caller needs to pass in the cpu number I suspect for this
to be really correct....

I just returned from family emergency travel and will take a look today


--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2010-06-14 14:51:16

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: BUG: using smp_processor_id() in preemptible code: s2disk

On (06/14/10 07:38), Arjan van de Ven wrote:
> > Hello,
> > Not sure if this simple solution is the correct one.
>
> it's not; the caller needs to pass in the cpu number I suspect for this
> to be really correct....
>
> I just returned from family emergency travel and will take a look today
>

I thought about patching

./drivers/cpuidle/governors/menu.c: if (nr_iowait_cpu())
./drivers/cpuidle/governors/menu.c: mult += 10 * nr_iowait_cpu();
./kernel/time/tick-sched.c: if (nr_iowait_cpu() > 0)


decided to patch nr_iowait_cpu instead.


Sergey


Attachments:
(No filename) (553.00 B)
(No filename) (316.00 B)
Download all attachments

2010-06-14 14:59:25

by Arjan van de Ven

[permalink] [raw]
Subject: Re: BUG: using smp_processor_id() in preemptible code: s2disk

On Mon, 14 Jun 2010 17:54:40 +0300
Sergey Senozhatsky <[email protected]> wrote:

> On (06/14/10 07:38), Arjan van de Ven wrote:
> > > Hello,
> > > Not sure if this simple solution is the correct one.
> >
> > it's not; the caller needs to pass in the cpu number I suspect for
> > this to be really correct....
> >
> > I just returned from family emergency travel and will take a look
> > today
> >
>
> I thought about patching
>
> ./drivers/cpuidle/governors/menu.c: if (nr_iowait_cpu())
> ./drivers/cpuidle/governors/menu.c: mult += 10 *
> nr_iowait_cpu(); ./kernel/time/tick-sched.c: if
> (nr_iowait_cpu() > 0)
>
>
> decided to patch nr_iowait_cpu instead.

the bug is that it needs to be

nr_iowait_cpu(int cpu)



--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2010-06-14 15:14:08

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: BUG: using smp_processor_id() in preemptible code: s2disk

On (06/14/10 08:01), Arjan van de Ven wrote:
> > decided to patch nr_iowait_cpu instead.
>
> the bug is that it needs to be
>
> nr_iowait_cpu(int cpu)
>

Something similar to this?

---

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 52ff8aa..4871ed5 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -137,14 +137,17 @@ static inline int which_bucket(unsigned int duration)
{
int bucket = 0;

+ int cpu = get_cpu();
/*
* We keep two groups of stats; one with no
* IO pending, one without.
* This allows us to calculate
* E(duration)|iowait
*/
- if (nr_iowait_cpu())
+ if (nr_iowait_cpu(cpu))
bucket = BUCKETS/2;
+
+ put_cpu();

if (duration < 10)
return bucket;
@@ -169,14 +172,17 @@ static inline int which_bucket(unsigned int duration)
static inline int performance_multiplier(void)
{
int mult = 1;
-
+ int cpu = get_cpu();
+
/* for higher loadavg, we are more reluctant */

mult += 2 * get_loadavg();

/* for IO wait tasks (per cpu!) we add 5x each */
- mult += 10 * nr_iowait_cpu();
+ mult += 10 * nr_iowait_cpu(cpu);

+ put_cpu();
+
return mult;
}

diff --git a/include/linux/sched.h b/include/linux/sched.h
index f118809..747fcae 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -139,7 +139,7 @@ extern int nr_processes(void);
extern unsigned long nr_running(void);
extern unsigned long nr_uninterruptible(void);
extern unsigned long nr_iowait(void);
-extern unsigned long nr_iowait_cpu(void);
+extern unsigned long nr_iowait_cpu(int cpu);
extern unsigned long this_cpu_load(void);


diff --git a/kernel/sched.c b/kernel/sched.c
index f8b8996..f61b48e 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2864,9 +2864,9 @@ unsigned long nr_iowait(void)
return sum;
}

-unsigned long nr_iowait_cpu(void)
+unsigned long nr_iowait_cpu(int cpu)
{
- struct rq *this = this_rq();
+ struct rq *this = cpu_rq(cpu);
return atomic_read(&this->nr_iowait);
}

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 1d7b9bc..101e8aa 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -159,10 +159,12 @@ update_ts_time_stats(struct tick_sched *ts, ktime_t now, u64 *last_update_time)
ktime_t delta;

if (ts->idle_active) {
+ int cpu = get_cpu();
delta = ktime_sub(now, ts->idle_entrytime);
ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
- if (nr_iowait_cpu() > 0)
+ if (nr_iowait_cpu(cpu) > 0)
ts->iowait_sleeptime = ktime_add(ts->iowait_sleeptime, delta);
+ put_cpu();
ts->idle_entrytime = now;
}

2010-06-15 03:37:47

by Arjan van de Ven

[permalink] [raw]
Subject: Re: BUG: using smp_processor_id() in preemptible code: s2disk

On Mon, 14 Jun 2010 18:17:35 +0300
Sergey Senozhatsky <[email protected]> wrote:

> On (06/14/10 08:01), Arjan van de Ven wrote:
> > > decided to patch nr_iowait_cpu instead.
> >
> > the bug is that it needs to be
> >
> > nr_iowait_cpu(int cpu)
> >
>
> Something similar to this?


yup that'll do it

Acked-by: Arjan van de Ven <[email protected]>

--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2010-06-15 06:15:59

by Sergey Senozhatsky

[permalink] [raw]
Subject: [PATCH] avoid using smp_processor_id() in preemptible code (nr_iowait_cpu)

Fixing
BUG: using smp_processor_id() in preemptible [00000000] code: s2disk/3392
caller is nr_iowait_cpu+0xe/0x1e
Pid: 3392, comm: s2disk Not tainted 2.6.35-rc3-dbg-00106-ga75e02b #2
Call Trace:
[<c1184c55>] debug_smp_processor_id+0xa5/0xbc
[<c10282a5>] nr_iowait_cpu+0xe/0x1e
[<c104ab7c>] update_ts_time_stats+0x32/0x6c
[<c104ac73>] get_cpu_idle_time_us+0x36/0x58
[<c124229b>] get_cpu_idle_time+0x12/0x74
[<c1242963>] cpufreq_governor_dbs+0xc3/0x2dc
[<c1240437>] __cpufreq_governor+0x51/0x85
[<c1241190>] __cpufreq_set_policy+0x10c/0x13d
[<c12413d3>] cpufreq_add_dev_interface+0x212/0x233
[<c1241b1e>] ? handle_update+0x0/0xd
[<c1241a18>] cpufreq_add_dev+0x34b/0x35a
[<c103c973>] ? schedule_delayed_work_on+0x11/0x13
[<c12c14db>] cpufreq_cpu_callback+0x59/0x63
[<c1042f39>] notifier_call_chain+0x26/0x48
[<c1042f7d>] __raw_notifier_call_chain+0xe/0x10
[<c102efb9>] __cpu_notify+0x15/0x29
[<c102efda>] cpu_notify+0xd/0xf
[<c12bfb30>] _cpu_up+0xaf/0xd2
[<c12b3ad4>] enable_nonboot_cpus+0x3d/0x94
[<c1055eef>] hibernation_snapshot+0x104/0x1a2
[<c1058b49>] snapshot_ioctl+0x24b/0x53e
[<c1028ad1>] ? sub_preempt_count+0x7c/0x89
[<c10ab91d>] vfs_ioctl+0x2e/0x8c
[<c10588fe>] ? snapshot_ioctl+0x0/0x53e
[<c10ac2c7>] do_vfs_ioctl+0x42f/0x45a
[<c10a0ba5>] ? fsnotify_modify+0x4f/0x5a
[<c11e9dc3>] ? tty_write+0x0/0x1d0
[<c10a12d6>] ? vfs_write+0xa2/0xda
[<c10ac333>] sys_ioctl+0x41/0x62
[<c10027d3>] sysenter_do_call+0x12/0x2d

Initially fix was to use get_cpu/put_cpu in nr_iowait_cpu. However,
Arjan van de Ven stated that "the bug is that it needs to be nr_iowait_cpu(int cpu)".

This patch introduces nr_iowait_cpu(int cpu) and changes to it callers.

Signed-off-by: Sergey Senozhatsky <[email protected]>

---

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 52ff8aa..4871ed5 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -137,14 +137,17 @@ static inline int which_bucket(unsigned int duration)
{
int bucket = 0;

+ int cpu = get_cpu();
/*
* We keep two groups of stats; one with no
* IO pending, one without.
* This allows us to calculate
* E(duration)|iowait
*/
- if (nr_iowait_cpu())
+ if (nr_iowait_cpu(cpu))
bucket = BUCKETS/2;
+
+ put_cpu();

if (duration < 10)
return bucket;
@@ -169,14 +172,17 @@ static inline int which_bucket(unsigned int duration)
static inline int performance_multiplier(void)
{
int mult = 1;
-
+ int cpu = get_cpu();
+
/* for higher loadavg, we are more reluctant */

mult += 2 * get_loadavg();

/* for IO wait tasks (per cpu!) we add 5x each */
- mult += 10 * nr_iowait_cpu();
+ mult += 10 * nr_iowait_cpu(cpu);

+ put_cpu();
+
return mult;
}

diff --git a/include/linux/sched.h b/include/linux/sched.h
index f118809..747fcae 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -139,7 +139,7 @@ extern int nr_processes(void);
extern unsigned long nr_running(void);
extern unsigned long nr_uninterruptible(void);
extern unsigned long nr_iowait(void);
-extern unsigned long nr_iowait_cpu(void);
+extern unsigned long nr_iowait_cpu(int cpu);
extern unsigned long this_cpu_load(void);


diff --git a/kernel/sched.c b/kernel/sched.c
index f8b8996..f61b48e 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2864,9 +2864,9 @@ unsigned long nr_iowait(void)
return sum;
}

-unsigned long nr_iowait_cpu(void)
+unsigned long nr_iowait_cpu(int cpu)
{
- struct rq *this = this_rq();
+ struct rq *this = cpu_rq(cpu);
return atomic_read(&this->nr_iowait);
}

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 1d7b9bc..101e8aa 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -159,10 +159,12 @@ update_ts_time_stats(struct tick_sched *ts, ktime_t now, u64 *last_update_time)
ktime_t delta;

if (ts->idle_active) {
+ int cpu = get_cpu();
delta = ktime_sub(now, ts->idle_entrytime);
ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
- if (nr_iowait_cpu() > 0)
+ if (nr_iowait_cpu(cpu) > 0)
ts->iowait_sleeptime = ktime_add(ts->iowait_sleeptime, delta);
+ put_cpu();
ts->idle_entrytime = now;
}

2010-06-15 14:22:26

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH] avoid using smp_processor_id() in preemptible code (nr_iowait_cpu)

On Tue, 15 Jun 2010 09:19:27 +0300
Sergey Senozhatsky <[email protected]> wrote:
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 1d7b9bc..101e8aa 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -159,10 +159,12 @@ update_ts_time_stats(struct tick_sched *ts,
> ktime_t now, u64 *last_update_time) ktime_t delta;
>
> if (ts->idle_active) {
> + int cpu = get_cpu();
> delta = ktime_sub(now, ts->idle_entrytime);
> ts->idle_sleeptime = ktime_add(ts->idle_sleeptime,
> delta);
> - if (nr_iowait_cpu() > 0)
> + if (nr_iowait_cpu(cpu) > 0)
> ts->iowait_sleeptime =
> ktime_add(ts->iowait_sleeptime, delta);
> + put_cpu();
> ts->idle_entrytime = now;
> }


hmm this part is wrong
you pick the current cpu, rather than the one denoted by ts.....

they will normally be the same, except for the case where you get your
warning...
(and this is also why you can't move the get_cpu() inside
nr_iowait_cpu() ... )

>


--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2010-06-15 14:46:59

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCH] avoid using smp_processor_id() in preemptible code (nr_iowait_cpu)

On (06/15/10 07:24), Arjan van de Ven wrote:
> On Tue, 15 Jun 2010 09:19:27 +0300
> Sergey Senozhatsky <[email protected]> wrote:
> > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> > index 1d7b9bc..101e8aa 100644
> > --- a/kernel/time/tick-sched.c
> > +++ b/kernel/time/tick-sched.c
> > @@ -159,10 +159,12 @@ update_ts_time_stats(struct tick_sched *ts,
> > ktime_t now, u64 *last_update_time) ktime_t delta;
> >
> > if (ts->idle_active) {
> > + int cpu = get_cpu();
> > delta = ktime_sub(now, ts->idle_entrytime);
> > ts->idle_sleeptime = ktime_add(ts->idle_sleeptime,
> > delta);
> > - if (nr_iowait_cpu() > 0)
> > + if (nr_iowait_cpu(cpu) > 0)
> > ts->iowait_sleeptime =
> > ktime_add(ts->iowait_sleeptime, delta);
> > + put_cpu();
> > ts->idle_entrytime = now;
> > }
>
>
> hmm this part is wrong
> you pick the current cpu, rather than the one denoted by ts.....
>

Hmm. Thanks, good catch.
Well, there is something I'm missing. How can I match given *ts and cpu
in update_ts_time_stats (except for introducing update_ts_time_stats(..., int cpu)) ?

> they will normally be the same, except for the case where you get your
> warning...
> (and this is also why you can't move the get_cpu() inside
> nr_iowait_cpu() ... )
>


Sergey


Attachments:
(No filename) (1.26 kB)
(No filename) (316.00 B)
Download all attachments

2010-06-15 15:05:40

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH] avoid using smp_processor_id() in preemptible code (nr_iowait_cpu)

On Tue, 15 Jun 2010 17:50:29 +0300
Sergey Senozhatsky <[email protected]> wrote:
> >
> > hmm this part is wrong
> > you pick the current cpu, rather than the one denoted by ts.....
> >
>
> Hmm. Thanks, good catch.
> Well, there is something I'm missing. How can I match given *ts and
> cpu in update_ts_time_stats (except for introducing
> update_ts_time_stats(..., int cpu)) ?


that'd be option one
option two is to add a "cpu" member to struct tick_sched.....

if you go for option one, I'd replace the ts argument with the cpu
argument.....


--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2010-06-15 15:19:45

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCH] avoid using smp_processor_id() in preemptible code (nr_iowait_cpu)

On (06/15/10 08:08), Arjan van de Ven wrote:
> On Tue, 15 Jun 2010 17:50:29 +0300
> Sergey Senozhatsky <[email protected]> wrote:
> > >
> > > hmm this part is wrong
> > > you pick the current cpu, rather than the one denoted by ts.....
> > >
> >
> > Hmm. Thanks, good catch.
> > Well, there is something I'm missing. How can I match given *ts and
> > cpu in update_ts_time_stats (except for introducing
> > update_ts_time_stats(..., int cpu)) ?
>
>
> that'd be option one

We'll have problem in tick_nohz_start_idle(struct tick_sched *ts)
{
ktime_t now;
now = ktime_get();
update_ts_time_stats(ts, now, NULL);

...
So, we also will have to expand it to tick_nohz_start_idle(struct tick_sched *ts, int cpu)

That's why I prefer option #2.

> option two is to add a "cpu" member to struct tick_sched.....
>
Thought about that. Seems ok to me.


> if you go for option one, I'd replace the ts argument with the cpu
> argument.....
>
>

Hmm, I missed this one (replacing *ts to int cpu). And I like it.
Something like:

-update_ts_time_stats(struct tick_sched *ts, ktime_t now, u64 *last_update_time)
+update_ts_time_stats(int cpu, ktime_t now, u64 *last_update_time)
{
ktime_t delta;
+ struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);

if (ts->idle_active) {
int cpu = get_cpu();
delta = ktime_sub(now, ts->idle_entrytime);
ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
if (nr_iowait_cpu(cpu) > 0)
ts->iowait_sleeptime = ktime_add(ts->iowait_sleeptime, delta);
put_cpu();
ts->idle_entrytime = now;
+ ts->idle_active = 0;
}

if (last_update_time)
*last_update_time = ktime_to_us(now);

}


static void tick_nohz_stop_idle(int cpu, ktime_t now)
{
- struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);

update_ts_time_stats(cpu, now, NULL);
- ts->idle_active = 0;

sched_clock_idle_wakeup_event(0);
}

and so on.


Sergey


Attachments:
(No filename) (1.84 kB)
(No filename) (316.00 B)
Download all attachments

2010-06-15 15:27:46

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCH] avoid using smp_processor_id() in preemptible code (nr_iowait_cpu)

On (06/15/10 18:23), Sergey Senozhatsky wrote:
>....
> delta = ktime_sub(now, ts->idle_entrytime);
> ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
> if (nr_iowait_cpu(cpu) > 0)
> ts->iowait_sleeptime = ktime_add(ts->iowait_sleeptime, delta);
> put_cpu();
> ts->idle_entrytime = now;
> + ts->idle_active = 0;
^^^^^^

This part is wrong. Sorry.

> }
>
> if (last_update_time)
> *last_update_time = ktime_to_us(now);
>
> }


Attachments:
(No filename) (455.00 B)
(No filename) (316.00 B)
Download all attachments

2010-06-15 16:09:33

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCH] avoid using smp_processor_id() in preemptible code (nr_iowait_cpu)


I've changed struct tick_sched to match passed *ts and cpu. Also changed "&per_cpu(tick_cpu_sched, cpu)"
call to "struct tick_sched *tick_get_tick_sched(int cpu)" which we already have.

But I don't really like this part:
struct tick_sched *tick_get_tick_sched(int cpu)
{
struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
ts->cpu = cpu;
^^^^^^^^^^^^^
return ts;
}

Please kindly review.

---

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 52ff8aa..4871ed5 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -137,14 +137,17 @@ static inline int which_bucket(unsigned int duration)
{
int bucket = 0;

+ int cpu = get_cpu();
/*
* We keep two groups of stats; one with no
* IO pending, one without.
* This allows us to calculate
* E(duration)|iowait
*/
- if (nr_iowait_cpu())
+ if (nr_iowait_cpu(cpu))
bucket = BUCKETS/2;
+
+ put_cpu();

if (duration < 10)
return bucket;
@@ -169,14 +172,17 @@ static inline int which_bucket(unsigned int duration)
static inline int performance_multiplier(void)
{
int mult = 1;
-
+ int cpu = get_cpu();
+
/* for higher loadavg, we are more reluctant */

mult += 2 * get_loadavg();

/* for IO wait tasks (per cpu!) we add 5x each */
- mult += 10 * nr_iowait_cpu();
+ mult += 10 * nr_iowait_cpu(cpu);

+ put_cpu();
+
return mult;
}

diff --git a/include/linux/sched.h b/include/linux/sched.h
index f118809..747fcae 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -139,7 +139,7 @@ extern int nr_processes(void);
extern unsigned long nr_running(void);
extern unsigned long nr_uninterruptible(void);
extern unsigned long nr_iowait(void);
-extern unsigned long nr_iowait_cpu(void);
+extern unsigned long nr_iowait_cpu(int cpu);
extern unsigned long this_cpu_load(void);


diff --git a/include/linux/tick.h b/include/linux/tick.h
index b232ccc..db14691 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -51,6 +51,7 @@ struct tick_sched {
unsigned long check_clocks;
enum tick_nohz_mode nohz_mode;
ktime_t idle_tick;
+ int cpu;
int inidle;
int tick_stopped;
unsigned long idle_jiffies;
diff --git a/kernel/sched.c b/kernel/sched.c
index f8b8996..f61b48e 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2864,9 +2864,9 @@ unsigned long nr_iowait(void)
return sum;
}

-unsigned long nr_iowait_cpu(void)
+unsigned long nr_iowait_cpu(int cpu)
{
- struct rq *this = this_rq();
+ struct rq *this = cpu_rq(cpu);
return atomic_read(&this->nr_iowait);
}

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 1d7b9bc..5105345 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -38,7 +38,9 @@ static ktime_t last_jiffies_update;

struct tick_sched *tick_get_tick_sched(int cpu)
{
- return &per_cpu(tick_cpu_sched, cpu);
+ struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
+ ts->cpu = cpu;
+ return ts;
}

/*
@@ -137,7 +139,7 @@ __setup("nohz=", setup_tick_nohz);
static void tick_nohz_update_jiffies(ktime_t now)
{
int cpu = smp_processor_id();
- struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
+ struct tick_sched *ts = tick_get_tick_sched(cpu);
unsigned long flags;

cpumask_clear_cpu(cpu, nohz_cpu_mask);
@@ -159,9 +161,10 @@ update_ts_time_stats(struct tick_sched *ts, ktime_t now, u64 *last_update_time)
ktime_t delta;

if (ts->idle_active) {
+ int cpu = ts->cpu;
delta = ktime_sub(now, ts->idle_entrytime);
ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
- if (nr_iowait_cpu() > 0)
+ if (nr_iowait_cpu(cpu) > 0)
ts->iowait_sleeptime = ktime_add(ts->iowait_sleeptime, delta);
ts->idle_entrytime = now;
}
@@ -173,7 +176,7 @@ update_ts_time_stats(struct tick_sched *ts, ktime_t now, u64 *last_update_time)

static void tick_nohz_stop_idle(int cpu, ktime_t now)
{
- struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
+ struct tick_sched *ts = tick_get_tick_sched(cpu);

update_ts_time_stats(ts, now, NULL);
ts->idle_active = 0;
@@ -211,7 +214,7 @@ static ktime_t tick_nohz_start_idle(struct tick_sched *ts)
*/
u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time)
{
- struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
+ struct tick_sched *ts = tick_get_tick_sched(cpu);

if (!tick_nohz_enabled)
return -1;
@@ -237,7 +240,7 @@ EXPORT_SYMBOL_GPL(get_cpu_idle_time_us);
*/
u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time)
{
- struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
+ struct tick_sched *ts = tick_get_tick_sched(cpu);

if (!tick_nohz_enabled)
return -1;
@@ -267,7 +270,7 @@ void tick_nohz_stop_sched_tick(int inidle)
local_irq_save(flags);

cpu = smp_processor_id();
- ts = &per_cpu(tick_cpu_sched, cpu);
+ ts = tick_get_tick_sched(cpu);

/*
* Call to tick_nohz_start_idle stops the last_update_time from being
@@ -508,7 +511,7 @@ static void tick_nohz_restart(struct tick_sched *ts, ktime_t now)
void tick_nohz_restart_sched_tick(void)
{
int cpu = smp_processor_id();
- struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
+ struct tick_sched *ts = tick_get_tick_sched(cpu);
#ifndef CONFIG_VIRT_CPU_ACCOUNTING
unsigned long ticks;
#endif
@@ -671,7 +674,7 @@ static void tick_nohz_kick_tick(int cpu, ktime_t now)
#if 0
/* Switch back to 2.6.27 behaviour */

- struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
+ struct tick_sched *ts = tick_get_tick_sched(cpu);
ktime_t delta;

/*
@@ -688,7 +691,7 @@ static void tick_nohz_kick_tick(int cpu, ktime_t now)

static inline void tick_check_nohz(int cpu)
{
- struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
+ struct tick_sched *ts = tick_get_tick_sched(cpu);
ktime_t now;

if (!ts->idle_active && !ts->tick_stopped)
@@ -818,7 +821,7 @@ void tick_setup_sched_timer(void)
#if defined CONFIG_NO_HZ || defined CONFIG_HIGH_RES_TIMERS
void tick_cancel_sched_timer(int cpu)
{
- struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
+ struct tick_sched *ts = tick_get_tick_sched(cpu);

# ifdef CONFIG_HIGH_RES_TIMERS
if (ts->sched_timer.base)

2010-06-16 06:03:14

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH] avoid using smp_processor_id() in preemptible code (nr_iowait_cpu)

On Tue, 15 Jun 2010 19:13:03 +0300
Sergey Senozhatsky <[email protected]> wrote:

>
> I've changed struct tick_sched to match passed *ts and cpu. Also
> changed "&per_cpu(tick_cpu_sched, cpu)" call to "struct tick_sched
> *tick_get_tick_sched(int cpu)" which we already have.
>
> But I don't really like this part:
> struct tick_sched *tick_get_tick_sched(int cpu)
> {
> struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
> ts->cpu = cpu;
> ^^^^^^^^^^^^^
> return ts;
> }
>
> Please kindly review.


can we do this bit once, when the ts structure gets initialized?
it's not like the cpu value will ever change...


--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2010-06-16 09:31:10

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCH] avoid using smp_processor_id() in preemptible code (nr_iowait_cpu)

On (06/15/10 23:05), Arjan van de Ven wrote:
> can we do this bit once, when the ts structure gets initialized?
> it's not like the cpu value will ever change...
>
>
Hello Arjan,

Sure we can. The question is where is the "proper place"?
for_each_possible_cpu in __init?

or something like

static DEFINE_PER_CPU(struct tick_sched, tick_cpu_sched);
int cpu = get_cpu();
&per_cpu(tick_cpu_sched, cpu)->cpu = cpu;
put_cpu();


Sergey


Attachments:
(No filename) (436.00 B)
(No filename) (316.00 B)
Download all attachments

2010-06-17 06:26:17

by Sergey Senozhatsky

[permalink] [raw]
Subject: [PATCH] cpuidle: avoid using smp_processor_id() in preemptible code (nr_iowait_cpu) v4

Fix

BUG: using smp_processor_id() in preemptible [00000000] code: s2disk/3392
caller is nr_iowait_cpu+0xe/0x1e
Pid: 3392, comm: s2disk Not tainted 2.6.35-rc3-dbg-00106-ga75e02b #2
Call Trace:
[<c1184c55>] debug_smp_processor_id+0xa5/0xbc
[<c10282a5>] nr_iowait_cpu+0xe/0x1e
[<c104ab7c>] update_ts_time_stats+0x32/0x6c
[<c104ac73>] get_cpu_idle_time_us+0x36/0x58
[<c124229b>] get_cpu_idle_time+0x12/0x74
[<c1242963>] cpufreq_governor_dbs+0xc3/0x2dc
[<c1240437>] __cpufreq_governor+0x51/0x85
[<c1241190>] __cpufreq_set_policy+0x10c/0x13d
[<c12413d3>] cpufreq_add_dev_interface+0x212/0x233
[<c1241b1e>] ? handle_update+0x0/0xd
[<c1241a18>] cpufreq_add_dev+0x34b/0x35a
[<c103c973>] ? schedule_delayed_work_on+0x11/0x13
[<c12c14db>] cpufreq_cpu_callback+0x59/0x63
[<c1042f39>] notifier_call_chain+0x26/0x48
[<c1042f7d>] __raw_notifier_call_chain+0xe/0x10
[<c102efb9>] __cpu_notify+0x15/0x29
[<c102efda>] cpu_notify+0xd/0xf
[<c12bfb30>] _cpu_up+0xaf/0xd2
[<c12b3ad4>] enable_nonboot_cpus+0x3d/0x94
[<c1055eef>] hibernation_snapshot+0x104/0x1a2
[<c1058b49>] snapshot_ioctl+0x24b/0x53e
[<c1028ad1>] ? sub_preempt_count+0x7c/0x89
[<c10ab91d>] vfs_ioctl+0x2e/0x8c
[<c10588fe>] ? snapshot_ioctl+0x0/0x53e
[<c10ac2c7>] do_vfs_ioctl+0x42f/0x45a
[<c10a0ba5>] ? fsnotify_modify+0x4f/0x5a
[<c11e9dc3>] ? tty_write+0x0/0x1d0
[<c10a12d6>] ? vfs_write+0xa2/0xda
[<c10ac333>] sys_ioctl+0x41/0x62
[<c10027d3>] sysenter_do_call+0x12/0x2d

The initial fix was to use get_cpu/put_cpu in nr_iowait_cpu. However,
Arjan stated that "the bug is that it needs to be nr_iowait_cpu(int cpu)".

This patch introduces nr_iowait_cpu(int cpu) and changes to its callers.

Arjan also pointed out that we can't use get_cpu/put_cpu in update_ts_time_stats
since we "pick the current cpu, rather than the one denoted by ts" in that case.
To match given *ts and cpu denoted by *ts we use new field in the struct tick_sched: int cpu.

Signed-off-by: Sergey Senozhatsky <[email protected]>

---

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 52ff8aa..4871ed5 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -137,14 +137,17 @@ static inline int which_bucket(unsigned int duration)
{
int bucket = 0;

+ int cpu = get_cpu();
/*
* We keep two groups of stats; one with no
* IO pending, one without.
* This allows us to calculate
* E(duration)|iowait
*/
- if (nr_iowait_cpu())
+ if (nr_iowait_cpu(cpu))
bucket = BUCKETS/2;
+
+ put_cpu();

if (duration < 10)
return bucket;
@@ -169,14 +172,17 @@ static inline int which_bucket(unsigned int duration)
static inline int performance_multiplier(void)
{
int mult = 1;
-
+ int cpu = get_cpu();
+
/* for higher loadavg, we are more reluctant */

mult += 2 * get_loadavg();

/* for IO wait tasks (per cpu!) we add 5x each */
- mult += 10 * nr_iowait_cpu();
+ mult += 10 * nr_iowait_cpu(cpu);

+ put_cpu();
+
return mult;
}

diff --git a/include/linux/sched.h b/include/linux/sched.h
index f118809..747fcae 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -139,7 +139,7 @@ extern int nr_processes(void);
extern unsigned long nr_running(void);
extern unsigned long nr_uninterruptible(void);
extern unsigned long nr_iowait(void);
-extern unsigned long nr_iowait_cpu(void);
+extern unsigned long nr_iowait_cpu(int cpu);
extern unsigned long this_cpu_load(void);


diff --git a/include/linux/tick.h b/include/linux/tick.h
index b232ccc..db14691 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -51,6 +51,7 @@ struct tick_sched {
unsigned long check_clocks;
enum tick_nohz_mode nohz_mode;
ktime_t idle_tick;
+ int cpu;
int inidle;
int tick_stopped;
unsigned long idle_jiffies;
diff --git a/kernel/sched.c b/kernel/sched.c
index f8b8996..f61b48e 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2864,9 +2864,9 @@ unsigned long nr_iowait(void)
return sum;
}

-unsigned long nr_iowait_cpu(void)
+unsigned long nr_iowait_cpu(int cpu)
{
- struct rq *this = this_rq();
+ struct rq *this = cpu_rq(cpu);
return atomic_read(&this->nr_iowait);
}

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 1d7b9bc..1907037 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -38,6 +38,9 @@ static ktime_t last_jiffies_update;

struct tick_sched *tick_get_tick_sched(int cpu)
{
+ /*FIXME: Arjan van de Ven:
+ can we do this bit once, when the ts structure gets initialized?*/
+ per_cpu(tick_cpu_sched, cpu).cpu = cpu;
return &per_cpu(tick_cpu_sched, cpu);
}

@@ -137,7 +140,7 @@ __setup("nohz=", setup_tick_nohz);
static void tick_nohz_update_jiffies(ktime_t now)
{
int cpu = smp_processor_id();
- struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
+ struct tick_sched *ts = tick_get_tick_sched(cpu);
unsigned long flags;

cpumask_clear_cpu(cpu, nohz_cpu_mask);
@@ -161,7 +164,7 @@ update_ts_time_stats(struct tick_sched *ts, ktime_t now, u64 *last_update_time)
if (ts->idle_active) {
delta = ktime_sub(now, ts->idle_entrytime);
ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
- if (nr_iowait_cpu() > 0)
+ if (nr_iowait_cpu(ts->cpu) > 0)
ts->iowait_sleeptime = ktime_add(ts->iowait_sleeptime, delta);
ts->idle_entrytime = now;
}
@@ -173,7 +176,7 @@ update_ts_time_stats(struct tick_sched *ts, ktime_t now, u64 *last_update_time)

static void tick_nohz_stop_idle(int cpu, ktime_t now)
{
- struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
+ struct tick_sched *ts = tick_get_tick_sched(cpu);

update_ts_time_stats(ts, now, NULL);
ts->idle_active = 0;
@@ -211,7 +214,7 @@ static ktime_t tick_nohz_start_idle(struct tick_sched *ts)
*/
u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time)
{
- struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
+ struct tick_sched *ts = tick_get_tick_sched(cpu);

if (!tick_nohz_enabled)
return -1;
@@ -237,7 +240,7 @@ EXPORT_SYMBOL_GPL(get_cpu_idle_time_us);
*/
u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time)
{
- struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
+ struct tick_sched *ts = tick_get_tick_sched(cpu);

if (!tick_nohz_enabled)
return -1;
@@ -267,7 +270,7 @@ void tick_nohz_stop_sched_tick(int inidle)
local_irq_save(flags);

cpu = smp_processor_id();
- ts = &per_cpu(tick_cpu_sched, cpu);
+ ts = tick_get_tick_sched(cpu);

/*
* Call to tick_nohz_start_idle stops the last_update_time from being
@@ -508,7 +511,7 @@ static void tick_nohz_restart(struct tick_sched *ts, ktime_t now)
void tick_nohz_restart_sched_tick(void)
{
int cpu = smp_processor_id();
- struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
+ struct tick_sched *ts = tick_get_tick_sched(cpu);
#ifndef CONFIG_VIRT_CPU_ACCOUNTING
unsigned long ticks;
#endif
@@ -671,7 +674,7 @@ static void tick_nohz_kick_tick(int cpu, ktime_t now)
#if 0
/* Switch back to 2.6.27 behaviour */

- struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
+ struct tick_sched *ts = tick_get_tick_sched(cpu);
ktime_t delta;

/*
@@ -688,7 +691,7 @@ static void tick_nohz_kick_tick(int cpu, ktime_t now)

static inline void tick_check_nohz(int cpu)
{
- struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
+ struct tick_sched *ts = tick_get_tick_sched(cpu);
ktime_t now;

if (!ts->idle_active && !ts->tick_stopped)
@@ -818,7 +821,7 @@ void tick_setup_sched_timer(void)
#if defined CONFIG_NO_HZ || defined CONFIG_HIGH_RES_TIMERS
void tick_cancel_sched_timer(int cpu)
{
- struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
+ struct tick_sched *ts = tick_get_tick_sched(cpu);

# ifdef CONFIG_HIGH_RES_TIMERS
if (ts->sched_timer.base)

2010-06-17 06:43:29

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH] cpuidle: avoid using smp_processor_id() in preemptible code (nr_iowait_cpu) v4

On Thu, 17 Jun 2010 09:29:50 +0300
Sergey Senozhatsky <[email protected]> wrote:

> Fix
>
> BUG: using smp_processor_id() in preemptible [00000000] code:
> s2disk/3392 caller is nr_iowait_cpu+0xe/0x1e
> Pid: 3392, comm: s2disk Not tainted 2.6.35-rc3-dbg-00106-ga75e02b #2
> Call Trace:
> [<c1184c55>] debug_smp_processor_id+0xa5/0xbc
> [<c10282a5>] nr_iowait_cpu+0xe/0x1e
> [<c104ab7c>] update_ts_time_stats+0x32/0x6c
> [<c104ac73>] get_cpu_idle_time_us+0x36/0x58
> [<c124229b>] get_cpu_idle_time+0x12/0x74
> [<c1242963>] cpufreq_governor_dbs+0xc3/0x2dc
> [<c1240437>] __cpufreq_governor+0x51/0x85
> [<c1241190>] __cpufreq_set_policy+0x10c/0x13d
> [<c12413d3>] cpufreq_add_dev_interface+0x212/0x233
> [<c1241b1e>] ? handle_update+0x0/0xd
> [<c1241a18>] cpufreq_add_dev+0x34b/0x35a
> [<c103c973>] ? schedule_delayed_work_on+0x11/0x13
> [<c12c14db>] cpufreq_cpu_callback+0x59/0x63
> [<c1042f39>] notifier_call_chain+0x26/0x48
> [<c1042f7d>] __raw_notifier_call_chain+0xe/0x10
> [<c102efb9>] __cpu_notify+0x15/0x29
> [<c102efda>] cpu_notify+0xd/0xf
> [<c12bfb30>] _cpu_up+0xaf/0xd2
> [<c12b3ad4>] enable_nonboot_cpus+0x3d/0x94
> [<c1055eef>] hibernation_snapshot+0x104/0x1a2
> [<c1058b49>] snapshot_ioctl+0x24b/0x53e
> [<c1028ad1>] ? sub_preempt_count+0x7c/0x89
> [<c10ab91d>] vfs_ioctl+0x2e/0x8c
> [<c10588fe>] ? snapshot_ioctl+0x0/0x53e
> [<c10ac2c7>] do_vfs_ioctl+0x42f/0x45a
> [<c10a0ba5>] ? fsnotify_modify+0x4f/0x5a
> [<c11e9dc3>] ? tty_write+0x0/0x1d0
> [<c10a12d6>] ? vfs_write+0xa2/0xda
> [<c10ac333>] sys_ioctl+0x41/0x62
> [<c10027d3>] sysenter_do_call+0x12/0x2d
>
> The initial fix was to use get_cpu/put_cpu in nr_iowait_cpu. However,
> Arjan stated that "the bug is that it needs to be nr_iowait_cpu(int
> cpu)".
>
> This patch introduces nr_iowait_cpu(int cpu) and changes to its
> callers.
>
> Arjan also pointed out that we can't use get_cpu/put_cpu in
> update_ts_time_stats since we "pick the current cpu, rather than the
> one denoted by ts" in that case. To match given *ts and cpu denoted
> by *ts we use new field in the struct tick_sched: int cpu.
>
> Signed-off-by: Sergey Senozhatsky <[email protected]>

Acked-by: Arjan van de Ven <[email protected]>


--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

2010-06-17 06:59:25

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] cpuidle: avoid using smp_processor_id() in preemptible code (nr_iowait_cpu) v4

On Thu, 17 Jun 2010 09:29:50 +0300 Sergey Senozhatsky <[email protected]> wrote:

> Fix
>
> BUG: using smp_processor_id() in preemptible [00000000] code: s2disk/3392
> caller is nr_iowait_cpu+0xe/0x1e
> Pid: 3392, comm: s2disk Not tainted 2.6.35-rc3-dbg-00106-ga75e02b #2
> Call Trace:
> [<c1184c55>] debug_smp_processor_id+0xa5/0xbc
> [<c10282a5>] nr_iowait_cpu+0xe/0x1e
> [<c104ab7c>] update_ts_time_stats+0x32/0x6c
> [<c104ac73>] get_cpu_idle_time_us+0x36/0x58
> [<c124229b>] get_cpu_idle_time+0x12/0x74
> [<c1242963>] cpufreq_governor_dbs+0xc3/0x2dc
> [<c1240437>] __cpufreq_governor+0x51/0x85
> [<c1241190>] __cpufreq_set_policy+0x10c/0x13d
> [<c12413d3>] cpufreq_add_dev_interface+0x212/0x233
> [<c1241b1e>] ? handle_update+0x0/0xd
> [<c1241a18>] cpufreq_add_dev+0x34b/0x35a
> [<c103c973>] ? schedule_delayed_work_on+0x11/0x13
> [<c12c14db>] cpufreq_cpu_callback+0x59/0x63
> [<c1042f39>] notifier_call_chain+0x26/0x48
> [<c1042f7d>] __raw_notifier_call_chain+0xe/0x10
> [<c102efb9>] __cpu_notify+0x15/0x29
> [<c102efda>] cpu_notify+0xd/0xf
> [<c12bfb30>] _cpu_up+0xaf/0xd2
> [<c12b3ad4>] enable_nonboot_cpus+0x3d/0x94
> [<c1055eef>] hibernation_snapshot+0x104/0x1a2
> [<c1058b49>] snapshot_ioctl+0x24b/0x53e
> [<c1028ad1>] ? sub_preempt_count+0x7c/0x89
> [<c10ab91d>] vfs_ioctl+0x2e/0x8c
> [<c10588fe>] ? snapshot_ioctl+0x0/0x53e
> [<c10ac2c7>] do_vfs_ioctl+0x42f/0x45a
> [<c10a0ba5>] ? fsnotify_modify+0x4f/0x5a
> [<c11e9dc3>] ? tty_write+0x0/0x1d0
> [<c10a12d6>] ? vfs_write+0xa2/0xda
> [<c10ac333>] sys_ioctl+0x41/0x62
> [<c10027d3>] sysenter_do_call+0x12/0x2d
>
> The initial fix was to use get_cpu/put_cpu in nr_iowait_cpu. However,
> Arjan stated that "the bug is that it needs to be nr_iowait_cpu(int cpu)".
>
> This patch introduces nr_iowait_cpu(int cpu) and changes to its callers.
>
> Arjan also pointed out that we can't use get_cpu/put_cpu in update_ts_time_stats
> since we "pick the current cpu, rather than the one denoted by ts" in that case.
> To match given *ts and cpu denoted by *ts we use new field in the struct tick_sched: int cpu.
>
>
> ...
>
> struct tick_sched *tick_get_tick_sched(int cpu)
> {
> + /*FIXME: Arjan van de Ven:
> + can we do this bit once, when the ts structure gets initialized?*/
> + per_cpu(tick_cpu_sched, cpu).cpu = cpu;
> return &per_cpu(tick_cpu_sched, cpu);
> }

That's just weird. And by doing a write it does require that this
cahcheline be probably-read and written back regularly, which is more
bus traffic.

It should be OK to initialise these guys with a for_each_possible_cpu()
loop in a new module_init() function in tick-sched.c - if someone runs
update_ts_time_stats() before the initcalls then conceivably the
`swapper' process's accounting will go a little bit wrong, but I doubt
it.

Still, it'd be better to do it earlier, I guess. tick_init() is called
super-early and that would be a good place. tick_init() is presently a
no-op if !CONFIG_GENERIC_CLOCKEVENTS, but all this code depends on
CONFIG_GENERIC_CLOCKEVENTS anwyay.

So how does this look? If "OK" then would you be able to test it please?


[ Sigh. The field tick_sched.cpu shouldn't even exist on
uniprocessor builds. Ifdeffing it away is trivial and a bit messy,
but it's still only a partial solution. Passing the `cpu' argument
to nr_iowait_cpu() will generate additional code, and it's unneeded
on uniprocessor builds.]


include/linux/tick.h | 1 +
kernel/time/tick-common.c | 1 +
kernel/time/tick-sched.c | 11 ++++++++---
3 files changed, 10 insertions(+), 3 deletions(-)

diff -puN include/linux/tick.h~cpuidle-avoid-using-smp_processor_id-in-preemptible-code-nr_iowait_cpu-v4-fix include/linux/tick.h
--- a/include/linux/tick.h~cpuidle-avoid-using-smp_processor_id-in-preemptible-code-nr_iowait_cpu-v4-fix
+++ a/include/linux/tick.h
@@ -71,6 +71,7 @@ struct tick_sched {
};

extern void __init tick_init(void);
+extern void __init tick_sched_init(void);
extern int tick_is_oneshot_available(void);
extern struct tick_device *tick_get_device(int cpu);

diff -puN kernel/time/tick-sched.c~cpuidle-avoid-using-smp_processor_id-in-preemptible-code-nr_iowait_cpu-v4-fix kernel/time/tick-sched.c
--- a/kernel/time/tick-sched.c~cpuidle-avoid-using-smp_processor_id-in-preemptible-code-nr_iowait_cpu-v4-fix
+++ a/kernel/time/tick-sched.c
@@ -38,9 +38,6 @@ static ktime_t last_jiffies_update;

struct tick_sched *tick_get_tick_sched(int cpu)
{
- /*FIXME: Arjan van de Ven:
- can we do this bit once, when the ts structure gets initialized?*/
- per_cpu(tick_cpu_sched, cpu).cpu = cpu;
return &per_cpu(tick_cpu_sched, cpu);
}

@@ -880,3 +877,11 @@ int tick_check_oneshot_change(int allow_
tick_nohz_switch_to_nohz();
return 0;
}
+
+void __init tick_sched_init(void)
+{
+ int cpu;
+
+ for_each_possible_cpu(cpu)
+ per_cpu(tick_cpu_sched, cpu).cpu = cpu;
+}
diff -puN kernel/time/tick-common.c~cpuidle-avoid-using-smp_processor_id-in-preemptible-code-nr_iowait_cpu-v4-fix kernel/time/tick-common.c
--- a/kernel/time/tick-common.c~cpuidle-avoid-using-smp_processor_id-in-preemptible-code-nr_iowait_cpu-v4-fix
+++ a/kernel/time/tick-common.c
@@ -413,4 +413,5 @@ static struct notifier_block tick_notifi
void __init tick_init(void)
{
clockevents_register_notifier(&tick_notifier);
+ tick_sched_init();
}
_

2010-06-17 07:04:41

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] cpuidle: avoid using smp_processor_id() in preemptible code (nr_iowait_cpu) v4

On Wed, 16 Jun 2010 23:59:07 -0700 Andrew Morton <[email protected]> wrote:

> So how does this look? If "OK" then would you be able to test it please?

I saw it first!


fix !CONFIG_TICK_ONESHOT

--- a/include/linux/tick.h~cpuidle-avoid-using-smp_processor_id-in-preemptible-code-nr_iowait_cpu-v4-fix-fix
+++ a/include/linux/tick.h
@@ -71,7 +71,6 @@ struct tick_sched {
};

extern void __init tick_init(void);
-extern void __init tick_sched_init(void);
extern int tick_is_oneshot_available(void);
extern struct tick_device *tick_get_device(int cpu);

@@ -93,6 +92,9 @@ extern struct cpumask *tick_get_broadcas

# ifdef CONFIG_TICK_ONESHOT
extern struct cpumask *tick_get_broadcast_oneshot_mask(void);
+extern void __init tick_sched_init(void);
+# else
+static inline void tick_sched_init(void) { }
# endif

# endif /* BROADCAST */
_

2010-06-17 07:30:34

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCH] cpuidle: avoid using smp_processor_id() in preemptible code (nr_iowait_cpu) v4

On (06/16/10 23:59), Andrew Morton wrote:
> [..] if someone runs
> update_ts_time_stats() before the initcalls then conceivably the
> `swapper' process's accounting will go a little bit wrong, but I doubt
> it.
>

That was the sing that scared me - update_ts_time_stats call before init.
Having ".cpu = cpu" in tick_get_tick_sched guarantees correct .cpu and...
and it sucks.

> Still, it'd be better to do it earlier, I guess. tick_init() is called
> super-early and that would be a good place. tick_init() is presently a
> no-op if !CONFIG_GENERIC_CLOCKEVENTS, but all this code depends on
> CONFIG_GENERIC_CLOCKEVENTS anwyay.
>
> So how does this look? If "OK" then would you be able to test it please?
>
>
I'll test it in 2 hours. Thanks.


> [ Sigh. The field tick_sched.cpu shouldn't even exist on
> uniprocessor builds. Ifdeffing it away is trivial and a bit messy,
> but it's still only a partial solution. Passing the `cpu' argument
> to nr_iowait_cpu() will generate additional code, and it's unneeded
> on uniprocessor builds.]
>
>
You're right.


Sergey


Attachments:
(No filename) (1.06 kB)
(No filename) (316.00 B)
Download all attachments

2010-06-25 14:40:57

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] cpuidle: avoid using smp_processor_id() in preemptible code (nr_iowait_cpu) v4

On Thu, 2010-06-17 at 09:29 +0300, Sergey Senozhatsky wrote:
> Fix
>
> BUG: using smp_processor_id() in preemptible [00000000] code: s2disk/3392

> The initial fix was to use get_cpu/put_cpu in nr_iowait_cpu. However,
> Arjan stated that "the bug is that it needs to be nr_iowait_cpu(int cpu)".
>
> This patch introduces nr_iowait_cpu(int cpu) and changes to its callers.
>
> Arjan also pointed out that we can't use get_cpu/put_cpu in update_ts_time_stats
> since we "pick the current cpu, rather than the one denoted by ts" in that case.
> To match given *ts and cpu denoted by *ts we use new field in the struct tick_sched: int cpu.


> diff --git a/include/linux/tick.h b/include/linux/tick.h
> index b232ccc..db14691 100644
> --- a/include/linux/tick.h
> +++ b/include/linux/tick.h
> @@ -51,6 +51,7 @@ struct tick_sched {
> unsigned long check_clocks;
> enum tick_nohz_mode nohz_mode;
> ktime_t idle_tick;
> + int cpu;
> int inidle;
> int tick_stopped;
> unsigned long idle_jiffies;

> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 1d7b9bc..1907037 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -38,6 +38,9 @@ static ktime_t last_jiffies_update;
>
> struct tick_sched *tick_get_tick_sched(int cpu)
> {
> + /*FIXME: Arjan van de Ven:
> + can we do this bit once, when the ts structure gets initialized?*/
> + per_cpu(tick_cpu_sched, cpu).cpu = cpu;
> return &per_cpu(tick_cpu_sched, cpu);
> }

> @@ -161,7 +164,7 @@ update_ts_time_stats(struct tick_sched *ts, ktime_t now, u64 *last_update_time)
> if (ts->idle_active) {
> delta = ktime_sub(now, ts->idle_entrytime);
> ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
> - if (nr_iowait_cpu() > 0)
> + if (nr_iowait_cpu(ts->cpu) > 0)
> ts->iowait_sleeptime = ktime_add(ts->iowait_sleeptime, delta);
> ts->idle_entrytime = now;
> }


This all seems extremely silly, why not something like:

---
kernel/time/tick-sched.c | 16 ++++++++--------
1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 5f171f0..1363d3a 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -154,14 +154,14 @@ static void tick_nohz_update_jiffies(ktime_t now)
* Updates the per cpu time idle statistics counters
*/
static void
-update_ts_time_stats(struct tick_sched *ts, ktime_t now, u64 *last_update_time)
+update_ts_time_stats(int cpu, struct tick_sched *ts, ktime_t now, u64 *last_update_time)
{
ktime_t delta;

if (ts->idle_active) {
delta = ktime_sub(now, ts->idle_entrytime);
ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
- if (nr_iowait_cpu() > 0)
+ if (nr_iowait_cpu(cpu) > 0)
ts->iowait_sleeptime = ktime_add(ts->iowait_sleeptime, delta);
ts->idle_entrytime = now;
}
@@ -175,19 +175,19 @@ static void tick_nohz_stop_idle(int cpu, ktime_t now)
{
struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);

- update_ts_time_stats(ts, now, NULL);
+ update_ts_time_stats(cpu, ts, now, NULL);
ts->idle_active = 0;

sched_clock_idle_wakeup_event(0);
}

-static ktime_t tick_nohz_start_idle(struct tick_sched *ts)
+static ktime_t tick_nohz_start_idle(int cpu, struct tick_sched *ts)
{
ktime_t now;

now = ktime_get();

- update_ts_time_stats(ts, now, NULL);
+ update_ts_time_stats(cpu, ts, now, NULL);

ts->idle_entrytime = now;
ts->idle_active = 1;
@@ -216,7 +216,7 @@ u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time)
if (!tick_nohz_enabled)
return -1;

- update_ts_time_stats(ts, ktime_get(), last_update_time);
+ update_ts_time_stats(cpu, ts, ktime_get(), last_update_time);

return ktime_to_us(ts->idle_sleeptime);
}
@@ -242,7 +242,7 @@ u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time)
if (!tick_nohz_enabled)
return -1;

- update_ts_time_stats(ts, ktime_get(), last_update_time);
+ update_ts_time_stats(cpu, ts, ktime_get(), last_update_time);

return ktime_to_us(ts->iowait_sleeptime);
}
@@ -284,7 +284,7 @@ void tick_nohz_stop_sched_tick(int inidle)
*/
ts->inidle = 1;

- now = tick_nohz_start_idle(ts);
+ now = tick_nohz_start_idle(cpu, ts);

/*
* If this cpu is offline and it is the one which updates

2010-06-30 19:59:21

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] cpuidle: avoid using smp_processor_id() in preemptible code (nr_iowait_cpu) v4

On Fri, 25 Jun 2010 16:39:33 +0200
Peter Zijlstra <[email protected]> wrote:

> On Thu, 2010-06-17 at 09:29 +0300, Sergey Senozhatsky wrote:
> > Fix
> >
> > BUG: using smp_processor_id() in preemptible [00000000] code: s2disk/3392
>
> > The initial fix was to use get_cpu/put_cpu in nr_iowait_cpu. However,
> > Arjan stated that "the bug is that it needs to be nr_iowait_cpu(int cpu)".
> >
> > This patch introduces nr_iowait_cpu(int cpu) and changes to its callers.
> >
> > Arjan also pointed out that we can't use get_cpu/put_cpu in update_ts_time_stats
> > since we "pick the current cpu, rather than the one denoted by ts" in that case.
> > To match given *ts and cpu denoted by *ts we use new field in the struct tick_sched: int cpu.
>
>
> > diff --git a/include/linux/tick.h b/include/linux/tick.h
> > index b232ccc..db14691 100644
> > --- a/include/linux/tick.h
> > +++ b/include/linux/tick.h
> > @@ -51,6 +51,7 @@ struct tick_sched {
> > unsigned long check_clocks;
> > enum tick_nohz_mode nohz_mode;
> > ktime_t idle_tick;
> > + int cpu;
> > int inidle;
> > int tick_stopped;
> > unsigned long idle_jiffies;
>
> > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> > index 1d7b9bc..1907037 100644
> > --- a/kernel/time/tick-sched.c
> > +++ b/kernel/time/tick-sched.c
> > @@ -38,6 +38,9 @@ static ktime_t last_jiffies_update;
> >
> > struct tick_sched *tick_get_tick_sched(int cpu)
> > {
> > + /*FIXME: Arjan van de Ven:
> > + can we do this bit once, when the ts structure gets initialized?*/
> > + per_cpu(tick_cpu_sched, cpu).cpu = cpu;
> > return &per_cpu(tick_cpu_sched, cpu);
> > }
>
> > @@ -161,7 +164,7 @@ update_ts_time_stats(struct tick_sched *ts, ktime_t now, u64 *last_update_time)
> > if (ts->idle_active) {
> > delta = ktime_sub(now, ts->idle_entrytime);
> > ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
> > - if (nr_iowait_cpu() > 0)
> > + if (nr_iowait_cpu(ts->cpu) > 0)
> > ts->iowait_sleeptime = ktime_add(ts->iowait_sleeptime, delta);
> > ts->idle_entrytime = now;
> > }
>
>
> This all seems extremely silly, why not something like:

Does it work?

c'mon guys, it's taking us weeks and weeks to fix one simple bug. It's
a regression! We should be in panic mode.

2010-07-01 06:13:04

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCH] cpuidle: avoid using smp_processor_id() in preemptible code (nr_iowait_cpu) v4

On (06/30/10 12:58), Andrew Morton wrote:
> Subject: Re: [PATCH] cpuidle: avoid using smp_processor_id() in
> preemptible
> code (nr_iowait_cpu) v4
> X-Mailer: Sylpheed 2.4.8 (GTK+ 2.12.9; x86_64-pc-linux-gnu)
>
> On Fri, 25 Jun 2010 16:39:33 +0200
> Peter Zijlstra <[email protected]> wrote:
>
> > On Thu, 2010-06-17 at 09:29 +0300, Sergey Senozhatsky wrote:
> > > Fix
> > >
> > > BUG: using smp_processor_id() in preemptible [00000000] code: s2disk/3392
> >
> > > The initial fix was to use get_cpu/put_cpu in nr_iowait_cpu. However,
> > > Arjan stated that "the bug is that it needs to be nr_iowait_cpu(int cpu)".
> > >
> > > This patch introduces nr_iowait_cpu(int cpu) and changes to its callers.
> > >
> > > Arjan also pointed out that we can't use get_cpu/put_cpu in update_ts_time_stats
> > > since we "pick the current cpu, rather than the one denoted by ts" in that case.
> > > To match given *ts and cpu denoted by *ts we use new field in the struct tick_sched: int cpu.
> >
> >
> > > diff --git a/include/linux/tick.h b/include/linux/tick.h
> > > index b232ccc..db14691 100644
> > > --- a/include/linux/tick.h
> > > +++ b/include/linux/tick.h
> > > @@ -51,6 +51,7 @@ struct tick_sched {
> > > unsigned long check_clocks;
> > > enum tick_nohz_mode nohz_mode;
> > > ktime_t idle_tick;
> > > + int cpu;
> > > int inidle;
> > > int tick_stopped;
> > > unsigned long idle_jiffies;
> >
> > > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> > > index 1d7b9bc..1907037 100644
> > > --- a/kernel/time/tick-sched.c
> > > +++ b/kernel/time/tick-sched.c
> > > @@ -38,6 +38,9 @@ static ktime_t last_jiffies_update;
> > >
> > > struct tick_sched *tick_get_tick_sched(int cpu)
> > > {
> > > + /*FIXME: Arjan van de Ven:
> > > + can we do this bit once, when the ts structure gets initialized?*/
> > > + per_cpu(tick_cpu_sched, cpu).cpu = cpu;
> > > return &per_cpu(tick_cpu_sched, cpu);
> > > }
> >
> > > @@ -161,7 +164,7 @@ update_ts_time_stats(struct tick_sched *ts, ktime_t now, u64 *last_update_time)
> > > if (ts->idle_active) {
> > > delta = ktime_sub(now, ts->idle_entrytime);
> > > ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
> > > - if (nr_iowait_cpu() > 0)
> > > + if (nr_iowait_cpu(ts->cpu) > 0)
> > > ts->iowait_sleeptime = ktime_add(ts->iowait_sleeptime, delta);
> > > ts->idle_entrytime = now;
> > > }
> >
> >
> > This all seems extremely silly, why not something like:
>
> Does it work?
>
> c'mon guys, it's taking us weeks and weeks to fix one simple bug. It's
> a regression! We should be in panic mode.
>

Hello,

Sergey Senozhatsky <[email protected]> wrote:
>> Well, there is something I'm missing. How can I match given *ts and
>> cpu in update_ts_time_stats (except for introducing
>> update_ts_time_stats(..., int cpu)) ?

Arjan van de Ven <[email protected]> wrote:
>that'd be option one
>option two is to add a "cpu" member to struct tick_sched.....

So, it's been discussed. I chose option #2 however and made a mistake.
Personally I prefer Peter's patch. I need some time to test it.


Sergey


Attachments:
(No filename) (3.20 kB)
(No filename) (316.00 B)
Download all attachments

2010-07-01 07:07:43

by Peter Zijlstra

[permalink] [raw]
Subject: [PATCH] sched: Cure nr_iowait_cpu() users

With 0224cf4c5e (sched: Intoduce get_cpu_iowait_time_us()) Arjan broke
things by not making sure preemption was indeed disabled by the callers
of nr_iowait_cpu() which took the iowait value of the current cpu.

This resulted in a heap of preempt warnings. Cure this by making
nr_iowait_cpu() take a cpu number and fix up the callers to pass in the
right number.

Signed-off-by: Peter Zijlstra <[email protected]>
---
Confirmed to work..

drivers/cpuidle/governors/menu.c | 4 ++--
include/linux/sched.h | 2 +-
kernel/sched.c | 4 ++--
kernel/time/tick-sched.c | 16 ++++++++--------
4 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 52ff8aa..1b12870 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -143,7 +143,7 @@ static inline int which_bucket(unsigned int duration)
* This allows us to calculate
* E(duration)|iowait
*/
- if (nr_iowait_cpu())
+ if (nr_iowait_cpu(smp_processor_id()))
bucket = BUCKETS/2;

if (duration < 10)
@@ -175,7 +175,7 @@ static inline int performance_multiplier(void)
mult += 2 * get_loadavg();

/* for IO wait tasks (per cpu!) we add 5x each */
- mult += 10 * nr_iowait_cpu();
+ mult += 10 * nr_iowait_cpu(smp_processor_id());

return mult;
}
diff --git a/include/linux/sched.h b/include/linux/sched.h
index a61c08c..1f25798 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -139,7 +139,7 @@ extern int nr_processes(void);
extern unsigned long nr_running(void);
extern unsigned long nr_uninterruptible(void);
extern unsigned long nr_iowait(void);
-extern unsigned long nr_iowait_cpu(void);
+extern unsigned long nr_iowait_cpu(int cpu);
extern unsigned long this_cpu_load(void);


diff --git a/kernel/sched.c b/kernel/sched.c
index 71e3dc8..d3c0262 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2946,9 +2946,9 @@ unsigned long nr_iowait(void)
return sum;
}

-unsigned long nr_iowait_cpu(void)
+unsigned long nr_iowait_cpu(int cpu)
{
- struct rq *this = this_rq();
+ struct rq *this = cpu_rq(cpu);
return atomic_read(&this->nr_iowait);
}

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index e0707ea..17525ca 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -154,14 +154,14 @@ static void tick_nohz_update_jiffies(ktime_t now)
* Updates the per cpu time idle statistics counters
*/
static void
-update_ts_time_stats(struct tick_sched *ts, ktime_t now, u64 *last_update_time)
+update_ts_time_stats(int cpu, struct tick_sched *ts, ktime_t now, u64 *last_update_time)
{
ktime_t delta;

if (ts->idle_active) {
delta = ktime_sub(now, ts->idle_entrytime);
ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
- if (nr_iowait_cpu() > 0)
+ if (nr_iowait_cpu(cpu) > 0)
ts->iowait_sleeptime = ktime_add(ts->iowait_sleeptime, delta);
ts->idle_entrytime = now;
}
@@ -175,19 +175,19 @@ static void tick_nohz_stop_idle(int cpu, ktime_t now)
{
struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);

- update_ts_time_stats(ts, now, NULL);
+ update_ts_time_stats(cpu, ts, now, NULL);
ts->idle_active = 0;

sched_clock_idle_wakeup_event(0);
}

-static ktime_t tick_nohz_start_idle(struct tick_sched *ts)
+static ktime_t tick_nohz_start_idle(int cpu, struct tick_sched *ts)
{
ktime_t now;

now = ktime_get();

- update_ts_time_stats(ts, now, NULL);
+ update_ts_time_stats(cpu, ts, now, NULL);

ts->idle_entrytime = now;
ts->idle_active = 1;
@@ -216,7 +216,7 @@ u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time)
if (!tick_nohz_enabled)
return -1;

- update_ts_time_stats(ts, ktime_get(), last_update_time);
+ update_ts_time_stats(cpu, ts, ktime_get(), last_update_time);

return ktime_to_us(ts->idle_sleeptime);
}
@@ -242,7 +242,7 @@ u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time)
if (!tick_nohz_enabled)
return -1;

- update_ts_time_stats(ts, ktime_get(), last_update_time);
+ update_ts_time_stats(cpu, ts, ktime_get(), last_update_time);

return ktime_to_us(ts->iowait_sleeptime);
}
@@ -284,7 +284,7 @@ void tick_nohz_stop_sched_tick(int inidle)
*/
ts->inidle = 1;

- now = tick_nohz_start_idle(ts);
+ now = tick_nohz_start_idle(cpu, ts);

/*
* If this cpu is offline and it is the one which updates

2010-07-01 08:14:22

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCH] sched: Cure nr_iowait_cpu() users

Acked-by: Sergey Senozhatsky <[email protected]> (??)

Sergey


On (07/01/10 09:07), Peter Zijlstra wrote:
> Subject: [PATCH] sched: Cure nr_iowait_cpu() users
> X-Mailer: Evolution 2.28.3
>
> With 0224cf4c5e (sched: Intoduce get_cpu_iowait_time_us()) Arjan broke
> things by not making sure preemption was indeed disabled by the callers
> of nr_iowait_cpu() which took the iowait value of the current cpu.
>
> This resulted in a heap of preempt warnings. Cure this by making
> nr_iowait_cpu() take a cpu number and fix up the callers to pass in the
> right number.
>
> Signed-off-by: Peter Zijlstra <[email protected]>
> ---
> Confirmed to work..
>
> drivers/cpuidle/governors/menu.c | 4 ++--
> include/linux/sched.h | 2 +-
> kernel/sched.c | 4 ++--
> kernel/time/tick-sched.c | 16 ++++++++--------
> 4 files changed, 13 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index 52ff8aa..1b12870 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -143,7 +143,7 @@ static inline int which_bucket(unsigned int duration)
> * This allows us to calculate
> * E(duration)|iowait
> */
> - if (nr_iowait_cpu())
> + if (nr_iowait_cpu(smp_processor_id()))
> bucket = BUCKETS/2;
>
> if (duration < 10)
> @@ -175,7 +175,7 @@ static inline int performance_multiplier(void)
> mult += 2 * get_loadavg();
>
> /* for IO wait tasks (per cpu!) we add 5x each */
> - mult += 10 * nr_iowait_cpu();
> + mult += 10 * nr_iowait_cpu(smp_processor_id());
>
> return mult;
> }
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index a61c08c..1f25798 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -139,7 +139,7 @@ extern int nr_processes(void);
> extern unsigned long nr_running(void);
> extern unsigned long nr_uninterruptible(void);
> extern unsigned long nr_iowait(void);
> -extern unsigned long nr_iowait_cpu(void);
> +extern unsigned long nr_iowait_cpu(int cpu);
> extern unsigned long this_cpu_load(void);
>
>
> diff --git a/kernel/sched.c b/kernel/sched.c
> index 71e3dc8..d3c0262 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -2946,9 +2946,9 @@ unsigned long nr_iowait(void)
> return sum;
> }
>
> -unsigned long nr_iowait_cpu(void)
> +unsigned long nr_iowait_cpu(int cpu)
> {
> - struct rq *this = this_rq();
> + struct rq *this = cpu_rq(cpu);
> return atomic_read(&this->nr_iowait);
> }
>
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index e0707ea..17525ca 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -154,14 +154,14 @@ static void tick_nohz_update_jiffies(ktime_t now)
> * Updates the per cpu time idle statistics counters
> */
> static void
> -update_ts_time_stats(struct tick_sched *ts, ktime_t now, u64 *last_update_time)
> +update_ts_time_stats(int cpu, struct tick_sched *ts, ktime_t now, u64 *last_update_time)
> {
> ktime_t delta;
>
> if (ts->idle_active) {
> delta = ktime_sub(now, ts->idle_entrytime);
> ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
> - if (nr_iowait_cpu() > 0)
> + if (nr_iowait_cpu(cpu) > 0)
> ts->iowait_sleeptime = ktime_add(ts->iowait_sleeptime, delta);
> ts->idle_entrytime = now;
> }
> @@ -175,19 +175,19 @@ static void tick_nohz_stop_idle(int cpu, ktime_t now)
> {
> struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);
>
> - update_ts_time_stats(ts, now, NULL);
> + update_ts_time_stats(cpu, ts, now, NULL);
> ts->idle_active = 0;
>
> sched_clock_idle_wakeup_event(0);
> }
>
> -static ktime_t tick_nohz_start_idle(struct tick_sched *ts)
> +static ktime_t tick_nohz_start_idle(int cpu, struct tick_sched *ts)
> {
> ktime_t now;
>
> now = ktime_get();
>
> - update_ts_time_stats(ts, now, NULL);
> + update_ts_time_stats(cpu, ts, now, NULL);
>
> ts->idle_entrytime = now;
> ts->idle_active = 1;
> @@ -216,7 +216,7 @@ u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time)
> if (!tick_nohz_enabled)
> return -1;
>
> - update_ts_time_stats(ts, ktime_get(), last_update_time);
> + update_ts_time_stats(cpu, ts, ktime_get(), last_update_time);
>
> return ktime_to_us(ts->idle_sleeptime);
> }
> @@ -242,7 +242,7 @@ u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time)
> if (!tick_nohz_enabled)
> return -1;
>
> - update_ts_time_stats(ts, ktime_get(), last_update_time);
> + update_ts_time_stats(cpu, ts, ktime_get(), last_update_time);
>
> return ktime_to_us(ts->iowait_sleeptime);
> }
> @@ -284,7 +284,7 @@ void tick_nohz_stop_sched_tick(int inidle)
> */
> ts->inidle = 1;
>
> - now = tick_nohz_start_idle(ts);
> + now = tick_nohz_start_idle(cpu, ts);
>
> /*
> * If this cpu is offline and it is the one which updates
>


Attachments:
(No filename) (4.78 kB)
(No filename) (316.00 B)
Download all attachments

2010-07-01 08:47:18

by Peter Zijlstra

[permalink] [raw]
Subject: [tip:sched/urgent] sched: Cure nr_iowait_cpu() users

Commit-ID: 8c215bd3890c347dfb6a2db4779755f8b9c298a9
Gitweb: http://git.kernel.org/tip/8c215bd3890c347dfb6a2db4779755f8b9c298a9
Author: Peter Zijlstra <[email protected]>
AuthorDate: Thu, 1 Jul 2010 09:07:17 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Thu, 1 Jul 2010 09:39:48 +0200

sched: Cure nr_iowait_cpu() users

Commit 0224cf4c5e (sched: Intoduce get_cpu_iowait_time_us())
broke things by not making sure preemption was indeed disabled
by the callers of nr_iowait_cpu() which took the iowait value of
the current cpu.

This resulted in a heap of preempt warnings. Cure this by making
nr_iowait_cpu() take a cpu number and fix up the callers to pass
in the right number.

Signed-off-by: Peter Zijlstra <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Maxim Levitsky <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Pavel Machek <[email protected]>
Cc: Jiri Slaby <[email protected]>
Cc: [email protected]
LKML-Reference: <1277968037.1868.120.camel@laptop>
Signed-off-by: Ingo Molnar <[email protected]>
---
drivers/cpuidle/governors/menu.c | 4 ++--
include/linux/sched.h | 2 +-
kernel/sched.c | 4 ++--
kernel/time/tick-sched.c | 16 ++++++++--------
4 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 52ff8aa..1b12870 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -143,7 +143,7 @@ static inline int which_bucket(unsigned int duration)
* This allows us to calculate
* E(duration)|iowait
*/
- if (nr_iowait_cpu())
+ if (nr_iowait_cpu(smp_processor_id()))
bucket = BUCKETS/2;

if (duration < 10)
@@ -175,7 +175,7 @@ static inline int performance_multiplier(void)
mult += 2 * get_loadavg();

/* for IO wait tasks (per cpu!) we add 5x each */
- mult += 10 * nr_iowait_cpu();
+ mult += 10 * nr_iowait_cpu(smp_processor_id());

return mult;
}
diff --git a/include/linux/sched.h b/include/linux/sched.h
index f118809..747fcae 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -139,7 +139,7 @@ extern int nr_processes(void);
extern unsigned long nr_running(void);
extern unsigned long nr_uninterruptible(void);
extern unsigned long nr_iowait(void);
-extern unsigned long nr_iowait_cpu(void);
+extern unsigned long nr_iowait_cpu(int cpu);
extern unsigned long this_cpu_load(void);


diff --git a/kernel/sched.c b/kernel/sched.c
index a24d6d5..f87abe3 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2864,9 +2864,9 @@ unsigned long nr_iowait(void)
return sum;
}

-unsigned long nr_iowait_cpu(void)
+unsigned long nr_iowait_cpu(int cpu)
{
- struct rq *this = this_rq();
+ struct rq *this = cpu_rq(cpu);
return atomic_read(&this->nr_iowait);
}

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 1d7b9bc..1a6f828 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -154,14 +154,14 @@ static void tick_nohz_update_jiffies(ktime_t now)
* Updates the per cpu time idle statistics counters
*/
static void
-update_ts_time_stats(struct tick_sched *ts, ktime_t now, u64 *last_update_time)
+update_ts_time_stats(int cpu, struct tick_sched *ts, ktime_t now, u64 *last_update_time)
{
ktime_t delta;

if (ts->idle_active) {
delta = ktime_sub(now, ts->idle_entrytime);
ts->idle_sleeptime = ktime_add(ts->idle_sleeptime, delta);
- if (nr_iowait_cpu() > 0)
+ if (nr_iowait_cpu(cpu) > 0)
ts->iowait_sleeptime = ktime_add(ts->iowait_sleeptime, delta);
ts->idle_entrytime = now;
}
@@ -175,19 +175,19 @@ static void tick_nohz_stop_idle(int cpu, ktime_t now)
{
struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu);

- update_ts_time_stats(ts, now, NULL);
+ update_ts_time_stats(cpu, ts, now, NULL);
ts->idle_active = 0;

sched_clock_idle_wakeup_event(0);
}

-static ktime_t tick_nohz_start_idle(struct tick_sched *ts)
+static ktime_t tick_nohz_start_idle(int cpu, struct tick_sched *ts)
{
ktime_t now;

now = ktime_get();

- update_ts_time_stats(ts, now, NULL);
+ update_ts_time_stats(cpu, ts, now, NULL);

ts->idle_entrytime = now;
ts->idle_active = 1;
@@ -216,7 +216,7 @@ u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time)
if (!tick_nohz_enabled)
return -1;

- update_ts_time_stats(ts, ktime_get(), last_update_time);
+ update_ts_time_stats(cpu, ts, ktime_get(), last_update_time);

return ktime_to_us(ts->idle_sleeptime);
}
@@ -242,7 +242,7 @@ u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time)
if (!tick_nohz_enabled)
return -1;

- update_ts_time_stats(ts, ktime_get(), last_update_time);
+ update_ts_time_stats(cpu, ts, ktime_get(), last_update_time);

return ktime_to_us(ts->iowait_sleeptime);
}
@@ -284,7 +284,7 @@ void tick_nohz_stop_sched_tick(int inidle)
*/
ts->inidle = 1;

- now = tick_nohz_start_idle(ts);
+ now = tick_nohz_start_idle(cpu, ts);

/*
* If this cpu is offline and it is the one which updates