2009-04-13 15:14:54

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: mmotm 2009-04-10-02-21 uploaded - forkbombed by work_for_cpu

On Fri, 10 Apr 2009 02:22:23 PDT, [email protected] said:
> The mm-of-the-moment snapshot 2009-04-10-02-21 has been uploaded to
>
> http://userweb.kernel.org/~akpm/mmotm/

My system is reporting 70-90 forks per second, 'lastcomm' is reporting:

work_for_cpu F root ?? 0.00 secs Sat Apr 11 08:37
work_for_cpu F root ?? 0.00 secs Sat Apr 11 08:37
work_for_cpu F root ?? 0.00 secs Sat Apr 11 08:37
work_for_cpu F root ?? 0.00 secs Sat Apr 11 08:37
work_for_cpu F root ?? 0.00 secs Sat Apr 11 08:37
work_for_cpu F root ?? 0.00 secs Sat Apr 11 08:37
work_for_cpu F root ?? 0.00 secs Sat Apr 11 08:37
work_for_cpu F root ?? 0.00 secs Sat Apr 11 08:37
work_for_cpu F root ?? 0.00 secs Sat Apr 11 08:37

(and about 100K more of same).

Reverting this commit made the forks go away:

commit 6b44003e5ca66a3fffeb5bc90f40ada2c4340896
Author: Andrew Morton <[email protected]>
Date: Thu Apr 9 09:50:37 2009 -0600

work_on_cpu(): rewrite it to create a kernel thread on demand

No, I don't see why this generated a user-visible fork and accounting
record to be cut. Maybe that's a normal side effect of kthread_create()
that I've never noticed because kthreads rarely exit, especially not at
90/sec. I also don't know who's *calling* work_on_cpu() 90 times a
second, but I suspect it's this:

This is not terribly fast, but the only current caller of work_on_cpu() is
pci_call_probe().

Umm. No.

% find . -name '*.c' | xargs grep -l work_on_cpu 2> /dev/null
./arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
./arch/x86/kernel/microcode_core.c
./arch/x86/kernel/acpi/cstate.c
./arch/x86/kernel/apm_32.c
./kernel/workqueue.c
./drivers/acpi/processor_throttling.c
./drivers/pci/pci-driver.c

Probable cause for my problem:

arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c calls work_on_cpu(). We get into a
state where we have enough activity to kick us to a high CPU speed, and then
the activity of writing 90 acct records per sec keeps us there - with continual
callbacks to see if we can drop the CPU speed.


Attachments:
(No filename) (226.00 B)

2009-04-13 16:09:42

by Linus Torvalds

[permalink] [raw]
Subject: Re: mmotm 2009-04-10-02-21 uploaded - forkbombed by work_for_cpu



On Sat, 11 Apr 2009, [email protected] wrote:
>
> Probable cause for my problem:
>
> arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c calls work_on_cpu(). We get into a
> state where we have enough activity to kick us to a high CPU speed, and then
> the activity of writing 90 acct records per sec keeps us there - with continual
> callbacks to see if we can drop the CPU speed.

Ok, I think that that work_on_cpu() commit is broken, but I _also_ think
that cpufreq is doing something fairly insane.

This behavior seems to be triggered by the "ondemand" policy case, btw,
and it literally does basically:

dbs_check_cpu:
for_each_cpu(j, policy->cpus)
...
freq_avg = __cpufreq_driver_getavg(policy, j);

where "__cpufreq_driver_getavg()" will do "freq->getavg(policy, cpu)" and
then acpi-cpufreq.c will do that "work_on_cpu()" as part of the call to
"get_measured_perf()".

So pretty much _all_ use is going to always effectively do a broadcast
"work on each cpu" thing. That's always going to be pretty damn expensive.

And there's no _reason_. As far as I can tell, that ACPI cpufreq thing
doesn't _need_ any "process context". That "get_measured_perf()" will
just do a single read_measured_perf_ctrs() call, and all that does is two
'rdmsr()' calls.

So afaik, acpi-cpufreq.c should not use "work_on_cpu()" for that at all.
It should just do a smp_call_function_single().

So I do think Andrew's commit is broken and we should think about it a bit
more, but I also think that Valdis' problem comes from acpi-cpufreq just
being damn stupid. Doing a smp_call_function_single() to read two MSR's is
going to be a _lot_ more efficient than doing that crazy work_on_cpu() for
that.

So the _real_ problem came through the commits like

cpufreq: use work_on_cpu in acpi-cpufreq.c for drv_read and drv_write
cpumask: use work_on_cpu in acpi-cpufreq.c for read_measured_perf_ctrs

that were meant to reduce stack usage with big cpu masks. And sure, the
_old_ way of doing it was also stupid (it rescheduled the process to the
other CPU by using cpus_allowed()).

Mike, Ingo?

Linus

2009-04-13 16:35:03

by Mike Galbraith

[permalink] [raw]
Subject: Re: mmotm 2009-04-10-02-21 uploaded - forkbombed by work_for_cpu

On Mon, 2009-04-13 at 09:04 -0700, Linus Torvalds wrote:

> So afaik, acpi-cpufreq.c should not use "work_on_cpu()" for that at all.
> It should just do a smp_call_function_single().

Andrew already fixed it to use smp_call_function_single() because of
another regression. (mysql+oltp took 10% peak throughput hit here)

-Mike

2009-04-13 17:19:46

by Ingo Molnar

[permalink] [raw]
Subject: Re: mmotm 2009-04-10-02-21 uploaded - forkbombed by work_for_cpu


* Linus Torvalds <[email protected]> wrote:

> So I do think Andrew's commit is broken and we should think about
> it a bit more, but I also think that Valdis' problem comes from
> acpi-cpufreq just being damn stupid. Doing a
> smp_call_function_single() to read two MSR's is going to be a
> _lot_ more efficient than doing that crazy work_on_cpu() for that.
>
> So the _real_ problem came through the commits like
>
> cpufreq: use work_on_cpu in acpi-cpufreq.c for drv_read and drv_write
> cpumask: use work_on_cpu in acpi-cpufreq.c for read_measured_perf_ctrs
>
> that were meant to reduce stack usage with big cpu masks. And
> sure, the _old_ way of doing it was also stupid (it rescheduled
> the process to the other CPU by using cpus_allowed()).
>
> Mike, Ingo?

I think Andrew has a stack of fixes queued up, one of which should
solve this problem too - which Mike tested - as the commit from
Andrew has caused another regression as well.

There's no sha1 - the patch is in this thread on lkml:

sysbench(oltp)+mysql 10% regression with 2.6.30-rc1

| From: Andrew Morton <[email protected]>
|
| Atttempting to rid us of the problematic work_on_cpu(). Just use
| smp_call_fuction_single() here.
|
| This repairs a 10% sysbench(oltp)+mysql regression which Mike
| reported,

Ingo

2009-04-13 17:34:28

by Andrew Morton

[permalink] [raw]
Subject: Re: mmotm 2009-04-10-02-21 uploaded - forkbombed by work_for_cpu

On Mon, 13 Apr 2009 19:18:53 +0200 Ingo Molnar <[email protected]> wrote:

>
> * Linus Torvalds <[email protected]> wrote:
>
> > So I do think Andrew's commit is broken and we should think about
> > it a bit more, but I also think that Valdis' problem comes from
> > acpi-cpufreq just being damn stupid. Doing a
> > smp_call_function_single() to read two MSR's is going to be a
> > _lot_ more efficient than doing that crazy work_on_cpu() for that.
> >
> > So the _real_ problem came through the commits like
> >
> > cpufreq: use work_on_cpu in acpi-cpufreq.c for drv_read and drv_write
> > cpumask: use work_on_cpu in acpi-cpufreq.c for read_measured_perf_ctrs
> >
> > that were meant to reduce stack usage with big cpu masks. And
> > sure, the _old_ way of doing it was also stupid (it rescheduled
> > the process to the other CPU by using cpus_allowed()).
> >
> > Mike, Ingo?
>
> I think Andrew has a stack of fixes queued up, one of which should
> solve this problem too - which Mike tested - as the commit from
> Andrew has caused another regression as well.
>
> There's no sha1 - the patch is in this thread on lkml:
>
> sysbench(oltp)+mysql 10% regression with 2.6.30-rc1
>
> | From: Andrew Morton <[email protected]>
> |
> | Atttempting to rid us of the problematic work_on_cpu(). Just use
> | smp_call_fuction_single() here.
> |
> | This repairs a 10% sysbench(oltp)+mysql regression which Mike
> | reported,
>

Yup. It's presently in Len's hands. And Rusty's.

Vladis, perhaps you can verify?


From: Andrew Morton <[email protected]>

Atttempting to rid us of the problematic work_on_cpu(). Just use
smp_call_fuction_single() here.

This repairs a 10% sysbench(oltp)+mysql regression which Mike reported,
due to

commit 6b44003e5ca66a3fffeb5bc90f40ada2c4340896
Author: Andrew Morton <[email protected]>
Date: Thu Apr 9 09:50:37 2009 -0600

work_on_cpu(): rewrite it to create a kernel thread on demand

It seems that the kernel calls these acpi-cpufreq functions at a quite
high frequency.

Cc: Rusty Russell <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Venkatesh Pallipadi <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Zhao Yakui <[email protected]>
Cc: Dave Jones <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Tested-by: Mike Galbraith <[email protected]>
Cc: "Zhang, Yanmin" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---

arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c | 25 ++++++++-----------
1 file changed, 11 insertions(+), 14 deletions(-)

diff -puN arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c~arch-x86-kernel-cpu-cpufreq-acpi-cpufreqc-avoid-using-work_on_cpu arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
--- a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c~arch-x86-kernel-cpu-cpufreq-acpi-cpufreqc-avoid-using-work_on_cpu
+++ a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
@@ -153,7 +153,8 @@ struct drv_cmd {
u32 val;
};

-static long do_drv_read(void *_cmd)
+/* Called via smp_call_function_single(), on the target CPU */
+static void do_drv_read(void *_cmd)
{
struct drv_cmd *cmd = _cmd;
u32 h;
@@ -170,10 +171,10 @@ static long do_drv_read(void *_cmd)
default:
break;
}
- return 0;
}

-static long do_drv_write(void *_cmd)
+/* Called via smp_call_function_single(), on the target CPU */
+static void do_drv_write(void *_cmd)
{
struct drv_cmd *cmd = _cmd;
u32 lo, hi;
@@ -192,23 +193,21 @@ static long do_drv_write(void *_cmd)
default:
break;
}
- return 0;
}

static void drv_read(struct drv_cmd *cmd)
{
cmd->val = 0;

- work_on_cpu(cpumask_any(cmd->mask), do_drv_read, cmd);
+ smp_call_function_single(cpumask_any(cmd->mask), do_drv_read, cmd, 1);
}

static void drv_write(struct drv_cmd *cmd)
{
- unsigned int i;
+ unsigned int cpu;

- for_each_cpu(i, cmd->mask) {
- work_on_cpu(i, do_drv_write, cmd);
- }
+ for_each_cpu(cpu, cmd->mask)
+ smp_call_function_single(cpu, do_drv_write, cmd, 1);
}

static u32 get_cur_val(const struct cpumask *mask)
@@ -252,15 +251,13 @@ struct perf_pair {
} aperf, mperf;
};

-
-static long read_measured_perf_ctrs(void *_cur)
+/* Called via smp_call_function_single(), on the target CPU */
+static void read_measured_perf_ctrs(void *_cur)
{
struct perf_pair *cur = _cur;

rdmsr(MSR_IA32_APERF, cur->aperf.split.lo, cur->aperf.split.hi);
rdmsr(MSR_IA32_MPERF, cur->mperf.split.lo, cur->mperf.split.hi);
-
- return 0;
}

/*
@@ -283,7 +280,7 @@ static unsigned int get_measured_perf(st
unsigned int perf_percent;
unsigned int retval;

- if (!work_on_cpu(cpu, read_measured_perf_ctrs, &readin))
+ if (smp_call_function_single(cpu, read_measured_perf_ctrs, &cur, 1))
return 0;

cur.aperf.whole = readin.aperf.whole -
_

2009-04-13 17:42:44

by Ingo Molnar

[permalink] [raw]
Subject: Re: mmotm 2009-04-10-02-21 uploaded - forkbombed by work_for_cpu


* Andrew Morton <[email protected]> wrote:

> > I think Andrew has a stack of fixes queued up, one of which
> > should solve this problem too - which Mike tested - as the
> > commit from Andrew has caused another regression as well.
> >
> > There's no sha1 - the patch is in this thread on lkml:
> >
> > sysbench(oltp)+mysql 10% regression with 2.6.30-rc1
> >
> > | From: Andrew Morton <[email protected]>
> > |
> > | Atttempting to rid us of the problematic work_on_cpu(). Just use
> > | smp_call_fuction_single() here.
> > |
> > | This repairs a 10% sysbench(oltp)+mysql regression which Mike
> > | reported,
> >
>
> Yup. It's presently in Len's hands. And Rusty's.
>
> Vladis, perhaps you can verify?

I have not tested it but it looks good and the use of a much more
atomic primitive can only improve the situation - so unless there's
something suspicious about it i'd suggest for Linus to pick this up
from email, before -rc2.

Acked-by: Ingo Molnar <[email protected]>

Ingo

2009-04-13 17:52:41

by Linus Torvalds

[permalink] [raw]
Subject: Re: mmotm 2009-04-10-02-21 uploaded - forkbombed by work_for_cpu



On Mon, 13 Apr 2009, Andrew Morton wrote:
>
> static void drv_write(struct drv_cmd *cmd)
> {
> - unsigned int i;
> + unsigned int cpu;
>
> - for_each_cpu(i, cmd->mask) {
> - work_on_cpu(i, do_drv_write, cmd);
> - }
> + for_each_cpu(cpu, cmd->mask)
> + smp_call_function_single(cpu, do_drv_write, cmd, 1);

Ok, that's just -wrong-.

Doesn't anybody else see anything odd in doing

for_each_cpu(cpu, cmd->mask)
smp_call_function_single(cpu, ..);

and react to it?

IOW, why not just do

smp_call_function_many(cmd->mask, do_drv_write, cmd, 1);

here?

Linus

2009-04-13 18:13:18

by Ingo Molnar

[permalink] [raw]
Subject: Re: mmotm 2009-04-10-02-21 uploaded - forkbombed by work_for_cpu


* Linus Torvalds <[email protected]> wrote:

> On Mon, 13 Apr 2009, Andrew Morton wrote:
> >
> > static void drv_write(struct drv_cmd *cmd)
> > {
> > - unsigned int i;
> > + unsigned int cpu;
> >
> > - for_each_cpu(i, cmd->mask) {
> > - work_on_cpu(i, do_drv_write, cmd);
> > - }
> > + for_each_cpu(cpu, cmd->mask)
> > + smp_call_function_single(cpu, do_drv_write, cmd, 1);
>
> Ok, that's just -wrong-.
>
> Doesn't anybody else see anything odd in doing
>
> for_each_cpu(cpu, cmd->mask)
> smp_call_function_single(cpu, ..);
>
> and react to it?
>
> IOW, why not just do
>
> smp_call_function_many(cmd->mask, do_drv_write, cmd, 1);
>
> here?

Uhm, yes. The ancient version did:

for_each_cpu_mask(i, cmd->mask) {
set_cpus_allowed(current, cpumask_of_cpu(i));
do_drv_write(cmd);
}

and we just kept that pattern while somewhat mindlessly converting
it around, and never realized how it collapses to a nice masked API
version when the right IPI-call primitive is used.

Ingo

2009-04-13 18:55:43

by Andrew Morton

[permalink] [raw]
Subject: Re: mmotm 2009-04-10-02-21 uploaded - forkbombed by work_for_cpu

On Mon, 13 Apr 2009 10:45:45 -0700 (PDT)
Linus Torvalds <[email protected]> wrote:

>
>
> On Mon, 13 Apr 2009, Andrew Morton wrote:
> >
> > static void drv_write(struct drv_cmd *cmd)
> > {
> > - unsigned int i;
> > + unsigned int cpu;
> >
> > - for_each_cpu(i, cmd->mask) {
> > - work_on_cpu(i, do_drv_write, cmd);
> > - }
> > + for_each_cpu(cpu, cmd->mask)
> > + smp_call_function_single(cpu, do_drv_write, cmd, 1);
>
> Ok, that's just -wrong-.
>
> Doesn't anybody else see anything odd in doing
>
> for_each_cpu(cpu, cmd->mask)
> smp_call_function_single(cpu, ..);
>
> and react to it?
>
> IOW, why not just do
>
> smp_call_function_many(cmd->mask, do_drv_write, cmd, 1);
>
> here?
>

Didn't know it existed :(

2009-04-13 19:03:29

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: mmotm 2009-04-10-02-21 uploaded - forkbombed by work_for_cpu

Andrew Morton wrote:
> On Mon, 13 Apr 2009 10:45:45 -0700 (PDT)
> Linus Torvalds <[email protected]> wrote:
>
>
>> IOW, why not just do
>>
>> smp_call_function_many(cmd->mask, do_drv_write, cmd, 1);
>>
>> here?
>>
>>
>
> Didn't know it existed :(
>

It's relatively new compared to the rest of the smp_call_function_* family.

J

2009-04-13 19:06:33

by Dave Jones

[permalink] [raw]
Subject: Re: mmotm 2009-04-10-02-21 uploaded - forkbombed by work_for_cpu

On Mon, Apr 13, 2009 at 11:49:32AM -0700, Andrew Morton wrote:
> On Mon, 13 Apr 2009 10:45:45 -0700 (PDT)
> >
> > IOW, why not just do
> >
> > smp_call_function_many(cmd->mask, do_drv_write, cmd, 1);
> >
> > here?
> >
>
> Didn't know it existed :(

heh. I'm going to run with that excuse too.
Remember the days when you could fit the kernels API in your head? :)

Dave

2009-04-13 19:31:39

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: mmotm 2009-04-10-02-21 uploaded - forkbombed by work_for_cpu

On Mon, 13 Apr 2009 10:27:49 PDT, Andrew Morton said:

> From: Andrew Morton <[email protected]>
>
> Atttempting to rid us of the problematic work_on_cpu(). Just use
> smp_call_fuction_single() here.
>
> This repairs a 10% sysbench(oltp)+mysql regression which Mike reported,
> due to
>
> commit 6b44003e5ca66a3fffeb5bc90f40ada2c4340896
> Author: Andrew Morton <[email protected]>
> Date: Thu Apr 9 09:50:37 2009 -0600
>
> work_on_cpu(): rewrite it to create a kernel thread on demand
>
> It seems that the kernel calls these acpi-cpufreq functions at a quite
> high frequency.

I put the one reverted commit back on (making it back into what was stock
mmotm-0410 that had the problem) and then applied this patch. I'm seeing
near-zero forks/second again, the CPU is at 1Ghz when idling (as it should
be), if I kick off 2 'for(;;)' cycle-suckers it goes to 2Ghz, And it even
puts 1 CPU at 1Ghz and the other at 2Ghz if there's only one running - I didn't
know a Core2 Duo could do that. ;)

So life is good again. ;)


Attachments:
(No filename) (226.00 B)

2009-04-13 19:42:23

by Ingo Molnar

[permalink] [raw]
Subject: Re: mmotm 2009-04-10-02-21 uploaded - forkbombed by work_for_cpu


* Dave Jones <[email protected]> wrote:

> On Mon, Apr 13, 2009 at 11:49:32AM -0700, Andrew Morton wrote:
> > On Mon, 13 Apr 2009 10:45:45 -0700 (PDT)
> > >
> > > IOW, why not just do
> > >
> > > smp_call_function_many(cmd->mask, do_drv_write, cmd, 1);
> > >
> > > here?
> > >
> >
> > Didn't know it existed :(
>
> heh. I'm going to run with that excuse too.
> Remember the days when you could fit the kernels API in your head? :)

smp_call_function_many() is the modern mask-pointery version of
smp_call_function_mask() - which is a pretty ancient API.

Ingo

2009-04-13 23:57:58

by Linus Torvalds

[permalink] [raw]
Subject: Re: mmotm 2009-04-10-02-21 uploaded - forkbombed by work_for_cpu



So I applied this (commit 01599fca6758d2cd133e78f87426fc851c9ea725:
"cpufreq: use smp_call_function_[single|many]() in acpi-cpufreq.c"), but
just realized - because of a compiler warning - that this looks
suspicious:

On Mon, 13 Apr 2009, Andrew Morton wrote:
> @@ -283,7 +280,7 @@ static unsigned int get_measured_perf(st
> unsigned int perf_percent;
> unsigned int retval;
>
> - if (!work_on_cpu(cpu, read_measured_perf_ctrs, &readin))
> + if (smp_call_function_single(cpu, read_measured_perf_ctrs, &cur, 1))
> return 0;
>
> cur.aperf.whole = readin.aperf.whole -

How and why did that "read_measured_perf_ctrs, &readin" become
"read_measured_perf_ctrs, &cur" when the work_on_cpu() was converted to
"smp_call_function_single()"?

Looks like a bug. But such an odd one that I wonder whether there was some
thought behind it? Andrew?

Linus

2009-04-14 00:38:14

by Andrew Morton

[permalink] [raw]
Subject: Re: mmotm 2009-04-10-02-21 uploaded - forkbombed by work_for_cpu

On Mon, 13 Apr 2009 16:50:45 -0700 (PDT) Linus Torvalds <[email protected]> wrote:

>
>
> So I applied this (commit 01599fca6758d2cd133e78f87426fc851c9ea725:
> "cpufreq: use smp_call_function_[single|many]() in acpi-cpufreq.c"), but
> just realized - because of a compiler warning - that this looks
> suspicious:
>
> On Mon, 13 Apr 2009, Andrew Morton wrote:
> > @@ -283,7 +280,7 @@ static unsigned int get_measured_perf(st
> > unsigned int perf_percent;
> > unsigned int retval;
> >
> > - if (!work_on_cpu(cpu, read_measured_perf_ctrs, &readin))
> > + if (smp_call_function_single(cpu, read_measured_perf_ctrs, &cur, 1))
> > return 0;
> >
> > cur.aperf.whole = readin.aperf.whole -
>
> How and why did that "read_measured_perf_ctrs, &readin" become
> "read_measured_perf_ctrs, &cur" when the work_on_cpu() was converted to
> "smp_call_function_single()"?
>
> Looks like a bug. But such an odd one that I wonder whether there was some
> thought behind it? Andrew?
>

<scratches head>

OK, the acpi tree went and had conflicting changes merged into it after
I'd written the patch:

@@ -281,52 +279,57 @@ static long read_measured_perf_ctrs(void
static unsigned int get_measured_perf(struct cpufreq_policy *policy,
unsigned int cpu)
{
- struct perf_cur cur;
+ struct perf_pair readin, cur;
unsigned int perf_percent;
unsigned int retval;

- if (!work_on_cpu(cpu, read_measured_perf_ctrs, &cur))
+ if (!work_on_cpu(cpu, read_measured_perf_ctrs, &readin))
return 0;

+ cur.aperf.whole = readin.aperf.whole -
+ per_cpu(drv_data, cpu)->saved_aperf;
+ cur.mperf.whole = readin.mperf.whole -
+ per_cpu(drv_data, cpu)->saved_mperf;
+ per_cpu(drv_data, cpu)->saved_aperf = readin.aperf.whole;
+ per_cpu(drv_data, cpu)->saved_mperf = readin.mperf.whole;
+

and it appears that I incorrectly reverted part of
18b2646fe3babeb40b34a0c1751e0bf5adfdc64c while fixing the resulting
rejects.

Switching it to `readin' looks correct.

2009-04-14 12:42:57

by Rusty Russell

[permalink] [raw]
Subject: Re: mmotm 2009-04-10-02-21 uploaded - forkbombed by work_for_cpu

On Tue, 14 Apr 2009 01:34:23 am Linus Torvalds wrote:
> So the _real_ problem came through the commits like
>
> cpufreq: use work_on_cpu in acpi-cpufreq.c for drv_read and drv_write
> cpumask: use work_on_cpu in acpi-cpufreq.c for read_measured_perf_ctrs
>
> that were meant to reduce stack usage with big cpu masks. And sure, the
> _old_ way of doing it was also stupid (it rescheduled the process to the
> other CPU by using cpus_allowed()).

Reducing stack was main motivation, but old way was actually wrong: not only
can userspace see the affinity change, it can mess it up by setting it at the
same time.

It used to be reasonably quick, but forking a thread (to prevent locking
problems with keventd and yet avoid YA 1-thread-per-cpu) made it worse.

I'm no expert, but would life be more pleasant if the core cpufreq code
called these methods bound to an appropriate CPU? They all seem to do these
tricks...

Cheers,
Rusty.

2009-04-15 08:16:09

by Ali Gholami Rudi

[permalink] [raw]
Subject: Re: mmotm 2009-04-10-02-21 uploaded - forkbombed by work_for_cpu

Hi,

In today's tip (v2.6.30-rc2), when my cpu is idle (and the ondemand
governor correctly uses the lowest frequency) the temperature of my CPU
rises to above 50^C till the fan turns on (it used to be about 40^C
before). Git bisect points to this patch:

commit 01599fca6758d2cd133e78f87426fc851c9ea725
Author: Andrew Morton <[email protected]>
Date: Mon Apr 13 10:27:49 2009 -0700

cpufreq: use smp_call_function_[single|many]() in acpi-cpufreq.c

Atttempting to rid us of the problematic work_on_cpu(). Just use
smp_call_fuction_single() here.

This repairs a 10% sysbench(oltp)+mysql regression which Mike reported,
due to

commit 6b44003e5ca66a3fffeb5bc90f40ada2c4340896
Author: Andrew Morton <[email protected]>
Date: Thu Apr 9 09:50:37 2009 -0600

work_on_cpu(): rewrite it to create a kernel thread on demand

It seems that the kernel calls these acpi-cpufreq functions at a quite
high frequency.

Valdis Kletnieks also reports that this causes 70-90 forks per second on
his hardware.

Cc: [email protected]
Cc: Rusty Russell <[email protected]>
Cc: Venkatesh Pallipadi <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Zhao Yakui <[email protected]>
Acked-by: Dave Jones <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Tested-by: Mike Galbraith <[email protected]>
Cc: "Zhang, Yanmin" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
[ Made it use smp_call_function_many() instead of looping over cpu's
with smp_call_function_single() - Linus ]
Signed-off-by: Linus Torvalds <[email protected]>

Should I include more info?

Regards,
Ali

2009-04-15 08:42:21

by Andrew Morton

[permalink] [raw]
Subject: Re: mmotm 2009-04-10-02-21 uploaded - forkbombed by work_for_cpu

On Wed, 15 Apr 2009 12:45:34 +0430 Ali Gholami Rudi <[email protected]> wrote:

> Hi,
>
> In today's tip (v2.6.30-rc2), when my cpu is idle (and the ondemand
> governor correctly uses the lowest frequency) the temperature of my CPU
> rises to above 50^C till the fan turns on (it used to be about 40^C
> before). Git bisect points to this patch:
>
> commit 01599fca6758d2cd133e78f87426fc851c9ea725
> Author: Andrew Morton <[email protected]>
> Date: Mon Apr 13 10:27:49 2009 -0700
>
> cpufreq: use smp_call_function_[single|many]() in acpi-cpufreq.c
>
> Atttempting to rid us of the problematic work_on_cpu(). Just use
> smp_call_fuction_single() here.
>
> This repairs a 10% sysbench(oltp)+mysql regression which Mike reported,
> due to
>
> commit 6b44003e5ca66a3fffeb5bc90f40ada2c4340896
> Author: Andrew Morton <[email protected]>
> Date: Thu Apr 9 09:50:37 2009 -0600
>
> work_on_cpu(): rewrite it to create a kernel thread on demand
>
> It seems that the kernel calls these acpi-cpufreq functions at a quite
> high frequency.
>
> Valdis Kletnieks also reports that this causes 70-90 forks per second on
> his hardware.
>
> Cc: [email protected]
> Cc: Rusty Russell <[email protected]>
> Cc: Venkatesh Pallipadi <[email protected]>
> Cc: Len Brown <[email protected]>
> Cc: Zhao Yakui <[email protected]>
> Acked-by: Dave Jones <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Tested-by: Mike Galbraith <[email protected]>
> Cc: "Zhang, Yanmin" <[email protected]>
> Signed-off-by: Andrew Morton <[email protected]>
> Acked-by: Ingo Molnar <[email protected]>
> [ Made it use smp_call_function_many() instead of looping over cpu's
> with smp_call_function_single() - Linus ]

<stares suspiciously at smp_call_function_many()>

* smp_call_function_many(): Run a function on a set of other CPUs.

"other". It refuses to call the function on *this* CPU. Tricky.

Does this fix it up?

--- a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c~a
+++ a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
@@ -204,7 +204,10 @@ static void drv_read(struct drv_cmd *cmd

static void drv_write(struct drv_cmd *cmd)
{
- smp_call_function_many(cmd->mask, do_drv_write, cmd, 1);
+ unsigned cpu;
+
+ for_each_cpu(cpu, cmd->mask)
+ smp_call_function_single(cpu, do_drv_write, cmd, 1);
}

static u32 get_cur_val(const struct cpumask *mask)
_

2009-04-15 09:09:14

by Ali Gholami Rudi

[permalink] [raw]
Subject: Re: mmotm 2009-04-10-02-21 uploaded - forkbombed by work_for_cpu

Andrew Morton <[email protected]> wrote:
> > [ Made it use smp_call_function_many() instead of looping over cpu's
> > with smp_call_function_single() - Linus ]
>
> <stares suspiciously at smp_call_function_many()>
>
> * smp_call_function_many(): Run a function on a set of other CPUs.
>
> "other". It refuses to call the function on *this* CPU. Tricky.
>
> Does this fix it up?
>
> --- a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c~a
> +++ a/arch/x86/kernel/cpu/cpufreq/acpi-cpufreq.c
> @@ -204,7 +204,10 @@ static void drv_read(struct drv_cmd *cmd
>
> static void drv_write(struct drv_cmd *cmd)
> {
> - smp_call_function_many(cmd->mask, do_drv_write, cmd, 1);
> + unsigned cpu;
> +
> + for_each_cpu(cpu, cmd->mask)
> + smp_call_function_single(cpu, do_drv_write, cmd, 1);
> }
>
> static u32 get_cur_val(const struct cpumask *mask)
> _

Yes, it does fix it.

Thanks,
Ali

2009-04-15 14:52:57

by Linus Torvalds

[permalink] [raw]
Subject: Re: mmotm 2009-04-10-02-21 uploaded - forkbombed by work_for_cpu



On Wed, 15 Apr 2009, Andrew Morton wrote:
>
> <stares suspiciously at smp_call_function_many()>
>
> * smp_call_function_many(): Run a function on a set of other CPUs.
>
> "other". It refuses to call the function on *this* CPU. Tricky.

.. Argh. And totally different from all the other smp_call_function's. In
smp_call_function_single(), for example, we literally test

if (cpu == this_cpu) {
local_irq_save(flags);
func(info);
local_irq_restore(flags);
} else {
.. do the cross-call ..

so I think this is just smp_call_function_many() breakage.

In fact, right now the PPC flush_tlb_page() does that insane dance just
because of this issue. So yes, there are a few current users, and they
seem to dislike the bad semantics (the kvm code doesn't care).

Duh duh duh.

Linus