2009-10-03 05:56:49

by Len Brown

[permalink] [raw]
Subject: [git pull request] ACPI Processor Aggregator Driver for 2.6.32-rc1

Hi Linus,

please pull from:

git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6.git acpi-pad

This will add the ACPI Processor Aggregator Driver to the kernel.
acpi_pad implements a new ACPI feature where the baseboard management
controller can ask Linux to force busy processors to stay idle.

The BMC does this under dire electrical or thermal conditions
after it has already lowered the P-state to low frequencey mode
on all processors, yet the emergency persists. This offline
technique is used before employing even more invasive measures,
such as taking T-states down to 1/8th of LFM, or powering-off
the server completely.

This driver will have no effect on the installed base,
no no old systems implement this new feature. However, this
simle driver in Linux will be useful as this feature
is deployed by OEMs on new hardware.

As I mentioned previously, this is an ACPI patch, not a scheduler patch.
PeterZ is hoping to implement a more sophisticated method for forcing
idle time in the system with the scheduler, and when that is available,
I'll be delighted to update this driver to invoke it.

thanks!

--
Len Brown
Intel Open Source Technology Center


ps. individual patches are available on [email protected]
and a consolidated plain patch is available here:
http://ftp.kernel.org/pub/linux/kernel/people/lenb/acpi/patches/2.6.31/acpi-acpi-pad-20090521-2.6.31-rc4.diff.gz

MAINTAINERS | 8 +
drivers/acpi/Kconfig | 12 +
drivers/acpi/Makefile | 2 +
drivers/acpi/acpi_pad.c | 514 +++++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 536 insertions(+), 0 deletions(-)
create mode 100644 drivers/acpi/acpi_pad.c

through these commits:

Len Brown (1):
acpi_pad: build only on X86

Shaohua Li (1):
ACPI: create Processor Aggregator Device driver

with this log:

commit d91f79ebc3191b15dbe385925af4840f4e68df77
Author: Len Brown <[email protected]>
Date: Sun Sep 27 02:35:55 2009 -0400

acpi_pad: build only on X86

X86_FEATURE_MWAIT doesn't exist on ia64...

Signed-off-by: Len Brown <[email protected]>

commit 8e0af5141ab950b78b3ebbfaded5439dcf8b3a8d
Author: Shaohua Li <[email protected]>
Date: Mon Jul 27 18:11:02 2009 -0400

ACPI: create Processor Aggregator Device driver

ACPI 4.0 created the logical "processor aggregator device" as
a mechinism for platforms to ask the OS to force otherwise busy
processors to enter (power saving) idle.

The intent is to lower power consumption to ride-out
transient electrical and thermal emergencies,
rather than powering off the server.

On platforms that can save more power/performance via P-states,
the platform will first exhaust P-states before forcing idle.
However, the relative benefit of P-states vs. idle states
is platform dependent, and thus this driver need not know
or care about it.

This driver does not use the kernel's CPU hot-plug mechanism
because after the transient emergency is over, the system must
be returned to its normal state, and hotplug would permanently
break both cpusets and binding.

So to force idle, the driver creates a power saving thread.
The scheduler will migrate the thread to the preferred CPU.
The thread has max priority and has SCHED_RR policy,
so it can occupy one CPU. To save power, the thread will
invoke the deep C-state entry instructions.

To avoid starvation, the thread will sleep 5% of the time
time for every second (current RT scheduler has threshold
to avoid starvation, but if other CPUs are idle,
the CPU can borrow CPU timer from other,
which makes the mechanism not work here)

Vaidyanathan Srinivasan has proposed scheduler enhancements
to allow injecting idle time into the system. This driver doesn't
depend on those enhancements, but could cut over to them
when they are available.

Peter Z. does not favor upstreaming this driver until
the those scheduler enhancements are in place. However,
we favor upstreaming this driver now because it is useful
now, and can be enhanced over time.

Signed-off-by: Shaohua Li <[email protected]>
NACKed-by: Peter Zijlstra <[email protected]>
Cc: Vaidyanathan Srinivasan <[email protected]>
Signed-off-by: Len Brown <[email protected]>


2009-10-03 20:48:38

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [git pull request] ACPI Processor Aggregator Driver for 2.6.32-rc1

On Saturday 03 October 2009, Len Brown wrote:
> Hi Linus,
>
> please pull from:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6.git acpi-pad
>
> This will add the ACPI Processor Aggregator Driver to the kernel.
> acpi_pad implements a new ACPI feature where the baseboard management
> controller can ask Linux to force busy processors to stay idle.
>
> The BMC does this under dire electrical or thermal conditions
> after it has already lowered the P-state to low frequencey mode
> on all processors, yet the emergency persists. This offline
> technique is used before employing even more invasive measures,
> such as taking T-states down to 1/8th of LFM, or powering-off
> the server completely.
>
> This driver will have no effect on the installed base,
> no no old systems implement this new feature. However, this
> simle driver in Linux will be useful as this feature
> is deployed by OEMs on new hardware.
>
> As I mentioned previously, this is an ACPI patch, not a scheduler patch.
> PeterZ is hoping to implement a more sophisticated method for forcing
> idle time in the system with the scheduler, and when that is available,
> I'll be delighted to update this driver to invoke it.
>
> thanks!
>
> --
> Len Brown
> Intel Open Source Technology Center
>
>
> ps. individual patches are available on [email protected]
> and a consolidated plain patch is available here:
> http://ftp.kernel.org/pub/linux/kernel/people/lenb/acpi/patches/2.6.31/acpi-acpi-pad-20090521-2.6.31-rc4.diff.gz
>
> MAINTAINERS | 8 +
> drivers/acpi/Kconfig | 12 +
> drivers/acpi/Makefile | 2 +
> drivers/acpi/acpi_pad.c | 514 +++++++++++++++++++++++++++++++++++++++++++++++
> 4 files changed, 536 insertions(+), 0 deletions(-)
> create mode 100644 drivers/acpi/acpi_pad.c
>
> through these commits:
>
> Len Brown (1):
> acpi_pad: build only on X86
>
> Shaohua Li (1):
> ACPI: create Processor Aggregator Device driver
>
> with this log:
>
> commit d91f79ebc3191b15dbe385925af4840f4e68df77
> Author: Len Brown <[email protected]>
> Date: Sun Sep 27 02:35:55 2009 -0400
>
> acpi_pad: build only on X86
>
> X86_FEATURE_MWAIT doesn't exist on ia64...
>
> Signed-off-by: Len Brown <[email protected]>
>
> commit 8e0af5141ab950b78b3ebbfaded5439dcf8b3a8d
> Author: Shaohua Li <[email protected]>
> Date: Mon Jul 27 18:11:02 2009 -0400
>
> ACPI: create Processor Aggregator Device driver

Can you please point me to the mailing list thread there this driver was
posted (and possibly reviewed and discussed)?

Best,
Rafael

2009-10-05 03:33:42

by Balbir Singh

[permalink] [raw]
Subject: Re: [git pull request] ACPI Processor Aggregator Driver for 2.6.32-rc1

* Len Brown <[email protected]> [2009-10-03 01:56:32]:

> This driver does not use the kernel's CPU hot-plug mechanism
> because after the transient emergency is over, the system must
> be returned to its normal state, and hotplug would permanently
> break both cpusets and binding.
>

Why does hotplug break cpusets and binding?


> So to force idle, the driver creates a power saving thread.
> The scheduler will migrate the thread to the preferred CPU.
> The thread has max priority and has SCHED_RR policy,
> so it can occupy one CPU. To save power, the thread will
> invoke the deep C-state entry instructions.
>
> To avoid starvation, the thread will sleep 5% of the time
> time for every second (current RT scheduler has threshold
> to avoid starvation, but if other CPUs are idle,
> the CPU can borrow CPU timer from other,
> which makes the mechanism not work here)
>
> Vaidyanathan Srinivasan has proposed scheduler enhancements
> to allow injecting idle time into the system. This driver doesn't
> depend on those enhancements, but could cut over to them
> when they are available.
>
> Peter Z. does not favor upstreaming this driver until
> the those scheduler enhancements are in place. However,
> we favor upstreaming this driver now because it is useful
> now, and can be enhanced over time.
>
> Signed-off-by: Shaohua Li <[email protected]>
> NACKed-by: Peter Zijlstra <[email protected]>
> Cc: Vaidyanathan Srinivasan <[email protected]>
> Signed-off-by: Len Brown <[email protected]>

This is a first a patch with a NACKed-by, could we please have more
discussion on the proposed design.

--
Balbir

2009-10-05 05:34:00

by Vaidyanathan Srinivasan

[permalink] [raw]
Subject: Re: [git pull request] ACPI Processor Aggregator Driver for 2.6.32-rc1

* Balbir Singh <[email protected]> [2009-10-05 09:02:56]:

> * Len Brown <[email protected]> [2009-10-03 01:56:32]:
>
> > This driver does not use the kernel's CPU hot-plug mechanism
> > because after the transient emergency is over, the system must
> > be returned to its normal state, and hotplug would permanently
> > break both cpusets and binding.
> >
>
> Why does hotplug break cpusets and binding?

CPU hotplug will break any cpuset that is setup with that cpu.
Userspace needs to register for notifications and redo the cpusets to
get desired behaviour. Since these emergencies are short durations,
these may require constant re-creation of cpusets leading to an
administrative overhead and added complexity.

Hence Peter had suggested a transparent method to reduce system
capacity and not really knock out a single cpu. The goal is to run
something like 6 cpus at a time in an 8 cpu system without starving
any single cpu. Also real time jobs should still allowed to be run
since they will generally be very short running.

In this patch series, the main design issue was about running
a forced-idle thread that may affect scheduling fairness.

> > So to force idle, the driver creates a power saving thread.
> > The scheduler will migrate the thread to the preferred CPU.
> > The thread has max priority and has SCHED_RR policy,
> > so it can occupy one CPU. To save power, the thread will
> > invoke the deep C-state entry instructions.
> >
> > To avoid starvation, the thread will sleep 5% of the time
> > time for every second (current RT scheduler has threshold
> > to avoid starvation, but if other CPUs are idle,
> > the CPU can borrow CPU timer from other,
> > which makes the mechanism not work here)
> >
> > Vaidyanathan Srinivasan has proposed scheduler enhancements
> > to allow injecting idle time into the system. This driver doesn't
> > depend on those enhancements, but could cut over to them
> > when they are available.
> >
> > Peter Z. does not favor upstreaming this driver until
> > the those scheduler enhancements are in place. However,
> > we favor upstreaming this driver now because it is useful
> > now, and can be enhanced over time.
> >
> > Signed-off-by: Shaohua Li <[email protected]>
> > NACKed-by: Peter Zijlstra <[email protected]>
> > Cc: Vaidyanathan Srinivasan <[email protected]>
> > Signed-off-by: Len Brown <[email protected]>
>
> This is a first a patch with a NACKed-by, could we please have more
> discussion on the proposed design.

This is the most recent reference I have:
http://marc.info/?l=linux-acpi&m=124650086915649&w=2

--Vaidy

2009-10-05 07:16:23

by Balbir Singh

[permalink] [raw]
Subject: Re: [git pull request] ACPI Processor Aggregator Driver for 2.6.32-rc1

* Vaidy <[email protected]> [2009-10-05 11:03:10]:

> * Balbir Singh <[email protected]> [2009-10-05 09:02:56]:
>
> > * Len Brown <[email protected]> [2009-10-03 01:56:32]:
> >
> > > This driver does not use the kernel's CPU hot-plug mechanism
> > > because after the transient emergency is over, the system must
> > > be returned to its normal state, and hotplug would permanently
> > > break both cpusets and binding.
> > >
> >
> > Why does hotplug break cpusets and binding?
>
> CPU hotplug will break any cpuset that is setup with that cpu.
> Userspace needs to register for notifications and redo the cpusets to
> get desired behaviour. Since these emergencies are short durations,
> these may require constant re-creation of cpusets leading to an
> administrative overhead and added complexity.
>

My concern is that the alternative is even harsher, you can end up
with starvation, unless we explicitly migrate all tasks away from that
CPU (I need to see the patches again to see if they do that).

> Hence Peter had suggested a transparent method to reduce system
> capacity and not really knock out a single cpu. The goal is to run
> something like 6 cpus at a time in an 8 cpu system without starving
> any single cpu. Also real time jobs should still allowed to be run
> since they will generally be very short running.
>
> In this patch series, the main design issue was about running
> a forced-idle thread that may affect scheduling fairness.
>

Not to mention that they put the scheduler and available CPUs or the
notion of them (out of sync).

> > > So to force idle, the driver creates a power saving thread.
> > > The scheduler will migrate the thread to the preferred CPU.
> > > The thread has max priority and has SCHED_RR policy,
> > > so it can occupy one CPU. To save power, the thread will
> > > invoke the deep C-state entry instructions.
> > >
> > > To avoid starvation, the thread will sleep 5% of the time
> > > time for every second (current RT scheduler has threshold
> > > to avoid starvation, but if other CPUs are idle,
> > > the CPU can borrow CPU timer from other,
> > > which makes the mechanism not work here)
> > >
> > > Vaidyanathan Srinivasan has proposed scheduler enhancements
> > > to allow injecting idle time into the system. This driver doesn't
> > > depend on those enhancements, but could cut over to them
> > > when they are available.
> > >
> > > Peter Z. does not favor upstreaming this driver until
> > > the those scheduler enhancements are in place. However,
> > > we favor upstreaming this driver now because it is useful
> > > now, and can be enhanced over time.
> > >
> > > Signed-off-by: Shaohua Li <[email protected]>
> > > NACKed-by: Peter Zijlstra <[email protected]>
> > > Cc: Vaidyanathan Srinivasan <[email protected]>
> > > Signed-off-by: Len Brown <[email protected]>
> >
> > This is a first a patch with a NACKed-by, could we please have more
> > discussion on the proposed design.
>
> This is the most recent reference I have:
> http://marc.info/?l=linux-acpi&m=124650086915649&w=2
>

Thanks for the link.

--
Balbir

2009-10-05 19:58:55

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [git pull request] ACPI Processor Aggregator Driver for 2.6.32-rc1

On Monday 05 October 2009, Balbir Singh wrote:
> * Len Brown <[email protected]> [2009-10-03 01:56:32]:
>
> > This driver does not use the kernel's CPU hot-plug mechanism
> > because after the transient emergency is over, the system must
> > be returned to its normal state, and hotplug would permanently
> > break both cpusets and binding.
> >
>
> Why does hotplug break cpusets and binding?
>
>
> > So to force idle, the driver creates a power saving thread.
> > The scheduler will migrate the thread to the preferred CPU.
> > The thread has max priority and has SCHED_RR policy,
> > so it can occupy one CPU. To save power, the thread will
> > invoke the deep C-state entry instructions.
> >
> > To avoid starvation, the thread will sleep 5% of the time
> > time for every second (current RT scheduler has threshold
> > to avoid starvation, but if other CPUs are idle,
> > the CPU can borrow CPU timer from other,
> > which makes the mechanism not work here)
> >
> > Vaidyanathan Srinivasan has proposed scheduler enhancements
> > to allow injecting idle time into the system. This driver doesn't
> > depend on those enhancements, but could cut over to them
> > when they are available.
> >
> > Peter Z. does not favor upstreaming this driver until
> > the those scheduler enhancements are in place. However,
> > we favor upstreaming this driver now because it is useful
> > now, and can be enhanced over time.
> >
> > Signed-off-by: Shaohua Li <[email protected]>
> > NACKed-by: Peter Zijlstra <[email protected]>
> > Cc: Vaidyanathan Srinivasan <[email protected]>
> > Signed-off-by: Len Brown <[email protected]>
>
> This is a first a patch with a NACKed-by, could we please have more
> discussion on the proposed design.

This thing has already been merged, it appears:

commit 8e0af5141ab950b78b3ebbfaded5439dcf8b3a8d
Author: Shaohua Li <[email protected]>
Date: Mon Jul 27 18:11:02 2009 -0400

ACPI: create Processor Aggregator Device driver

and it looks like a total breakage of rules to me.

Thanks,
Rafael

2009-10-05 20:40:18

by Balbir Singh

[permalink] [raw]
Subject: Re: [git pull request] ACPI Processor Aggregator Driver for 2.6.32-rc1

* Rafael J. Wysocki <[email protected]> [2009-10-05 21:59:24]:

> On Monday 05 October 2009, Balbir Singh wrote:
> > * Len Brown <[email protected]> [2009-10-03 01:56:32]:
> >
> > > This driver does not use the kernel's CPU hot-plug mechanism
> > > because after the transient emergency is over, the system must
> > > be returned to its normal state, and hotplug would permanently
> > > break both cpusets and binding.
> > >
> >
> > Why does hotplug break cpusets and binding?
> >
> >
> > > So to force idle, the driver creates a power saving thread.
> > > The scheduler will migrate the thread to the preferred CPU.
> > > The thread has max priority and has SCHED_RR policy,
> > > so it can occupy one CPU. To save power, the thread will
> > > invoke the deep C-state entry instructions.
> > >
> > > To avoid starvation, the thread will sleep 5% of the time
> > > time for every second (current RT scheduler has threshold
> > > to avoid starvation, but if other CPUs are idle,
> > > the CPU can borrow CPU timer from other,
> > > which makes the mechanism not work here)
> > >
> > > Vaidyanathan Srinivasan has proposed scheduler enhancements
> > > to allow injecting idle time into the system. This driver doesn't
> > > depend on those enhancements, but could cut over to them
> > > when they are available.
> > >
> > > Peter Z. does not favor upstreaming this driver until
> > > the those scheduler enhancements are in place. However,
> > > we favor upstreaming this driver now because it is useful
> > > now, and can be enhanced over time.
> > >
> > > Signed-off-by: Shaohua Li <[email protected]>
> > > NACKed-by: Peter Zijlstra <[email protected]>
> > > Cc: Vaidyanathan Srinivasan <[email protected]>
> > > Signed-off-by: Len Brown <[email protected]>
> >
> > This is a first a patch with a NACKed-by, could we please have more
> > discussion on the proposed design.
>
> This thing has already been merged, it appears:
>

Yeah.. just saw it in my kernel compile today!

> commit 8e0af5141ab950b78b3ebbfaded5439dcf8b3a8d
> Author: Shaohua Li <[email protected]>
> Date: Mon Jul 27 18:11:02 2009 -0400
>
> ACPI: create Processor Aggregator Device driver
>
> and it looks like a total breakage of rules to me.
>
> Thanks,
> Rafael

--
Balbir

2009-10-05 20:58:47

by Linus Torvalds

[permalink] [raw]
Subject: Re: [git pull request] ACPI Processor Aggregator Driver for 2.6.32-rc1



On Mon, 5 Oct 2009, Rafael J. Wysocki wrote:
>
> This thing has already been merged, it appears: and it looks like a
> total breakage of rules to me.

Well, Len pointed out to me that the NAK is kind of pointless, since it
had no constructive alternatives to the issue. So he left it in as
documentation, but until the scheduler people can actually _do_ something
about the problem, their voice doesn't really matter, does it?

Linus

2009-10-05 21:06:05

by Andrew Morton

[permalink] [raw]
Subject: Re: [git pull request] ACPI Processor Aggregator Driver for 2.6.32-rc1

On Mon, 5 Oct 2009 21:59:24 +0200
"Rafael J. Wysocki" <[email protected]> wrote:

> > > Signed-off-by: Shaohua Li <[email protected]>
> > > NACKed-by: Peter Zijlstra <[email protected]>
> > > Cc: Vaidyanathan Srinivasan <[email protected]>
> > > Signed-off-by: Len Brown <[email protected]>
> >
> > This is a first a patch with a NACKed-by, could we please have more
> > discussion on the proposed design.
>
> This thing has already been merged, it appears:
>
> commit 8e0af5141ab950b78b3ebbfaded5439dcf8b3a8d
> Author: Shaohua Li <[email protected]>
> Date: Mon Jul 27 18:11:02 2009 -0400
>
> ACPI: create Processor Aggregator Device driver
>
> and it looks like a total breakage of rules to me.

To me also.

I'm not a great believer in "rules". They're only guidelines based on
past experience and an expectation that prior experience is a predictor
of future outcomes. Sometimes, in some cases that prediction breaks down,
so we should ignore the "rules". But this does not appear to be such a
case.

The driver was merged over Peter's strenuous and reasonable-sounding
objections. Partly because of:

On Fri, 26 Jun 2009 12:46:53 -0400 (EDT)
Len Brown <[email protected]> wrote:

> ...
> We'd like to ship the forced-idle thread as a self-contained driver,
> if possilbe. Because that would enable us to easily back-port it
> to some enterprise releases that want the feature. So if we can
> implement this such that it is functional with existing scheduler
> facilities, that would be get us by. If the scheduler evolves
> and provides a more optimal mechanism in the future, then that is
> great, as long as we don't have to wait for that to provide
> the basic version of the feature.

Problem is, that plan doesn't work. If in 2.6.33 we add the new
scheduler capabilities and then convert this driver to utilise them
then the enterprise releases don't really have a backported driver any
more. They own some special thing which was cherrypicked in a
mid-development stage from 2.6.32. And future enhancement or fixing of
that driver is not applicable to the enterprise kernel's version. So a
large part of the reason for preferring to use backported mainline
features will be invalidated.


All that being said, I don't see a lot of gain in reverting the driver
now. From the mainline kernel's POV we make the scheduler changes,
alter the driver and then proceed happily. The thus-stranded
enterprise kernel people are then somewhat screwed, but what can we do?
Apart from asking "please don't do this again".



Technical question: the overall feature, which I'd describe as
"shutting down CPUs when an external agent tells us the
thermal/electrical/other load is too high" is not at all specific to
the x86 CPU. Should the code have been designed in such a way as to
permit other architectures to play?

2009-10-05 21:51:07

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [git pull request] ACPI Processor Aggregator Driver for 2.6.32-rc1

On Monday 05 October 2009, Linus Torvalds wrote:
>
> On Mon, 5 Oct 2009, Rafael J. Wysocki wrote:
> >
> > This thing has already been merged, it appears: and it looks like a
> > total breakage of rules to me.
>
> Well, Len pointed out to me that the NAK is kind of pointless, since it
> had no constructive alternatives to the issue. So he left it in as
> documentation, but until the scheduler people can actually _do_ something
> about the problem, their voice doesn't really matter, does it?

Well, for a patch that was objected to so strongly, I think it didn't get
enough review from other relevant people before being pushed upstream.
It looks like Balbir didn't see it before for one example.

It's been lots of time since the patch was originally posted to send it to
the LKML for discussion and so on and to receive some comments that
might help to improve it. I have no idea why that wasn't done and I suspect
there was some corporate pressure on Len to push it upstream as quickly as
possible.

Best,
Rafael

2009-10-05 22:18:59

by Len Brown

[permalink] [raw]
Subject: Re: [git pull request] ACPI Processor Aggregator Driver for 2.6.32-rc1

On Mon, 5 Oct 2009, Rafael J. Wysocki wrote:

> On Monday 05 October 2009, Linus Torvalds wrote:
> >
> > On Mon, 5 Oct 2009, Rafael J. Wysocki wrote:
> > >
> > > This thing has already been merged, it appears: and it looks like a
> > > total breakage of rules to me.
> >
> > Well, Len pointed out to me that the NAK is kind of pointless, since it
> > had no constructive alternatives to the issue. So he left it in as
> > documentation, but until the scheduler people can actually _do_ something
> > about the problem, their voice doesn't really matter, does it?
>
> Well, for a patch that was objected to so strongly, I think it didn't get
> enough review from other relevant people before being pushed upstream.

Originally we proposed various "cute" scheduler tricks to solve this
problem. Peter objected strongly to all of them.

So the patch evolved into something quite simple that
doesn't touch the scheduler at all. However, Peter's
objection to the concept never went away.

> It looks like Balbir didn't see it before for one example.

Surprising, since Balbir is presumably in the same group
as Vaidy at IBM, who is well informed on this topic
and this thread.

> It's been lots of time since the patch was originally posted to send it to
> the LKML for discussion and so on and to receive some comments that
> might help to improve it.

There was no more actionable feedback on the patch.
The latest version was checked in in July, and has
been waiting for the 2.6.32 merge window ever since.

I don't claim that any code in Linux is either
perfect or permanent, but I think this driver is
useful and thus deserved to be upstream where it
can be useful to the widest group of customers.

I'll be happy to have it re-written three times
if Linux evolved to supply more sophisticated
features to handle this situation.

> I have no idea why that wasn't done and I suspect
> there was some corporate pressure on Len to push it upstream as quickly as
> possible.

The driver has simply been waiting for the Linux merge window.

cheers,
-Len

2009-10-05 22:21:42

by Len Brown

[permalink] [raw]
Subject: Re: [git pull request] ACPI Processor Aggregator Driver for 2.6.32-rc1

> Technical question: the overall feature, which I'd describe as
> "shutting down CPUs when an external agent tells us the
> thermal/electrical/other load is too high" is not at all specific to
> the x86 CPU. Should the code have been designed in such a way as to
> permit other architectures to play?

I agree with Peter and Vaidy that a generic capability in the Linux
scheduler would be wonderful. But we don't have that today.

Re: this driver in particular...
acpi_pad accepts an ACPI event from the platform and then
does something with it.

Today, ACPI runs on just x86 and ia64, and I'm not aware
of any plans to implement this particularly feature on ia64 platforms.

cheers,
Len Brown, Intel Open Source Technology Center

2009-10-05 22:42:22

by Andrew Morton

[permalink] [raw]
Subject: Re: [git pull request] ACPI Processor Aggregator Driver for 2.6.32-rc1

On Mon, 05 Oct 2009 18:20:44 -0400 (EDT)
Len Brown <[email protected]> wrote:

> > Technical question: the overall feature, which I'd describe as
> > "shutting down CPUs when an external agent tells us the
> > thermal/electrical/other load is too high" is not at all specific to
> > the x86 CPU. Should the code have been designed in such a way as to
> > permit other architectures to play?
>
> I agree with Peter and Vaidy that a generic capability in the Linux
> scheduler would be wonderful. But we don't have that today.
>
> Re: this driver in particular...
> acpi_pad accepts an ACPI event from the platform and then
> does something with it.
>
> Today, ACPI runs on just x86 and ia64, and I'm not aware
> of any plans to implement this particularly feature on ia64 platforms.

The sysfs handling looks generic and the driver can clearly be split
into upper and lower layers, with all the ACPI specificity in the lower
one.

One would need to poll the arch maintainers to find out whether that's
a desirable thing to attempt at the intial stage.

If it _is_ desirable then there's now a risk that the interfaces and
possibly behaviour will change in non-back-compatible ways. Or they
will be stuck with ugly back-compatibility things

But it's all too late now, isn't it. This is the first time that
non-linux-acpi readers knew of the existence of this driver.

2009-10-05 23:25:10

by Linus Torvalds

[permalink] [raw]
Subject: Re: [git pull request] ACPI Processor Aggregator Driver for 2.6.32-rc1



On Mon, 5 Oct 2009, Andrew Morton wrote:
>
> But it's all too late now, isn't it. This is the first time that
> non-linux-acpi readers knew of the existence of this driver.

Why do people even care? This driver is not going to be used in any
situation where any regular person will _ever_ care. And if you don't
like it, you don't have to enable it at all.

That driver is ACPI-specific, is not very complex, is marked EXPERIMENTAL,
and everybody including Len has said that _if_ the scheduler people can
ever solve it at that level, it would be good. But as it is, it has
NOTHING to do with the scheduler, and why _should_ any non-ACPI people
know about the existence of that driver?

In fact, the only reason the scheduler people even know about it is that
Len at first tried to do something more invasive, and was shot down. Now
it's just a driver, and the scheduler people can _try_ to do it some other
way if they really care, but that's _their_ problem. Not the driver.

In the meantime, I personally suspect we probably never want to even try
to solve it in the scheduler, because why the hell should we care and add
complex logic for something like that? At least not until we end up having
the same issue on some other architecture too, and it turns from a hacky
ACPI thing into something more.

As it is, the driver is entirely self-contained and doesn't affect any
other subsystem.

Linus

2009-10-06 01:29:20

by Len Brown

[permalink] [raw]
Subject: Re: [git pull request] ACPI Processor Aggregator Driver for 2.6.32-rc1

> ... we probably never want to even try
> to solve it in the scheduler, because why the hell should we care and add
> complex logic for something like that?

Today we take cores down to 100% idle, one at a time.
This is useful, but isn't the best we can do.

For we get zero "uncore" power savings.
As long as at least one core is active anywhere in the system,
all the uncores in all the packages in the system must remain active.

What a scheduler-based solution could do is
instead of taking, say, 1 of 64 cores down for 100%
of the period, it could take all 64 cores down
for 64th of the same period. This could get the hardware
into the deeper "package C-states", for a measurable
net power savings.

At the same time, this system-wide throttling may mitigate
some of the fairness/availability issues raised regarding
taking cores 100% off-line.

But doing this optimally will not be trivial.
The hardware must stay in the deep-sleep-states long enough
to make it worth the energy to enter and exit those states.
The hardware will flush the caches, having a performance
impact on all the cores. Device interrupts would prevent
the cores from sleeping, so they'd need to be somehow delayed
if we are to sleep long enough to make sleeping worth it etc.

cheers,
-Len Brown, Intel Open Source Technology Center

2009-10-06 09:16:50

by Balbir Singh

[permalink] [raw]
Subject: Re: [git pull request] ACPI Processor Aggregator Driver for 2.6.32-rc1

* Len Brown <[email protected]> [2009-10-05 21:28:18]:

> > ... we probably never want to even try
> > to solve it in the scheduler, because why the hell should we care and add
> > complex logic for something like that?
>
> Today we take cores down to 100% idle, one at a time.
> This is useful, but isn't the best we can do.
>
> For we get zero "uncore" power savings.
> As long as at least one core is active anywhere in the system,
> all the uncores in all the packages in the system must remain active.
>
> What a scheduler-based solution could do is
> instead of taking, say, 1 of 64 cores down for 100%
> of the period, it could take all 64 cores down
> for 64th of the same period. This could get the hardware
> into the deeper "package C-states", for a measurable
> net power savings.
>
> At the same time, this system-wide throttling may mitigate
> some of the fairness/availability issues raised regarding
> taking cores 100% off-line.
>
> But doing this optimally will not be trivial.
> The hardware must stay in the deep-sleep-states long enough
> to make it worth the energy to enter and exit those states.
> The hardware will flush the caches, having a performance
> impact on all the cores. Device interrupts would prevent
> the cores from sleeping, so they'd need to be somehow delayed
> if we are to sleep long enough to make sleeping worth it etc.
>

My concern is the side-effects of an approach like this. In the case
that was pointed out earlier about hotplug breaking CPUSets, this
solution can lead to starvation (single CPU in a cpuset). Most distros
will enable this feature right? So most end users will see this thread
running when ACPI 4.0 firmware decides to do thermal throttling.

--
Balbir