LinuxLists.cc - Updated throttling fix patchkit

2012-02-06 16:17:28

Subject: Updated throttling fix patchkit

Here's an updated patchkit to fix ACPI thermal zone throttling and some
related problems. See the individual patches for more details.

Changes to the earlier version:

- Address earlier review comments
- Fix package index computation
- Shut up Intel IPS driver

This is ready for merging now.

-Andi

2012-02-06 16:17:30

by Andi Kleen

[permalink] [raw]

Subject: [PATCH 2/5] ACPI: Make ACPI interrupt threaded

Some ACPI interrupt actions may need to wait, and it's easiest to
have a thread context for this. So turn the ACPI interrupt
into a threaded interrupt.

Signed-off-by: Andi Kleen <[email protected]>
---
drivers/acpi/osl.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 412a1e0..f3f6ff7 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -595,7 +595,8 @@ acpi_os_install_interrupt_handler(u32 gsi, acpi_osd_handler handler,

acpi_irq_handler = handler;
acpi_irq_context = context;
- if (request_irq(irq, acpi_irq, IRQF_SHARED, "acpi", acpi_irq)) {
+ if (request_threaded_irq(irq, NULL, acpi_irq, IRQF_SHARED, "acpi",
+ acpi_irq)) {
printk(KERN_ERR PREFIX "SCI (IRQ%d) allocation failed\n", irq);
acpi_irq_handler = NULL;
return AE_NOT_ACQUIRED;
--
1.7.7

2012-02-06 16:17:27

by Andi Kleen

[permalink] [raw]

Subject: [PATCH 1/5] ACPI: ec: Do request_region outside WARN()

WARN() is not supposed to have side effects, so move the request_regions
outside.

Signed-off-by: Andi Kleen <[email protected]>
---
drivers/acpi/ec.c | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
index b19a18d..3268dcf 100644
--- a/drivers/acpi/ec.c
+++ b/drivers/acpi/ec.c
@@ -812,10 +812,10 @@ static int acpi_ec_add(struct acpi_device *device)
first_ec = ec;
device->driver_data = ec;

- WARN(!request_region(ec->data_addr, 1, "EC data"),
- "Could not request EC data io port 0x%lx", ec->data_addr);
- WARN(!request_region(ec->command_addr, 1, "EC cmd"),
- "Could not request EC cmd io port 0x%lx", ec->command_addr);
+ ret = !!request_region(ec->data_addr, 1, "EC data");
+ WARN(!ret, "Could not request EC data io port 0x%lx", ec->data_addr);
+ ret = !!request_region(ec->command_addr, 1, "EC cmd");
+ WARN(!ret, "Could not request EC cmd io port 0x%lx", ec->command_addr);

pr_info(PREFIX "GPE = 0x%lx, I/O: command/status = 0x%lx, data = 0x%lx\n",
ec->gpe, ec->command_addr, ec->data_addr);
--
1.7.7

2012-02-06 16:18:03

by Andi Kleen

[permalink] [raw]

Subject: [PATCH 5/5] Disable MCP limit exceeded messages from Intel IPS driver

From: Andi Kleen <[email protected]>

On a system on the thermal limit these are quite noisy and flood the logs.
Better would be a counter anyways. But given that we don't even have
anything for normal throttling this doesn't seem to be urgent either.

Cc: [email protected]
Signed-off-by: Andi Kleen <[email protected]>
---
drivers/platform/x86/intel_ips.c | 13 ++-----------
1 files changed, 2 insertions(+), 11 deletions(-)

diff --git a/drivers/platform/x86/intel_ips.c b/drivers/platform/x86/intel_ips.c
index 809a3ae..84bba52 100644
--- a/drivers/platform/x86/intel_ips.c
+++ b/drivers/platform/x86/intel_ips.c
@@ -620,25 +620,16 @@ static bool mcp_exceeded(struct ips_driver *ips)
bool ret = false;
u32 temp_limit;
u32 avg_power;
- const char *msg = "MCP limit exceeded: ";

spin_lock_irqsave(&ips->turbo_status_lock, flags);

temp_limit = ips->mcp_temp_limit * 100;
- if (ips->mcp_avg_temp > temp_limit) {
- dev_info(&ips->dev->dev,
- "%sAvg temp %u, limit %u\n", msg, ips->mcp_avg_temp,
- temp_limit);
+ if (ips->mcp_avg_temp > temp_limit)
ret = true;
- }

avg_power = ips->cpu_avg_power + ips->mch_avg_power;
- if (avg_power > ips->mcp_power_limit) {
- dev_info(&ips->dev->dev,
- "%sAvg power %u, limit %u\n", msg, avg_power,
- ips->mcp_power_limit);
+ if (avg_power > ips->mcp_power_limit)
ret = true;
- }

spin_unlock_irqrestore(&ips->turbo_status_lock, flags);

--
1.7.7

2012-02-06 16:18:26

by Andi Kleen

[permalink] [raw]

Subject: [PATCH 4/5] ACPI: Do cpufreq clamping for throttling per package v2

On Intel CPUs the processor typically uses the highest frequency
set by any logical CPU. When the system overheats
Linux first forces the frequency to the lowest available one
to lower the temperature.

However this was done only per logical CPU, which means all
logical CPUs in a package would need to go through this before
the frequency is actually lowered.

Worse this delay actually prevents real throttling, because
the real throttle code only proceeds when the lowest frequency
is already reached.

So when a throttle event happens force the lowest frequency
for all CPUs in the package where it happened. The per CPU
state is now kept per package, not per logical CPU. An alternative
would be to do it per cpufreq unit, but since we want to bring
down the temperature of the complete chip it's better
to do it for all.

In principle it may even make sense to do it for all CPUs,
but I kept it on the package for now.

With this change the frequency is actually lowered, which
in terms also allows real throttling to proceed.

I also removed an unnecessary per cpu variable initialization.

v2: Fix package mapping
Signed-off-by: Andi Kleen <[email protected]>
---
drivers/acpi/processor_thermal.c | 45 +++++++++++++++++++++++++++++++------
1 files changed, 37 insertions(+), 8 deletions(-)

diff --git a/drivers/acpi/processor_thermal.c b/drivers/acpi/processor_thermal.c
index 3b599ab..b149b6e 100644
--- a/drivers/acpi/processor_thermal.c
+++ b/drivers/acpi/processor_thermal.c
@@ -57,6 +57,27 @@ ACPI_MODULE_NAME("processor_thermal");
static DEFINE_PER_CPU(unsigned int, cpufreq_thermal_reduction_pctg);
static unsigned int acpi_thermal_cpufreq_is_init = 0;

+#define reduction_pctg(cpu) \
+ per_cpu(cpufreq_thermal_reduction_pctg, phys_package_first_cpu(cpu))
+
+/*
+ * Emulate "per package data" using per cpu data (which should be really provided
+ * elsewhere)
+ *
+ * Note we can lose a CPU on cpu hotunplug, in this case we forget the state
+ * temporarily. Fortunately that's not a big issue here (I hope)
+ */
+static int phys_package_first_cpu(int cpu)
+{
+ int i;
+ int id = topology_physical_package_id(cpu);
+
+ for_each_online_cpu (i)
+ if (topology_physical_package_id(i) == id)
+ return i;
+ return 0;
+}
+
static int cpu_has_cpufreq(unsigned int cpu)
{
struct cpufreq_policy policy;
@@ -76,7 +97,7 @@ static int acpi_thermal_cpufreq_notifier(struct notifier_block *nb,

max_freq = (
policy->cpuinfo.max_freq *
- (100 - per_cpu(cpufreq_thermal_reduction_pctg, policy->cpu) * 20)
+ (100 - reduction_pctg(policy->cpu) * 20)
) / 100;

cpufreq_verify_within_limits(policy, 0, max_freq);
@@ -102,16 +123,28 @@ static int cpufreq_get_cur_state(unsigned int cpu)
if (!cpu_has_cpufreq(cpu))
return 0;

- return per_cpu(cpufreq_thermal_reduction_pctg, cpu);
+ return reduction_pctg(cpu);
}

static int cpufreq_set_cur_state(unsigned int cpu, int state)
{
+ int i;
+
if (!cpu_has_cpufreq(cpu))
return 0;

- per_cpu(cpufreq_thermal_reduction_pctg, cpu) = state;
- cpufreq_update_policy(cpu);
+ reduction_pctg(cpu) = state;
+
+ /*
+ * Update all the CPUs in the same package because they all
+ * contribute to the temperature and often share the same
+ * frequency.
+ */
+ for_each_online_cpu (i) {
+ if (topology_physical_package_id(i) ==
+ topology_physical_package_id(cpu))
+ cpufreq_update_policy(i);
+ }
return 0;
}

@@ -119,10 +152,6 @@ void acpi_thermal_cpufreq_init(void)
{
int i;

- for (i = 0; i < nr_cpu_ids; i++)
- if (cpu_present(i))
- per_cpu(cpufreq_thermal_reduction_pctg, i) = 0;
-
i = cpufreq_register_notifier(&acpi_thermal_cpufreq_notifier_block,
CPUFREQ_POLICY_NOTIFIER);
if (!i)
--
1.7.7

2012-02-06 16:31:10

by Matthew Garrett

[permalink] [raw]

Subject: Re: [PATCH 4/5] ACPI: Do cpufreq clamping for throttling per package v2

On Mon, Feb 06, 2012 at 08:17:11AM -0800, Andi Kleen wrote:
> +#define reduction_pctg(cpu) \
> + per_cpu(cpufreq_thermal_reduction_pctg, phys_package_first_cpu(cpu))

I don't like using percentages here - we end up with the potential for
several percentages to end up mapping to the same P state. I've sent a
patch that replaces the percentage code with just stepping through P
states instead. But otherwise, yes, this seems sensible. An open
question is whether we should be doing the same on _PPC notifications.
There's some vague evidence that Windows does.

--
Matthew Garrett | [email protected]

2012-02-06 16:31:58

by Matthew Garrett

[permalink] [raw]

Subject: Re: Updated throttling fix patchkit

On Mon, Feb 06, 2012 at 08:17:07AM -0800, Andi Kleen wrote:
> Here's an updated patchkit to fix ACPI thermal zone throttling and some
> related problems. See the individual patches for more details.

ACK to the set (with the provisos noted on a couple of the individual
patches). It may need some rebasing depending on whether Len merges my
other patch or not.

--
Matthew Garrett | [email protected]

2012-02-06 17:59:24

by Andi Kleen

[permalink] [raw]

Subject: Re: [PATCH 4/5] ACPI: Do cpufreq clamping for throttling per package v2

On Mon, Feb 06, 2012 at 04:31:06PM +0000, Matthew Garrett wrote:
> On Mon, Feb 06, 2012 at 08:17:11AM -0800, Andi Kleen wrote:
> > +#define reduction_pctg(cpu) \
> > + per_cpu(cpufreq_thermal_reduction_pctg, phys_package_first_cpu(cpu))
>
> I don't like using percentages here - we end up with the potential for

The code was using percentages before, i merely moved the accounting
to be per package. I don't disagree with changing them, but it
should be probably a separate patch.

-Andi

2012-02-07 19:45:54

by Jesse Barnes

[permalink] [raw]

Subject: Re: [PATCH 5/5] Disable MCP limit exceeded messages from Intel IPS driver

On Mon, 6 Feb 2012 08:17:12 -0800
Andi Kleen <[email protected]> wrote:

> From: Andi Kleen <[email protected]>
>
> On a system on the thermal limit these are quite noisy and flood the logs.
> Better would be a counter anyways. But given that we don't even have
> anything for normal throttling this doesn't seem to be urgent either.
>
> Cc: [email protected]
> Signed-off-by: Andi Kleen <[email protected]>

Yeah looks fine. I think I queued a message reduction patch awhile
back too... not sure if mjg saw it or not.

Reviewed-by: Jesse Barnes <[email protected]>

--
Jesse Barnes, Intel Open Source Technology Center

Attachments:

signature.asc (836.00 B)

2012-02-13 23:30:38

by Len Brown

[permalink] [raw]

Subject: Re: [PATCH 4/5] ACPI: Do cpufreq clamping for throttling per package v2

On 02/06/2012 11:31 AM, Matthew Garrett wrote:

> On Mon, Feb 06, 2012 at 08:17:11AM -0800, Andi Kleen wrote:
>> +#define reduction_pctg(cpu) \
>> + per_cpu(cpufreq_thermal_reduction_pctg, phys_package_first_cpu(cpu))
>
> I don't like using percentages here - we end up with the potential for
> several percentages to end up mapping to the same P state.

Does it matter?

> I've sent a

> patch that replaces the percentage code with just stepping through P
> states instead. But otherwise, yes, this seems sensible. An open
> question is whether we should be doing the same on _PPC notifications.
> There's some vague evidence that Windows does.

If you stepped by P-states, then you behave entirely differently
on a machine with many P-states vs a machine with few P-states.

There is code floating about that exposes every 100 MHz step on SNB
and later as a P-state -- you can have quite a few...

thanks,
-Len Brown, Intel Open Source Technology Center

2012-02-14 00:17:55

by Matthew Garrett

[permalink] [raw]

Subject: Re: [PATCH 4/5] ACPI: Do cpufreq clamping for throttling per package v2

On Mon, Feb 13, 2012 at 06:30:33PM -0500, Len Brown wrote:
> On 02/06/2012 11:31 AM, Matthew Garrett wrote:
>
> > On Mon, Feb 06, 2012 at 08:17:11AM -0800, Andi Kleen wrote:
> >> +#define reduction_pctg(cpu) \
> >> + per_cpu(cpufreq_thermal_reduction_pctg, phys_package_first_cpu(cpu))
> >
> > I don't like using percentages here - we end up with the potential for
> > several percentages to end up mapping to the same P state.
>
>
> Does it matter?

If you step through multiple percentages that map to the same P state,
yes. On the other hand, re-reading the specification, it seems that this
is the behaviour envisaged in the polling equation. I guess we'll stick
with that.

--
Matthew Garrett | [email protected]