2015-07-04 05:20:23

by Jisheng Zhang

[permalink] [raw]
Subject: [PATCH v2 0/2] irqchip: dw-apb-ictl: add irq_set_affinity support

On Marvell Berlin SoCs, the cpu's local timer is shutdown when the cpu
goes to a deep idle state, then the timer framework will be notified to
use a broadcast timer instead. The broadcast timer uses dw-apb-ictl as
interrupt chip, these patches try to add irq_set_affinity support so
that the going to deep idle state cpu can set the interrupt affinity of the
broadcast interrupt to avoid unnecessary wakeups and IPIs.

Changes since v1:
- Add a simple test and its result into the second patch's commit msg.

Jisheng Zhang (2):
irqchip: dw-apb-ictl: add private data structure
irqchip: dw-apb-ictl: add irq_set_affinity support

drivers/irqchip/irq-dw-apb-ictl.c | 47 ++++++++++++++++++++++++++++++++++-----
1 file changed, 42 insertions(+), 5 deletions(-)

--
2.1.4


2015-07-04 05:20:42

by Jisheng Zhang

[permalink] [raw]
Subject: [PATCH v2 1/2] irqchip: dw-apb-ictl: add private data structure

This patch adds struct dw_apb_ictl_priv definition, now it only has one
member: the irq domain. Then make the generic irq chip gc->private to point
to the struct. This is to prepare for the next patch which will implement
irq_set_affinity.

Signed-off-by: Jisheng Zhang <[email protected]>
---
drivers/irqchip/irq-dw-apb-ictl.c | 26 +++++++++++++++++++++-----
1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/drivers/irqchip/irq-dw-apb-ictl.c b/drivers/irqchip/irq-dw-apb-ictl.c
index 53bb732..8bef7f7 100644
--- a/drivers/irqchip/irq-dw-apb-ictl.c
+++ b/drivers/irqchip/irq-dw-apb-ictl.c
@@ -16,6 +16,7 @@
#include <linux/irqchip/chained_irq.h>
#include <linux/of_address.h>
#include <linux/of_irq.h>
+#include <linux/slab.h>

#include "irqchip.h"

@@ -26,11 +27,16 @@
#define APB_INT_FINALSTATUS_L 0x30
#define APB_INT_FINALSTATUS_H 0x34

+struct dw_apb_ictl_priv {
+ struct irq_domain *domain;
+};
+
static void dw_apb_ictl_handler(unsigned int irq, struct irq_desc *desc)
{
struct irq_chip *chip = irq_get_chip(irq);
struct irq_chip_generic *gc = irq_get_handler_data(irq);
- struct irq_domain *d = gc->private;
+ struct dw_apb_ictl_priv *priv = gc->private;
+ struct irq_domain *d = priv->domain;
u32 stat;
int n;

@@ -71,27 +77,34 @@ static int __init dw_apb_ictl_init(struct device_node *np,
unsigned int clr = IRQ_NOREQUEST | IRQ_NOPROBE | IRQ_NOAUTOEN;
struct resource r;
struct irq_domain *domain;
+ struct dw_apb_ictl_priv *priv;
struct irq_chip_generic *gc;
void __iomem *iobase;
int ret, nrirqs, irq;
u32 reg;

+ priv = kzalloc(sizeof(*priv), GFP_KERNEL);
+ if (!priv)
+ return -ENOMEM;
+
/* Map the parent interrupt for the chained handler */
irq = irq_of_parse_and_map(np, 0);
if (irq <= 0) {
pr_err("%s: unable to parse irq\n", np->full_name);
- return -EINVAL;
+ ret = -EINVAL;
+ goto err_free;
}

ret = of_address_to_resource(np, 0, &r);
if (ret) {
pr_err("%s: unable to get resource\n", np->full_name);
- return ret;
+ goto err_free;
}

if (!request_mem_region(r.start, resource_size(&r), np->full_name)) {
pr_err("%s: unable to request mem region\n", np->full_name);
- return -ENOMEM;
+ ret = -ENOMEM;
+ goto err_free;
}

iobase = ioremap(r.start, resource_size(&r));
@@ -138,7 +151,8 @@ static int __init dw_apb_ictl_init(struct device_node *np,
}

gc = irq_get_domain_generic_chip(domain, 0);
- gc->private = domain;
+ priv->domain = domain;
+ gc->private = priv;
gc->reg_base = iobase;

gc->chip_types[0].regs.mask = APB_INT_MASK_L;
@@ -164,6 +178,8 @@ err_unmap:
iounmap(iobase);
err_release:
release_mem_region(r.start, resource_size(&r));
+err_free:
+ kfree(priv);
return ret;
}
IRQCHIP_DECLARE(dw_apb_ictl,
--
2.1.4

2015-07-04 05:20:37

by Jisheng Zhang

[permalink] [raw]
Subject: [PATCH v2 2/2] irqchip: dw-apb-ictl: add irq_set_affinity support

On Marvell Berlin SoCs, the cpu's local timer is shutdown when the cpu
goes to a deep idle state, then the timer framework will be notified to
use a broadcast timer instead. The broadcast timer uses dw-apb-ictl as
interrupt chip, this patch adds irq_set_affinity support so that the
going to deep idle state cpu can set the interrupt affinity of the
broadcast interrupt to avoid unnecessary wakeups and IPIs.

A simple test:
~ # rm /tmp/test.sh
~ # cat > /tmp/test.sh
cat /proc/interrupts
for i in `seq 10` ; do sleep $i; done
cat /proc/interrupts
~ # chmod +x /tmp/test.sh
~ # taskset 0x2 /tmp/test.sh

without the patch:

CPU0 CPU1
27: 115 36 GIC 27 arch_timer
45: 62 0 GIC 45 mmc0
160: 88 0 interrupt-controller 8 timer
227: 0 0 interrupt-controller 4 f7e81400.i2c
228: 0 0 interrupt-controller 5 f7e81800.i2c
229: 0 0 interrupt-controller 7 dw_spi65535
230: 0 0 interrupt-controller 21 f7e84000.i2c
231: 0 0 interrupt-controller 20 f7e84800.i2c
265: 445 0 interrupt-controller 8 serial
IPI0: 0 0 CPU wakeup interrupts
IPI1: 0 11 Timer broadcast interrupts
IPI2: 56 104 Rescheduling interrupts
IPI3: 0 0 Function call interrupts
IPI4: 0 4 Single function call interrupts
IPI5: 0 0 CPU stop interrupts
IPI6: 25 27 IRQ work interrupts
IPI7: 0 0 completion interrupts
IPI8: 0 0 CPU backtrace
Err: 0
CPU0 CPU1
27: 115 38 GIC 27 arch_timer
45: 62 0 GIC 45 mmc0
160: 160 0 interrupt-controller 8 timer
227: 0 0 interrupt-controller 4 f7e81400.i2c
228: 0 0 interrupt-controller 5 f7e81800.i2c
229: 0 0 interrupt-controller 7 dw_spi65535
230: 0 0 interrupt-controller 21 f7e84000.i2c
231: 0 0 interrupt-controller 20 f7e84800.i2c
265: 514 0 interrupt-controller 8 serial
IPI0: 0 0 CPU wakeup interrupts
IPI1: 0 83 Timer broadcast interrupts
IPI2: 56 104 Rescheduling interrupts
IPI3: 0 0 Function call interrupts
IPI4: 0 4 Single function call interrupts
IPI5: 0 0 CPU stop interrupts
IPI6: 25 46 IRQ work interrupts
IPI7: 0 0 completion interrupts
IPI8: 0 0 CPU backtrace
Err: 0

cpu0 get 160-88=72 timer interrupts, CPU1 got 83-11=72 broadcast timer
IPIs. So, overall system got 72+72=144 wake ups and 72 broadcast timer IPIs

With the patch:
CPU0 CPU1
27: 107 37 GIC 27 arch_timer
45: 62 0 GIC 45 mmc0
160: 66 7 interrupt-controller 8 timer
227: 0 0 interrupt-controller 4 f7e81400.i2c
228: 0 0 interrupt-controller 5 f7e81800.i2c
229: 0 0 interrupt-controller 7 dw_spi65535
230: 0 0 interrupt-controller 21 f7e84000.i2c
231: 0 0 interrupt-controller 20 f7e84800.i2c
265: 311 0 interrupt-controller 8 serial
IPI0: 0 0 CPU wakeup interrupts
IPI1: 2 4 Timer broadcast interrupts
IPI2: 58 100 Rescheduling interrupts
IPI3: 0 0 Function call interrupts
IPI4: 0 4 Single function call interrupts
IPI5: 0 0 CPU stop interrupts
IPI6: 21 24 IRQ work interrupts
IPI7: 0 0 completion interrupts
IPI8: 0 0 CPU backtrace
Err: 0
CPU0 CPU1
27: 107 39 GIC 27 arch_timer
45: 62 0 GIC 45 mmc0
160: 69 75 interrupt-controller 8 timer
227: 0 0 interrupt-controller 4 f7e81400.i2c
228: 0 0 interrupt-controller 5 f7e81800.i2c
229: 0 0 interrupt-controller 7 dw_spi65535
230: 0 0 interrupt-controller 21 f7e84000.i2c
231: 0 0 interrupt-controller 20 f7e84800.i2c
265: 380 0 interrupt-controller 8 serial
IPI0: 0 0 CPU wakeup interrupts
IPI1: 3 6 Timer broadcast interrupts
IPI2: 60 100 Rescheduling interrupts
IPI3: 0 0 Function call interrupts
IPI4: 0 4 Single function call interrupts
IPI5: 0 0 CPU stop interrupts
IPI6: 21 45 IRQ work interrupts
IPI7: 0 0 completion interrupts
IPI8: 0 0 CPU backtrace
Err: 0

cpu0 got 69-66=3, cpu1 got 75-7=68 timer interrupts. cpu0 got 3-2=1
broadcast timer IPIs, cpu1 got 6-4=2 broadcast timer IPIs. So, overall
system got 3+68+1+2=74 wakeups and 1+2=3 broadcast timer IPIs.

This patch removes 50% wakeups and almost 100% broadcast timer IPIs!
I believe this patch will also benefit other dw-apb-ictl users.

Signed-off-by: Jisheng Zhang <[email protected]>
---
drivers/irqchip/irq-dw-apb-ictl.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)

diff --git a/drivers/irqchip/irq-dw-apb-ictl.c b/drivers/irqchip/irq-dw-apb-ictl.c
index 8bef7f7..efc0aec 100644
--- a/drivers/irqchip/irq-dw-apb-ictl.c
+++ b/drivers/irqchip/irq-dw-apb-ictl.c
@@ -29,6 +29,7 @@

struct dw_apb_ictl_priv {
struct irq_domain *domain;
+ unsigned int parent_irq;
};

static void dw_apb_ictl_handler(unsigned int irq, struct irq_desc *desc)
@@ -56,6 +57,21 @@ static void dw_apb_ictl_handler(unsigned int irq, struct irq_desc *desc)
chained_irq_exit(chip, desc);
}

+static int dw_apb_ictl_set_affinity(struct irq_data *d,
+ const struct cpumask *mask_val,
+ bool force)
+{
+ struct irq_chip_generic *gc = irq_data_get_irq_chip_data(d);
+ struct dw_apb_ictl_priv *priv = gc->private;
+ struct irq_chip *chip = irq_get_chip(priv->parent_irq);
+ struct irq_data *data = irq_get_irq_data(priv->parent_irq);
+
+ if (chip && chip->irq_set_affinity)
+ return chip->irq_set_affinity(data, mask_val, force);
+ else
+ return -EINVAL;
+}
+
#ifdef CONFIG_PM
static void dw_apb_ictl_resume(struct irq_data *d)
{
@@ -95,6 +111,8 @@ static int __init dw_apb_ictl_init(struct device_node *np,
goto err_free;
}

+ priv->parent_irq = irq;
+
ret = of_address_to_resource(np, 0, &r);
if (ret) {
pr_err("%s: unable to get resource\n", np->full_name);
@@ -160,6 +178,7 @@ static int __init dw_apb_ictl_init(struct device_node *np,
gc->chip_types[0].chip.irq_mask = irq_gc_mask_set_bit;
gc->chip_types[0].chip.irq_unmask = irq_gc_mask_clr_bit;
gc->chip_types[0].chip.irq_resume = dw_apb_ictl_resume;
+ gc->chip_types[0].chip.irq_set_affinity = dw_apb_ictl_set_affinity;

if (nrirqs > 32) {
gc->chip_types[1].regs.mask = APB_INT_MASK_H;
@@ -167,6 +186,8 @@ static int __init dw_apb_ictl_init(struct device_node *np,
gc->chip_types[1].chip.irq_mask = irq_gc_mask_set_bit;
gc->chip_types[1].chip.irq_unmask = irq_gc_mask_clr_bit;
gc->chip_types[1].chip.irq_resume = dw_apb_ictl_resume;
+ gc->chip_types[1].chip.irq_set_affinity =
+ dw_apb_ictl_set_affinity;
}

irq_set_handler_data(irq, gc);
--
2.1.4

2015-07-05 13:11:12

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] irqchip: dw-apb-ictl: add irq_set_affinity support

On Sat, Jul 04, 2015 at 01:19:30PM +0800, Jisheng Zhang wrote:
> On Marvell Berlin SoCs, the cpu's local timer is shutdown when the cpu
> goes to a deep idle state, then the timer framework will be notified to
> use a broadcast timer instead. The broadcast timer uses dw-apb-ictl as
> interrupt chip, this patch adds irq_set_affinity support so that the
> going to deep idle state cpu can set the interrupt affinity of the
> broadcast interrupt to avoid unnecessary wakeups and IPIs.

NAK to this patch.

The real question is - if CPU0 is the CPU going offline, why is it
still receiving _any_ interrupts - all interrupts should be migrated
off it, including the chained interrupts.

Sounds like there's a bug in the migration code which needs further
investigation, rather than hacking around the problem by introducing
lots of driver code.

--
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

2015-07-05 09:00:47

by Jisheng Zhang

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] irqchip: dw-apb-ictl: add irq_set_affinity support

Dear Russell,

On Sat, 4 Jul 2015 09:26:23 +0100
Russell King - ARM Linux <[email protected]> wrote:

> On Sat, Jul 04, 2015 at 01:19:30PM +0800, Jisheng Zhang wrote:
> > On Marvell Berlin SoCs, the cpu's local timer is shutdown when the cpu
> > goes to a deep idle state, then the timer framework will be notified to
> > use a broadcast timer instead. The broadcast timer uses dw-apb-ictl as
> > interrupt chip, this patch adds irq_set_affinity support so that the
> > going to deep idle state cpu can set the interrupt affinity of the
> > broadcast interrupt to avoid unnecessary wakeups and IPIs.
>
> NAK to this patch.
>
> The real question is - if CPU0 is the CPU going offline, why is it
> still receiving _any_ interrupts - all interrupts should be migrated
> off it, including the chained interrupts.

I think it's due to broadcast timer interrupt. Let me describe the situation:

1. cpu1 is going offline
2. cpuidle notify timer framework to use a broadcast timer instead due to localtimer
is CLOCK_EVT_FEAT_C3STOP
3. when timer is expired, CPU0 will be waken up by the timer interrupt if it has
gone offline
4. CPU0 sends broadcast timer IPI to CPU1

As can be seen, both cpu0 waken up and the broadcast timer IPI are unnecessary.

This patch tries to improve such situation. Here I copied my simple test result

A simple test:
~ # rm /tmp/test.sh
~ # cat > /tmp/test.sh
cat /proc/interrupts
for i in `seq 10` ; do sleep $i; done
cat /proc/interrupts
~ # chmod +x /tmp/test.sh
~ # taskset 0x2 /tmp/test.sh

without the patch:

CPU0 CPU1
27: 115 36 GIC 27 arch_timer
45: 62 0 GIC 45 mmc0
160: 88 0 interrupt-controller 8 timer
227: 0 0 interrupt-controller 4 f7e81400.i2c
228: 0 0 interrupt-controller 5 f7e81800.i2c
229: 0 0 interrupt-controller 7 dw_spi65535
230: 0 0 interrupt-controller 21 f7e84000.i2c
231: 0 0 interrupt-controller 20 f7e84800.i2c
265: 445 0 interrupt-controller 8 serial
IPI0: 0 0 CPU wakeup interrupts
IPI1: 0 11 Timer broadcast interrupts
IPI2: 56 104 Rescheduling interrupts
IPI3: 0 0 Function call interrupts
IPI4: 0 4 Single function call interrupts
IPI5: 0 0 CPU stop interrupts
IPI6: 25 27 IRQ work interrupts
IPI7: 0 0 completion interrupts
IPI8: 0 0 CPU backtrace
Err: 0
CPU0 CPU1
27: 115 38 GIC 27 arch_timer
45: 62 0 GIC 45 mmc0
160: 160 0 interrupt-controller 8 timer
227: 0 0 interrupt-controller 4 f7e81400.i2c
228: 0 0 interrupt-controller 5 f7e81800.i2c
229: 0 0 interrupt-controller 7 dw_spi65535
230: 0 0 interrupt-controller 21 f7e84000.i2c
231: 0 0 interrupt-controller 20 f7e84800.i2c
265: 514 0 interrupt-controller 8 serial
IPI0: 0 0 CPU wakeup interrupts
IPI1: 0 83 Timer broadcast interrupts
IPI2: 56 104 Rescheduling interrupts
IPI3: 0 0 Function call interrupts
IPI4: 0 4 Single function call interrupts
IPI5: 0 0 CPU stop interrupts
IPI6: 25 46 IRQ work interrupts
IPI7: 0 0 completion interrupts
IPI8: 0 0 CPU backtrace
Err: 0

cpu0 get 160-88=72 timer interrupts, CPU1 got 83-11=72 broadcast timer
IPIs. So, overall system got 72+72=144 wake ups and 72 broadcast timer IPIs

With the patch:
CPU0 CPU1
27: 107 37 GIC 27 arch_timer
45: 62 0 GIC 45 mmc0
160: 66 7 interrupt-controller 8 timer
227: 0 0 interrupt-controller 4 f7e81400.i2c
228: 0 0 interrupt-controller 5 f7e81800.i2c
229: 0 0 interrupt-controller 7 dw_spi65535
230: 0 0 interrupt-controller 21 f7e84000.i2c
231: 0 0 interrupt-controller 20 f7e84800.i2c
265: 311 0 interrupt-controller 8 serial
IPI0: 0 0 CPU wakeup interrupts
IPI1: 2 4 Timer broadcast interrupts
IPI2: 58 100 Rescheduling interrupts
IPI3: 0 0 Function call interrupts
IPI4: 0 4 Single function call interrupts
IPI5: 0 0 CPU stop interrupts
IPI6: 21 24 IRQ work interrupts
IPI7: 0 0 completion interrupts
IPI8: 0 0 CPU backtrace
Err: 0
CPU0 CPU1
27: 107 39 GIC 27 arch_timer
45: 62 0 GIC 45 mmc0
160: 69 75 interrupt-controller 8 timer
227: 0 0 interrupt-controller 4 f7e81400.i2c
228: 0 0 interrupt-controller 5 f7e81800.i2c
229: 0 0 interrupt-controller 7 dw_spi65535
230: 0 0 interrupt-controller 21 f7e84000.i2c
231: 0 0 interrupt-controller 20 f7e84800.i2c
265: 380 0 interrupt-controller 8 serial
IPI0: 0 0 CPU wakeup interrupts
IPI1: 3 6 Timer broadcast interrupts
IPI2: 60 100 Rescheduling interrupts
IPI3: 0 0 Function call interrupts
IPI4: 0 4 Single function call interrupts
IPI5: 0 0 CPU stop interrupts
IPI6: 21 45 IRQ work interrupts
IPI7: 0 0 completion interrupts
IPI8: 0 0 CPU backtrace
Err: 0

cpu0 got 69-66=3, cpu1 got 75-7=68 timer interrupts. cpu0 got 3-2=1
broadcast timer IPIs, cpu1 got 6-4=2 broadcast timer IPIs. So, overall
system got 3+68+1+2=74 wakeups and 1+2=3 broadcast timer IPIs.

This patch removes 50% wakeups and almost 100% broadcast timer IPIs!

What do you think?

Thanks,
Jisheng

>
> Sounds like there's a bug in the migration code which needs further
> investigation, rather than hacking around the problem by introducing
> lots of driver code.
>

2015-07-05 13:11:30

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] irqchip: dw-apb-ictl: add irq_set_affinity support

On Sat, Jul 04, 2015 at 04:50:07PM +0800, Jisheng Zhang wrote:
> Dear Russell,
>
> On Sat, 4 Jul 2015 09:26:23 +0100
> Russell King - ARM Linux <[email protected]> wrote:
>
> > On Sat, Jul 04, 2015 at 01:19:30PM +0800, Jisheng Zhang wrote:
> > > On Marvell Berlin SoCs, the cpu's local timer is shutdown when the cpu
> > > goes to a deep idle state, then the timer framework will be notified to
> > > use a broadcast timer instead. The broadcast timer uses dw-apb-ictl as
> > > interrupt chip, this patch adds irq_set_affinity support so that the
> > > going to deep idle state cpu can set the interrupt affinity of the
> > > broadcast interrupt to avoid unnecessary wakeups and IPIs.
> >
> > NAK to this patch.
> >
> > The real question is - if CPU0 is the CPU going offline, why is it
> > still receiving _any_ interrupts - all interrupts should be migrated
> > off it, including the chained interrupts.
>
> I think it's due to broadcast timer interrupt. Let me describe the situation:
>
> 1. cpu1 is going offline
> 2. cpuidle notify timer framework to use a broadcast timer instead due to localtimer
> is CLOCK_EVT_FEAT_C3STOP
> 3. when timer is expired, CPU0 will be waken up by the timer interrupt if it has
> gone offline
> 4. CPU0 sends broadcast timer IPI to CPU1

If CPU1 is going offline, then CPU1 should have no interrupts delivered to
it. However, this is not the situation you're testing - in your results
below, and your "simple test" you never take CPU1 offline.

--
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

2015-07-05 08:39:40

by Jisheng Zhang

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] irqchip: dw-apb-ictl: add irq_set_affinity support

Dear Russell,

On Sat, 4 Jul 2015 10:25:46 +0100
Russell King - ARM Linux <[email protected]> wrote:

> On Sat, Jul 04, 2015 at 04:50:07PM +0800, Jisheng Zhang wrote:
> > Dear Russell,
> >
> > On Sat, 4 Jul 2015 09:26:23 +0100
> > Russell King - ARM Linux <[email protected]> wrote:
> >
> > > On Sat, Jul 04, 2015 at 01:19:30PM +0800, Jisheng Zhang wrote:
> > > > On Marvell Berlin SoCs, the cpu's local timer is shutdown when the cpu
> > > > goes to a deep idle state, then the timer framework will be notified to
> > > > use a broadcast timer instead. The broadcast timer uses dw-apb-ictl as
> > > > interrupt chip, this patch adds irq_set_affinity support so that the
> > > > going to deep idle state cpu can set the interrupt affinity of the
> > > > broadcast interrupt to avoid unnecessary wakeups and IPIs.
> > >
> > > NAK to this patch.
> > >
> > > The real question is - if CPU0 is the CPU going offline, why is it
> > > still receiving _any_ interrupts - all interrupts should be migrated
> > > off it, including the chained interrupts.
> >
> > I think it's due to broadcast timer interrupt. Let me describe the situation:
> >
> > 1. cpu1 is going offline
> > 2. cpuidle notify timer framework to use a broadcast timer instead due to localtimer
> > is CLOCK_EVT_FEAT_C3STOP
> > 3. when timer is expired, CPU0 will be waken up by the timer interrupt if it has
> > gone offline
> > 4. CPU0 sends broadcast timer IPI to CPU1
>
> If CPU1 is going offline, then CPU1 should have no interrupts delivered to

"sleep $i" on CPU1 will make the timer framework to wake up it at current_time+$i
in the future then go offline, currently this is achieved by programming one
broadcast timer. In our case, the broadcast timer interrupt will be routed to
CPU0 by default, so CPU0 has to send broadcast timer IPI to CPU1.

> it. However, this is not the situation you're testing - in your results
> below, and your "simple test" you never take CPU1 offline.
>

when cpu1 executes "sleep $i", cpu1 will go to deepest cpuidle level, in our
case it will go offline.

Thanks a lot for your review,
Jisheng

2015-07-05 08:39:57

by Jisheng Zhang

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] irqchip: dw-apb-ictl: add irq_set_affinity support

Dear Russell,

On Sat, 4 Jul 2015 17:35:33 +0800
Jisheng Zhang <[email protected]> wrote:

> Dear Russell,
>
> On Sat, 4 Jul 2015 10:25:46 +0100
> Russell King - ARM Linux <[email protected]> wrote:
>
> > On Sat, Jul 04, 2015 at 04:50:07PM +0800, Jisheng Zhang wrote:
> > > Dear Russell,
> > >
> > > On Sat, 4 Jul 2015 09:26:23 +0100
> > > Russell King - ARM Linux <[email protected]> wrote:
> > >
> > > > On Sat, Jul 04, 2015 at 01:19:30PM +0800, Jisheng Zhang wrote:
> > > > > On Marvell Berlin SoCs, the cpu's local timer is shutdown when the cpu
> > > > > goes to a deep idle state, then the timer framework will be notified to
> > > > > use a broadcast timer instead. The broadcast timer uses dw-apb-ictl as
> > > > > interrupt chip, this patch adds irq_set_affinity support so that the
> > > > > going to deep idle state cpu can set the interrupt affinity of the
> > > > > broadcast interrupt to avoid unnecessary wakeups and IPIs.
> > > >
> > > > NAK to this patch.
> > > >
> > > > The real question is - if CPU0 is the CPU going offline, why is it
> > > > still receiving _any_ interrupts - all interrupts should be migrated
> > > > off it, including the chained interrupts.
> > >
> > > I think it's due to broadcast timer interrupt. Let me describe the situation:
> > >
> > > 1. cpu1 is going offline
> > > 2. cpuidle notify timer framework to use a broadcast timer instead due to localtimer
> > > is CLOCK_EVT_FEAT_C3STOP
> > > 3. when timer is expired, CPU0 will be waken up by the timer interrupt if it has
> > > gone offline
> > > 4. CPU0 sends broadcast timer IPI to CPU1
> >
> > If CPU1 is going offline, then CPU1 should have no interrupts delivered to
>
> "sleep $i" on CPU1 will make the timer framework to wake up it at current_time+$i
> in the future then go offline, currently this is achieved by programming one
> broadcast timer. In our case, the broadcast timer interrupt will be routed to
> CPU0 by default, so CPU0 has to send broadcast timer IPI to CPU1.
>
> > it. However, this is not the situation you're testing - in your results
> > below, and your "simple test" you never take CPU1 offline.
> >
>
> when cpu1 executes "sleep $i", cpu1 will go to deepest cpuidle level, in our
> case it will go offline.
>

I may misread your emails. I guess the "offline" means "cpu hot unplug"? Sorry
for misunderstanding.

This patch doesn't try to improve anything related with "hog unplug", it tries to
improve the following situation instead:

1. cpu1 is entering deepest cpuidle level, shutdown in our case.
2. cpuidle notify timer framework to use a broadcast timer instead due to localtimer
is CLOCK_EVT_FEAT_C3STOP
3. when timer is expired, CPU0 will be waken up by the timer interrupt if it has
been shutdown
4. CPU0 sends broadcast timer IPI to CPU1

Thanks,
Jisheng

2015-07-05 13:06:49

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] irqchip: dw-apb-ictl: add irq_set_affinity support

On Sat, 4 Jul 2015, Russell King - ARM Linux wrote:

> On Sat, Jul 04, 2015 at 01:19:30PM +0800, Jisheng Zhang wrote:
> > On Marvell Berlin SoCs, the cpu's local timer is shutdown when the cpu
> > goes to a deep idle state, then the timer framework will be notified to
> > use a broadcast timer instead. The broadcast timer uses dw-apb-ictl as
> > interrupt chip, this patch adds irq_set_affinity support so that the
> > going to deep idle state cpu can set the interrupt affinity of the
> > broadcast interrupt to avoid unnecessary wakeups and IPIs.
>
> NAK to this patch.
>
> The real question is - if CPU0 is the CPU going offline, why is it
> still receiving _any_ interrupts - all interrupts should be migrated
> off it, including the chained interrupts.
>
> Sounds like there's a bug in the migration code which needs further
> investigation, rather than hacking around the problem by introducing
> lots of driver code.

I think you misunderstood the changelog, which is horrible btw.

So the real reason to do this is to steer the broadcast interrupt to
the CPU which has the earliest expiry time. This avoids that another
cpu is woken from idle just to deliver the broadcast IPI to the other
cpu.

Thanks,

tglx



2015-07-05 13:11:24

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] irqchip: dw-apb-ictl: add irq_set_affinity support

On Sat, Jul 04, 2015 at 11:53:57AM +0200, Thomas Gleixner wrote:
> On Sat, 4 Jul 2015, Russell King - ARM Linux wrote:
>
> > On Sat, Jul 04, 2015 at 01:19:30PM +0800, Jisheng Zhang wrote:
> > > On Marvell Berlin SoCs, the cpu's local timer is shutdown when the cpu
> > > goes to a deep idle state, then the timer framework will be notified to
> > > use a broadcast timer instead. The broadcast timer uses dw-apb-ictl as
> > > interrupt chip, this patch adds irq_set_affinity support so that the
> > > going to deep idle state cpu can set the interrupt affinity of the
> > > broadcast interrupt to avoid unnecessary wakeups and IPIs.
> >
> > NAK to this patch.
> >
> > The real question is - if CPU0 is the CPU going offline, why is it
> > still receiving _any_ interrupts - all interrupts should be migrated
> > off it, including the chained interrupts.
> >
> > Sounds like there's a bug in the migration code which needs further
> > investigation, rather than hacking around the problem by introducing
> > lots of driver code.
>
> I think you misunderstood the changelog, which is horrible btw.
>
> So the real reason to do this is to steer the broadcast interrupt to
> the CPU which has the earliest expiry time. This avoids that another
> cpu is woken from idle just to deliver the broadcast IPI to the other
> cpu.

Unless I'm mistaken, the code does this by messing around with the parent
interrupt affinity of a chained interrupt, which really isn't a good thing
to do, because it migrates every interrupt on the child interrupt
controller.

--
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

2015-07-05 08:36:31

by Jisheng Zhang

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] irqchip: dw-apb-ictl: add irq_set_affinity support

Dear Russell, Thomas,

On Sat, 4 Jul 2015 11:08:27 +0100
Russell King - ARM Linux <[email protected]> wrote:

> On Sat, Jul 04, 2015 at 11:53:57AM +0200, Thomas Gleixner wrote:
> > On Sat, 4 Jul 2015, Russell King - ARM Linux wrote:
> >
> > > On Sat, Jul 04, 2015 at 01:19:30PM +0800, Jisheng Zhang wrote:
> > > > On Marvell Berlin SoCs, the cpu's local timer is shutdown when the cpu
> > > > goes to a deep idle state, then the timer framework will be notified to
> > > > use a broadcast timer instead. The broadcast timer uses dw-apb-ictl as
> > > > interrupt chip, this patch adds irq_set_affinity support so that the
> > > > going to deep idle state cpu can set the interrupt affinity of the
> > > > broadcast interrupt to avoid unnecessary wakeups and IPIs.
> > >
> > > NAK to this patch.
> > >
> > > The real question is - if CPU0 is the CPU going offline, why is it
> > > still receiving _any_ interrupts - all interrupts should be migrated
> > > off it, including the chained interrupts.
> > >
> > > Sounds like there's a bug in the migration code which needs further
> > > investigation, rather than hacking around the problem by introducing
> > > lots of driver code.
> >
> > I think you misunderstood the changelog, which is horrible btw.
> >
> > So the real reason to do this is to steer the broadcast interrupt to
> > the CPU which has the earliest expiry time. This avoids that another
> > cpu is woken from idle just to deliver the broadcast IPI to the other
> > cpu.
>
> Unless I'm mistaken, the code does this by messing around with the parent
> interrupt affinity of a chained interrupt, which really isn't a good thing
> to do, because it migrates every interrupt on the child interrupt
> controller.
>

Yep, that's the problem although it doesn't matter in our case for other
interrupts don't care the affinity at all.
I'm requesting our HW people to connect timer interrupt to GIC directly in
future chips. But for the existing chips, this patch does really reduce
power consumption, is there any elegant solution?

PS: I noticed that exynos-combiner.c also migrated every interrupt on the
child irq controller.

Any suggestions are appreciated.

Thanks,
Jisheng

2015-07-05 13:06:35

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] irqchip: dw-apb-ictl: add irq_set_affinity support

On Sat, 4 Jul 2015, Russell King - ARM Linux wrote:
> On Sat, Jul 04, 2015 at 11:53:57AM +0200, Thomas Gleixner wrote:
> > On Sat, 4 Jul 2015, Russell King - ARM Linux wrote:
> >
> > > On Sat, Jul 04, 2015 at 01:19:30PM +0800, Jisheng Zhang wrote:
> > > > On Marvell Berlin SoCs, the cpu's local timer is shutdown when the cpu
> > > > goes to a deep idle state, then the timer framework will be notified to
> > > > use a broadcast timer instead. The broadcast timer uses dw-apb-ictl as
> > > > interrupt chip, this patch adds irq_set_affinity support so that the
> > > > going to deep idle state cpu can set the interrupt affinity of the
> > > > broadcast interrupt to avoid unnecessary wakeups and IPIs.
> > >
> > > NAK to this patch.
> > >
> > > The real question is - if CPU0 is the CPU going offline, why is it
> > > still receiving _any_ interrupts - all interrupts should be migrated
> > > off it, including the chained interrupts.
> > >
> > > Sounds like there's a bug in the migration code which needs further
> > > investigation, rather than hacking around the problem by introducing
> > > lots of driver code.
> >
> > I think you misunderstood the changelog, which is horrible btw.
> >
> > So the real reason to do this is to steer the broadcast interrupt to
> > the CPU which has the earliest expiry time. This avoids that another
> > cpu is woken from idle just to deliver the broadcast IPI to the other
> > cpu.
>
> Unless I'm mistaken, the code does this by messing around with the parent
> interrupt affinity of a chained interrupt, which really isn't a good thing
> to do, because it migrates every interrupt on the child interrupt
> controller.

Fair enough, I missed that chained hackery.

For that powersaving scenario it's probably ok to move the all child
irqs around, but we should at least make that an opt-in behaviour and
not enabled by default.

Thanks,

tglx

2015-07-05 08:56:28

by Jisheng Zhang

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] irqchip: dw-apb-ictl: add irq_set_affinity support

Dear Thomas,

On Sat, 4 Jul 2015 14:49:31 +0200
Thomas Gleixner <[email protected]> wrote:

> On Sat, 4 Jul 2015, Russell King - ARM Linux wrote:
> > On Sat, Jul 04, 2015 at 11:53:57AM +0200, Thomas Gleixner wrote:
> > > On Sat, 4 Jul 2015, Russell King - ARM Linux wrote:
> > >
> > > > On Sat, Jul 04, 2015 at 01:19:30PM +0800, Jisheng Zhang wrote:
> > > > > On Marvell Berlin SoCs, the cpu's local timer is shutdown when the cpu
> > > > > goes to a deep idle state, then the timer framework will be notified to
> > > > > use a broadcast timer instead. The broadcast timer uses dw-apb-ictl as
> > > > > interrupt chip, this patch adds irq_set_affinity support so that the
> > > > > going to deep idle state cpu can set the interrupt affinity of the
> > > > > broadcast interrupt to avoid unnecessary wakeups and IPIs.
> > > >
> > > > NAK to this patch.
> > > >
> > > > The real question is - if CPU0 is the CPU going offline, why is it
> > > > still receiving _any_ interrupts - all interrupts should be migrated
> > > > off it, including the chained interrupts.
> > > >
> > > > Sounds like there's a bug in the migration code which needs further
> > > > investigation, rather than hacking around the problem by introducing
> > > > lots of driver code.
> > >
> > > I think you misunderstood the changelog, which is horrible btw.
> > >
> > > So the real reason to do this is to steer the broadcast interrupt to
> > > the CPU which has the earliest expiry time. This avoids that another
> > > cpu is woken from idle just to deliver the broadcast IPI to the other
> > > cpu.
> >
> > Unless I'm mistaken, the code does this by messing around with the parent
> > interrupt affinity of a chained interrupt, which really isn't a good thing
> > to do, because it migrates every interrupt on the child interrupt
> > controller.
>
> Fair enough, I missed that chained hackery.
>
> For that powersaving scenario it's probably ok to move the all child
> irqs around, but we should at least make that an opt-in behaviour and
> not enabled by default.

Thank you for your suggestion. Is is acceptable to make an config option
such as DW_APB_ICTL_SET_AFFINITY, and warn enabled this would migrates every
interrupt, and disable it by default?

Thanks a lot,
Jisheng

2015-07-05 13:06:44

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] irqchip: dw-apb-ictl: add irq_set_affinity support

On Sat, 4 Jul 2015, Jisheng Zhang wrote:
> Thank you for your suggestion. Is is acceptable to make an config option
> such as DW_APB_ICTL_SET_AFFINITY, and warn enabled this would migrates every
> interrupt, and disable it by default?

No, this wants to be a runtime/boottime configuration (device tree
might be a possible solution).

Thanks,

tglx