2008-10-06 05:04:20

by Yanmin Zhang

[permalink] [raw]
Subject: aim7 47% regression with 2.6.27-rc8

Comparing with 2.6.27-rc8, aim7 result has about 47% regression with 2.6.27-rc8 on
my 16-core tigerton and 8 core+HyperThreading x86_64 machine.

I bisected it down to patch:

302745699c1b675b5d2a1af87271de10e4d96b6a is first bad commit
commit 302745699c1b675b5d2a1af87271de10e4d96b6a
Author: Thomas Gleixner <[email protected]>
Date: Mon Sep 22 19:02:25 2008 +0200

clockevents: check broadcast device not tick device

Impact: Possible hang on CPU online observed on AMD C1E machines.

The broadcast setup code looks at the mode of the tick device to
determine whether it needs to be shut down or setup. This is wrong
when the broadcast mode is set to one shot already. This can happen
when a CPU is brought online as it goes through the periodic setup
first.

The problem went unnoticed as sane systems do not call into that code
before the switch to one shot for the clock event device happens.
The AMD C1E idle routine switches over immediately and thereby shuts
down the just setup device before the first interrupt happens.


After I reverted the patch against 2.6.27-rc8, the regression disappears.
It's interesting that the regression doesn't exist on 8-core stoakley.

-yanmin


2008-10-06 06:00:22

by Yanmin Zhang

[permalink] [raw]
Subject: Re: aim7 47% regression with 2.6.27-rc8


On Mon, 2008-10-06 at 07:38 +0200, Mike Galbraith wrote:
> On Mon, 2008-10-06 at 12:58 +0800, Zhang, Yanmin wrote:
> > Comparing with 2.6.27-rc8, aim7 result has about 47% regression with 2.6.27-rc8 on
> > my 16-core tigerton and 8 core+HyperThreading x86_64 machine.
> >
> > I bisected it down to patch:
> >
> > 302745699c1b675b5d2a1af87271de10e4d96b6a is first bad commit
>
> > After I reverted the patch against 2.6.27-rc8, the regression disappears.
> > It's interesting that the regression doesn't exist on 8-core stoakley.
>
> Looks like it's been fixed meanwhile.
I did a new quick test and the patch does fix it.

Thanks,
Yanmin

>
> commit 07454bfff151d2465ada809bbaddf3548cc1097c
> Author: Thomas Gleixner <[email protected]>
> Date: Sat Oct 4 10:51:07 2008 +0200
>
> clockevents: check broadcast tick device not the clock events device
>
> Impact: jiffies increment too fast.
>
> Hugh Dickins noted that with NOHZ=n and HIGHRES=n jiffies get
> incremented too fast. The reason is a wrong check in the broadcast
> enter/exit code, which keeps the local apic timer in periodic mode
> when the switch happens.
>
> Signed-off-by: Thomas Gleixner <[email protected]>
>
> diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
> index bd70345..cb01cd8 100644
> --- a/kernel/time/tick-broadcast.c
> +++ b/kernel/time/tick-broadcast.c
> @@ -235,7 +235,8 @@ static void tick_do_broadcast_on_off(void *why)
> case CLOCK_EVT_NOTIFY_BROADCAST_FORCE:
> if (!cpu_isset(cpu, tick_broadcast_mask)) {
> cpu_set(cpu, tick_broadcast_mask);
> - if (bc->mode == TICKDEV_MODE_PERIODIC)
> + if (tick_broadcast_device.mode ==
> + TICKDEV_MODE_PERIODIC)
> clockevents_shutdown(dev);
> }
> if (*reason == CLOCK_EVT_NOTIFY_BROADCAST_FORCE)
> @@ -245,7 +246,8 @@ static void tick_do_broadcast_on_off(void *why)
> if (!tick_broadcast_force &&
> cpu_isset(cpu, tick_broadcast_mask)) {
> cpu_clear(cpu, tick_broadcast_mask);
> - if (bc->mode == TICKDEV_MODE_PERIODIC)
> + if (tick_broadcast_device.mode ==
> + TICKDEV_MODE_PERIODIC)
> tick_setup_periodic(dev, 0);
> }
> break;
>
>

2008-10-06 06:59:40

by Thomas Gleixner

[permalink] [raw]
Subject: Re: aim7 47% regression with 2.6.27-rc8

On Mon, 6 Oct 2008, Zhang, Yanmin wrote:
> Comparing with 2.6.27-rc8, aim7 result has about 47% regression with 2.6.27-rc8 on
> my 16-core tigerton and 8 core+HyperThreading x86_64 machine.
>
> I bisected it down to patch:
>
> 302745699c1b675b5d2a1af87271de10e4d96b6a is first bad commit
> commit 302745699c1b675b5d2a1af87271de10e4d96b6a
> Author: Thomas Gleixner <[email protected]>
> Date: Mon Sep 22 19:02:25 2008 +0200
>
> clockevents: check broadcast device not tick device
>
> Impact: Possible hang on CPU online observed on AMD C1E machines.
>
> The broadcast setup code looks at the mode of the tick device to
> determine whether it needs to be shut down or setup. This is wrong
> when the broadcast mode is set to one shot already. This can happen
> when a CPU is brought online as it goes through the periodic setup
> first.
>
> The problem went unnoticed as sane systems do not call into that code
> before the switch to one shot for the clock event device happens.
> The AMD C1E idle routine switches over immediately and thereby shuts
> down the just setup device before the first interrupt happens.
>
>
> After I reverted the patch against 2.6.27-rc8, the regression disappears.
> It's interesting that the regression doesn't exist on 8-core stoakley.

There is a fixup patch for this one after rc8. 07454bfff151d2465ada809bbaddf3548cc1097c
Can you check with this one applied please ?

Thanks,

tglx