2006-02-13 10:26:43

by Heiko Carstens

[permalink] [raw]
Subject: calibrate_migration_costs takes ages on s390

The boot sequence on s390 sometimes takes ages and we spend a very long time
(up to one or two minutes) in calibrate_migration_costs. The time spent there
differs from boot to boot. Also the calculated costs differ a lot. I've seen
differences by up to a factor of 15 (yes, factor not percent).
Also I doubt that making these measurements make much sense on a completely
virtualized architecture where you cannot tell how much cpu time you will
get anyway.
Is there any workaround or fix available so we can avoid seeing this?

Thanks,
Heiko


2006-02-13 10:35:41

by David Miller

[permalink] [raw]
Subject: Re: calibrate_migration_costs takes ages on s390

From: Heiko Carstens <[email protected]>
Date: Mon, 13 Feb 2006 11:26:34 +0100

> The boot sequence on s390 sometimes takes ages and we spend a very long time
> (up to one or two minutes) in calibrate_migration_costs. The time spent there
> differs from boot to boot. Also the calculated costs differ a lot. I've seen
> differences by up to a factor of 15 (yes, factor not percent).
> Also I doubt that making these measurements make much sense on a completely
> virtualized architecture where you cannot tell how much cpu time you will
> get anyway.
> Is there any workaround or fix available so we can avoid seeing this?

Things are not as slow, but definitely slow on sparc64 too, and it's
also due to the migration cost calculations.

It's also really bad that it's using vmalloc(), for one thing, because
this thrashes the TLB (some of us have 64-entry software replaced
TLBs) and also because you can make no guarentees about how well the
backing physical pages will distribute into the L2 cache.

As a result, wildly different run-to-run results can be expected
particularly for systems with 1-way or 2-way set assosciative L2
caches, which are common on sparc64. I don't know about s390.

I think the migration cost calculator is way overboard and needs to be
toned down a little bit.

2006-02-13 10:48:35

by Ingo Molnar

[permalink] [raw]
Subject: Re: calibrate_migration_costs takes ages on s390


* Heiko Carstens <[email protected]> wrote:

> The boot sequence on s390 sometimes takes ages and we spend a very
> long time (up to one or two minutes) in calibrate_migration_costs. The
> time spent there differs from boot to boot. Also the calculated costs
> differ a lot. I've seen differences by up to a factor of 15 (yes,
> factor not percent). Also I doubt that making these measurements make
> much sense on a completely virtualized architecture where you cannot
> tell how much cpu time you will get anyway. Is there any workaround or
> fix available so we can avoid seeing this?

which is the precise kernel version used? We toned down calibration a
bit recently.

The immediate workaround would be to use the migration_cost=0 boot
parameter.

Generally, i agree that it makes sense to not calibrate at all on
virtual platforms. Does the patch below help? It gives virtual
platforms a way to provide a default migration cost and thus avoid the
boot-time calibration altogether. (I have tested it on x86, it does the
expected thing.) This needs to hit v2.6.16 too.

Ingo

---------
introduce the CONFIG_DEFAULT_MIGRATION_COST method for an architecture
to set the scheduler migration costs. This turns off automatic detection
of migration costs. Makes sense on virtual platforms, where migration
costs are hard to measure accurately.

Signed-off-by: Ingo Molnar <[email protected]>

----

arch/s390/Kconfig | 4 ++++
kernel/sched.c | 13 ++++++++++++-
2 files changed, 16 insertions(+), 1 deletion(-)

Index: linux-robust-list.q/arch/s390/Kconfig
===================================================================
--- linux-robust-list.q.orig/arch/s390/Kconfig
+++ linux-robust-list.q/arch/s390/Kconfig
@@ -80,6 +80,10 @@ config HOTPLUG_CPU
can be controlled through /sys/devices/system/cpu/cpu#.
Say N if you want to disable CPU hotplug.

+config DEFAULT_MIGRATION_COST
+ int
+ default "1000000"
+
config MATHEMU
bool "IEEE FPU emulation"
depends on MARCH_G5
Index: linux-robust-list.q/kernel/sched.c
===================================================================
--- linux-robust-list.q.orig/kernel/sched.c
+++ linux-robust-list.q/kernel/sched.c
@@ -5159,7 +5159,18 @@ static void init_sched_build_groups(stru
#define MAX_DOMAIN_DISTANCE 32

static unsigned long long migration_cost[MAX_DOMAIN_DISTANCE] =
- { [ 0 ... MAX_DOMAIN_DISTANCE-1 ] = -1LL };
+ { [ 0 ... MAX_DOMAIN_DISTANCE-1 ] =
+/*
+ * Architectures may override the migration cost and thus avoid
+ * boot-time calibration. Unit is nanoseconds. Mostly useful for
+ * virtualized hardware:
+ */
+#ifdef CONFIG_DEFAULT_MIGRATION_COST
+ CONFIG_DEFAULT_MIGRATION_COST
+#else
+ -1LL
+#endif
+};

/*
* Allow override of migration cost - in units of microseconds.

2006-02-13 10:56:16

by Ingo Molnar

[permalink] [raw]
Subject: Re: calibrate_migration_costs takes ages on s390


* David S. Miller <[email protected]> wrote:

> Things are not as slow, but definitely slow on sparc64 too, and it's
> also due to the migration cost calculations.
>
> It's also really bad that it's using vmalloc(), for one thing, because
> this thrashes the TLB (some of us have 64-entry software replaced
> TLBs) and also because you can make no guarentees about how well the
> backing physical pages will distribute into the L2 cache.

the TLB trashing is intended, to calculate the worst-case migration
cost. If userspace is TLB-intensive, it will trash TLBs just as much.

> As a result, wildly different run-to-run results can be expected
> particularly for systems with 1-way or 2-way set assosciative L2
> caches, which are common on sparc64. I don't know about s390.

s390 is clearly a special-base, being a virtual platform. But the
calibration should be improved to work better on sparc64.

Do things get better if you fill out include/asm-sparc64/system.h's
sched_cacheflush() function, to flush the L2 cache? That should at least
make the cache state more or less reproducable across runs.

Ingo

2006-02-13 16:13:18

by Heiko Carstens

[permalink] [raw]
Subject: Re: calibrate_migration_costs takes ages on s390

> > The boot sequence on s390 sometimes takes ages and we spend a very
> > long time (up to one or two minutes) in calibrate_migration_costs. The
> > time spent there differs from boot to boot. Also the calculated costs
> > differ a lot. I've seen differences by up to a factor of 15 (yes,
> > factor not percent). Also I doubt that making these measurements make
> > much sense on a completely virtualized architecture where you cannot
> > tell how much cpu time you will get anyway. Is there any workaround or
> > fix available so we can avoid seeing this?
>
> which is the precise kernel version used? We toned down calibration a
> bit recently.

2.6.16-rc3.

> The immediate workaround would be to use the migration_cost=0 boot
> parameter.
>
> Generally, i agree that it makes sense to not calibrate at all on
> virtual platforms. Does the patch below help? It gives virtual
> platforms a way to provide a default migration cost and thus avoid the
> boot-time calibration altogether. (I have tested it on x86, it does the
> expected thing.) This needs to hit v2.6.16 too.

Yes, calibrate_migration_costs is very fast now. But it turned out that
this was just hiding the real problem: if we have CONFIG_PREEMPT disabled
the kernel gets (sometimes) unbelievably slow.
I think this happened somewhere between rc1 and rc3. Maybe Hannes knows
more exactly when this happened the first time, since I always run with
CONFIG_PREEMPT enabled.

Heiko

2006-02-13 20:58:55

by David Miller

[permalink] [raw]
Subject: Re: calibrate_migration_costs takes ages on s390

From: Ingo Molnar <[email protected]>
Date: Mon, 13 Feb 2006 11:54:21 +0100

> Do things get better if you fill out include/asm-sparc64/system.h's
> sched_cacheflush() function, to flush the L2 cache? That should at least
> make the cache state more or less reproducable across runs.

Yes, I tried to implement that, and it makes the migration cost
calculation take 4 to 5 times longer, so I'm leaving it unimplemented.

2006-02-13 23:42:57

by Olaf Hering

[permalink] [raw]
Subject: Re: calibrate_migration_costs takes ages on s390

On Mon, Feb 13, Ingo Molnar wrote:

>
> * Heiko Carstens <[email protected]> wrote:
>
> > The boot sequence on s390 sometimes takes ages and we spend a very
> > long time (up to one or two minutes) in calibrate_migration_costs. The
> > time spent there differs from boot to boot. Also the calculated costs
> > differ a lot. I've seen differences by up to a factor of 15 (yes,
> > factor not percent). Also I doubt that making these measurements make
> > much sense on a completely virtualized architecture where you cannot
> > tell how much cpu time you will get anyway. Is there any workaround or
> > fix available so we can avoid seeing this?
>
> which is the precise kernel version used? We toned down calibration a
> bit recently.

We did a bit of testing, -rc2-git3 + the patch below was still ok.

[PATCH] s390: earlier initialization of cpu_possible_map
9733e2407ad2237867cb13c04e7d619397fa3090

2006-02-14 00:08:11

by Olaf Hering

[permalink] [raw]
Subject: Re: calibrate_migration_costs takes ages on s390

On Tue, Feb 14, Olaf Hering wrote:

> We did a bit of testing, -rc2-git3 + the patch below was still ok.
>
> [PATCH] s390: earlier initialization of cpu_possible_map
> 9733e2407ad2237867cb13c04e7d619397fa3090

I need to double check, but -git5 + that patch was reported to be slow.

2006-02-14 08:09:56

by Heiko Carstens

[permalink] [raw]
Subject: Re: calibrate_migration_costs takes ages on s390

> > We did a bit of testing, -rc2-git3 + the patch below was still ok.
> >
> > [PATCH] s390: earlier initialization of cpu_possible_map
> > 9733e2407ad2237867cb13c04e7d619397fa3090
>
> I need to double check, but -git5 + that patch was reported to be slow.

I did a quick git bisect search. This is one is the hurting one:

Author: Ingo Molnar <[email protected]> 2006-02-07 21:58:54
Committer: Linus Torvalds <[email protected]> 2006-02-08 01:12:33
Parent: 8519fb30e438f8088b71a94a7d5a660a814d3872 ([PATCH] mm: compound release fix)
Child: 0d4c3e7a8c65892c7d6a748fdbb4499e988880db ([PATCH] unshare system call -v5: Documentation file)

[PATCH] Fix spinlock debugging delays to not time out too early

The spinlock-debug wait-loop was using loops_per_jiffy to detect too long
spinlock waits - but on fast CPUs this led to a way too fast timeout and false
messages.

The fix is to include a __delay(1) call in the loop, to correctly approximate
the intended delay timeout of 1 second. The code assumes that every
architecture implements __delay(1) to last around 1/(loops_per_jiffy*HZ)
seconds.

Signed-off-by: Ingo Molnar <[email protected]>
Cc: Andi Kleen <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>

I guess we're once again suffering from being a virtualized platform: the
formerly used call to cpu_relax() informed the underlying hypervisor that
we want to give up the current cpu while __delay() keeps it.
Unless we're scheduled away involuntarily.
The "Detect Soft Lockups" option doesn't make too much sense too on our
platform, since we get a lot of false positives.
Quick fix: turn off the options CONFIG_DEBUG_SPINLOCK and
CONFIG_DETECT_SOFTLOCKUP.

Heiko

2006-02-14 10:56:16

by Heiko Carstens

[permalink] [raw]
Subject: Re: calibrate_migration_costs takes ages on s390

> I did a quick git bisect search. This is one is the hurting one:
>
> Author: Ingo Molnar <[email protected]> 2006-02-07 21:58:54
> Committer: Linus Torvalds <[email protected]> 2006-02-08 01:12:33
> Parent: 8519fb30e438f8088b71a94a7d5a660a814d3872 ([PATCH] mm: compound release fix)
> Child: 0d4c3e7a8c65892c7d6a748fdbb4499e988880db ([PATCH] unshare system call -v5: Documentation file)
>
> The fix is to include a __delay(1) call in the loop, to correctly approximate
> the intended delay timeout of 1 second. The code assumes that every
> architecture implements __delay(1) to last around 1/(loops_per_jiffy*HZ)
> seconds.
>
> I guess we're once again suffering from being a virtualized platform: the
> formerly used call to cpu_relax() informed the underlying hypervisor that
> we want to give up the current cpu while __delay() keeps it.
> Unless we're scheduled away involuntarily.
> The "Detect Soft Lockups" option doesn't make too much sense too on our
> platform, since we get a lot of false positives.
> Quick fix: turn off the options CONFIG_DEBUG_SPINLOCK and
> CONFIG_DETECT_SOFTLOCKUP.

Wrong analysis. Our __delay() implementation is broken. This doesn't help for
the CONFIG_DETECT_SOFTLOCKUP case, but at least CONFIG_DEBUG_SPINLOCK works
again with this.

Andrew, could you pick this one up, or should I send it separately?

[PATCH] s390: fix __delay implementation

From: Heiko Carstens <[email protected]>

Fix __delay implementation. Called with an argument "1" or "0" it
would loop nearly forever (since (1/2)-1 = 0xffffffff).

Signed-off-by: Heiko Carstens <[email protected]>
---

arch/s390/lib/delay.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/s390/lib/delay.c b/arch/s390/lib/delay.c
index e96c35b..71f0a2f 100644
--- a/arch/s390/lib/delay.c
+++ b/arch/s390/lib/delay.c
@@ -30,7 +30,7 @@ void __delay(unsigned long loops)
*/
__asm__ __volatile__(
"0: brct %0,0b"
- : /* no outputs */ : "r" (loops/2) );
+ : /* no outputs */ : "r" ((loops/2) + 1));
}

/*

2006-02-14 12:37:09

by Ingo Molnar

[permalink] [raw]
Subject: Re: calibrate_migration_costs takes ages on s390


* Heiko Carstens <[email protected]> wrote:

> --- a/arch/s390/lib/delay.c
> +++ b/arch/s390/lib/delay.c
> @@ -30,7 +30,7 @@ void __delay(unsigned long loops)
> */
> __asm__ __volatile__(
> "0: brct %0,0b"
> - : /* no outputs */ : "r" (loops/2) );
> + : /* no outputs */ : "r" ((loops/2) + 1));
> }

ahh ... that explains the delays indeed!

Ingo

2006-02-14 12:39:13

by Ingo Molnar

[permalink] [raw]
Subject: Re: calibrate_migration_costs takes ages on s390


* Ingo Molnar <[email protected]> wrote:

> * Heiko Carstens <[email protected]> wrote:
>
> > --- a/arch/s390/lib/delay.c
> > +++ b/arch/s390/lib/delay.c
> > @@ -30,7 +30,7 @@ void __delay(unsigned long loops)
> > */
> > __asm__ __volatile__(
> > "0: brct %0,0b"
> > - : /* no outputs */ : "r" (loops/2) );
> > + : /* no outputs */ : "r" ((loops/2) + 1));
> > }
>
> ahh ... that explains the delays indeed!

just to make sure, i've checked all the other __delay() implementations
in the kernel, and none seems to have such problems.

Ingo

2006-02-16 06:27:42

by Heiko Carstens

[permalink] [raw]
Subject: Re: calibrate_migration_costs takes ages on s390

> introduce the CONFIG_DEFAULT_MIGRATION_COST method for an architecture
> to set the scheduler migration costs. This turns off automatic detection
> of migration costs. Makes sense on virtual platforms, where migration
> costs are hard to measure accurately.
>
> Signed-off-by: Ingo Molnar <[email protected]>
>
> ----
>
> arch/s390/Kconfig | 4 ++++
> kernel/sched.c | 13 ++++++++++++-
> 2 files changed, 16 insertions(+), 1 deletion(-)
>
> Index: linux-robust-list.q/arch/s390/Kconfig
> ===================================================================
> --- linux-robust-list.q.orig/arch/s390/Kconfig
> +++ linux-robust-list.q/arch/s390/Kconfig
> @@ -80,6 +80,10 @@ config HOTPLUG_CPU
> can be controlled through /sys/devices/system/cpu/cpu#.
> Say N if you want to disable CPU hotplug.
>
> +config DEFAULT_MIGRATION_COST
> + int
> + default "1000000"
> +
> config MATHEMU
> bool "IEEE FPU emulation"
> depends on MARCH_G5
> Index: linux-robust-list.q/kernel/sched.c
> ===================================================================
> --- linux-robust-list.q.orig/kernel/sched.c
> +++ linux-robust-list.q/kernel/sched.c
> @@ -5159,7 +5159,18 @@ static void init_sched_build_groups(stru
> #define MAX_DOMAIN_DISTANCE 32
>
> static unsigned long long migration_cost[MAX_DOMAIN_DISTANCE] =
> - { [ 0 ... MAX_DOMAIN_DISTANCE-1 ] = -1LL };
> + { [ 0 ... MAX_DOMAIN_DISTANCE-1 ] =
> +/*
> + * Architectures may override the migration cost and thus avoid
> + * boot-time calibration. Unit is nanoseconds. Mostly useful for
> + * virtualized hardware:
> + */
> +#ifdef CONFIG_DEFAULT_MIGRATION_COST
> + CONFIG_DEFAULT_MIGRATION_COST
> +#else
> + -1LL
> +#endif
> +};
>
> /*
> * Allow override of migration cost - in units of microseconds.
> -

This one should be applied then.

Thanks,
Heiko