2004-09-08 11:17:56

by Ingo Molnar

[permalink] [raw]
Subject: [patch] preempt-smp.patch, 2.6.9-rc1-bk14


the attached preempt-smp.patch is another piece of the voluntary-preempt
patchset: it reworks the way kernel-preemption is done on SMP, to solve
3 generic problem areas that introduce unbound latencies. Here's a
description of the changes:

SMP locking latencies are one of the last architectural problems that
cause millisec-category scheduling delays. CONFIG_PREEMPT tries to solve
some of the SMP issues but there are still lots of problems remaining:
spinlocks nested at multiple levels, spinning with irqs turned off, and
non-nested spinning with preemption turned off permanently.

The nesting problem goes like this: if a piece of kernel code (e.g. the
MM or ext3's journalling code) does the following:

spin_lock(&spinlock_1);
...
spin_lock(&spinlock_2);
...

then even with CONFIG_PREEMPT enabled, current kernels may spin on
spinlock_2 indefinitely. A number of critical sections break their long
paths by using cond_resched_lock(), but this does not break the path on
SMP, because need_resched() *of the other CPU* is not set so
cond_resched_lock() doesnt notice that a reschedule is due.

to solve this problem i've introduced a new spinlock field,
lock->break_lock, which signals towards the holding CPU that a
spinlock-break is requested by another CPU. This field is only set if a
CPU is spinning in a spinlock function [at any locking depth], so the
default overhead is zero. I've extended cond_resched_lock() to check for
this flag - in this case we can also save a reschedule. I've added the
lock_need_resched(lock) and need_lockbreak(lock) methods to check for
the need to break out of a critical section.

Another latency problem was that the stock kernel, even with
CONFIG_PREEMPT enabled, didnt have any spin-nicely preemption logic for
the following, commonly used SMP locking primitives: read_lock(),
spin_lock_irqsave(), spin_lock_irq(), spin_lock_bh(),
read_lock_irqsave(), read_lock_irq(), read_lock_bh(),
write_lock_irqsave(), write_lock_irq(), write_lock_bh(). Only
spin_lock() and write_lock() [the two simplest cases] where covered.

In addition to the preemption latency problems, the _irq() variants in
the above list didnt do any IRQ-enabling while spinning - possibly
resulting in excessive irqs-off sections of code!

preempt-smp.patch fixes all these latency problems by spinning
irq-nicely (if possible) and by requesting lock-breaks if needed. Two
architecture-level changes were necessary for this: the addition of the
break_lock field to spinlock_t and rwlock_t, and the addition of the
_raw_read_trylock() function.

Testing done by Mark H Johnson and myself indicate SMP latencies
comparable to the UP kernel - while they were basically indefinitely
high without this patch.

i successfully test-compiled and test-booted this patch ontop of BK-curr
using the following .config combinations: SMP && PREEMPT, !SMP &&
PREEMPT, SMP && !PREEMPT and !SMP && !PREEMPT on x86, !SMP && !PREEMPT
and SMP && PREEMPT on x64. I also test-booted x86 with the
generic_read_trylock function to check that it works fine. Essentially
the same patch has been in testing as part of the voluntary-preempt
patches for some time already.

!PREEMPT or !SMP kernels are not affected by this patch - other than by
the un-inlining of cond_resched_lock(), which makes sense on all
kernels.

NOTE to architecture maintainers: generic_raw_read_trylock() is a crude
version that should be replaced with the proper arch-optimized version
ASAP.

Ingo


Attachments:
(No filename) (3.39 kB)
preempt-smp.patch (24.71 kB)
Download all attachments

2004-09-08 11:40:10

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [patch] preempt-smp.patch, 2.6.9-rc1-bk14

> NOTE to architecture maintainers: generic_raw_read_trylock() is a crude
> version that should be replaced with the proper arch-optimized version
> ASAP.

I'd suggest not providing it at all then, because that forces arch
maintainers to implement it.

2004-09-08 11:43:23

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch] preempt-smp.patch, 2.6.9-rc1-bk14


* Christoph Hellwig <[email protected]> wrote:

> > NOTE to architecture maintainers: generic_raw_read_trylock() is a crude
> > version that should be replaced with the proper arch-optimized version
> > ASAP.
>
> I'd suggest not providing it at all then, because that forces arch
> maintainers to implement it.

while generic it's actually correct for the purpose it's being used for
right now and all architectures should thus compile & work fine.

Ingo

2004-09-08 12:42:40

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [patch] preempt-smp.patch, 2.6.9-rc1-bk14

On Wed, Sep 08, 2004 at 01:44:46PM +0200, Ingo Molnar wrote:
>
> * Christoph Hellwig <[email protected]> wrote:
>
> > > NOTE to architecture maintainers: generic_raw_read_trylock() is a crude
> > > version that should be replaced with the proper arch-optimized version
> > > ASAP.
> >
> > I'd suggest not providing it at all then, because that forces arch
> > maintainers to implement it.
>
> while generic it's actually correct for the purpose it's being used for
> right now and all architectures should thus compile & work fine.

Well, if it goes into -mm I'd rather see it not commpile first and let
arch maintainers do proper version instead of forgetting it. And
unless I misread the code it's only used for SMP && PREEMPT anyway, right?

2004-09-08 12:46:44

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch] preempt-smp.patch, 2.6.9-rc1-bk14


* Christoph Hellwig <[email protected]> wrote:

> > while generic it's actually correct for the purpose it's being used for
> > right now and all architectures should thus compile & work fine.
>
> Well, if it goes into -mm I'd rather see it not commpile first and let
> arch maintainers do proper version instead of forgetting it. And
> unless I misread the code it's only used for SMP && PREEMPT anyway,
> right?

right now it's only for SMP && PREEMPT, yes. Both variants are OK to me.
We can add a #warning to the generic define too.

Ingo

2004-09-08 12:57:14

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch] preempt-smp.patch, 2.6.9-rc1-bk14


* Zwane Mwaikambo <[email protected]> wrote:

> > to solve this problem i've introduced a new spinlock field,
> > lock->break_lock, which signals towards the holding CPU that a
> > spinlock-break is requested by another CPU. This field is only set if a
> > CPU is spinning in a spinlock function [at any locking depth], so the
> > default overhead is zero. I've extended cond_resched_lock() to check for
> > this flag - in this case we can also save a reschedule. I've added the
> > lock_need_resched(lock) and need_lockbreak(lock) methods to check for
> > the need to break out of a critical section.
>
> Doesn't having break_lock within the same cacheline as lock bounce the
> line around more?

in fact this way it bounces less than if it were on a separate
cacheline. Contention causes bouncing anyway. This way we already have
the cacheline dirty and on the local CPU when we set break_lock, which
the lockholder CPU bounces back when it breaks the lock and/or releases
the lock.

Ingo

2004-09-08 13:05:35

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch] preempt-smp.patch, 2.6.9-rc1-bk14


* Zwane Mwaikambo <[email protected]> wrote:

> > In addition to the preemption latency problems, the _irq() variants in
> > the above list didnt do any IRQ-enabling while spinning - possibly
> > resulting in excessive irqs-off sections of code!
>
> I had a patch for this
> http://www.ussg.iu.edu/hypermail/linux/kernel/0405.3/0578.html and it
> has been running for about 3 months now on a heavily used 4 processor
> box. It's all a matter of whether Andrew is feeling brave ;)

at a quick glance your patch doesnt seem to cover the following locking
primitives: read_lock_irqsave(), read_lock_irq(), write_lock_irqsave,
write_lock_irq(). Also, i think your 2.6.6 patch doesnt apply anymore
because it clashes with your very nice out-of-line spinlocks patch that
went into -BK recently ;)

anyway, the preempt-smp.patch is a complete and systematic solution that
has been tested, measured and traced quite heavily.

Ingo

2004-09-08 12:57:15

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: [patch] preempt-smp.patch, 2.6.9-rc1-bk14

On Wed, 8 Sep 2004, Ingo Molnar wrote:

> to solve this problem i've introduced a new spinlock field,
> lock->break_lock, which signals towards the holding CPU that a
> spinlock-break is requested by another CPU. This field is only set if a
> CPU is spinning in a spinlock function [at any locking depth], so the
> default overhead is zero. I've extended cond_resched_lock() to check for
> this flag - in this case we can also save a reschedule. I've added the
> lock_need_resched(lock) and need_lockbreak(lock) methods to check for
> the need to break out of a critical section.

Doesn't having break_lock within the same cacheline as lock bounce the
line around more?

> In addition to the preemption latency problems, the _irq() variants in
> the above list didnt do any IRQ-enabling while spinning - possibly
> resulting in excessive irqs-off sections of code!

I had a patch for this
http://www.ussg.iu.edu/hypermail/linux/kernel/0405.3/0578.html and it has
been running for about 3 months now on a heavily used 4 processor box.
It's all a matter of whether Andrew is feeling brave ;)

Thanks,
Zwane

2004-09-08 13:29:47

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: [patch] preempt-smp.patch, 2.6.9-rc1-bk14

On Wed, 8 Sep 2004, Ingo Molnar wrote:

> at a quick glance your patch doesnt seem to cover the following locking
> primitives: read_lock_irqsave(), read_lock_irq(), write_lock_irqsave,
> write_lock_irq(). Also, i think your 2.6.6 patch doesnt apply anymore
> because it clashes with your very nice out-of-line spinlocks patch that
> went into -BK recently ;)

Yes i intentionally avoided rwlocks, in that case i think write_lock_*
would be the one to work on, but this is all covered by CONFIG_PREEMPT and
your patch now.

Thanks,
Zwane

2004-09-13 10:24:21

by Ingo Molnar

[permalink] [raw]
Subject: [patch] preempt-smp.patch, 2.6.9-rc1-mm5


the attached patch is a merge of the preempt-smp patch to 2.6.9-rc1-mm5.

Changes since the -mm4 version: fixed it to work with CONFIG_LOCKMETER.

I compiled & booted all possible combinations of SMP/UP, PREEMPT and
LOCKMETER on x86 and it works fine. I also test-built and booted the
patched kernel on x64-SMP-PREEMPT. Details in the patch metadata.

Ingo


Attachments:
(No filename) (359.00 B)
preempt-smp.patch (25.35 kB)
Download all attachments