2004-09-14 09:17:00

by Ingo Molnar

[permalink] [raw]
Subject: [patch] preempt-cleanup.patch, 2.6.9-rc2


the attached patch is ontop of preempt-smp.patch. This is another
generic fallout from the voluntary-preempt patchset: a cleanup of the
cond_resched() infrastructure, in preparation of the latency reduction
patches. The changes:

- uninline cond_resched() - this makes the footprint smaller,
especially once the number of cond_resched() points increase.

- add a 'was rescheduled' return value to cond_resched. This makes it
symmetric to cond_resched_lock() and later latency reduction patches
rely on the ability to tell whether there was any preemption.

- make cond_resched() more robust by using the same mechanism as
preempt_kernel(): by using PREEMPT_ACTIVE. This preserves the task's
state - e.g. if the task is in TASK_ZOMBIE but gets preempted via
cond_resched() just prior scheduling off then this approach preserves
TASK_ZOMBIE.

- the patch also adds need_lockbreak() which critical sections can use
to detect lock-break requests.

i've tested the patch on x86 SMP and UP.

Ingo


Attachments:
(No filename) (1.00 kB)
preempt-cleanup.patch (2.90 kB)
Download all attachments

2004-09-14 09:34:36

by William Lee Irwin III

[permalink] [raw]
Subject: [printk] make console_conditional_schedule() __sched and use cond_resched()

On Tue, Sep 14, 2004 at 11:15:29AM +0200, Ingo Molnar wrote:
> the attached patch is ontop of preempt-smp.patch. This is another
> generic fallout from the voluntary-preempt patchset: a cleanup of the
> cond_resched() infrastructure, in preparation of the latency reduction
> patches. The changes:

Relatively minor add-on (not necessarily tied to it or required to be
taken or a fix for any bug). Since cond_resched() is using
PREEMPT_ACTIVE now, it may be useful to update the open-coded instance
of cond_resched() to use the generic call. Also, it should probably be
__sched so the caller shows up in wchan.


-- wli

Index: mm5-2.6.9-rc1/kernel/printk.c
===================================================================
--- mm5-2.6.9-rc1.orig/kernel/printk.c 2004-08-24 01:13:48.000000000 -0700
+++ mm5-2.6.9-rc1/kernel/printk.c 2004-09-14 02:23:41.222204160 -0700
@@ -658,12 +658,10 @@
*
* Must be called within acquire_console_sem().
*/
-void console_conditional_schedule(void)
+void __sched console_conditional_schedule(void)
{
- if (console_may_schedule && need_resched()) {
- set_current_state(TASK_RUNNING);
- schedule();
- }
+ if (console_may_schedule)
+ cond_resched();
}
EXPORT_SYMBOL(console_conditional_schedule);

2004-09-14 09:38:53

by Ingo Molnar

[permalink] [raw]
Subject: [patch] preempt-lock-need-resched.patch, 2.6.9-rc2


the attached preempt-lock-need-resched.patch is ontop of the preempt-smp
and preempt-cleanups patches. It adds lock_need_resched() which is to
check for the necessity of lock-break in a critical section. Used by
later latency-break patches.

tested on x86, should work on all architectures.

Ingo


Attachments:
(No filename) (299.00 B)
preempt-lock-need-resched.patch (933.00 B)
Download all attachments

2004-09-14 09:49:59

by Ingo Molnar

[permalink] [raw]
Subject: [patch] sched: add cond_resched_softirq()


the attached patch comes after preempt-lock-need-resched.patch.

it adds cond_resched_softirq() which can be used by _process context_
softirqs-disabled codepaths to preempt if necessary. The function will
enable softirqs before scheduling. (Later patches will use this
primitive.)

Ingo


Attachments:
(No filename) (290.00 B)
sched-add-cond_resched_softirq.patch (1.40 kB)
Download all attachments

2004-09-14 09:56:32

by Ingo Molnar

[permalink] [raw]
Subject: [patch] sched: fix latency in random driver


the attached patch fixes possibly long scheduling latencies in the
/dev/random driver's rekey_seq_generator() function, by moving the
expensive get_random_bytes() function out from under ip_lock.

has been in the -VP patchset for quite some time.

Ingo


Attachments:
(No filename) (255.00 B)
fix-latency-random.patch (955.00 B)
Download all attachments

2004-09-14 10:07:39

by Ingo Molnar

[permalink] [raw]
Subject: [patch] sched, ext3: fix scheduling latencies in ext3


the attached patch fixes long scheduling latencies in the ext3 code, and
it also cleans up the existing lock-break functionality to use the new
primitives.

This patch has been in the -VP patchset for quite some time.

Ingo


Attachments:
(No filename) (226.00 B)
fix-latency-ext3.patch (5.32 kB)
Download all attachments

2004-09-14 10:12:16

by Ingo Molnar

[permalink] [raw]
Subject: [patch] sched, vfs: fix scheduling latencies in invalidate_inodes()


the attached patch fixes long scheduling latencies in
invalidate_inodes(). The lock-break is a bit tricky to not get into a
livelock scenario: we use a dummy inode as a marker at which point we
can continue the scanning after the schedule.

this patch has been tested as part of the -VP patchset for weeks.

Ingo


Attachments:
(No filename) (315.00 B)
fix-latency-invalidate-inodes.patch (2.57 kB)
Download all attachments

2004-09-14 10:17:55

by Ingo Molnar

[permalink] [raw]
Subject: [patch] sched, vfs: fix scheduling latencies in prune_dcache() and select_parent()


the attached patch fixes long scheduling latencies in select_parent()
and prune_dcache(). The prune_dcache() lock-break is easy, but for
select_parent() the only viable solution i found was to break out if
there's a resched necessary - the reordering is not necessary and the
dcache scanning/shrinking will later on do it anyway.

This patch has been in the -VP patchset for weeks.

Ingo


Attachments:
(No filename) (390.00 B)
fix-latency-dcache.patch (1.22 kB)
Download all attachments

2004-09-14 10:24:12

by Ingo Molnar

[permalink] [raw]
Subject: [patch] sched, net: fix scheduling latencies in netstat


the attached patch fixes long scheduling latencies caused by access to
the /proc/net/tcp file. The seqfile functions keep softirqs disabled for
a very long time (i've seen reports of 20+ msecs, if there are enough
sockets in the system). With the attached patch it's below 100 usecs.

the cond_resched_softirq() relies on the implicit knowledge that this
code executes in process context and runs with softirqs disabled.

potentially enabling softirqs means that the socket list might change
between buckets - but this is not an issue since seqfiles have a 4K
iteration granularity anyway and /proc/net/tcp is often (much) larger
than that.

This patch has been in the -VP patchset for weeks.

Ingo


Attachments:
(No filename) (701.00 B)
fix-latency-netstat.patch (1.48 kB)
Download all attachments

2004-09-14 10:57:53

by Ingo Molnar

[permalink] [raw]
Subject: [patch] sched, mm: fix scheduling latencies in get_user_pages()


The attached patch fixes long scheduling latencies in get_user_pages().

has been tested as part of the -VP patchset.

Ingo


Attachments:
(No filename) (126.00 B)
fix-latency-get_user_pages.patch (503.00 B)
Download all attachments

2004-09-14 11:00:19

by Ingo Molnar

[permalink] [raw]
Subject: [patch] sched, mm: fix scheduling latencies in unmap_vmas()


The attached patch fixes long latencies in unmap_vmas(). We had
lockbreak code in that function already but it did not take delayed
effects of TLB-gather into account.

has been tested as part of the -VP patchset.

Ingo


Attachments:
(No filename) (222.00 B)
fix-latency-unmap_vmas.patch (1.62 kB)
Download all attachments

2004-09-14 11:03:59

by Ingo Molnar

[permalink] [raw]
Subject: [patch] sched, net: fix scheduling latencies in __release_sock


the attached patch fixes long scheduling latencies caused by backlog
triggered by __release_sock(). That code only executes in process
context, and we've made the backlog queue private already at this point
so it is safe to do a cond_resched_softirq().

This patch has been in the -VP patchset for some time.

Ingo


Attachments:
(No filename) (317.00 B)
fix-latency-release_sock.patch (808.00 B)
Download all attachments

2004-09-14 11:07:30

by Ingo Molnar

[permalink] [raw]
Subject: [patch] sched, mm: fix scheduling latencies in filemap_sync()


The attached patch, written by Andrew Morton, fixes long scheduling
latencies in filemap_sync().

has been tested as part of the -VP patchset.

Ingo


Attachments:
(No filename) (151.00 B)
fix-latency-filemap_sync.patch (1.29 kB)
Download all attachments

2004-09-14 11:12:04

by Ingo Molnar

[permalink] [raw]
Subject: [patch] might_sleep() additions to fs-writeback.c


add two more might_sleep() checks: to sync_inode() and
generic_osync_inode().

Signed-off-by: Ingo Molnar <[email protected]>

--- linux/fs/fs-writeback.c.orig
+++ linux/fs/fs-writeback.c
@@ -578,6 +578,7 @@ int sync_inode(struct inode *inode, stru
{
int ret;

+ might_sleep();
spin_lock(&inode_lock);
ret = __writeback_single_inode(inode, wbc);
spin_unlock(&inode_lock);
@@ -622,6 +623,7 @@ int generic_osync_inode(struct inode *in
}
current->flags &= ~PF_SYNCWRITE;

+ might_sleep();
spin_lock(&inode_lock);
if ((inode->i_state & I_DIRTY) &&
((what & OSYNC_INODE) || (inode->i_state & I_DIRTY_DATASYNC)))


Attachments:
(No filename) (631.00 B)
might_sleep-fs-writeback.patch (631.00 B)
Download all attachments

2004-09-14 11:27:43

by Ingo Molnar

[permalink] [raw]
Subject: [patch] sched, tty: fix scheduling latencies in tty_io.c


the attached patch fixes long scheduling latencies in tty_read() and
tty_write() caused by the BKL.

has been tested as part of the -VP patchset and the tty_read() side
has also been tested in earlier -mm trees.

Ingo


Attachments:
(No filename) (222.00 B)
fix-latency-tty_io.patch (1.10 kB)
Download all attachments

2004-09-14 11:31:10

by Ingo Molnar

[permalink] [raw]
Subject: [patch] sched, pty: fix scheduling latencies in pty.c


the attached patch fixes long scheduling latencies in pty_read()caused
by the BKL.

has been tested as part of the -VP patchset and in earlier -mm trees.

Ingo


Attachments:
(No filename) (162.00 B)
fix-latency-pty.patch (474.00 B)
Download all attachments

2004-09-14 11:32:46

by Ingo Molnar

[permalink] [raw]
Subject: [patch] fix keventd execution dependency


We dont want to execute off keventd since it might hold a semaphore our
callers hold too. This can happen when kthread_create() is called from
within keventd. This happened due to the IRQ threading patches but it
could happen with other code too.

Ingo


Attachments:
(No filename) (255.00 B)
fix-keventd.patch (1.39 kB)
Download all attachments

2004-09-14 11:32:45

by Ingo Molnar

[permalink] [raw]
Subject: [patch] sched: fix scheduling latencies in vgacon.c


this patch fixes scheduling latencies in vgacon_do_font_op(). The code
is protected by vga_lock already so it's safe to drop (and re-acquire)
the BKL.

has been tested in the -VP patchset.

Ingo


Attachments:
(No filename) (197.00 B)
fix-latency-vgacon.patch (906.00 B)
Download all attachments

2004-09-14 11:45:21

by Ingo Molnar

[permalink] [raw]
Subject: [patch] sched: fix scheduling latencies for !PREEMPT kernels


this patch adds a handful of cond_resched() points to a number of
key, scheduling-latency related non-inlined functions.

this reduces preemption latency for !PREEMPT kernels. These are
scheduling points complementary to PREEMPT_VOLUNTARY scheduling
points (might_sleep() places) - i.e. these are all points where
an explicit cond_resched() had to be added.

has been tested as part of the -VP patchset.

Ingo


Attachments:
(No filename) (412.00 B)
fix-latency-nopreempt.patch (2.78 kB)
Download all attachments

2004-09-14 12:02:33

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch] sched, tty: fix scheduling latencies in tty_io.c


* Alan Cox <[email protected]> wrote:

> On Maw, 2004-09-14 at 12:06, Ingo Molnar wrote:
> > the attached patch fixes long scheduling latencies in tty_read() and
> > tty_write() caused by the BKL.
>
> Have you verified that none of the ldisc methods rely on the big
> kernel lock. In doing the tty audit I found several cases that tty
> drivers still rely on this lock so I am curious why you make this
> change.
>
> Would it not be better to fix the tty layer locking rather than
> introduces new random memory corruptors ?

sure ... any volunteers?

Ingo

2004-09-14 12:14:25

by Alan

[permalink] [raw]
Subject: Re: [patch] sched, tty: fix scheduling latencies in tty_io.c

On Maw, 2004-09-14 at 12:06, Ingo Molnar wrote:
> the attached patch fixes long scheduling latencies in tty_read() and
> tty_write() caused by the BKL.

Have you verified that none of the ldisc methods rely on the big kernel
lock. In doing the tty audit I found several cases that tty drivers
still rely on this lock so I am curious why you make this change.

Would it not be better to fix the tty layer locking rather than
introduces new random memory corruptors ?


Alan

2004-09-14 12:28:59

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch] sched, tty: fix scheduling latencies in tty_io.c


* Alan Cox <[email protected]> wrote:

> On Maw, 2004-09-14 at 13:00, Ingo Molnar wrote:
> > > Would it not be better to fix the tty layer locking rather than
> > > introduces new random memory corruptors ?
> >
> > sure ... any volunteers?
>
> Not that I've seen. I'm fixing some locking but not stuff that depends
> on lock_kernel(). [...]

would it be a sufficient fix to grab some sort of tty_sem mutex in the
places where the patch drops the BKL - or are there other entry paths to
this code?

Ingo

2004-09-14 12:23:08

by Alan

[permalink] [raw]
Subject: Re: [patch] sched, tty: fix scheduling latencies in tty_io.c

On Maw, 2004-09-14 at 13:00, Ingo Molnar wrote:
> > Would it not be better to fix the tty layer locking rather than
> > introduces new random memory corruptors ?
>
> sure ... any volunteers?

Not that I've seen. I'm fixing some locking but not stuff that depends
on lock_kernel(). In the meantime therefore it seems inappropriate to
add random corruptors and bugs to the kernel in pursuit of low latency
other than as private "add this but beware" patches.

The tty I/O patches should not be applied until the tty layer doesn't
rely on the lock_kernel locking. So if people want the low latency stuff
covering the tty code I suggest that rather than making other peoples
machines crash more, they volunteer. The console driver might be a good
starting point.

Alan

2004-09-14 12:58:40

by Nick Piggin

[permalink] [raw]
Subject: Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels

Ingo Molnar wrote:

> --- linux/mm/vmscan.c.orig
> +++ linux/mm/vmscan.c
> @@ -361,6 +361,8 @@ static int shrink_list(struct list_head
> int may_enter_fs;
> int referenced;
>
> + cond_resched();
> +
> page = lru_to_page(page_list);
> list_del(&page->lru);
>
> @@ -710,6 +712,7 @@ refill_inactive_zone(struct zone *zone,
> reclaim_mapped = 1;
>
> while (!list_empty(&l_hold)) {
> + cond_resched();
> page = lru_to_page(&l_hold);
> list_del(&page->lru);
> if (page_mapped(page)) {


Could these ones go up a level? We break down scanning into 32 page
chunks, so I don't think it needs to be checked every page.

in shrink_zone(), maybe stick them:

while (nr_active || nr_inactive) {
if (nr_active) {
---> here
sc->nr_to_scan = min(nr_active,
(unsigned long)SWAP_CLUSTER_MAX);
nr_active -= sc->nr_to_scan;
refill_inactive_zone(zone, sc);
}

if (nr_inactive) {
---> and here
sc->nr_to_scan = min(nr_inactive,
(unsigned long)SWAP_CLUSTER_MAX);
nr_inactive -= sc->nr_to_scan;
shrink_cache(zone, sc);
if (sc->nr_to_reclaim <= 0)
break;
}
}

Does that work for you?

2004-09-14 11:37:56

by Ingo Molnar

[permalink] [raw]
Subject: [patch] sched: fix scheduling latencies in NTFS mount


this patch fixes scheduling latencies in the NTFS mount path. We are
dropping the BKL because the code itself is using the ntfs_lock
semaphore which provides safe locking.

has been tested as part of the -VP patchset.

Ingo


Attachments:
(No filename) (226.00 B)
fix-latency-ntfs-mount.patch (1.15 kB)
Download all attachments

2004-09-14 11:34:53

by Ingo Molnar

[permalink] [raw]
Subject: [patch] sched: fix scheduling latencies in mttr.c


fix scheduling latencies in the MTRR-setting codepath.
Also, fix bad bug: MTRR's _must_ be set with interrupts disabled!

Ingo

2004-09-14 11:32:45

by Ingo Molnar

[permalink] [raw]
Subject: [patch] sched, mm: fix scheduling latencies in copy_page_range()

The attached patch does a lock-break in copy_page_range() if needed, to
reduce scheduling latencies.

has been tested as part of the -VP patchset.

Ingo


Attachments:
(No filename) (154.00 B)
fix-latency-copy_page_range.patch (1.06 kB)
Download all attachments

2004-09-14 13:18:31

by Alan

[permalink] [raw]
Subject: Re: [patch] sched, tty: fix scheduling latencies in tty_io.c

On Maw, 2004-09-14 at 13:27, Ingo Molnar wrote:
> would it be a sufficient fix to grab some sort of tty_sem mutex in the
> places where the patch drops the BKL - or are there other entry paths to
> this code?

There are a whole load of entry points. I've been trying to document all
the ldisc layer stuff as I fix the basic bugs in it. A semaphore isnt
sufficient because some of the entry points have to be multi-entry so
you'd need to go modify all the ldisc internals (I think that will have
to happen eventually). It also relies on it for read v write locking
still.

So far all I've fixed is the vfs/ldisc locking to ensure that open/close
and other events are properly sequenced. I've not yet finished tackling
the ldisc/driver cases where driver->close() completes and the ldisc
calls the driver still.

At that point its at least then down to the drivers to get fixed.

2004-09-14 13:27:46

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels


* Nick Piggin <[email protected]> wrote:

> Could these ones go up a level? We break down scanning into 32 page
> chunks, so I don't think it needs to be checked every page.

not really - we can occasionally get into high latencies even with a
single page - if a single page is mapped by alot of processes.

Ingo

2004-09-14 13:30:03

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch] sched: fix scheduling latencies in mtrr.c


> fix scheduling latencies in the MTRR-setting codepath.
> Also, fix bad bug: MTRR's _must_ be set with interrupts disabled!

patch attached ...

caveat: one of the wbinvd() removals is correct i believe, but the other
one might be problematic. It doesnt seem logical to do a wbinvd() at
that point though ...

Ingo


Attachments:
(No filename) (318.00 B)
fix-latency-mtrr.patch (1.36 kB)
Download all attachments

2004-09-14 13:33:44

by Anton Altaparmakov

[permalink] [raw]
Subject: Re: [patch] sched: fix scheduling latencies in NTFS mount

On Tue, 2004-09-14 at 12:35, Ingo Molnar wrote:
> this patch fixes scheduling latencies in the NTFS mount path. We are
> dropping the BKL because the code itself is using the ntfs_lock
> semaphore which provides safe locking.
>
> has been tested as part of the -VP patchset.

Looks good. I designed the NTFS driver to not rely on the BKL at all so
you should be able to drop it anywhere you like. (-:

I have applied your patch to my ntfs-2.6-devel bk tree so it doesn't get
lost.

Thanks,

Anton
--
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK
Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net
WWW: http://linux-ntfs.sf.net/, http://www-stu.christs.cam.ac.uk/~aia21/

2004-09-14 13:37:12

by Nick Piggin

[permalink] [raw]
Subject: Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels

Ingo Molnar wrote:
> * Nick Piggin <[email protected]> wrote:
>
>
>>Could these ones go up a level? We break down scanning into 32 page
>>chunks, so I don't think it needs to be checked every page.
>
>
> not really - we can occasionally get into high latencies even with a
> single page - if a single page is mapped by alot of processes.
>

So doing it in the loop doesn't really give you a deterministic
maximum latency if somebody is out to cause trouble, does it?

OTOH, I guess libc or some shared memory segment may be mapped
into a lot of processes even on RT applictions.

Another thing, I don't mean this to sound like a rhetorical question,
but if we have a preemptible kernel, why is it a good idea to sprinkle
cond_rescheds everywhere? Isn't this now the worst of both worlds?
Why would someone who really cares about latency not enable preempt?

2004-09-14 14:09:17

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels

On Tue, Sep 14, 2004 at 11:33:48PM +1000, Nick Piggin wrote:
> cond_rescheds everywhere? Isn't this now the worst of both worlds?

1) cond_resched should become a noop if CONFIG_PREEMPT=y
(cond_resched_lock of course should still unlock/relock if
need_resched() is set, but not __cond_resched).
2) all Ingo's new and old might_sleep should be converted to
cond_resched (or optionally to cond_resched_costly, see point 5).
3) might_sleep should return a debug statement.
4) cond_resched should call might_sleep if need_resched is not set if
CONFIG_PREEMPT=n is disabled, and it should _only_ call might_sleep
if CONFIG_PREEMPT=y after we implement point 1.
5) no further config option should exist (if we really add an option
it should be called CONFIG_COND_RESCHED_COSTLY of similar to
differentiate scheduling points in fast paths (like spinlock places
with CONFIG_PREEMPT=n) (so you can choose between cond_resched() and
cond_resched_costly())

I recommended point 2,3,4,5 already (a few of them twice), point 1 (your
point) looks lower prio (CONFIG_PREEMPT=y already does an overkill of
implicit need_resched() checks anyways).

> Why would someone who really cares about latency not enable preempt?

to avoid lots of worthless cond_resched in all spin_unlock and to avoid
kernel crashes if some driver is not preempt complaint?

I've a better question for you, why would someone ever disable
CONFIG_PREEMPT_VOLUNTARY? That config option is a nosense as far as I
can tell. If something it should be renamed to
"CONFIG_I_DON_T_WANT_TO_RUN_THE_OLD_KERNEL_CODE" ;)

2004-09-14 14:18:42

by Alan

[permalink] [raw]
Subject: Re: [patch] sched: fix scheduling latencies in mtrr.c

On Maw, 2004-09-14 at 14:25, Ingo Molnar wrote:
> caveat: one of the wbinvd() removals is correct i believe, but the other
> one might be problematic. It doesnt seem logical to do a wbinvd() at
> that point though ...

See the intel ppro manual volume 3 page 11-25. Its quite specific about
the sequence, so unless anything changes with AMD or later processors
the change seems to match the description.

IRQs are required to be off far longer than this sequence according to
the docs however, and PGE is supposed to be cleared too.

Step 11: set CD flag
Step 12: wbinvd
Step 13: set pge in CR4 if previously cleared


2004-09-14 14:28:59

by Nick Piggin

[permalink] [raw]
Subject: Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels

Andrea Arcangeli wrote:
> On Tue, Sep 14, 2004 at 11:33:48PM +1000, Nick Piggin wrote:
>
>>cond_rescheds everywhere? Isn't this now the worst of both worlds?
>
>
> 1) cond_resched should become a noop if CONFIG_PREEMPT=y
> (cond_resched_lock of course should still unlock/relock if
> need_resched() is set, but not __cond_resched).

Unfortunately we need to keep the cond_rescheds that are called under
the bkl. Otherwise yes, this would be nice to be able to do.

> 2) all Ingo's new and old might_sleep should be converted to
> cond_resched (or optionally to cond_resched_costly, see point 5).
> 3) might_sleep should return a debug statement.
> 4) cond_resched should call might_sleep if need_resched is not set if
> CONFIG_PREEMPT=n is disabled, and it should _only_ call might_sleep
> if CONFIG_PREEMPT=y after we implement point 1.
> 5) no further config option should exist (if we really add an option
> it should be called CONFIG_COND_RESCHED_COSTLY of similar to
> differentiate scheduling points in fast paths (like spinlock places
> with CONFIG_PREEMPT=n) (so you can choose between cond_resched() and
> cond_resched_costly())
>
> I recommended point 2,3,4,5 already (a few of them twice), point 1 (your
> point) looks lower prio (CONFIG_PREEMPT=y already does an overkill of
> implicit need_resched() checks anyways).
>

Which is why we don't need more of them ;)

>
>>Why would someone who really cares about latency not enable preempt?
>
>
> to avoid lots of worthless cond_resched in all spin_unlock and to avoid
> kernel crashes if some driver is not preempt complaint?
>

Well I don't know how good an argument the crashes one is these days,
but generally (as far as I know) those who really care about latency
shouldn't mind about some extra overheads.

Now I don't disagree with some cond_rescheds for places where !PREEMPT
latency would otherwise be massive, but cases like doing cond_resched
for every page in the scanner is something that you could say is imposing
overhead on people who *don't* want it (ie !PREEMPT).

> I've a better question for you, why would someone ever disable
> CONFIG_PREEMPT_VOLUNTARY? That config option is a nosense as far as I
> can tell. If something it should be renamed to
> "CONFIG_I_DON_T_WANT_TO_RUN_THE_OLD_KERNEL_CODE" ;)
>

:) I don't think Ingo intended this for merging as is. Maybe it is to
test how much progress he has made.

2004-09-14 15:01:57

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels


* Nick Piggin <[email protected]> wrote:

> Another thing, I don't mean this to sound like a rhetorical question,
> but if we have a preemptible kernel, why is it a good idea to sprinkle
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> cond_rescheds everywhere? Isn't this now the worst of both worlds? Why
> would someone who really cares about latency not enable preempt?

two things:

1) none of the big distros enables CONFIG_PREEMPT in their kernels - not
even SuSE. This is pretty telling.

2) 10 new cond_resched()'s are not precisely 'sprinkle everywhere'.

Ingo

2004-09-14 15:09:34

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels

On Wed, Sep 15, 2004 at 12:28:49AM +1000, Nick Piggin wrote:
> Andrea Arcangeli wrote:
> >On Tue, Sep 14, 2004 at 11:33:48PM +1000, Nick Piggin wrote:
> >
> >>cond_rescheds everywhere? Isn't this now the worst of both worlds?
> >
> >
> >1) cond_resched should become a noop if CONFIG_PREEMPT=y
> > (cond_resched_lock of course should still unlock/relock if
> > need_resched() is set, but not __cond_resched).
>
> Unfortunately we need to keep the cond_rescheds that are called under
> the bkl. Otherwise yes, this would be nice to be able to do.

we simply need a cond_resched_bkl() for that, no? Very few places are
still serialized with the BKL, so I don't think it would be a big issue
to convert those few places to use cond_resched_bkl.

> Which is why we don't need more of them ;)

eheh ;)

> Well I don't know how good an argument the crashes one is these days,
> but generally (as far as I know) those who really care about latency
> shouldn't mind about some extra overheads.

sure, that's especially true for the hardirq and softirq total scheduler
offloading. The real question is where a generic desktop positions. I
doubt on a generic desktop a latency over 1msec matters much,
top performance of repetitive tasks that sums up like hardirqs for a NIC
sounds more sensible to me.

And for the other usages RTAI or any other hard realtime sounds safer
anyways.

> Now I don't disagree with some cond_rescheds for places where !PREEMPT
> latency would otherwise be massive, but cases like doing cond_resched
> for every page in the scanner is something that you could say is imposing
> overhead on people who *don't* want it (ie !PREEMPT).

definitely. Note that this could be simply fixed by having a
CONFIG_PREEMPT around it, but the real fix is definitely to make
cond_resched a noop with PREEMPT=y and secondly to add a
cond_resched_bkl defined as the current cond_resched as suggested above.

> :) I don't think Ingo intended this for merging as is. Maybe it is to
> test how much progress he has made.

I hope so, he said the latest patches were cleaned up and he removed the
debugging/performance cruft (the most explicit message was
[email protected]) but it's still not clear to me if he
left the CONFIG_VOLOUNTARY_PREEMPT config option because he intends to
merge it or just temporarily. Comments?

Thanks.

2004-09-14 15:06:25

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch] sched: fix scheduling latencies in mtrr.c


* Alan Cox <[email protected]> wrote:

> On Maw, 2004-09-14 at 14:25, Ingo Molnar wrote:
> > caveat: one of the wbinvd() removals is correct i believe, but the other
> > one might be problematic. It doesnt seem logical to do a wbinvd() at
> > that point though ...
>
> See the intel ppro manual volume 3 page 11-25. Its quite specific
> about the sequence, so unless anything changes with AMD or later
> processors the change seems to match the description.
>
> IRQs are required to be off far longer than this sequence according to
> the docs however, and PGE is supposed to be cleared too.

the problem is, we've had IRQs on in all of the 2.6 kernels so any
argument about extra wbinvd brings the obvious question: 'why did it
work so far'?

the only wbinvd that makes sense is the one you quote:

> Step 11: set CD flag
> Step 12: wbinvd
> Step 13: set pge in CR4 if previously cleared

and i agree that the pge thing should be fixed too.

Ingo

2004-09-14 16:43:30

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels

On Tue, Sep 14, 2004 at 04:09:05PM +0200, Andrea Arcangeli wrote:
> 1) cond_resched should become a noop if CONFIG_PREEMPT=y
> (cond_resched_lock of course should still unlock/relock if
> need_resched() is set, but not __cond_resched).
> 2) all Ingo's new and old might_sleep should be converted to
> cond_resched (or optionally to cond_resched_costly, see point 5).
> 3) might_sleep should return a debug statement.
> 4) cond_resched should call might_sleep if need_resched is not set if
> CONFIG_PREEMPT=n is disabled, and it should _only_ call might_sleep
> if CONFIG_PREEMPT=y after we implement point 1.
> 5) no further config option should exist (if we really add an option
> it should be called CONFIG_COND_RESCHED_COSTLY of similar to
> differentiate scheduling points in fast paths (like spinlock places
> with CONFIG_PREEMPT=n) (so you can choose between cond_resched() and
> cond_resched_costly())
> I recommended point 2,3,4,5 already (a few of them twice), point 1 (your
> point) looks lower prio (CONFIG_PREEMPT=y already does an overkill of
> implicit need_resched() checks anyways).

The might_sleep() in cond_resched() sounds particularly useful to pick
up misapplications of cond_resched().


On Tue, Sep 14, 2004 at 11:33:48PM +1000, Nick Piggin wrote:
>> Why would someone who really cares about latency not enable preempt?
>> cond_rescheds everywhere? Isn't this now the worst of both worlds?

On Tue, Sep 14, 2004 at 04:09:05PM +0200, Andrea Arcangeli wrote:
> to avoid lots of worthless cond_resched in all spin_unlock and to avoid
> kernel crashes if some driver is not preempt complaint?
> I've a better question for you, why would someone ever disable
> CONFIG_PREEMPT_VOLUNTARY? That config option is a nosense as far as I
> can tell. If something it should be renamed to
> "CONFIG_I_DON_T_WANT_TO_RUN_THE_OLD_KERNEL_CODE" ;)

Well, thankfully we've taken the whole of the preempt-related code in
spin_unlock() and all the other locking primitives out of line for the
CONFIG_PREEMPT case (and potentially more, though that would be at the
expense of x86(-64)).

preempt_schedule() is actually what's used in preempt_check_resched(),
not cond_resched(), and this inspired me to take a look at that. What
on earth are people smoking with all these loops done up with gotos?
I suppose this isn't even the half of it, but it's what I looked at.


-- wli

Nuke some superfluous gotos in preempt_schedule().

Index: mm5-2.6.9-rc1/kernel/sched.c
===================================================================
--- mm5-2.6.9-rc1.orig/kernel/sched.c 2004-09-13 16:27:46.998672328 -0700
+++ mm5-2.6.9-rc1/kernel/sched.c 2004-09-14 09:01:46.514087864 -0700
@@ -2464,18 +2464,19 @@
* If there is a non-zero preempt_count or interrupts are disabled,
* we do not want to preempt the current task. Just return..
*/
- if (unlikely(ti->preempt_count || irqs_disabled()))
- return;
-
-need_resched:
- ti->preempt_count = PREEMPT_ACTIVE;
- schedule();
- ti->preempt_count = 0;
+ if (likely(!ti->preempt_count && !irqs_disabled())) {
+ do {
+ ti->preempt_count = PREEMPT_ACTIVE;
+ schedule();
+ ti->preempt_count = 0;

- /* we could miss a preemption opportunity between schedule and now */
- barrier();
- if (unlikely(test_thread_flag(TIF_NEED_RESCHED)))
- goto need_resched;
+ /*
+ * Without this barrier, we could miss a
+ * preemption opportunity between schedule and now
+ */
+ barrier();
+ } while (unlikely(test_thread_flag(TIF_NEED_RESCHED)));
+ }
}

EXPORT_SYMBOL(preempt_schedule);

2004-09-14 18:18:21

by Robert Love

[permalink] [raw]
Subject: Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels

On Tue, 2004-09-14 at 17:03 +0200, Andrea Arcangeli wrote:

> we simply need a cond_resched_bkl() for that, no? Very few places are
> still serialized with the BKL, so I don't think it would be a big issue
> to convert those few places to use cond_resched_bkl.

Yes, this is all we need to do.

cond_resched() goes away under PREEMPT.

cond_resched_bkl() does not.

I did this a looong time ago, but did not get much interest.

Explicitly marking places that use BKL's "I can always call schedule()"
assumption help make it easier to phase out that assumption, too. Or at
least better mark it.

Robert Love


2004-09-14 18:55:11

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels

On Tue, 2004-09-14 at 17:03 +0200, Andrea Arcangeli wrote:
>> we simply need a cond_resched_bkl() for that, no? Very few places are
>> still serialized with the BKL, so I don't think it would be a big issue
>> to convert those few places to use cond_resched_bkl.

On Tue, Sep 14, 2004 at 02:05:03PM -0400, Robert Love wrote:
> Yes, this is all we need to do.
> cond_resched() goes away under PREEMPT.
> cond_resched_bkl() does not.
> I did this a looong time ago, but did not get much interest.
> Explicitly marking places that use BKL's "I can always call schedule()"
> assumption help make it easier to phase out that assumption, too. Or at
> least better mark it.

I'd vaguely prefer to clean up the BKL (ab)users... of course, this
involves working with some of the dirtiest code in the kernel that's
already discouraged most/all of those who work on reducing/eliminating
BKL use from touching it... maybe the latency trend is the final nail
in the coffin of resistance to cleaning that up, though I agree with
Alan that we have to be very careful about it, particularly since all
prior attempts failed to be sufficiently so.


-- wli

2004-09-14 19:08:33

by Robert Love

[permalink] [raw]
Subject: Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels

On Tue, 2004-09-14 at 11:52 -0700, William Lee Irwin III wrote:

> I'd vaguely prefer to clean up the BKL (ab)users... of course, this
> involves working with some of the dirtiest code in the kernel that's
> already discouraged most/all of those who work on reducing/eliminating
> BKL use from touching it... maybe the latency trend is the final nail
> in the coffin of resistance to cleaning that up, though I agree with
> Alan that we have to be very careful about it, particularly since all
> prior attempts failed to be sufficiently so.

I'd love to just throw away all the BKL users, too, William. But
pragmatism and my cautious sense of reality tells me that the BKL is not
going anywhere anytime soon. We might get it down to 1% of its previous
usage, but it is awful intertwined in some places. It will take some
time.

What I think we can do is start looking at removing the special
properties of the BKL, or at least better containing and documenting
them. The BKL has a few oddities over other spin locks:

- you can safely call schedule() while holding it
- you can grab it recursively
- you cannot use it in interrupt handlers

Getting rid of these, or at least better delineating them, will move the
BKL closer to being just a very granular lock.

cond_resched_bkl() is a step toward that.

I want the BKL gone, too; I think reducing its specialness is a good way
to achieve that. If you can also s/BKL/some other lock/ at the same
time, rock that.

Best,

Robert Love


2004-09-14 19:26:50

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels

On Tue, 2004-09-14 at 11:52 -0700, William Lee Irwin III wrote:
>> I'd vaguely prefer to clean up the BKL (ab)users... of course, this
>> involves working with some of the dirtiest code in the kernel that's
>> already discouraged most/all of those who work on reducing/eliminating
>> BKL use from touching it... maybe the latency trend is the final nail
>> in the coffin of resistance to cleaning that up, though I agree with
>> Alan that we have to be very careful about it, particularly since all
>> prior attempts failed to be sufficiently so.

On Tue, Sep 14, 2004 at 03:02:49PM -0400, Robert Love wrote:
> I'd love to just throw away all the BKL users, too, William. But
> pragmatism and my cautious sense of reality tells me that the BKL is not
> going anywhere anytime soon. We might get it down to 1% of its previous
> usage, but it is awful intertwined in some places. It will take some
> time.

Far from "just throw away" -- this is hard work! Very hard work, and a
number of people have already tried and failed.


On Tue, Sep 14, 2004 at 03:02:49PM -0400, Robert Love wrote:
> What I think we can do is start looking at removing the special
> properties of the BKL, or at least better containing and documenting
> them. The BKL has a few oddities over other spin locks:
> - you can safely call schedule() while holding it
> - you can grab it recursively
> - you cannot use it in interrupt handlers

The "safely call schedule() while holding it" needs quite a bit of
qualification; it's implicitly dropped during voluntary context
switches and reacquired when rescheduled, but it's not valid to force
such a task into an involuntary context switch and calling schedule()
implies dropping the lock, so it has to be done at the proper times.
This is a complex semantic that likely trips up numerous callers (as
well as attempts to explain it, though surely you know these things,
and merely wanted a shorter line for the bullet point).


On Tue, Sep 14, 2004 at 03:02:49PM -0400, Robert Love wrote:
> Getting rid of these, or at least better delineating them, will move the
> BKL closer to being just a very granular lock.
> cond_resched_bkl() is a step toward that.
> I want the BKL gone, too; I think reducing its specialness is a good way
> to achieve that. If you can also s/BKL/some other lock/ at the same
> time, rock that.

I'd actually like to go the opposite direction from Ingo: I'd like to
remove uses of the sleeping characteristic first, as in my mind that is
the most pernicious and causes the most subtleties. Afterward, the
recursion. Except it's unclear that I have the time/etc. resources to
address it apart from taking on small pieces of whatever auditing work
or sweeps others care to devolve to me, so I'll largely be using
whatever tactic whoever cares to drive all this (probably Alan) prefers.


-- wli

2004-09-14 19:28:58

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels

Hi Robert,

On Tue, Sep 14, 2004 at 03:02:49PM -0400, Robert Love wrote:
> - you can safely call schedule() while holding it
> - you can grab it recursively
> - you cannot use it in interrupt handlers

the latter won't make it harder to get rid of at least ;)

>
> Getting rid of these, or at least better delineating them, will move the
> BKL closer to being just a very granular lock.
>
> cond_resched_bkl() is a step toward that.

yes, I don't think it will make thing worse in respect of dropping the
bkl, if something it should help.

probably the bkl is still there because removing it won't bring much
further value to the kernel at runtime, it'd probably only make the
kernel a bit cleaner and simpler.

2004-09-14 19:33:58

by Robert Love

[permalink] [raw]
Subject: Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels

On Tue, 2004-09-14 at 21:25 +0200, Andrea Arcangeli wrote:

Hi, Andrea.

> On Tue, Sep 14, 2004 at 03:02:49PM -0400, Robert Love wrote:
> > - you can safely call schedule() while holding it
> > - you can grab it recursively
> > - you cannot use it in interrupt handlers
>
> the latter won't make it harder to get rid of at least ;)

Indeed. I should not of lumped the last property in with the "things to
get rid of", although it is one of the implicit rules of the BKL.

We probably don't want to actually start disabling interrupts for no
reason when we grab the BKL. ;-)

Although the locks that replace the BKL can certainly be BKL-safe locks.

> yes, I don't think it will make thing worse in respect of dropping the
> bkl, if something it should help.
>
> probably the bkl is still there because removing it won't bring much
> further value to the kernel at runtime, it'd probably only make the
> kernel a bit cleaner and simpler.

I agree. Barring a few worst-case examples, I think the only reason
going forward to reduce the BKL's use is cleanliness and simplicity. It
is rather hard at times to find just what the BKL is locking.

Robert Love


2004-09-14 19:32:54

by Robert Love

[permalink] [raw]
Subject: Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels

On Tue, 2004-09-14 at 12:21 -0700, William Lee Irwin III wrote:

> Far from "just throw away" -- this is hard work! Very hard work, and a
> number of people have already tried and failed.

That is my point.

> The "safely call schedule() while holding it" needs quite a bit of
> qualification; it's implicitly dropped during voluntary context
> switches and reacquired when rescheduled, but it's not valid to force
> such a task into an involuntary context switch and calling schedule()
> implies dropping the lock, so it has to be done at the proper times.
> This is a complex semantic that likely trips up numerous callers (as
> well as attempts to explain it, though surely you know these things,
> and merely wanted a shorter line for the bullet point).

Right. I meant safe against deadlocks. It obviously is not "safe" to
reschedule in the middle of your critical region.

> I'd actually like to go the opposite direction from Ingo: I'd like to
> remove uses of the sleeping characteristic first, as in my mind that is
> the most pernicious and causes the most subtleties. Afterward, the
> recursion. Except it's unclear that I have the time/etc. resources to
> address it apart from taking on small pieces of whatever auditing work
> or sweeps others care to devolve to me, so I'll largely be using
> whatever tactic whoever cares to drive all this (probably Alan) prefers.

This, too, is exactly what I am saying.

I want to remove the sleep characteristic.

One way to do that is start special casing it. Document it with the
code, e.g. make an explicit cond_resched_bkl().

Robert Love


2004-09-14 19:41:35

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels

On Tue, Sep 14, 2004 at 03:02:49PM -0400, Robert Love wrote:
>> Getting rid of these, or at least better delineating them, will move the
>> BKL closer to being just a very granular lock.
>> cond_resched_bkl() is a step toward that.

On Tue, Sep 14, 2004 at 09:25:13PM +0200, Andrea Arcangeli wrote:
> yes, I don't think it will make thing worse in respect of dropping the
> bkl, if something it should help.
> probably the bkl is still there because removing it won't bring much
> further value to the kernel at runtime, it'd probably only make the
> kernel a bit cleaner and simpler.

I think the real trouble is that the locking being so hard to analyze,
especially when it's intermixed with normal locking, causes real bugs.


-- wli

2004-09-14 20:33:14

by Alan

[permalink] [raw]
Subject: Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels

On Maw, 2004-09-14 at 20:21, William Lee Irwin III wrote:
> The "safely call schedule() while holding it" needs quite a bit of
> qualification; it's implicitly dropped during voluntary context
> switches and reacquired when rescheduled, but it's not valid to force
> such a task into an involuntary context switch and calling schedule()
> implies dropping the lock, so it has to be done at the proper times.
> This is a complex semantic that likely trips up numerous callers (as
> well as attempts to explain it, though surely you know these things,
> and merely wanted a shorter line for the bullet point).

The problem is people think of the BKL as a lock. It isn't a lock it has
never been a lock and it's all DaveM's fault 8) for naming it that when
he moved my asm entry point stuff into C.

The BKL turns on old style unix non-pre-emptive sematics between all
code that is within lock_kernel sections, that is it. That also makes it
hard to clean up because lock_kernel is delimiting code properties (its
essentially almost a function attribute) and spin_lock/down/up and
friends are real locks and lock data.

I've seen very few cases where there is a magic transform from one to
the other because of this. So if you want to kill the BKL add proper
locking to data structures covered by BKL users until the lock_kernel
simply does nothing.

> or sweeps others care to devolve to me, so I'll largely be using
> whatever tactic whoever cares to drive all this (probably Alan) prefers.

Fix the data structure locking starting at the lowest level is how I've
always tackled these messes. When the low level locking is right the
rest just works (usually 8)).

"Lock data not code"

Alan

2004-09-14 18:38:06

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: [patch] sched: fix scheduling latencies in mtrr.c

On Tue, 14 Sep 2004, Ingo Molnar wrote:

>
> > fix scheduling latencies in the MTRR-setting codepath.
> > Also, fix bad bug: MTRR's _must_ be set with interrupts disabled!
>
> patch attached ...
>
> caveat: one of the wbinvd() removals is correct i believe, but the other
> one might be problematic. It doesnt seem logical to do a wbinvd() at
> that point though ...

Just a data point, Volume 3 "MTRR considerations in MP systems" states
that the second wbinvd() isn't required for P6 and Pentium 4 but might be
necessary for future systems.

Zwane

2004-09-14 21:18:19

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels

On Tue, Sep 14, 2004 at 03:02:49PM -0400, Robert Love wrote:
>> I'd love to just throw away all the BKL users, too, William. But
>> pragmatism and my cautious sense of reality tells me that the BKL is not
>> going anywhere anytime soon. We might get it down to 1% of its previous
>> usage, but it is awful intertwined in some places. It will take some
>> time.

On Tue, Sep 14, 2004 at 12:21:04PM -0700, William Lee Irwin III wrote:
> Far from "just throw away" -- this is hard work! Very hard work, and a
> number of people have already tried and failed.

Well, since sleep_on() and relatives require the BKL for safety and
otherwise are unsafe, here's a patch to mark them deprecated, suggested
by Arjan van de Ven and others. vs. 2.6.9-rc1-mm5. Compiletested on x86-64.


-- wli


Index: mm5-2.6.9-rc1/include/linux/wait.h
===================================================================
--- mm5-2.6.9-rc1.orig/include/linux/wait.h 2004-09-13 16:27:50.000000000 -0700
+++ mm5-2.6.9-rc1/include/linux/wait.h 2004-09-14 13:36:00.766337272 -0700
@@ -23,6 +23,7 @@
#include <linux/list.h>
#include <linux/stddef.h>
#include <linux/spinlock.h>
+#include <linux/compiler.h>
#include <asm/system.h>
#include <asm/current.h>

@@ -281,12 +282,33 @@
* They are racy. DO NOT use them, use the wait_event* interfaces above.
* We plan to remove these interfaces during 2.7.
*/
-extern void FASTCALL(sleep_on(wait_queue_head_t *q));
-extern long FASTCALL(sleep_on_timeout(wait_queue_head_t *q,
- signed long timeout));
-extern void FASTCALL(interruptible_sleep_on(wait_queue_head_t *q));
-extern long FASTCALL(interruptible_sleep_on_timeout(wait_queue_head_t *q,
- signed long timeout));
+
+void FASTCALL(__sleep_on(wait_queue_head_t *));
+long FASTCALL(__sleep_on_timeout(wait_queue_head_t *, signed long));
+void FASTCALL(__interruptible_sleep_on(wait_queue_head_t *));
+long FASTCALL(__interruptible_sleep_on_timeout(wait_queue_head_t *, signed long));
+
+static inline void __deprecated sleep_on(wait_queue_head_t *q)
+{
+ __sleep_on(q);
+}
+
+static inline long __deprecated
+sleep_on_timeout(wait_queue_head_t *q, signed long timeout)
+{
+ __sleep_on_timeout(q, timeout);
+}
+
+static inline __deprecated void interruptible_sleep_on(wait_queue_head_t *q)
+{
+ __interruptible_sleep_on(q);
+}
+
+static inline long __deprecated
+interruptible_sleep_on_timeout(wait_queue_head_t *q, signed long timeout)
+{
+ __interruptible_sleep_on_timeout(q, timeout);
+}

/*
* Waitqueues which are removed from the waitqueue_head at wakeup time
Index: mm5-2.6.9-rc1/kernel/sched.c
===================================================================
--- mm5-2.6.9-rc1.orig/kernel/sched.c 2004-09-14 09:01:46.000000000 -0700
+++ mm5-2.6.9-rc1/kernel/sched.c 2004-09-14 13:36:22.743996160 -0700
@@ -2633,7 +2633,7 @@
__remove_wait_queue(q, &wait); \
spin_unlock_irqrestore(&q->lock, flags);

-void fastcall __sched interruptible_sleep_on(wait_queue_head_t *q)
+void fastcall __sched __interruptible_sleep_on(wait_queue_head_t *q)
{
SLEEP_ON_VAR

@@ -2644,9 +2644,9 @@
SLEEP_ON_TAIL
}

-EXPORT_SYMBOL(interruptible_sleep_on);
+EXPORT_SYMBOL(__interruptible_sleep_on);

-long fastcall __sched interruptible_sleep_on_timeout(wait_queue_head_t *q, long timeout)
+long fastcall __sched __interruptible_sleep_on_timeout(wait_queue_head_t *q, long timeout)
{
SLEEP_ON_VAR

@@ -2659,9 +2659,9 @@
return timeout;
}

-EXPORT_SYMBOL(interruptible_sleep_on_timeout);
+EXPORT_SYMBOL(__interruptible_sleep_on_timeout);

-void fastcall __sched sleep_on(wait_queue_head_t *q)
+void fastcall __sched __sleep_on(wait_queue_head_t *q)
{
SLEEP_ON_VAR

@@ -2672,9 +2672,9 @@
SLEEP_ON_TAIL
}

-EXPORT_SYMBOL(sleep_on);
+EXPORT_SYMBOL(__sleep_on);

-long fastcall __sched sleep_on_timeout(wait_queue_head_t *q, long timeout)
+long fastcall __sched __sleep_on_timeout(wait_queue_head_t *q, long timeout)
{
SLEEP_ON_VAR

@@ -2687,7 +2687,7 @@
return timeout;
}

-EXPORT_SYMBOL(sleep_on_timeout);
+EXPORT_SYMBOL(__sleep_on_timeout);

void set_user_nice(task_t *p, long nice)
{

2004-09-14 21:58:42

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels

On Tue, Sep 14, 2004 at 09:31:06AM -0700, William Lee Irwin III wrote:
> The might_sleep() in cond_resched() sounds particularly useful to pick
> up misapplications of cond_resched().

agreed ;)

> I suppose this isn't even the half of it, but it's what I looked at.

looks much nicer indeed (the previous code was basic-like ;).

2004-09-14 22:58:11

by Nick Piggin

[permalink] [raw]
Subject: Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels

Ingo Molnar wrote:
> * Nick Piggin <[email protected]> wrote:
>
>
>>Another thing, I don't mean this to sound like a rhetorical question,
>>but if we have a preemptible kernel, why is it a good idea to sprinkle
>
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
>>cond_rescheds everywhere? Isn't this now the worst of both worlds? Why
>>would someone who really cares about latency not enable preempt?
>
>
> two things:
>
> 1) none of the big distros enables CONFIG_PREEMPT in their kernels - not
> even SuSE. This is pretty telling.
>
> 2) 10 new cond_resched()'s are not precisely 'sprinkle everywhere'.
>

No, but I mean putting them right down into fastpaths like the vmscan
one, for example.

And if I remember correctly, you resorted to putting them into
might_sleep as well (but I haven't read the code for a while, maybe
you're now getting decent results without doing that).

2004-09-15 00:24:06

by Lee Revell

[permalink] [raw]
Subject: Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels

On Tue, 2004-09-14 at 15:19, Alan Cox wrote:
> Fix the data structure locking starting at the lowest level is how I've
> always tackled these messes. When the low level locking is right the
> rest just works (usually 8)).
>
> "Lock data not code"
>

Although, there is at least one case (reiser3) where we know which data
structures the BKL is supposed to be protecting, because the code does
something like reiserfs_write_lock(foo_data_structure) which gets
define'd away to lock_kernel(). And apparently some of the best and
brightest on LKML have tried and failed to fix it, and even Hans says
"it's HARD, the fix is reiser4".

So, maybe some of the current uses should be tagged as WONTFIX.

Lee

2004-09-15 00:35:30

by Lee Revell

[permalink] [raw]
Subject: Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels

On Tue, 2004-09-14 at 10:54, Ingo Molnar wrote:
> * Nick Piggin <[email protected]> wrote:
>
> > Another thing, I don't mean this to sound like a rhetorical question,
> > but if we have a preemptible kernel, why is it a good idea to sprinkle
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > cond_rescheds everywhere? Isn't this now the worst of both worlds? Why
> > would someone who really cares about latency not enable preempt?
>
> two things:
>
> 1) none of the big distros enables CONFIG_PREEMPT in their kernels - not
> even SuSE. This is pretty telling.
>

I am not sure this means preemption is a bad idea, it just means there's
no point in enabling CONFIG_PREEMPT with the current kernel because it's
not enough of an improvement to make a difference for low latency
applications.

Lee

2004-09-15 01:02:11

by Lee Revell

[permalink] [raw]
Subject: Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels

On Tue, 2004-09-14 at 11:03, Andrea Arcangeli wrote:
> On Wed, Sep 15, 2004 at 12:28:49AM +1000, Nick Piggin wrote:
> > Well I don't know how good an argument the crashes one is these days,
> > but generally (as far as I know) those who really care about latency
> > shouldn't mind about some extra overheads.
>
> sure, that's especially true for the hardirq and softirq total scheduler
> offloading. The real question is where a generic desktop positions. I
> doubt on a generic desktop a latency over 1msec matters much,
> top performance of repetitive tasks that sums up like hardirqs for a NIC
> sounds more sensible to me.
>
> And for the other usages RTAI or any other hard realtime sounds safer
> anyways.
>

For a generic desktop I don't think any of this makes much of a
difference; AFAIK none of the VP testers have reported a perceptible
difference in system responsiveness. A good point of comparison here is
what Microsoft OS'es can do. My Windows XP setup works pretty well with
a latency of 2.66ms or 128 frames at 48KHZ, and is rock solid at 256
frames or 5.33ms.

However for low latency audio Mac OS X is our real competition. OS X
can deliver audio latencies of probably 0.5ms. There is not much point
in going much lower than this because the difference becomes
imperceptible and the more frequent cache thrashing becomes an issue;
this is close enough to the limits of what sound hardware is capable of
anyway.

With Ingo's patches the worst case latency on the same machine as my XP
example is about 150 usecs. So, it seems to me that Ingo's patches can
achieve results as good or better than OSX even without the one or two
"dangerous" changes, like the removal of lock_kernel around
do_tty_write.

Lee

2004-09-15 01:18:26

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels

On Tue, Sep 14, 2004 at 08:19:57PM +0100, Alan Cox wrote:
> The problem is people think of the BKL as a lock. It isn't a lock it has
> never been a lock and it's all DaveM's fault 8) for naming it that when
> he moved my asm entry point stuff into C.
> The BKL turns on old style unix non-pre-emptive sematics between all
> code that is within lock_kernel sections, that is it. That also makes it
> hard to clean up because lock_kernel is delimiting code properties (its
> essentially almost a function attribute) and spin_lock/down/up and
> friends are real locks and lock data.
> I've seen very few cases where there is a magic transform from one to
> the other because of this. So if you want to kill the BKL add proper
> locking to data structures covered by BKL users until the lock_kernel
> simply does nothing.

The perfect critic has no method, save one.


On Maw, 2004-09-14 at 20:21, William Lee Irwin III wrote:
>> or sweeps others care to devolve to me, so I'll largely be using
>> whatever tactic whoever cares to drive all this (probably Alan) prefers.

On Tue, Sep 14, 2004 at 08:19:57PM +0100, Alan Cox wrote:
> Fix the data structure locking starting at the lowest level is how I've
> always tackled these messes. When the low level locking is right the
> rest just works (usually 8)).

This is the opposite of what some parties endorse, which is pushing the
BKL downward safely as audits determine, adding locking to enable it to
be done safely, and so on. So ad hoc it is.


-- wli

2004-09-15 01:39:34

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels

On Tue, Sep 14, 2004 at 09:02:07PM -0400, Lee Revell wrote:
> For a generic desktop I don't think any of this makes much of a
> difference; AFAIK none of the VP testers have reported a perceptible
> difference in system responsiveness. A good point of comparison here is
> what Microsoft OS'es can do. My Windows XP setup works pretty well with
> a latency of 2.66ms or 128 frames at 48KHZ, and is rock solid at 256
> frames or 5.33ms.
> However for low latency audio Mac OS X is our real competition. OS X
> can deliver audio latencies of probably 0.5ms. There is not much point
> in going much lower than this because the difference becomes
> imperceptible and the more frequent cache thrashing becomes an issue;
> this is close enough to the limits of what sound hardware is capable of
> anyway.
> With Ingo's patches the worst case latency on the same machine as my XP
> example is about 150 usecs. So, it seems to me that Ingo's patches can
> achieve results as good or better than OSX even without the one or two
> "dangerous" changes, like the removal of lock_kernel around
> do_tty_write.

The code we're most worried is buggy, not just nonperformant.


-- wli

2004-09-15 01:47:23

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels

On Tue, 2004-09-14 at 15:19, Alan Cox wrote:
>> Fix the data structure locking starting at the lowest level is how I've
>> always tackled these messes. When the low level locking is right the
>> rest just works (usually 8)).
>> "Lock data not code"

On Tue, Sep 14, 2004 at 08:22:29PM -0400, Lee Revell wrote:
> Although, there is at least one case (reiser3) where we know which data
> structures the BKL is supposed to be protecting, because the code does
> something like reiserfs_write_lock(foo_data_structure) which gets
> define'd away to lock_kernel(). And apparently some of the best and
> brightest on LKML have tried and failed to fix it, and even Hans says
> "it's HARD, the fix is reiser4".
> So, maybe some of the current uses should be tagged as WONTFIX.

I've not heard a peep about anyone trying to fix this. It should be
killed off along with the rest, of course, but like I said before, it's
the messiest, dirtiest, and ugliest code that's left to go through,
which is why it's been left for last. e.g. driver ->ioctl() methods.


-- wli

2004-09-15 02:00:42

by Lee Revell

[permalink] [raw]
Subject: Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels

On Tue, 2004-09-14 at 21:46, William Lee Irwin III wrote:
> On Tue, Sep 14, 2004 at 08:22:29PM -0400, Lee Revell wrote:
> > Although, there is at least one case (reiser3) where we know which data
> > structures the BKL is supposed to be protecting, because the code does
> > something like reiserfs_write_lock(foo_data_structure) which gets
> > define'd away to lock_kernel(). And apparently some of the best and
> > brightest on LKML have tried and failed to fix it, and even Hans says
> > "it's HARD, the fix is reiser4".
> > So, maybe some of the current uses should be tagged as WONTFIX.
>
> I've not heard a peep about anyone trying to fix this. It should be
> killed off along with the rest, of course, but like I said before, it's
> the messiest, dirtiest, and ugliest code that's left to go through,
> which is why it's been left for last. e.g. driver ->ioctl() methods.
>

Andrew tried to fix this a few times in 2.4 and it broke the FS in
subtle ways. Don't have an archive link but the message is
<[email protected]>. I asked Hans directly about it
and he said "balancing makes it hard, the fix is reiser4", see
<[email protected]>.

Lee

2004-09-15 02:12:12

by Lee Revell

[permalink] [raw]
Subject: Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels

On Tue, 2004-09-14 at 21:39, William Lee Irwin III wrote:
> > With Ingo's patches the worst case latency on the same machine as my XP
> > example is about 150 usecs. So, it seems to me that Ingo's patches can
> > achieve results as good or better than OSX even without the one or two
> > "dangerous" changes, like the removal of lock_kernel around
> > do_tty_write.
>
> The code we're most worried is buggy, not just nonperformant.
>

Understood. The only dangerous change I know of in the VP patches is
the one Alan took issue with, the aforementioned removal of lock_kernel
in the tty code. IIRC this only produced a latency of about 300 usecs,
and only when switching VT's from a console to X and back. My point was
that it's quite possible that we can get OSX-like results without the
more dangerous changes.

Ingo, if you want to send me a patch set without the more controversial
changes, I can compare the performance. A diff against the latest VP
patch would be OK.

Lee



2004-09-15 02:36:36

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels

On Tue, 2004-09-14 at 21:46, William Lee Irwin III wrote:
>> I've not heard a peep about anyone trying to fix this. It should be
>> killed off along with the rest, of course, but like I said before, it's
>> the messiest, dirtiest, and ugliest code that's left to go through,
>> which is why it's been left for last. e.g. driver ->ioctl() methods.

On Tue, Sep 14, 2004 at 10:00:44PM -0400, Lee Revell wrote:
> Andrew tried to fix this a few times in 2.4 and it broke the FS in
> subtle ways. Don't have an archive link but the message is
> <[email protected]>. I asked Hans directly about it
> and he said "balancing makes it hard, the fix is reiser4", see
> <[email protected]>.

I have neither of these locally. I suspect someone needs to care enough
about the code for anything to happen soon. I suppose there are things
that probably weren't tried, e.g. auditing to make sure dependencies on
external synchronization are taken care of, removing implicit sleeping
with the BKL held, then punt a private recursive spinlock in reiser3's
direction. Not sure what went on, or if I want to get involved in this
particular case.


-- wli

2004-09-15 02:59:44

by Lee Revell

[permalink] [raw]
Subject: Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels

On Tue, 2004-09-14 at 22:36, William Lee Irwin III wrote:
> I have neither of these locally. I suspect someone needs to care enough
> about the code for anything to happen soon. I suppose there are things
> that probably weren't tried, e.g. auditing to make sure dependencies on
> external synchronization are taken care of, removing implicit sleeping
> with the BKL held, then punt a private recursive spinlock in reiser3's
> direction. Not sure what went on, or if I want to get involved in this
> particular case.
>

There isn't really any information in the archives about what was
tried. Here's Andrew's message:

http://lkml.org/lkml/2004/7/12/266

And Hans':

http://lkml.org/lkml/2004/8/10/320

I suspect that "Use reiser4 (or ext3) if you care about latency" is a
good enough answer for most people.

Lee

2004-09-15 06:18:01

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels


* Nick Piggin <[email protected]> wrote:

> No, but I mean putting them right down into fastpaths like the vmscan
> one, for example.

it is a very simple no-parameters call to a function that reads a
likely-cached word and returns. The cost is in the 2-3 cycles range - a
_single_ cachemiss can be 10-100 times more expensive, and cachemisses
happen very frequently in every iteration of the VM _scanning_ path
since it (naturally and inevitably) deals with lots of sparsely
scattered data structures that havent been referenced for quite some
time.

The function (cond_resched()) triggers scheduling only very rarely, you
should not be worried about that aspect either.

> And if I remember correctly, you resorted to putting them into
> might_sleep as well (but I haven't read the code for a while, maybe
> you're now getting decent results without doing that).

i'm not arguing that now at all, that preemption model clearly has to be
an optional thing - at least initially.

Ingo

2004-09-15 08:23:55

by Nick Piggin

[permalink] [raw]
Subject: Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels

Ingo Molnar wrote:
> * Nick Piggin <[email protected]> wrote:
>
>
>>No, but I mean putting them right down into fastpaths like the vmscan
>>one, for example.
>
>
> it is a very simple no-parameters call to a function that reads a
> likely-cached word and returns. The cost is in the 2-3 cycles range - a
> _single_ cachemiss can be 10-100 times more expensive, and cachemisses
> happen very frequently in every iteration of the VM _scanning_ path
> since it (naturally and inevitably) deals with lots of sparsely
> scattered data structures that havent been referenced for quite some
> time.
>

OK, this one thing isn't going to be noticable. But why you really
must have the check for every page, and not in the logical place
where we batch them up? You're obviously aiming for the lowest
latencies possible *without* CONFIG_PREEMPT.

But I'm thinking, why add overhead for people who don't care about
sub-ms latency (ie. most of us)? And at the same time, why would
anyone in a latency critical environment not enable preempt?


> The function (cond_resched()) triggers scheduling only very rarely, you
> should not be worried about that aspect either.
>

No, I'm not worried about that.

>
>>And if I remember correctly, you resorted to putting them into
>>might_sleep as well (but I haven't read the code for a while, maybe
>>you're now getting decent results without doing that).
>
>
> i'm not arguing that now at all, that preemption model clearly has to be
> an optional thing - at least initially.
>

OK.

Alternatively, I'd say tell everyone who wants really low latency to
enable CONFIG_PREEMPT, which automatically gives the minimum possible
preempt latency, delimited (and defined) by critical sections, instead
of the more ad-hoc "sprinkling" ;)

2004-09-15 08:42:56

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels


* Nick Piggin <[email protected]> wrote:

> OK.
>
> Alternatively, I'd say tell everyone who wants really low latency to
> enable CONFIG_PREEMPT, which automatically gives the minimum possible
> preempt latency, delimited (and defined) by critical sections, instead
> of the more ad-hoc "sprinkling" ;)

it's not ad-hoc. These are the 10 remaining points for which there is no
natural might_sleep() point nearby (according to measurements). That's
why i called them 'complementary'. They cause zero problems for the
normal kernel (we already have another 70 cond_resched() points), but
they _are_ the ones needed in addition if might_sleep() also does
cond_resched().

the 'reliability' of latency break-up depends on the basic preemption
model. Believe me, even with CONFIG_PREEMPT there were a boatload of
critical sections that had insanely long latencies that nobody fixed
until the VP patchset came along. Without CONFIG_PREEMPT the number of
possibly latency-paths increases, but the situation is the same as with
CONFIG_PREEMPT: you need tools, people that test stuff and lots of
manual work to break them up reliably. You will never be 'done' but you
can do a reasonably good job for workloads that people care about.

the 'final' preemption model [for hard-RT purposes] that i believe will
make it into the Linux kernel one nice day is total preemptability of
everything but the core preemption code (i.e. the scheduler and
interrupt controllers). _That_ might be something that has provable
latencies. Note that such a 'total preemption' model has prerequisites
too, like the deterministic execution of hardirqs/softirqs.

note that the current lock-break-up activities still make alot of sense
even under the total-preemption model: it decreases the latency of
kernel-using hard-RT applications. (raw total preemption only guarantees
quick scheduling of the hard-RT task - it doesnt guarantee that the task
can complete any useful kernel/syscall work.)

since we already see at least 4 different viable preemption models
placed on different points in the 'latency reliability' spectrum, it
makes little sense to settle for any of them. So i'm aiming to keep the
core code flexible to have them all without much fuss, and usage will
decide which ones are needed. Maybe CONFIG_PREEMPT will merge into
CONFIG_TOTAL_PREEMPT. Maybe CONFIG_NO_PREEMPT will merge into
CONFIG_PREEMPT_VOLUNTARY. Maybe CONFIG_PREEMPT_VOLUNTARY will go away
altogether. We cannot know at this point, it all depends on how usage
(and consequently, hardware) evolves.

Ingo

2004-09-15 09:55:06

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels


* William Lee Irwin III <[email protected]> wrote:

> > With Ingo's patches the worst case latency on the same machine as my XP
> > example is about 150 usecs. So, it seems to me that Ingo's patches can
> > achieve results as good or better than OSX even without the one or two
> > "dangerous" changes, like the removal of lock_kernel around
> > do_tty_write.
>
> The code we're most worried is buggy, not just nonperformant.

what code do you mean? The one i know about and which Lee is referring
to above is the 6-lines tty unlocking change - the one which Alan
objected to rightfully. I've zapped it from my tree.

(nobody objected to the original thread on lkml weeks ago where the tty
unlocking change was proposed, implemented, alpha-tested and beta-tested
in -mm for several releases - that's why it showed up in my 20+
latency-reduction patches.)

No latency changes to the tty layer until someone fixes its locking. End
of story.

Ingo

2004-09-15 10:02:53

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels

* William Lee Irwin III <[email protected]> wrote:
>> The code we're most worried is buggy, not just nonperformant.

On Wed, Sep 15, 2004 at 11:56:14AM +0200, Ingo Molnar wrote:
> what code do you mean? The one i know about and which Lee is referring
> to above is the 6-lines tty unlocking change - the one which Alan
> objected to rightfully. I've zapped it from my tree.
> (nobody objected to the original thread on lkml weeks ago where the tty
> unlocking change was proposed, implemented, alpha-tested and beta-tested
> in -mm for several releases - that's why it showed up in my 20+
> latency-reduction patches.)
> No latency changes to the tty layer until someone fixes its locking. End
> of story.

I had the buggy code being associated with BKL use in mind as a motive
for the BKL sweeps etc., and wasn't referring to any pending changes.


-- wli

2004-09-15 10:09:17

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels

On Wed, Sep 15, 2004 at 10:43:55AM +0200, Ingo Molnar wrote:
> the 'final' preemption model [for hard-RT purposes] that i believe will
> make it into the Linux kernel one nice day is total preemptability of
> everything but the core preemption code (i.e. the scheduler and
> interrupt controllers). _That_ might be something that has provable
> latencies. Note that such a 'total preemption' model has prerequisites
> too, like the deterministic execution of hardirqs/softirqs.

I don't see deterministic execution time of hardirqs/softirqs happening
on stock hardware and without serious driver work, and I don't see much
hard RT ever happening on SMP due to lock contention. But maybe that
just means it's difficult.


On Wed, Sep 15, 2004 at 10:43:55AM +0200, Ingo Molnar wrote:
> note that the current lock-break-up activities still make alot of sense
> even under the total-preemption model: it decreases the latency of
> kernel-using hard-RT applications. (raw total preemption only guarantees
> quick scheduling of the hard-RT task - it doesnt guarantee that the task
> can complete any useful kernel/syscall work.)

One reason I'm not complaining is because voluntary switching is
required to reschedule in otherwise non-preemptible critical sections.


On Wed, Sep 15, 2004 at 10:43:55AM +0200, Ingo Molnar wrote:
> since we already see at least 4 different viable preemption models
> placed on different points in the 'latency reliability' spectrum, it
> makes little sense to settle for any of them. So i'm aiming to keep the
> core code flexible to have them all without much fuss, and usage will
> decide which ones are needed. Maybe CONFIG_PREEMPT will merge into
> CONFIG_TOTAL_PREEMPT. Maybe CONFIG_NO_PREEMPT will merge into
> CONFIG_PREEMPT_VOLUNTARY. Maybe CONFIG_PREEMPT_VOLUNTARY will go away
> altogether. We cannot know at this point, it all depends on how usage
> (and consequently, hardware) evolves.

Well, one thing that completely voluntary context switch -based
critical section management tells us that reliance on implicit kernel
preemption doesn't is which codepaths matter: whatever codepaths would
need to be preempted implicitly then explicitly show up as latency
blips on the latency instrumentation radar screen, and by so doing at
least get annotated with "this codepath matters for latency" and
whoever may inadvertently try to extend a nonpreemptible critical
section over the scheduling point will have a big "stop" sign in front
of them. So, even if CONFIG_PREEMPT is the way and the annotations are
nops there, the annotation of latency-critical scheduling points, even
where they would be preemptible with CONFIG_PREEMPT=y, has value.

And, of course, the preemption reenablement required for voluntary
yielding from non-preemptible critical sections also has a positive
operational effect with CONFIG_PREEMPT=y, so there is nothing in VP
that doesn't also benefit CONFIG_PREEMPT.


-- wli

2004-09-15 10:15:07

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels


* William Lee Irwin III <[email protected]> wrote:

> I had the buggy code being associated with BKL use in mind as a motive
> for the BKL sweeps etc., and wasn't referring to any pending changes.

ok, fair enough.

Ingo

2004-09-15 10:19:59

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels


* William Lee Irwin III <[email protected]> wrote:

> I don't see deterministic execution time of hardirqs/softirqs
> happening on stock hardware and without serious driver work, and I
> don't see much hard RT ever happening on SMP due to lock contention.
> But maybe that just means it's difficult.

actually, check out what i did to SMP latencies via the
preempt-smp.patch. This patch brings SMP worst-case latencies to UP
levels. E.g. on a dual system the worst-case should be roughly twice the
UP worst-case latency: if both CPUs are trying to execute the same
critical section. But there is no nondeterministic spinning anymore. You
obviously cannot get better than that (other than increasing parallelism
and reducing the number of critical sections).

(not all cases are covered - if some code is not using spinlocks or
rwlocks but some static flag and is emulating spinlocks then it could
spin uncontrollably. Those places show up in the tracer quite easily - i
had to fix only one so far.)

Ingo

2004-09-15 11:52:12

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels


* Lee Revell <[email protected]> wrote:

> Ingo, if you want to send me a patch set without the more
> controversial changes, I can compare the performance. A diff against
> the latest VP patch would be OK.

just undo the tty.c changes. (i.e. manually remove the tty.c chunks from
the -S0 patch and apply the result.)

Ingo

2004-09-15 13:39:17

by Hans Reiser

[permalink] [raw]
Subject: Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels

William Lee Irwin III wrote:

>On Tue, 2004-09-14 at 21:46, William Lee Irwin III wrote:
>
>
>>>I've not heard a peep about anyone trying to fix this. It should be
>>>killed off along with the rest, of course, but like I said before, it's
>>>the messiest, dirtiest, and ugliest code that's left to go through,
>>>which is why it's been left for last. e.g. driver ->ioctl() methods.
>>>
>>>
>
>On Tue, Sep 14, 2004 at 10:00:44PM -0400, Lee Revell wrote:
>
>
>>Andrew tried to fix this a few times in 2.4 and it broke the FS in
>>subtle ways. Don't have an archive link but the message is
>><[email protected]>. I asked Hans directly about it
>>and he said "balancing makes it hard, the fix is reiser4", see
>><[email protected]>.
>>
>>
>
>I have neither of these locally. I suspect someone needs to care enough
>about the code for anything to happen soon. I suppose there are things
>that probably weren't tried, e.g. auditing to make sure dependencies on
>external synchronization are taken care of, removing implicit sleeping
>with the BKL held, then punt a private recursive spinlock in reiser3's
>direction. Not sure what went on, or if I want to get involved in this
>particular case.
>
>
>-- wli
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/
>
>
>
>
Why bother? It is V3, it should be left undisturbed except for
bugfixes. Please, spend your efforts on reducing V4 latency and
measuring whether it fails to scale to multiple processors. That would
be very useful to me if someone helped with that. V4 has the
architecture for doing such things well, but there are always accidental
bottlenecks that testing can discover, and I am sure we will have a
handful of things preventing us from scaling well that are not hard to
fix. It would be nice to fix those......

The hard stuff for scalability, the locking of the tree, we did that.
We just haven't tested and evaluated and refined like we need to in V4.



Hans

2004-09-15 22:25:58

by Rusty Russell

[permalink] [raw]
Subject: Re: [patch] fix keventd execution dependency

On Tue, 2004-09-14 at 21:25, Ingo Molnar wrote:
> We dont want to execute off keventd since it might hold a semaphore our
> callers hold too. This can happen when kthread_create() is called from
> within keventd. This happened due to the IRQ threading patches but it
> could happen with other code too.

Ackl, thanks Ingo, looks fine.

Rusty.
--
Anyone who quotes me in their signature is an idiot -- Rusty Russell

2004-09-15 20:43:10

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels

On Wed, Sep 15, 2004 at 06:36:24AM -0700, Hans Reiser wrote:
> Why bother? It is V3, it should be left undisturbed except for
> bugfixes. Please, spend your efforts on reducing V4 latency and
> measuring whether it fails to scale to multiple processors. That would
> be very useful to me if someone helped with that. V4 has the
> architecture for doing such things well, but there are always accidental
> bottlenecks that testing can discover, and I am sure we will have a
> handful of things preventing us from scaling well that are not hard to
> fix. It would be nice to fix those......
> The hard stuff for scalability, the locking of the tree, we did that.
> We just haven't tested and evaluated and refined like we need to in V4.

It's not for scalability; it's for cleaning up the users, which are
universally buggy. My suggestion above would not, in fact, make reiser3
any more scalable; it would merely isolate the locking semantics it
couldn't live without into its own internals.


-- wli

2004-09-16 01:09:26

by Nick Piggin

[permalink] [raw]
Subject: Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels

Ingo Molnar wrote:
> * Nick Piggin <[email protected]> wrote:
>
>
>>OK.
>>
>>Alternatively, I'd say tell everyone who wants really low latency to
>>enable CONFIG_PREEMPT, which automatically gives the minimum possible
>>preempt latency, delimited (and defined) by critical sections, instead
>>of the more ad-hoc "sprinkling" ;)
>
>
> it's not ad-hoc. These are the 10 remaining points for which there is no
> natural might_sleep() point nearby (according to measurements). That's
> why i called them 'complementary'. They cause zero problems for the
> normal kernel (we already have another 70 cond_resched() points), but
> they _are_ the ones needed in addition if might_sleep() also does
> cond_resched().
>

No, I mean putting cond_resched in might_sleep is ad hoc. But that
doesn't mean it doesn't work - it obviously does ;)

> the 'reliability' of latency break-up depends on the basic preemption
> model. Believe me, even with CONFIG_PREEMPT there were a boatload of
> critical sections that had insanely long latencies that nobody fixed
> until the VP patchset came along. Without CONFIG_PREEMPT the number of
> possibly latency-paths increases, but the situation is the same as with
> CONFIG_PREEMPT: you need tools, people that test stuff and lots of
> manual work to break them up reliably. You will never be 'done' but you
> can do a reasonably good job for workloads that people care about.
>

All the other stuff in your patches are obviously very important with
or without "full preempt", and make up the bulk of the *hard* work.
You have no arguments from me about that.

> the 'final' preemption model [for hard-RT purposes] that i believe will
> make it into the Linux kernel one nice day is total preemptability of
> everything but the core preemption code (i.e. the scheduler and
> interrupt controllers). _That_ might be something that has provable
> latencies. Note that such a 'total preemption' model has prerequisites
> too, like the deterministic execution of hardirqs/softirqs.
>
> note that the current lock-break-up activities still make alot of sense
> even under the total-preemption model: it decreases the latency of
> kernel-using hard-RT applications. (raw total preemption only guarantees
> quick scheduling of the hard-RT task - it doesnt guarantee that the task
> can complete any useful kernel/syscall work.)
>
> since we already see at least 4 different viable preemption models
> placed on different points in the 'latency reliability' spectrum, it
> makes little sense to settle for any of them. So i'm aiming to keep the
> core code flexible to have them all without much fuss, and usage will
> decide which ones are needed. Maybe CONFIG_PREEMPT will merge into
> CONFIG_TOTAL_PREEMPT. Maybe CONFIG_NO_PREEMPT will merge into
> CONFIG_PREEMPT_VOLUNTARY. Maybe CONFIG_PREEMPT_VOLUNTARY will go away
> altogether. We cannot know at this point, it all depends on how usage
> (and consequently, hardware) evolves.
>

OK I'll leave it at that. We'll see what happens.

2004-09-16 06:13:15

by Ingo Molnar

[permalink] [raw]
Subject: Re: [patch] sched: fix scheduling latencies for !PREEMPT kernels


* Nick Piggin <[email protected]> wrote:

> No, I mean putting cond_resched in might_sleep is ad hoc. But that
> doesn't mean it doesn't work - it obviously does ;)

ah, ok, i understand your point now. No, i'm not submitting the
CONFIG_PREEMPT_VOLUNTARY .config switch (and the kernel.h 2-liner) at
this point. All the latency breakers so far will mainly benefit
CONFIG_PREEMPT - which is also the primary preempt model used by
bleeding-edge testers.

Ingo