2003-03-19 09:10:19

by Andrew Morton

[permalink] [raw]
Subject: 2.5.65-mm2


http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.65/2.5.65-mm2/

will appear sometime at:

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.5/2.5.65/2.5.65-mm2/


. Added all the 32-bit dev_t patches.

. An update to the brlock-removal patches which might fix the reported
netfilter problems (would like confirmation of this please).



Changes since 2.5.65-mm1:


+linus.patch

Latest from Linus

-noirqbalance-fix.patch
-ppc64-64-bit-exec-fix.patch
-remove-unused-congestion-stuff.patch
-smalldevfs.patch
-timer-cleanup.patch
-timer-readdition-fix.patch
-set_current_state-fs.patch
-set_current_state-mm.patch
-copy_thread-leak-fix.patch
-file_list_lock-contention-fix.patch
-tty_files-fixes.patch
-file_list_cleanup.patch
-file_list-remove-free_list.patch
-file-list-less-locking.patch
-vt_ioctl-stack-use.patch
-no-mmu-stubs.patch
-nommu-slab.patch
-nfs-memleak-fix.patch
-ufs-memleak-fix.patch
-posix-timers-update.patch
-oops-counters.patch
-io_apic-DO_ACTION-cleanup.patch
-oprofile-timer-fix.patch
-pgd_index-comments.patch
-proc-sysrq-trigger.patch
-CONFIG_NUMA-fixes.patch
-nfsd-symlink-failpath.patch
-get_disk-error-checking.patch
-nanosleep-accuracy-fix.patch

Merged

+as-predict-data-direction.patch

Anticipatory scheduler work: starting to track per-process behaviour a
little more.

-brlock-removal-1.patch
+brlock-1b.patch

Updated (might fix a netfilter problem)

+nanosleep-accuracy-fix-2.patch

Another go at fixing the nanosleep() inaccuracy.

+linear-oops-fix-1.patch

Fix oops in the MD linear driver

+dev_t-1-kill-cdev.patch
+dev_t-2-remove-MAX_CHRDEV.patch
+dev_t-3-major_h-cleanup.patch
+dev_t-32-bit.patch
+dev_t-drm-warnings.patch
+dev_t-remove-B_FREE.patch

32-bit dev_t work

+cpufreq-xtime-locking.patch

locking fix

+cs46xx-fixes.patch

Minor fixes

+notsclock-option.patch

Add "notsclock" boot option for misbehaving SpeedStep machines.

+tty-put_user-checks.patch

Fix some missing uaccess checks.

+fail-setup_irq-for-unconfigured-IRQs.patch

Stuff from Zwane.

+raw-fix-address_space-rewriting.patch
+raw-cleanups-and-fixlets.patch

Fixes for the raw driver

+oops-dump-preceding-code.patch

Make the ia32 oops code dump instructions which preceded the failing EIP.
Keith has ksymoops support for this and it works nicely. I'm not sure what
his plans are for adding it to a released version.



All 105 patches:

linus.patch
Latest from Linus

mm.patch
add -mmN to EXTRAVERSION

kgdb.patch

kgdb-cleanup.patch
make kgdb less invasive (when disabled)

proc-sys-debug.patch
create /proc/sys/debug/0 ... 7

config_spinline.patch
uninline spinlocks for profiling accuracy.

ppc64-reloc_hide.patch

ppc64-pci-patch.patch
Subject: pci patch

ppc64-aio-32bit-emulation.patch
32/64bit emulation for aio

ppc64-scruffiness.patch
Fix some PPC64 compile warnings

sym-do-160.patch
make the SYM driver do 160 MB/sec

config-PAGE_OFFSET.patch
Configurable kenrel/user memory split

ptrace-flush.patch
cache flushing in the ptrace code

buffer-debug.patch
buffer.c debugging

warn-null-wakeup.patch

ext3-truncate-ordered-pages.patch
ext3: explicitly free truncated pages

reiserfs_file_write-5.patch

tcp-wakeups.patch
Use fast wakeups in TCP/IPV4

rcu-stats.patch
RCU statistics reporting

ext3-journalled-data-assertion-fix.patch
Remove incorrect assertion from ext3

nfs-speedup.patch

nfs-oom-fix.patch
nfs oom fix

sk-allocation.patch
Subject: Re: nfs oom

nfs-more-oom-fix.patch

rpciod-atomic-allocations.patch
Make rcpiod use atomic allocations

linux-isp.patch

isp-update-1.patch

kblockd.patch
Create `kblockd' workqueue

as-iosched.patch
anticipatory I/O scheduler

as-debug-BUG-fix.patch

as-eject-BUG-fix.patch
AS: don't go BUG during cdrom eject

as-jumbo-fix.patch
AS: OSDL fixes

as-request_fn-in-timer.patch
Remove the scheduled_work thing

as-remove-request-fix.patch

as-np-1.patch
as: cleanups & comments

as-use-kblockd.patch

as-cleanup-2.patch
AS: cleanup + comments

as-as_remove_request-simplification.patch
as: as_remove_request simplification

as-dont-go-BUG-again.patch

as-handle-non-block-requests.patch
AS: handle non-block requests

as-np-reads-1.patch
AS: read-vs-read fixes

as-np-reads-2.patch
AS: more read-vs-read fixes

as-predict-data-direction.patch
as: predict direction of next IO

cfq-2.patch
CFQ scheduler, #2

unplug-use-kblockd.patch
Use kblockd for running request queues

remap-file-pages-2.5.63-a1.patch
Subject: [patch] remap-file-pages-2.5.63-A1

hugh-remap-fix.patch
hugh's file-offset-in-pte fix

fremap-limit-offsets.patch
fremap: limit remap_file_pages() file offsets

fremap-all-mappings.patch
Make all executable mappings be nonlinear

filemap_populate-speedup.patch
filemap_populate speedup

file-offset-in-pte-x86_64.patch
x86_64: support for file offsets in pte's

file-offset-in-pte-ppc64.patch

objrmap-2.5.62-5.patch
object-based rmap

objrmap-nonlinear-fixes.patch
objrmap fix for nonlinear

sched-2.5.64-D3.patch
sched-2.5.64-D3, more interactivity changes

scheduler-tunables.patch
scheduler tunables

show_task-free-stack-fix.patch
show_task() fix and cleanup

yellowfin-set_bit-fix.patch
yellowfin driver set_bit fix

htree-nfs-fix.patch
Fix ext3 htree / NFS compatibility problems

update_atime-ng.patch
inode a/c/mtime modification speedup

one-sec-times.patch
Implement a/c/time speedup in ext2 & ext3

task_prio-fix.patch
simple task_prio() fix

slab_store_user-large-objects.patch
slab debug: perform redzoning against larger objects

pcmcia-2.patch

pcmcia-3b.patch

pcmcia-3.patch

pcmcia-4.patch

pcmcia-5.patch

pcmcia-6.patch

pcmcia-7b.patch

pcmcia-7.patch

pcmcia-8.patch

pcmcia-9.patch

pcmcia-10.patch

htree-nfs-fix-2.patch
htree nfs fix

ext2-no-lock_super.patch
concurrent block allocation for ext2

ext2-ialloc-no-lock_super.patch
concurrent inode allocation for ext2

brlock-1b.patch
Re: 2.5.64-mm8 breaks MASQ

brlock-removal-2.patch
brlock removal 2/5: remove brlock from snap and vlan

brlock-removal-3.patch
brlock removal 3/5: remove brlock from bridge

brlock-removal-4.patch
brlock removal 4/5: removal from ipv4/ipv6

brlock-removal-5.patch
brlock removal 5/5: remove brlock code

lseek-ext2_readdir.patch
remove lock_kernel() from readdir implementations.

inode_setattr-lock_kernel-removal.patch
remove lock_kernel() from inode_setattr's vmtruncate() call

ide_probe-init_irq-fix.patch
ide-probe init_irq cleanup

raid1-fix.patch
MD RAID1 fix

nmi-watchdog-fix.patch
NMI watchdog fix

vm_enough_memory-speedup.patch
speed up vm_enough_memory()

nanosleep-accuracy-fix-2.patch
fix nanosleep() granularity bumps

linear-oops-fix-1.patch
md/linear oops fix

dev_t-1-kill-cdev.patch
dev_t [1/3]: kill cdev

dev_t-2-remove-MAX_CHRDEV.patch
dev_t [2/3] - remove MAX_CHRDEV

dev_t-3-major_h-cleanup.patch
dev_t [3/3]: major.h cleanups

dev_t-32-bit.patch
[for playing only] change type of dev_t

dev_t-drm-warnings.patch
dev_t: fix drm printk warnings

dev_t-remove-B_FREE.patch
dev_t: eliminate B_FREE

smalldevfs.patch
smalldevfs

cpufreq-xtime-locking.patch
add write_seqlock to cpufreq change notifier for TSC

cs46xx-fixes.patch
cs46xx minor fixes

notsclock-option.patch
boot time parameter to turn of TSC usage

tty-put_user-checks.patch
Add missing put_user checks in n_tty

fail-setup_irq-for-unconfigured-IRQs.patch
Fail setup_irq for unconfigured IRQs

raw-fix-address_space-rewriting.patch
raw driver: rewrite i_mapping only on final close

raw-cleanups-and-fixlets.patch
raw driver: cleanups and small fixes

oops-dump-preceding-code.patch
i386 oops output: dump preceding code




2003-03-19 10:05:53

by Alexander Hoogerhuis

[permalink] [raw]
Subject: Re: 2.5.65-mm2

Andrew Morton <[email protected]> writes:
>
> [SNIP]
>

Yay! Still working Radeon :)

And 4x AGP:

agpgart: Putting AGP V2 device at 00:00.0 into 4x mode
agpgart: Putting AGP V2 device at 01:00.0 into 4x mode

Come to think of it, I'll give it a spin, this might be due to a
working DSDT table that was compiled in with ACPI, whereas I had the
1x problems before I did this.

mvh,
A
--
Alexander Hoogerhuis | [email protected]
CCNP - CCDP - MCNE - CCSE | +47 908 21 485
"You have zero privacy anyway. Get over it." --Scott McNealy

2003-03-19 10:15:22

by dth

[permalink] [raw]
Subject: Re: 2.5.65-mm2

Andrew Morton <[email protected]> wrote:
>http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.65/2.5.65-mm2/
>. An update to the brlock-removal patches which might fix the reported
> netfilter problems (would like confirmation of this please).

Yup, works again.
Where 2.5.64-mm7 & 2.5.65-mm1 didn't forward packets on my firewall.

Zanks !

Danny

--
Miguel | "I can't tell if I have worked all my life or if
de Icaza | I have never worked a single day of my life,"

2003-03-19 19:45:05

by Steven Cole

[permalink] [raw]
Subject: Re: 2.5.65-mm2

On Wed, 2003-03-19 at 02:21, Andrew Morton wrote:
>
> http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.65/2.5.65-mm2/
>

I am seeing a significant degradation of interactivity under load with
recent -mm kernels. The load is dbench on a reiserfs file system with
increasing numbers of clients. The test machine is single PIII, IDE,
256MB memory, all kernels PREEMPT.

Specifying elevator=deadline improved the response of 2.5.65-mm2
somewhat, but it still eventually became intolerably slow with
sufficient load.

Interactivity tests consisted of switching between desktops with two
instances of Mozilla 1.3 on separate desktops, and Evolution 1.2.2 on
another desktop. Additional tests included shaking the window and
wiggling the scrollbar.

The third and fourth columns list the number of dbench clients at which
interactivity becomes poor, or intolerable, defined here as getting a
response after:

good less than 1 second
poor seconds
intolerable tens of seconds

kernel interactivity under load (dbench clients)
good poor intolerable

2.5.65-bk 56*
2.5.65-mm1 <8 16 24
2.5.65-mm2 <8 16 24
2.5.65-mm2 deadline <8 20 28

*2.5.65-bk was still performing very well at dbench 56. I'll continue
to test up to 128 clients.

2.5.65-bk was updated with a bk pull this morning.

Steven

2003-03-19 20:00:09

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.5.65-mm2

Steven Cole <[email protected]> wrote:
>
> On Wed, 2003-03-19 at 02:21, Andrew Morton wrote:
> >
> > http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.65/2.5.65-mm2/
> >
>
> I am seeing a significant degradation of interactivity under load with
> recent -mm kernels. The load is dbench on a reiserfs file system with
> increasing numbers of clients. The test machine is single PIII, IDE,
> 256MB memory, all kernels PREEMPT.

(This email brought to you while running dbench 128 on ext3)

There's a pretty big reiserfs patch in -mm. Are you able to whip up
an ext2 partition and see if that displays the same problem?

2003-03-19 20:49:56

by Steven Cole

[permalink] [raw]
Subject: Re: 2.5.65-mm2

On Wed, 2003-03-19 at 13:10, Andrew Morton wrote:
> Steven Cole <[email protected]> wrote:
> >
> > On Wed, 2003-03-19 at 02:21, Andrew Morton wrote:
> > >
> > > http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.65/2.5.65-mm2/
> > >
> >
> > I am seeing a significant degradation of interactivity under load with
> > recent -mm kernels. The load is dbench on a reiserfs file system with
> > increasing numbers of clients. The test machine is single PIII, IDE,
> > 256MB memory, all kernels PREEMPT.
>
> (This email brought to you while running dbench 128 on ext3)
>
> There's a pretty big reiserfs patch in -mm. Are you able to whip up
> an ext2 partition and see if that displays the same problem?
>

I repeated the test on an ext3 partition, and the response with 28
dbench clients running is definitely better, although I'm starting to
get some stalls of a couple seconds while typing this in Evolution on
the machine under test. Now it's becoming intolerable, so I aborted the
dbench run so I could finish this email.

This was with 2.5.65-mm2 and elevator=as. I'll repeat soon with
elevator=deadline. I didn't try typing in Evolution with 2.5.65-bk
under high loads, so I'll also give that a try.

Summary: using ext3, the simple window shake and scrollbar wiggle tests
were much improved, but really using Evolution left much to be desired.

Steven

2003-03-19 21:55:08

by Steven Cole

[permalink] [raw]
Subject: Re: 2.5.65-mm2

On Wed, 2003-03-19 at 13:57, Steven P. Cole wrote:
> On Wed, 2003-03-19 at 13:10, Andrew Morton wrote:
> > Steven Cole <[email protected]> wrote:
> > >
> > > On Wed, 2003-03-19 at 02:21, Andrew Morton wrote:
> > > >
> > > > http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.65/2.5.65-mm2/
> > > >
> > >
> > > I am seeing a significant degradation of interactivity under load with
> > > recent -mm kernels. The load is dbench on a reiserfs file system with
> > > increasing numbers of clients. The test machine is single PIII, IDE,
> > > 256MB memory, all kernels PREEMPT.
> >
> > (This email brought to you while running dbench 128 on ext3)
> >
> > There's a pretty big reiserfs patch in -mm. Are you able to whip up
> > an ext2 partition and see if that displays the same problem?
> >
>
> I repeated the test on an ext3 partition, and the response with 28
> dbench clients running is definitely better, although I'm starting to
> get some stalls of a couple seconds while typing this in Evolution on
> the machine under test. Now it's becoming intolerable, so I aborted the
> dbench run so I could finish this email.
>
> This was with 2.5.65-mm2 and elevator=as. I'll repeat soon with
> elevator=deadline. I didn't try typing in Evolution with 2.5.65-bk
> under high loads, so I'll also give that a try.
>
> Summary: using ext3, the simple window shake and scrollbar wiggle tests
> were much improved, but really using Evolution left much to be desired.

Replying to myself for a followup,

I repeated the tests with 2.5.65-mm2 elevator=deadline and the situation
was similar to elevator=as. Running dbench on ext3, the response to
desktop switches and window wiggles was improved over running dbench on
reiserfs, but typing in Evolution was subject to long delays with dbench
clients greater than 16.

I rebooted with 2.5.65-bk and ran dbench on ext3 again. Everything was
going smoothly, excellent interactivity, and then with dbench 28, the
system froze. No response to pings, no response to alt-sysrq-b (after
alt-sysrq-s). A hard reset was required. Nothing interesting logged.
Too bad. Before it crashed, 2.5.65-bk was responding to typing in an
Evolution new message window better than -mm2.

I'll see if this is repeatable.

Steven

2003-03-19 22:06:17

by jjs

[permalink] [raw]
Subject: Re: 2.5.65-mm2

Steven P. Cole wrote:

>I repeated the tests with 2.5.65-mm2 elevator=deadline and the situation
>was similar to elevator=as. Running dbench on ext3, the response to
>desktop switches and window wiggles was improved over running dbench on
>reiserfs, but typing in Evolution was subject to long delays with dbench
>clients greater than 16.
>
>I rebooted with 2.5.65-bk and ran dbench on ext3 again. Everything was
>going smoothly, excellent interactivity, and then with dbench 28, the
>system froze. No response to pings, no response to alt-sysrq-b (after
>alt-sysrq-s). A hard reset was required. Nothing interesting logged.
>Too bad. Before it crashed, 2.5.65-bk was responding to typing in an
>Evolution new message window better than -mm2.
>

Just out of curiosity, what is the result of:

cat /proc/sys/sched/max_timeslice?

Does setting that to e.g. 50 make -mm2 smooth?

Joe

2003-03-19 22:17:21

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.5.65-mm2

"Steven P. Cole" <[email protected]> wrote:
>
> >
> > Summary: using ext3, the simple window shake and scrollbar wiggle tests
> > were much improved, but really using Evolution left much to be desired.
>
> Replying to myself for a followup,
>
> I repeated the tests with 2.5.65-mm2 elevator=deadline and the situation
> was similar to elevator=as. Running dbench on ext3, the response to
> desktop switches and window wiggles was improved over running dbench on
> reiserfs, but typing in Evolution was subject to long delays with dbench
> clients greater than 16.

OK, final question before I get off my butt and find a way to reproduce this:

Does reverting

http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.65/2.5.65-mm2/broken-out/sched-2.5.64-D3.patch

help?

2003-03-19 22:44:16

by Steven Cole

[permalink] [raw]
Subject: Re: 2.5.65-mm2

On Wed, 2003-03-19 at 15:17, jjs wrote:
> Steven P. Cole wrote:
>
> >I repeated the tests with 2.5.65-mm2 elevator=deadline and the situation
> >was similar to elevator=as. Running dbench on ext3, the response to
> >desktop switches and window wiggles was improved over running dbench on
> >reiserfs, but typing in Evolution was subject to long delays with dbench
> >clients greater than 16.
> >
> >I rebooted with 2.5.65-bk and ran dbench on ext3 again. Everything was
> >going smoothly, excellent interactivity, and then with dbench 28, the
> >system froze. No response to pings, no response to alt-sysrq-b (after
> >alt-sysrq-s). A hard reset was required. Nothing interesting logged.
> >Too bad. Before it crashed, 2.5.65-bk was responding to typing in an
> >Evolution new message window better than -mm2.
> >
>
> Just out of curiosity, what is the result of:
>
> cat /proc/sys/sched/max_timeslice?
>
> Does setting that to e.g. 50 make -mm2 smooth?
>
> Joe

[root@spc1 steven]# cat /proc/sys/sched/max_timeslice
200
[root@spc1 steven]# echo 50 >/proc/sys/sched/max_timeslice
[root@spc1 steven]# cat /proc/sys/sched/max_timeslice
50

Ouch. I inserted the above text saved as a file, and had to wait
over a minute after hitting the OK button. I aborted dbench which was
running 24 clients on ext3 just to finish this.

The change in max_timeslice didn't seem to improve things.

Apart from the little matter of crashing, 2.5-bk was more usable at that
and higher loads.

I'll try the different value of max_timeslice with dbench on reiserfs
next. That's where the lack of response was most evident.

Steven

2003-03-19 22:53:34

by Robert Love

[permalink] [raw]
Subject: Re: 2.5.65-mm2

On Wed, 2003-03-19 at 17:51, Steven P. Cole wrote:

> I'll try the different value of max_timeslice with dbench on
> reiserfs next. That's where the lack of response was most evident.

I am curious as to whether reverting sched-D4 fixes this.

If not, the first step is seeing whether this is a bad decision made by
the interactivity estimator. Something like:

ps -eo pid,nice,priority,command

for dbench, evolution, and X might be useful.

Thanks,

Robert Love

2003-03-19 22:59:05

by jjs

[permalink] [raw]
Subject: Re: 2.5.65-mm2

Steven P. Cole wrote

>[root@spc1 steven]# cat /proc/sys/sched/max_timeslice
>200
>[root@spc1 steven]# echo 50 >/proc/sys/sched/max_timeslice
>[root@spc1 steven]# cat /proc/sys/sched/max_timeslice
>50
>
>Ouch. I inserted the above text saved as a file, and had to wait
>over a minute after hitting the OK button. I aborted dbench which was
>running 24 clients on ext3 just to finish this.
>
hmm, I'd always made this sort of change
under somewhat quiescent conditions -

Interesting results though - it helped on my
system in terms of desktop smoothness, i.e.
visible stuttering of xterm motion when being
dragged around the desktop during dbench -

Joe

2003-03-19 23:06:25

by Charlie Baylis

[permalink] [raw]
Subject: Re: 2.5.65-mm2


I'm getting quite a lot of audio skips with this one. 2.5.64-mm8 was the
last one I tested and it was very good.

2.5.64-mm8 works fine with pretty much any thud load I throw at it, but thud
3 is enough to cause some skips with 2.5.65-mm2. Thud 5 causes serious
starvation problems to the whole desktop.

Charlie



2003-03-19 23:11:34

by Steven Cole

[permalink] [raw]
Subject: Re: 2.5.65-mm2

On Wed, 2003-03-19 at 16:04, Robert Love wrote:
> On Wed, 2003-03-19 at 17:51, Steven P. Cole wrote:
>
> > I'll try the different value of max_timeslice with dbench on
> > reiserfs next. That's where the lack of response was most evident.
>
> I am curious as to whether reverting sched-D4 fixes this.
>
> If not, the first step is seeing whether this is a bad decision made by
> the interactivity estimator. Something like:
>
> ps -eo pid,nice,priority,command
>
> for dbench, evolution, and X might be useful.
>
> Thanks,
>
> Robert Love
>
In the meantime, Andrew has asked me to revert the sched-D3 patch. I'm
recompiling now, and will only have time for that test before I have to
go do other things.

Steven

2003-03-19 23:22:00

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.5.65-mm2

Charles Baylis <[email protected]> wrote:
>
>
> I'm getting quite a lot of audio skips with this one. 2.5.64-mm8 was the
> last one I tested and it was very good.
>
> 2.5.64-mm8 works fine with pretty much any thud load I throw at it, but thud
> 3 is enough to cause some skips with 2.5.65-mm2. Thud 5 causes serious
> starvation problems to the whole desktop.

Please test 2.5.65 base.

2003-03-19 23:37:49

by Steven Cole

[permalink] [raw]
Subject: Re: 2.5.65-mm2

On Wed, 2003-03-19 at 17:33, Andrew Morton wrote:
> "Steven P. Cole" <[email protected]> wrote:
> >
> > >
> > > Summary: using ext3, the simple window shake and scrollbar wiggle tests
> > > were much improved, but really using Evolution left much to be desired.
> >
> > Replying to myself for a followup,
> >
> > I repeated the tests with 2.5.65-mm2 elevator=deadline and the situation
> > was similar to elevator=as. Running dbench on ext3, the response to
> > desktop switches and window wiggles was improved over running dbench on
> > reiserfs, but typing in Evolution was subject to long delays with dbench
> > clients greater than 16.
>
> OK, final question before I get off my butt and find a way to reproduce this:
>
> Does reverting
>
> http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.65/2.5.65-mm2/broken-out/sched-2.5.64-D3.patch
>
> help?

Sorry, didn't have much time for a lot of testing, but no miracles
occurred. With 5 minutes of testing 2.5.65-mm2 and dbench 24 on ext3
and that patch reverted (first hunk had to be manually fixed), I don't
see any improvement. Still the same long long delays in trying to use
Evolution.

Steven

2003-03-20 04:16:29

by Ed Tomlinson

[permalink] [raw]
Subject: Re: 2.5.65-mm2

On March 19, 2003 06:45 pm, Steven P. Cole wrote:
> On Wed, 2003-03-19 at 17:33, Andrew Morton wrote:
> > "Steven P. Cole" <[email protected]> wrote:
> > > > Summary: using ext3, the simple window shake and scrollbar wiggle
> > > > tests were much improved, but really using Evolution left much to be
> > > > desired.
> > >
> > > Replying to myself for a followup,
> > >
> > > I repeated the tests with 2.5.65-mm2 elevator=deadline and the
> > > situation was similar to elevator=as. Running dbench on ext3, the
> > > response to desktop switches and window wiggles was improved over
> > > running dbench on reiserfs, but typing in Evolution was subject to long
> > > delays with dbench clients greater than 16.
> >
> > OK, final question before I get off my butt and find a way to reproduce
> > this:
> >
> > Does reverting
> >
> > http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.65/2.5.65-mm2/broken-ou
> >t/sched-2.5.64-D3.patch
> >
> > help?
>
> Sorry, didn't have much time for a lot of testing, but no miracles
> occurred. With 5 minutes of testing 2.5.65-mm2 and dbench 24 on ext3
> and that patch reverted (first hunk had to be manually fixed), I don't
> see any improvement. Still the same long long delays in trying to use
> Evolution.

Steven,

Do things improve with the patch below applied? You have to backout the
schedule-tuneables patch before appling it.

Ed Tomlinson


Attachments:
(No filename) (1.36 kB)
ptg-D3-mm2 (9.01 kB)
Download all attachments

2003-03-20 04:35:12

by Mike Galbraith

[permalink] [raw]
Subject: Re: 2.5.65-mm2

At 06:04 PM 3/19/2003 -0500, Robert Love wrote:
>On Wed, 2003-03-19 at 17:51, Steven P. Cole wrote:
>
> > I'll try the different value of max_timeslice with dbench on
> > reiserfs next. That's where the lack of response was most evident.
>
>I am curious as to whether reverting sched-D4 fixes this.
>
>If not, the first step is seeing whether this is a bad decision made by
>the interactivity estimator. Something like:
>
> ps -eo pid,nice,priority,command
>
>for dbench, evolution, and X might be useful.


I think I know what he'll see... elevated priority tasks doing round
robin. Watch with top d1 showing only runnable tasks and you can see the
starvation.

The problem as I see it is that when you have a number of tasks which
become elevated to interactive status, they'll round robin and starve
non-interactive tasks basically forever. This is also why my make -j30
bzImage introduces concurrency problems. Despite gcc being a cpu hog, when
enough of them are running, those which have to wait for more time than
they consume via cpu usage eventually achieve elevated status and round
robin until they exit... throttling concurrency. Limiting the amount of
boost that a task can gain via one activation helps this problem
considerably, but does not eliminate it.

(think what happens to EXPIRED_STARVING when you have 30 hogs running, a
few of them doing round robin, and the rest of them just _waiting_ for that
queue switch to happen. :-/ ATM, I'm also gathering sleep time at
schedule time [friendly tasks gain], so sleep_avg will never be consumed if
you have more than one hog running. I made the starvation problem better
for some loads, but utterly deadly for others.)

Something I'm going to try today (yesterday was educational if not
wonderfully fruitful) is to limit the amount of time a piggy task can
remain active in the hope of reducing the time interactive hogs can starve
their expired brethren. I'm currently thinking forced expire after some
number of switches * cpu_usage is reached might cure the starvation without
destroying sleep_avg.

Suggestions very welcome. (fun problem:)

-Mike

2003-03-20 04:41:01

by Mike Galbraith

[permalink] [raw]
Subject: Re: 2.5.65-mm2

At 11:17 PM 3/19/2003 +0000, Charles Baylis wrote:

>I'm getting quite a lot of audio skips with this one. 2.5.64-mm8 was the
>last one I tested and it was very good.
>
>2.5.64-mm8 works fine with pretty much any thud load I throw at it, but thud
>3 is enough to cause some skips with 2.5.65-mm2. Thud 5 causes serious
>starvation problems to the whole desktop.

My crude hack helped with thud and some others, but is b0rken for others.

-Mike

2003-03-20 04:57:07

by Steven Cole

[permalink] [raw]
Subject: Re: 2.5.65-mm2

On Wed, 2003-03-19 at 21:27, Ed Tomlinson wrote:
> On March 19, 2003 06:45 pm, Steven P. Cole wrote:
> > On Wed, 2003-03-19 at 17:33, Andrew Morton wrote:
> > > "Steven P. Cole" <[email protected]> wrote:
> > > > > Summary: using ext3, the simple window shake and scrollbar wiggle
> > > > > tests were much improved, but really using Evolution left much to be
> > > > > desired.
> > > >
> > > > Replying to myself for a followup,
> > > >
> > > > I repeated the tests with 2.5.65-mm2 elevator=deadline and the
> > > > situation was similar to elevator=as. Running dbench on ext3, the
> > > > response to desktop switches and window wiggles was improved over
> > > > running dbench on reiserfs, but typing in Evolution was subject to long
> > > > delays with dbench clients greater than 16.
> > >
> > > OK, final question before I get off my butt and find a way to reproduce
> > > this:
> > >
> > > Does reverting
> > >
> > > http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.65/2.5.65-mm2/broken-ou
> > >t/sched-2.5.64-D3.patch
> > >
> > > help?
> >
> > Sorry, didn't have much time for a lot of testing, but no miracles
> > occurred. With 5 minutes of testing 2.5.65-mm2 and dbench 24 on ext3
> > and that patch reverted (first hunk had to be manually fixed), I don't
> > see any improvement. Still the same long long delays in trying to use
> > Evolution.
>
> Steven,
>
> Do things improve with the patch below applied? You have to backout the
> schedule-tuneables patch before appling it.

I take it this is the one to back out?
scheduler-tunables.patch 17-Mar-2003 22:01 11k

>
> Ed Tomlinson

I'll give it a shot when I get the chance. Unfortunately, I'm bogged
down with meetings tomorrow morning, so it will be at least 14-15 hours
from now. Perhaps some other adventurous person can pick up the ball in
the meantime.

My test system is 933Mhz PIII, IDE, 256MB. The apps are Mozilla 1.3 and
Evolution 1.2.2 running under KDE 3.1.

Thanks,
Steven

2003-03-20 14:29:49

by Steven Cole

[permalink] [raw]
Subject: Re: 2.5.65-mm2

On Wed, 2003-03-19 at 21:27, Ed Tomlinson wrote:
> On March 19, 2003 06:45 pm, Steven P. Cole wrote:
> > On Wed, 2003-03-19 at 17:33, Andrew Morton wrote:
> > > "Steven P. Cole" <[email protected]> wrote:
> > > > > Summary: using ext3, the simple window shake and scrollbar wiggle
> > > > > tests were much improved, but really using Evolution left much to be
> > > > > desired.
> > > >
> > > > Replying to myself for a followup,
> > > >
> > > > I repeated the tests with 2.5.65-mm2 elevator=deadline and the
> > > > situation was similar to elevator=as. Running dbench on ext3, the
> > > > response to desktop switches and window wiggles was improved over
> > > > running dbench on reiserfs, but typing in Evolution was subject to long
> > > > delays with dbench clients greater than 16.
> > >
> > > OK, final question before I get off my butt and find a way to reproduce
> > > this:
> > >
> > > Does reverting
> > >
> > > http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.65/2.5.65-mm2/broken-ou
> > >t/sched-2.5.64-D3.patch
> > >
> > > help?
> >
> > Sorry, didn't have much time for a lot of testing, but no miracles
> > occurred. With 5 minutes of testing 2.5.65-mm2 and dbench 24 on ext3
> > and that patch reverted (first hunk had to be manually fixed), I don't
> > see any improvement. Still the same long long delays in trying to use
> > Evolution.
>
> Steven,
>
> Do things improve with the patch below applied? You have to backout the
> schedule-tuneables patch before appling it.
>
> Ed Tomlinson

[patch snipped]

I tried that patch, and the bad behavior with the Evolution "Compose a
Message" window remains. With a load of dbench 12, I had stalls of many
seconds before I could type something. Also, here is an additional
symptom. If I move the Evolution "Compose" window around rapidly, it
leaves a smear of itself on the screen under itself. With all -mm2
variants, this smear stays for an intolerably long time (tens of
seconds) while that window does not record keyboard strokes. 2.5.65-bk
on the other hand exhibits much more benign behavior. Under similar
load, the smear disappears in a few seconds and the window starts
responding to keyboard events. I just now rebooted 2.5-bk to verify,
and it is still responsive at dbench client loads which would make
Evolution unusable with 2.5.65-mm2. Mozilla, on the other hand, still
works OK under load with -mm2. This was all with dbench running on
ext3.

I won't be able to do any more testing for several hours, so have fun!

Steven

2003-03-20 19:33:31

by Mike Galbraith

[permalink] [raw]
Subject: Re: 2.5.65-mm2

At 07:36 AM 3/20/2003 -0700, Steven Cole wrote:
>On Wed, 2003-03-19 at 21:27, Ed Tomlinson wrote:
> > On March 19, 2003 06:45 pm, Steven P. Cole wrote:
> > > On Wed, 2003-03-19 at 17:33, Andrew Morton wrote:
> > > > "Steven P. Cole" <[email protected]> wrote:
> > > > > > Summary: using ext3, the simple window shake and scrollbar wiggle
> > > > > > tests were much improved, but really using Evolution left much
> to be
> > > > > > desired.
> > > > >
> > > > > Replying to myself for a followup,
> > > > >
> > > > > I repeated the tests with 2.5.65-mm2 elevator=deadline and the
> > > > > situation was similar to elevator=as. Running dbench on ext3, the
> > > > > response to desktop switches and window wiggles was improved over
> > > > > running dbench on reiserfs, but typing in Evolution was subject
> to long
> > > > > delays with dbench clients greater than 16.
> > > >
> > > > OK, final question before I get off my butt and find a way to reproduce
> > > > this:
> > > >
> > > > Does reverting
> > > >
> > > >
> http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.65/2.5.65-mm2/broken-ou
> > > >t/sched-2.5.64-D3.patch
> > > >
> > > > help?
> > >
> > > Sorry, didn't have much time for a lot of testing, but no miracles
> > > occurred. With 5 minutes of testing 2.5.65-mm2 and dbench 24 on ext3
> > > and that patch reverted (first hunk had to be manually fixed), I don't
> > > see any improvement. Still the same long long delays in trying to use
> > > Evolution.
> >
> > Steven,
> >
> > Do things improve with the patch below applied? You have to backout the
> > schedule-tuneables patch before appling it.
> >
> > Ed Tomlinson
>
>[patch snipped]
>
>I tried that patch, and the bad behavior with the Evolution "Compose a
>Message" window remains. With a load of dbench 12, I had stalls of many
>seconds before I could type something. Also, here is an additional
>symptom. If I move the Evolution "Compose" window around rapidly, it
>leaves a smear of itself on the screen under itself. With all -mm2
>variants, this smear stays for an intolerably long time (tens of
>seconds) while that window does not record keyboard strokes. 2.5.65-bk
>on the other hand exhibits much more benign behavior.

This is a side effect of Ingo's (nice!) latency change methinks. When you
have several cpu hogs running (dbench), and they are cleaning your cpu's
clock by using their full bandwidth to attain maximum throughput, and they
then break up their timeslice in order to provide you with more
responsiveness, and then their _cumulative_ sleep time between (round
robin!) cpu hard burns is added to their sleep_avg, (boy is this a long
sentence) you will (likely) find that they run at a highly elevated
priority and starve the devil out of everything else because they can not
possibly get enough cpu to eat the sleep_avg they have been given (only way
to reduce their priority without forking). Virgin .65 is also subject to
the positive feedback loop (irman's process load is worst case methinks,
and rounding down only ~hides it).

I have a really horrid looking sched.c right now that works around some of
this problem in disgusting ways. If you want to try it, give me a holler
before tomorrow morning (slice 'n dice resumes) and I'll rip it out and
send it to you. Fair warning though, if you have good taste, don't look at
it at all before applying. There are a few of spots I wouldn't even want
to _try_ to justify. I don't think dbench will be able to dork it up, but
irman's process load (horrible thing) now can again. (it's pure
research... that says a lot;)

Bottom line is that once cpu hogs are falsely determined to be sleepers,
positive feedback kills you.

-Mike

2003-03-20 19:34:06

by John M Flinchbaugh

[permalink] [raw]
Subject: Re: 2.5.65 performance

On Wed, Mar 19, 2003 at 05:38:08PM -0800, Andrew Morton wrote:
> Charles Baylis <[email protected]> wrote:
> > I'm getting quite a lot of audio skips with this one. 2.5.64-mm8
was the
> > last one I tested and it was very good.
> > 2.5.64-mm8 works fine with pretty much any thud load I throw at it,
but thud
> > 3 is enough to cause some skips with 2.5.65-mm2. Thud 5 causes
serious
> > starvation problems to the whole desktop.
> Please test 2.5.65 base.

doing normal desktop things (gnome, jboss, mozilla, apt-get updates,
etc) i've noticed audio skips on occassion that i had not seen in
kernels previous to 2.5.65.

i'm not going to complain though, because mozilla seems to start
quicker, and my jboss start time has dropped from 1m:10s average to
40s. very cool!
--
____________________}John Flinchbaugh{______________________
| [email protected] http://www.hjsoft.com/~glynis/ |
~~Powered by Linux: Reboots are for hardware upgrades only~~


Attachments:
(No filename) (973.00 B)
(No filename) (189.00 B)
Download all attachments

2003-03-20 20:05:03

by Steven Cole

[permalink] [raw]
Subject: Re: 2.5.65-mm2

On Thu, 2003-03-20 at 12:48, Mike Galbraith wrote:
> At 07:36 AM 3/20/2003 -0700, Steven Cole wrote:
> >On Wed, 2003-03-19 at 21:27, Ed Tomlinson wrote:
> > > On March 19, 2003 06:45 pm, Steven P. Cole wrote:
> > > > On Wed, 2003-03-19 at 17:33, Andrew Morton wrote:
> > > > > "Steven P. Cole" <[email protected]> wrote:
> > > > > > > Summary: using ext3, the simple window shake and scrollbar wiggle
> > > > > > > tests were much improved, but really using Evolution left much
> > to be
> > > > > > > desired.
> > > > > >
> > > > > > Replying to myself for a followup,
> > > > > >
> > > > > > I repeated the tests with 2.5.65-mm2 elevator=deadline and the
> > > > > > situation was similar to elevator=as. Running dbench on ext3, the
> > > > > > response to desktop switches and window wiggles was improved over
> > > > > > running dbench on reiserfs, but typing in Evolution was subject
> > to long
> > > > > > delays with dbench clients greater than 16.
> > > > >
> > > > > OK, final question before I get off my butt and find a way to reproduce
> > > > > this:
> > > > >
> > > > > Does reverting
> > > > >
> > > > >
> > http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.65/2.5.65-mm2/broken-ou
> > > > >t/sched-2.5.64-D3.patch
> > > > >
> > > > > help?
> > > >
> > > > Sorry, didn't have much time for a lot of testing, but no miracles
> > > > occurred. With 5 minutes of testing 2.5.65-mm2 and dbench 24 on ext3
> > > > and that patch reverted (first hunk had to be manually fixed), I don't
> > > > see any improvement. Still the same long long delays in trying to use
> > > > Evolution.
> > >
> > > Steven,
> > >
> > > Do things improve with the patch below applied? You have to backout the
> > > schedule-tuneables patch before appling it.
> > >
> > > Ed Tomlinson
> >
> >[patch snipped]
> >
> >I tried that patch, and the bad behavior with the Evolution "Compose a
> >Message" window remains. With a load of dbench 12, I had stalls of many
> >seconds before I could type something. Also, here is an additional
> >symptom. If I move the Evolution "Compose" window around rapidly, it
> >leaves a smear of itself on the screen under itself. With all -mm2
> >variants, this smear stays for an intolerably long time (tens of
> >seconds) while that window does not record keyboard strokes. 2.5.65-bk
> >on the other hand exhibits much more benign behavior.
>
> This is a side effect of Ingo's (nice!) latency change methinks. When you
> have several cpu hogs running (dbench), and they are cleaning your cpu's
> clock by using their full bandwidth to attain maximum throughput, and they
> then break up their timeslice in order to provide you with more
> responsiveness, and then their _cumulative_ sleep time between (round
> robin!) cpu hard burns is added to their sleep_avg, (boy is this a long
> sentence) you will (likely) find that they run at a highly elevated
> priority and starve the devil out of everything else because they can not
> possibly get enough cpu to eat the sleep_avg they have been given (only way
> to reduce their priority without forking). Virgin .65 is also subject to
> the positive feedback loop (irman's process load is worst case methinks,
> and rounding down only ~hides it).
>
> I have a really horrid looking sched.c right now that works around some of
> this problem in disgusting ways. If you want to try it, give me a holler
> before tomorrow morning (slice 'n dice resumes) and I'll rip it out and
> send it to you. Fair warning though, if you have good taste, don't look at
> it at all before applying. There are a few of spots I wouldn't even want
> to _try_ to justify. I don't think dbench will be able to dork it up, but
> irman's process load (horrible thing) now can again. (it's pure
> research... that says a lot;)
>
> Bottom line is that once cpu hogs are falsely determined to be sleepers,
> positive feedback kills you.
>
> -Mike
>
>
Sure, either post a patch against a known sync point, .65, .65-bk, or
65-mm2, or send me the sched.c file itself (2600 lines might be a little
too much for the entire list).

If you send it in the next 2 hours, I can test today, otherwise I'll do
it ma?ana.

Steven

2003-03-20 20:18:18

by John Levon

[permalink] [raw]
Subject: Re: 2.5.65 performance

On Thu, Mar 20, 2003 at 02:44:48PM -0500, John M Flinchbaugh wrote:

> doing normal desktop things (gnome, jboss, mozilla, apt-get updates,
> etc) i've noticed audio skips on occassion that i had not seen in
> kernels previous to 2.5.65.

I've also been seeing this with 2.5.65. 2.5.64 was OK. madplay on .65
skips whilst running wine, mozilla, kde and a couple of gcc's. No
massive drop outs though.

regards,
john

2003-03-20 20:53:04

by Mike Galbraith

[permalink] [raw]
Subject: Re: 2.5.65-mm2

At 01:12 PM 3/20/2003 -0700, Steven P. Cole wrote:
>On Thu, 2003-03-20 at 12:48, Mike Galbraith wrote:
> > At 07:36 AM 3/20/2003 -0700, Steven Cole wrote:
> > Bottom line is that once cpu hogs are falsely determined to be sleepers,
> > positive feedback kills you.
> >
> > -Mike
> >
> >
>Sure, either post a patch against a known sync point, .65, .65-bk, or
>65-mm2, or send me the sched.c file itself (2600 lines might be a little
>too much for the entire list).
>
>If you send it in the next 2 hours, I can test today, otherwise I'll do
>it ma?ana.

What the heck. It is attached.

-Mike

(and I repeat, don't _look_, just run it, and let me know;)


Attachments:
xx.diff (7.61 kB)

2003-03-20 21:07:50

by Steven Cole

[permalink] [raw]
Subject: Re: 2.5.65-mm2

On Thu, 2003-03-20 at 14:07, Mike Galbraith wrote:
> At 01:12 PM 3/20/2003 -0700, Steven P. Cole wrote:
> >On Thu, 2003-03-20 at 12:48, Mike Galbraith wrote:
> > > At 07:36 AM 3/20/2003 -0700, Steven Cole wrote:
> > > Bottom line is that once cpu hogs are falsely determined to be sleepers,
> > > positive feedback kills you.
> > >
> > > -Mike
> > >
> > >
> >Sure, either post a patch against a known sync point, .65, .65-bk, or
> >65-mm2, or send me the sched.c file itself (2600 lines might be a little
> >too much for the entire list).
> >
> >If you send it in the next 2 hours, I can test today, otherwise I'll do
> >it ma?ana.
>
> What the heck. It is attached.
>
> -Mike
>
> (and I repeat, don't _look_, just run it, and let me know;)

[steven@spc1 linux-2.5.65-mg]$ patch -p1 <../../xx.diff
patching file include/linux/sched.h
patching file kernel/fork.c
patching file kernel/printk.c
patching file kernel/sched.c
patch: **** unexpected end of file in patch

It looks like the last hunk has no trailing context lines.
Did your patch get clobbered?

Steven

2003-03-21 05:04:42

by Mike Galbraith

[permalink] [raw]
Subject: Re: 2.5.65-mm2

At 02:15 PM 3/20/2003 -0700, Steven P. Cole wrote:
>[steven@spc1 linux-2.5.65-mg]$ patch -p1 <../../xx.diff
>patching file include/linux/sched.h
>patching file kernel/fork.c
>patching file kernel/printk.c
>patching file kernel/sched.c
>patch: **** unexpected end of file in patch
>
>It looks like the last hunk has no trailing context lines.
>Did your patch get clobbered?

Must have. One more time.

-Mike


Attachments:
xx.diff (8.18 kB)

2003-03-21 05:56:19

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.5.65-mm2


On Thu, 20 Mar 2003, Mike Galbraith wrote:

> This is a side effect of Ingo's (nice!) latency change methinks. When
> you have several cpu hogs running (dbench), and they are cleaning your
> cpu's clock by using their full bandwidth to attain maximum throughput,
> and they then break up their timeslice in order to provide you with more
> responsiveness, and then their _cumulative_ sleep time between (round
> robin!) cpu hard burns is added to their sleep_avg, [...]

actually, the round-robining for finer-grained timeslices should not
impact the sleep average at all, because the roundrobin is done while the
task is still _running_, ie. the sleep average does not get impacted.
Otherwise we'd have elevated priority of simple CPU-intensive
applications, which would be Bad.

The way the sleep-average is maintained is balanced very carefully in the
O(1) scheduler. There are three states a task can be in:

- sleeping: the sleep average increases
- running but not executing: the sleep average stagnates
- executing on a CPU: the sleep average decreases

ie. in the roundrobin case the tasks will neither increase, nor decrease
their sleep average - they are in essence 'frozen'. The moment they get
scheduled on a CPU for execution, their sleep average starts to decrease
again. (and once they go to sleep, their sleep average increases.)

so whatever effect you are seeing, it must be something else.

Ingo

2003-03-21 06:06:15

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.5.65-mm2


On Thu, 20 Mar 2003, Mike Galbraith wrote:

> [...] Virgin .65 is also subject to the positive feedback loop (irman's
> process load is worst case methinks, and rounding down only ~hides it).

there's no positive feedback loop. What might happen is that in 2.5.65 we
now distribute the bonus timeslices more widely (the backboost thing), so
certain workloads might be rated more interactive. But we never give away
timeslices that were not earned the hard way (ie. via actual sleeping).

i've attached a patch that temporarily turns off the back-boost - does
that have any measurable impact? [please apply this to -mm1, i do think
the timeslice-granularity change in -mm1 (-D3) is something we really
want.]

Ingo

--- kernel/sched.c.orig 2003-03-21 07:14:02.000000000 +0100
+++ kernel/sched.c 2003-03-21 07:15:08.000000000 +0100
@@ -365,7 +365,7 @@
* tasks.
*/
if (sleep_avg > MAX_SLEEP_AVG) {
- if (!in_interrupt()) {
+ if (0 && !in_interrupt()) {
sleep_avg += current->sleep_avg - MAX_SLEEP_AVG;
if (sleep_avg > MAX_SLEEP_AVG)
sleep_avg = MAX_SLEEP_AVG;

2003-03-22 19:35:03

by Mike Galbraith

[permalink] [raw]
Subject: Re: 2.5.65-mm2

At 07:16 AM 3/21/2003 +0100, Ingo Molnar wrote:

>On Thu, 20 Mar 2003, Mike Galbraith wrote:
>
> > [...] Virgin .65 is also subject to the positive feedback loop (irman's
> > process load is worst case methinks, and rounding down only ~hides it).
>
>there's no positive feedback loop. What might happen is that in 2.5.65 we
>now distribute the bonus timeslices more widely (the backboost thing), so
>certain workloads might be rated more interactive. But we never give away
>timeslices that were not earned the hard way (ie. via actual sleeping).

(backboost alone is not it, nor is it timeslice granularity alone... bleh)

>i've attached a patch that temporarily turns off the back-boost - does
>that have any measurable impact? [please apply this to -mm1, i do think
>the timeslice-granularity change in -mm1 (-D3) is something we really
>want.]

I still don't have anything worth discussing.

-Mike

(however, I have been fiddling with the dang thing rather frenetically;)

Yes, this makes a difference. (everything in sched makes a
difference) The basic problem I'm seeing is load detection, and recovery
from erroneous detection. When it goes wrong, recovery isn't happening
here. cc1 should not ever be called anything but a cpu hog, but I've see
it and others running at prio 16 (deadly). This is nice if you're doing
deadline scheduling, and boost cc1 because it's late, ie intentionally, to
boost it's throughput. What I believe happens is that various cpu hogs get
miss-identified, and get boost with no way other than to fork
(parent_penalty [100%atm]) or use more cpu than exists. (I think) This I
call positive feedback. The irman process loop is really ugly, and the
scheduler totally fails to deal with it. Disabling forward boost actually
does serious harm to this load. The best thing you can do for this load
with the scheduler is to run it at nice 19. You can get a worst case
latency of 50ms without much if any tinkering. (no stockish kernel does
better than 600ms _ever_ on an otherwise totally idle 500Mhz box
here. ~200ms worst case is the _best_ I've gotten by playing with this and
that priority wise)

There may be something really simple behind the concurrency problems I see
here. (bottom line for me is the concurrency problem... I want to
understand it. The rest is less than the crux of the biscuit.

(generally, concurrency is much improved, and believe it or not, that's
exactly what is bugging me so. Too much is too little is too much. I'm
not ready to give up yet.

-Mike