2002-09-14 03:46:04

by Andrew Morton

[permalink] [raw]
Subject: 2.5.34-mm4


url: http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.34/2.5.34-mm4/

Some additional work has been performed on the new, faster
sleep/wakeup facilities.

I have converted TCP/IPV4 over to use the faster wakeups. It would
be appreciated if the people who are interested in (and set up for
testing) high performance networking could test this out. Note
however that there is no benefit to select()/poll(). That's quite
a large change.

So please bear in mind that this code will only help if applications
are generally sleeping in accept(), connect(), etc. At this stage
I'd like to know whether this work is generally something which should be
pursued further - let's be careful that the measurements are not
swamped by select()/poll() wakeups.

The individual patches are:

http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.34/2.5.34-mm4/broken-out/wake-speedup.patch
http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.34/2.5.34-mm4/broken-out/tcp-wakeups.patch

These apply against 2.5.26 and possibly earlier, and testing against
earlier kernels would be valid. Thanks.



Changes have been made to /proc/stat which break top(1) and vmstat(1).
New versions are available at
http://www.zip.com.au/~akpm/linux/patches/procps-2.5.34-mm4.tar.gz
and newer versions will appear at
http://surriel.com/procps/

+aio-sync-iocb.patch

Ben's AIO patch conflicted with the readv/writev patch. This is
Ben's patch reworked to fit on top of readv-writev.patch

+pagevec_lru_add.patch

Fix a bogon which broke reiserfs4

+taka-writev.patch

Hirokazu Takahashi's writev() speedup.

+vm-wakeups.patch

Use the auto waitqueues in the VM and block layers. Broken out of
the wake-speedup patch.

+per-node-kswapd.patch

David Hansen's per-NUMA-node kswapd patch.

+topology-api.patch

Matthew Dobson's topology API.

+kswapd-reclaim-stats.patch

Add `kswapd_steal' and `pgrefill' to /proc/vmstat. The former indicates
that, on a quick test, 99% of page reclaim is being performed by kswapd.

+iowait.patch

Instrumentation to show how much time is spent in disk wait. (Doesn't
appear to come out in the new top(1) though?)

+tcp-wakeups.patch

Use auto-waitqueues in TCP/IPV4




linus.patch
cset-1.568.19.4-to-1.661.txt.gz

scsi_hack.patch
Fix block-highmem for scsi

ext3-htree.patch
Indexed directories for ext3

spin-lock-check.patch
spinlock/rwlock checking infrastructure

rd-cleanup.patch
Cleanup and fix the ramdisk driver (doesn't work right yet)

readv-writev.patch
O_DIRECT support for readv/writev

aio-sync-iocb.patch
Use a sync iocb for generic_file_read

llzpr.patch
Reduce scheduling latency across zap_page_range

buffermem.patch
Resurrect buffermem accounting

lpp.patch
ia32 huge tlb pages

lpp-update.patch
hugetlbpage fixes

reversemaps-leak.patch
Fix reverse map accounting leak

sharedmem.patch
Add /proc/meminfo:Mapped - tha amount of memory which is mapped into pagetables

ext3-sb.patch
u.ext3_sb -> generic_sbp

pagevec_lru_add.patch
Run readpage before dropping the page refcount

oom-fix.patch
Fix an OOM condition on big highmem machines

tlb-cleanup.patch
Clean up the tlb gather code

dump-stack.patch
arch-neutral dump_stack() function

wli-cleanup.patch
random cleanups

madvise-move.patch
move mdavise implementation into mm/madvise.c

split-vma.patch
VMA splitting patch

mmap-fixes.patch
mmap.c cleanup and lock ranking fixes

buffer-ops-move.patch
Move submit_bh() and ll_rw_block() into fs/buffer.c

slab-stats.patch
Display total slab memory in /proc/meminfo

writeback-control.patch
Cleanup and extension of the writeback paths

free_area_init-cleanup.patch
free_area_init() code cleanup

alloc_pages-cleanup.patch
alloc_pages cleanup and optimisation

statm_pgd_range-sucks.patch
Remove the pagetable walk from /proc/stat

remove-sync_thresh.patch
Remove /proc/sys/vm/dirty_sync_thresh

taka-writev.patch
Speed up writev

pf_nowarn.patch
Fix up the handling of PF_NOWARN

jeremy.patch
Spel Jermy's naim wright

queue-congestion.patch
Infrastructure for communicating request queue congestion to the VM

nonblocking-ext2-preread.patch
avoid ext2 inode prereads if the queue is congested

nonblocking-pdflush.patch
non-blocking writeback infrastructure, use it for pdflush

nonblocking-vm.patch
Non-blocking page reclaim

wake-speedup.patch
Faster wakeup code

vm-wakeups.patch
Use the faster wakeups in the VM and block layers

sync-helper.patch
Speed up sys_sync() against multiple spindles

slabasap.patch
Early and smarter shrinking of slabs

write-deadlock.patch
Fix the generic_file_write-from-same-mmapped-page deadlock

buddyinfo.patch
Add /proc/buddyinfo - stats on the free pages pool

free_area.patch
Remove struct free_area_struct and free_area_t, use `struct free_area'

per-node-kswapd.patch
Per-node kswapd instance

topology-api.patch
NUMA topology API

radix_tree_gang_lookup.patch
radix tree gang lookup

truncate_inode_pages.patch
truncate/invalidate_inode_pages rewrite

proc_vmstat.patch
Move the vm accounting out of /proc/stat

kswapd-reclaim-stats.patch
Add kswapd_steal to /proc/vmstat

iowait.patch
I/O wait statistics

tcp-wakeups.patch
Use fast wakeups in TCP/IPV4


2002-09-14 03:56:45

by Rik van Riel

[permalink] [raw]
Subject: Re: 2.5.34-mm4

On Fri, 13 Sep 2002, Andrew Morton wrote:

> +iowait.patch
>
> Instrumentation to show how much time is spent in disk wait. (Doesn't
> appear to come out in the new top(1) though?)

Will add it now that you're shipping it again. Note that this
will be available as patches on my home page and from my bk
tree only for now. I'll merge the needed patches into the main
procps tree once this stuff gets merged into the kernel.

Rik
--
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/ http://distro.conectiva.com/

Spamtraps of the month: [email protected] [email protected]

2002-09-15 10:45:35

by Axel H. Siebenwirth

[permalink] [raw]
Subject: Re: 2.5.34-mm4

Hi Andrew!

On Fri, 13 Sep 2002, Andrew Morton wrote:

> url: http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.34/2.5.34-mm4/

With changing from 2.5.34-mm2 to -mm4 I have experienced some moments of
quite unresponsive behaviour. For example I am building X which at that
special moment causes pretty heavy disk load and the system doesn't respond
at all. I was using X and was not able to switch consoles or move mouse only
extremely sluggish.
I have seen that it used more swap that usual.

total used free shared buffers cached
Mem: 191096 159340 31756 0 10568 94100
-/+ buffers/cache: 54672 136424
Swap: 289160 0 289160

This is how it looks like under normal circumstances and when building X I
had 20M in swap usage which seemed quite a lot to me. Maybe I'm just wrong.
Unfortunately I was not able to start vmstat, first because I can't start
vmstat when system is not responding and second it doesn't work anyway
because of your changes.


Best regards,
Axel

2002-09-15 14:27:00

by Rik van Riel

[permalink] [raw]
Subject: Re: 2.5.34-mm4

On Sun, 15 Sep 2002, Axel Siebenwirth wrote:
> On Fri, 13 Sep 2002, Andrew Morton wrote:
>
> > url: http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.34/2.5.34-mm4/
>
> With changing from 2.5.34-mm2 to -mm4 I have experienced some moments of
> quite unresponsive behaviour.

Don't worry, it's supposed to do that. You can't measure desktop
interactivity, so it doesn't exist ;)


Rik
--
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/ http://distro.conectiva.com/

Spamtraps of the month: [email protected] [email protected]

2002-09-15 17:20:44

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.5.34-mm4

Axel Siebenwirth wrote:
>
> Hi Andrew!
>
> On Fri, 13 Sep 2002, Andrew Morton wrote:
>
> > url: http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.34/2.5.34-mm4/
>
> With changing from 2.5.34-mm2 to -mm4 I have experienced some moments of
> quite unresponsive behaviour. For example I am building X which at that
> special moment causes pretty heavy disk load and the system doesn't respond
> at all. I was using X and was not able to switch consoles or move mouse only
> extremely sluggish.

There are large IDE updates in -mm4, and this is consistent with
a disk which isn't doing DMA any more. Could you (and Con) please
double-check with `hdparm -i' and `hdparm -t' that the disk subsystem
is behaving properly?

Yes, it could well be a VM bug, but I wouldn't want to run round in
confused circles all day ;) Thanks.


> I have seen that it used more swap that usual.

2.5 is much more swaphappy than 2.4. I believe that this is actually
correct behaviour for optimum throughput. But it just happens that
people (me included) hate it. We don't notice the improved runtimes
for the pagecache-intensive operations but we do notice the time it
takes to get the xterms working again.

We have not yet sat down and worked out what to do about this.

> total used free shared buffers cached
> Mem: 191096 159340 31756 0 10568 94100
> -/+ buffers/cache: 54672 136424
> Swap: 289160 0 289160
>
> This is how it looks like under normal circumstances and when building X I
> had 20M in swap usage which seemed quite a lot to me. Maybe I'm just wrong.
> Unfortunately I was not able to start vmstat, first because I can't start
> vmstat when system is not responding and second it doesn't work anyway
> because of your changes.
>

Yeah, sorry. The burden of back-compatibility weighed too heavy and
Rik decided that we just have to fix userspace to follow kernel
changes. There will be breakage for a while; updates are at
http://surriel.com/procps/.

Unfortunately, those updates cause odd-but-not-serious things to
happen to Red Hat initscripts. This happens when you install standard
util-linux as well. It is due to the initscripts passing in arguments
which the standard tools do not understand.

2002-09-15 17:32:18

by Rik van Riel

[permalink] [raw]
Subject: Re: 2.5.34-mm4

On Sun, 15 Sep 2002, Andrew Morton wrote:

> Unfortunately, those updates cause odd-but-not-serious things to
> happen to Red Hat initscripts. This happens when you install standard
> util-linux as well. It is due to the initscripts passing in arguments
> which the standard tools do not understand.

I'm about to add all patches from the RH procps rpm to the
procps cvs tree, so this should go away soon.

Rik
--
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/ http://distro.conectiva.com/

Spamtraps of the month: [email protected] [email protected]

2002-09-15 17:34:34

by Rik van Riel

[permalink] [raw]
Subject: Re: 2.5.34-mm4

On Sun, 15 Sep 2002, Andrew Morton wrote:
> Axel Siebenwirth wrote:

> > I have seen that it used more swap that usual.
>
> 2.5 is much more swaphappy than 2.4. I believe that this is actually
> correct behaviour for optimum throughput. But it just happens that
> people (me included) hate it.

Time for a corollary to "if you can't measure it, it doesn't exist":

"If you can't measure desktop performance, our method of development
will ensure it won't exist"

cheers,

Rik
--
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/ http://distro.conectiva.com/

Spamtraps of the month: [email protected] [email protected]

2002-09-15 17:44:41

by M. Edward Borasky

[permalink] [raw]
Subject: RE: 2.5.34-mm4

Borasky's Corollary 1: If you *can* measure it and it *does* exist, the
cheapest solution may still be to buy more memory, more disks or a faster
processor.

Borasky's Corollary 2: When you try to measure the performance of people the
way you measure performance of computers, you need psychological help.

M. Edward (Ed) Borasky
mailto: [email protected]
http://www.pdxneurosemantics.com
http://www.meta-trading-coach.com
http://www.borasky-research.net

Coaching: It's Not Just for Athletes and Executives Any More!

-----Original Message-----
From: [email protected] [mailto:[email protected]]On Behalf Of
Rik van Riel
Sent: Sunday, September 15, 2002 10:39 AM
To: Andrew Morton
Cc: Axel Siebenwirth; Con Kolivas; lkml; [email protected];
[email protected]
Subject: Re: 2.5.34-mm4

On Sun, 15 Sep 2002, Andrew Morton wrote:
> Axel Siebenwirth wrote:

> > I have seen that it used more swap that usual.
>
> 2.5 is much more swaphappy than 2.4. I believe that this is actually
> correct behaviour for optimum throughput. But it just happens that
> people (me included) hate it.

Time for a corollary to "if you can't measure it, it doesn't exist":

"If you can't measure desktop performance, our method of development
will ensure it won't exist"

cheers,

Rik
--
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/ http://distro.conectiva.com/

Spamtraps of the month: [email protected] [email protected]

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to [email protected]. For more info on Linux MM,
see: http://www.linux-mm.org/

2002-09-15 17:49:55

by Rik van Riel

[permalink] [raw]
Subject: RE: 2.5.34-mm4

On Sun, 15 Sep 2002, M. Edward Borasky wrote:

> Borasky's Corollary 1: If you *can* measure it and it *does* exist, the
> cheapest solution may still be to buy more memory, more disks or a
> faster processor.

Current 2.5 is sluggish on systems with a fast CPU and 768 MB
of RAM, whereas current -ac runs the same workload smoothly
with 128 MB of RAM.

Now tell me, what's your point ?

Rik
--
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/ http://distro.conectiva.com/

Spamtraps of the month: [email protected] [email protected]

2002-09-15 18:34:42

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.5.34-mm4

Rik van Riel wrote:
>
> On Sun, 15 Sep 2002, M. Edward Borasky wrote:
>
> > Borasky's Corollary 1: If you *can* measure it and it *does* exist, the
> > cheapest solution may still be to buy more memory, more disks or a
> > faster processor.
>
> Current 2.5 is sluggish on systems with a fast CPU and 768 MB
> of RAM, whereas current -ac runs the same workload smoothly
> with 128 MB of RAM.
>

I've been running 2.5 on my desktop at work (800MHz/256M UP) since
2.5.26 and on the machine at home (Dual 850MHz/768M) on-and-off
(recent freizures sent that machine back to Marcelo; need to try
again). I also ran 2.4.19-ac-something for a couple of weeks.

Impressions are:

- 2.5 swaps a lot in response to heavy pagecache activity.

SEGQ didn't change that, actually. And this is correct,
as-designed behaviour. We'll need some "don't be irritating"
knob to prevent this. Or speculative pagein when the load
has subsided, which would be a fair-sized project.

- In both -ac and 2.5 the scheduler is prone to starving interactive
applications (netscape 4, gkrellm, command-line gdb, others) when
there is a compilation happening.

This is very, very noticeable; and it afects applications which
do not use sched_yield(). Ingo has put some extra stuff in since
then and I need to retest.

- In -ac, there are noticeable stalls during heavy writeout. This
may be an ext3 thing, but I can't think of any IO scheduling
differences in -ac ext3. I'd be guessing that it is due to
bdflush/kupdate lumpiness.

Overall I find Marcelo kernels to be the most comfortable, followed
by 2.5. Alan's kernels I find to be the least comfortable in a
"developer's desktop" situation.

2002-09-15 18:51:49

by Rik van Riel

[permalink] [raw]
Subject: Re: 2.5.34-mm4

On Sun, 15 Sep 2002, Andrew Morton wrote:

> - In -ac, there are noticeable stalls during heavy writeout. This
> may be an ext3 thing, but I can't think of any IO scheduling
> differences in -ac ext3. I'd be guessing that it is due to
> bdflush/kupdate lumpiness.

This is also due to the fact that -ac has an older -rmap
VM. As in current 2.5, rmap can write out all inactive
pages ... and it did in some worst case situations.

This is fixed in rmap14.

(I hope Alan is done playing with IDE soon so I can push
him a VM update)

regards,

Rik
--
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/ http://distro.conectiva.com/

Spamtraps of the month: [email protected] [email protected]

2002-09-15 19:05:13

by Andi Kleen

[permalink] [raw]
Subject: Re: [Lse-tech] Re: 2.5.34-mm4

> Overall I find Marcelo kernels to be the most comfortable, followed
> by 2.5. Alan's kernels I find to be the least comfortable in a

... and -aa kernels are marcelo kernels, just with the the corner
cases fixed too. Works very nicely here.

-Andi

2002-09-16 01:27:13

by Alan

[permalink] [raw]
Subject: Re: 2.5.34-mm4

On Sun, 2002-09-15 at 19:56, Rik van Riel wrote:
> On Sun, 15 Sep 2002, Andrew Morton wrote:
>
> > - In -ac, there are noticeable stalls during heavy writeout. This
> > may be an ext3 thing, but I can't think of any IO scheduling
> > differences in -ac ext3. I'd be guessing that it is due to
> > bdflush/kupdate lumpiness.

I think so. I've always been conservative, I need rmap to pass cerberus
still. But the rmap in -ac is out of date a little with the 2.5 tuning

> This is also due to the fact that -ac has an older -rmap
> VM. As in current 2.5, rmap can write out all inactive
> pages ... and it did in some worst case situations.
>
> This is fixed in rmap14.
>
> (I hope Alan is done playing with IDE soon so I can push
> him a VM update)

The big one left to fix is the simplex device bug - which is an "I know
why". The great mystery is the affair of taskfile pio write. Other than
that its annoying glitches not big problems now.

So send me rmap-14a patches by all means

2002-09-16 18:35:28

by Bill Davidsen

[permalink] [raw]
Subject: Re: 2.5.34-mm4

On Sun, 15 Sep 2002, Rik van Riel wrote:

> On Sun, 15 Sep 2002, Axel Siebenwirth wrote:
> > On Fri, 13 Sep 2002, Andrew Morton wrote:
> >
> > > url: http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.34/2.5.34-mm4/
> >
> > With changing from 2.5.34-mm2 to -mm4 I have experienced some moments of
> > quite unresponsive behaviour.
>
> Don't worry, it's supposed to do that. You can't measure desktop
> interactivity, so it doesn't exist ;)

But now we have `contest' and we can, so it does.

--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

2002-09-16 18:53:29

by Bill Davidsen

[permalink] [raw]
Subject: Re: [Lse-tech] Re: 2.5.34-mm4

On Sun, 15 Sep 2002, Andi Kleen wrote:

> > Overall I find Marcelo kernels to be the most comfortable, followed
> > by 2.5. Alan's kernels I find to be the least comfortable in a
>
> ... and -aa kernels are marcelo kernels, just with the the corner
> cases fixed too. Works very nicely here.

Corner cases? The IDE, VM and scheduler are different...

--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

2002-09-16 18:50:59

by Bill Davidsen

[permalink] [raw]
Subject: Re: 2.5.34-mm4

On Sun, 15 Sep 2002, Andrew Morton wrote:

> Impressions are:
>
> - 2.5 swaps a lot in response to heavy pagecache activity.
>
> SEGQ didn't change that, actually. And this is correct,
> as-designed behaviour. We'll need some "don't be irritating"
> knob to prevent this. Or speculative pagein when the load
> has subsided, which would be a fair-sized project.

It would be nice to have a knob in /proc/sys which could be tuned for
response or throughput, Preferably not a boolean;-) I suspect that we
would have lack of agreement on what that would do, but it sure would be
nice!

> - In both -ac and 2.5 the scheduler is prone to starving interactive
> applications (netscape 4, gkrellm, command-line gdb, others) when
> there is a compilation happening.
>
> This is very, very noticeable; and it afects applications which
> do not use sched_yield(). Ingo has put some extra stuff in since
> then and I need to retest.
>
> - In -ac, there are noticeable stalls during heavy writeout. This
> may be an ext3 thing, but I can't think of any IO scheduling
> differences in -ac ext3. I'd be guessing that it is due to
> bdflush/kupdate lumpiness.

I have the feeling that 2.5 is less good about noting that a file is open
for write only and no seeks have been done. I haven't measured it, but it
would seem that writes to such a file would be better on the disk and not
taking buffers, since they're probably not going to be read.

This is just based on running mkisofs on 2.4.19 and 2.5.34, a watching "no
disk activity" followed by a heavy burst. I haven't made any careful
measurement, so take this as you will, but I agree that heavy write bogs
the system. Clearly with big memory I can/do get the whole ~700MB in
memory if writes don't start quickly.

Yes, that could be tuning, I know that.

> Overall I find Marcelo kernels to be the most comfortable, followed
> by 2.5. Alan's kernels I find to be the least comfortable in a
> "developer's desktop" situation.

On small memory machines I don't see as much to choose, and the -ck series
has been very nice to me. I don't run 2.5 on any but test machines, and
both are big memory (1+GB) machines.

--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

2002-09-19 09:00:08

by Jens Axboe

[permalink] [raw]
Subject: Re: [Lse-tech] Re: 2.5.34-mm4

On Mon, Sep 16 2002, Bill Davidsen wrote:
> On Sun, 15 Sep 2002, Andi Kleen wrote:
>
> > > Overall I find Marcelo kernels to be the most comfortable, followed
> > > by 2.5. Alan's kernels I find to be the least comfortable in a
> >
> > ... and -aa kernels are marcelo kernels, just with the the corner
> > cases fixed too. Works very nicely here.
>
> Corner cases? The IDE, VM and scheduler are different...

The IDE is the same, I'll refrain from commenting on the rest. There's
just an adjustment to the read ahead, which makes sense.

--
Jens Axboe