2001-10-09 18:55:09

by Andrea Arcangeli

[permalink] [raw]
Subject: 2.4.11pre6aa1

Allocation faliures with highmem seems cured (at least under heavy
emulation, didn't tested real hardware yet). Robert, could you give it
a spin and see if you can still reproduce the faliures now?

ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.11pre6aa1.bz2

Thanks,

--
Only in 2.4.11pre3aa1: 00_3.5G-address-space-1
Only in 2.4.11pre6aa1: 00_3.5G-address-space-2

Rediffed.

Only in 2.4.11pre3aa1: 00_debug-gfp-1
Only in 2.4.11pre6aa1: 10_debug-gfp-1

Renamed.

Only in 2.4.11pre6aa1: 00_rb-export-1

Patch from Mark J Roberts to export the rb library function to modules
(he's using rb trees in a module).

Only in 2.4.11pre3aa1: 00_rwsem-fair-20
Only in 2.4.11pre3aa1: 00_rwsem-fair-20-recursive-4
Only in 2.4.11pre6aa1: 00_rwsem-fair-22
Only in 2.4.11pre6aa1: 00_rwsem-fair-22-recursive-4

Rediffed.

Only in 2.4.11pre3aa1: 00_unmap-dirty-pte-2

Dropped, it generated a false positive on s390 that implements
slightly different semantics for pte_dirty (using per-page
physical dirty bitflag maintained by hardware).

Only in 2.4.11pre6aa1: 00_vm-1
Only in 2.4.11pre3aa1: 00_vm-tweaks-3

Allocation faliures with highmem should be cured. Swap seems
smooth and Andrew's workload also seems ok. Still untested
on real highmem at the moment and I'd love feedback on it.
I will be able to test very soon on real highmem too thanks
to osdlab.org resources.

Only in 2.4.11pre3aa1: 10_highmem-debug-4
Only in 2.4.11pre6aa1: 10_highmem-debug-5

Rediffed.

Only in 2.4.11pre3aa1: 10_numa-sched-10
Only in 2.4.11pre6aa1: 10_numa-sched-11

Rediffed.

Only in 2.4.11pre3aa1: 50_uml-patch-2.4.10-5.bz2
Only in 2.4.11pre6aa1: 50_uml-patch-2.4.10-6.bz2
Only in 2.4.11pre6aa1: 52_uml-export-objs-1

Picked last update from sourceforge.

Andrea


2001-10-10 03:10:51

by Andrea Arcangeli

[permalink] [raw]
Subject: 2.4.11aa1 [was Re: 2.4.11pre6aa1]

On Tue, Oct 09, 2001 at 08:55:16PM +0200, Andrea Arcangeli wrote:
> Allocation faliures with highmem seems cured (at least under heavy
> emulation, didn't tested real hardware yet). Robert, could you give it
> a spin and see if you can still reproduce the faliures now?
>
> ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.11pre6aa1.bz2

Only moved on top to 2.4.11 and picked the latest subsystem updates from
Jeff and Ingo:

Only in 2.4.11pre6aa1: 50_uml-patch-2.4.10-6.bz2
Only in 2.4.11aa1: 50_uml-patch-2.4.10-7.bz2

Picked last update from user-mode-linux.sourceforge.net .

Only in 2.4.11aa1: 60_tux-2.4.10-ac10-D2.bz2
Only in 2.4.11pre6aa1: 60_tux-2.4.10-ac4-A4.bz2

Picked last update from http://www.redhat.com/~mingo/ .

URL:

ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.11aa1.bz2
ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.11aa1/00_vm-1

(ftp.kernel.org will get it faster)

Andrea

2001-10-11 10:32:43

by Andrea Arcangeli

[permalink] [raw]
Subject: 2.4.12aa1 [was Re: 2.4.11aa1 [was Re: 2.4.11pre6aa1]]

This update has further VM work (actually fixes compared to 2.4.11aa1),
I also changed my mind about a few bits, and I suggest to test it since
this one seems to run very well for me.

Lorenzo and Jeffrey, I'd be interested if you could check with your
tests how it behaves compared to 2.4.12 vanilla.

Thanks!!

URL:

ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.12aa1.bz2
ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.12aa1/00_vm-2

(as usual ftp.kernel.org gets it faster)

Only in 2.4.12aa1: 00_cache-without-buffers-1

Don't account the buffer cache as pagecache. The logic is racy
(theoretically the "cache" value could also become negative) but that's
not a problem. Patch from Chris Mason.

Only in 2.4.11aa1: 00_copy-user-lat-5
Only in 2.4.12aa1: 00_lowlatency-fixes-1

Replaced the lowlatency in copy-user with explicit preemption
points. This is basically Andrew's patch posted to l-k yesterday but
I'm always using conditional_schedule() that uses unlikely. It isn't
worthwhile to make the __set_current_task(TASK_RUNNABLE) conditional
since it's in the slow path. I also hooked into the wait unlocked
buffers code. Also schedule after the pagecache is released, so it's
not a contention point.

Some #include rule: conditional_schedule() is defined by linux/sched.h
and likely/unlikely are defined by linux/kernel.h, so the latter are
generally always available in any kernel code without the need
of the #include <linux/compiler.h> (see the 10_compiler.h-1 patch).

Only in 2.4.11aa1: 00_o_direct-1
Only in 2.4.12aa1: 00_o_direct-2

Never use i_sb->s_blocksize* and friends in the pagecache layer, it is
the wrong thing for the blkdevs. Use inode->i_blkbits instead that
is defined ad hoc by the blkdev open code.

Only in 2.4.12aa1: 00_parport-fix-1

Parport compile fix from Tim.

Only in 2.4.11aa1: 00_vm-1
Only in 2.4.12aa1: 00_vm-2
Only in 2.4.11aa1: 10_debug-gfp-1

Further vm changes. First of all it fixes the reclaiming of mapped
pagecache/swapcache. vm-1 was not allowing the mapped cache to be
released correctly (it would been very bad if all the normal or dma
zone were mapped for example). Also added some write throttling at
the page layer to avoid unnecessary bangs on the clean cache.
This is still a bit experimental of course, but it is doing very well
here so far. As usual I'd really like if people could test and
feedback. Thanks!

Only in 2.4.12aa1: 10_lvm-snapshot-hardsectsize-1

Use the hardblocksize as the snapshot COWs rawio blocksize. The
"softblocksize" is meaningless for physical volumes and it
was breaking some compatibility with older lvmtools that sets
different alignments. Based on a patch from Chris Mason.

Only in 2.4.11aa1: 50_uml-patch-2.4.10-7.bz2
Only in 2.4.12aa1: 50_uml-patch-2.4.11-1.bz2
Only in 2.4.11aa1: 52_uml-export-objs-1

Picked last update at user-mode-linux.sourceforge.net from Jeff.

Only in 2.4.11aa1: 60_tux-2.4.10-ac10-D2.bz2
Only in 2.4.12aa1: 60_tux-2.4.10-ac10-E6.bz2

Picked latest Ingo's tux release at http://www.redhat.com/~mingo/ .

Andrea

2001-10-11 19:57:40

by Lorenzo Allegrucci

[permalink] [raw]
Subject: Re: 2.4.12aa1 [was Re: 2.4.11aa1 [was Re: 2.4.11pre6aa1]]

At 12.32 11/10/01 +0200, Andrea Arcangeli wrote:
>This update has further VM work (actually fixes compared to 2.4.11aa1),
>I also changed my mind about a few bits, and I suggest to test it since
>this one seems to run very well for me.
>
>Lorenzo and Jeffrey, I'd be interested if you could check with your
>tests how it behaves compared to 2.4.12 vanilla.

Linux-2.4.11:

lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100
seed = 140175100
71.020u 1.650s 2:20.74 51.6% 0+0k 0+0io 10652pf+0w
lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100
seed = 140175100
71.070u 1.650s 2:21.51 51.3% 0+0k 0+0io 10499pf+0w
lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100
seed = 140175100
70.790u 1.670s 2:21.01 51.3% 0+0k 0+0io 10641pf+0w

lenstra:~/src/qsort> time ./qsbench -n 9000000 -p 10 -s 140175100
63.500u 1.800s 1:34.06 69.4% 0+0k 0+0io 8836pf+0w
lenstra:~/src/qsort> time ./qsbench -n 9000000 -p 10 -s 140175100
63.020u 1.470s 1:20.22 80.3% 0+0k 0+0io 6394pf+0w
lenstra:~/src/qsort> time ./qsbench -n 9000000 -p 10 -s 140175100
63.130u 1.520s 1:12.21 89.5% 0+0k 0+0io 5676pf+0w
lenstra:~/src/qsort> time ./qsbench -n 9000000 -p 10 -s 140175100
62.820u 1.560s 1:12.61 88.6% 0+0k 0+0io 5433pf+0w
lenstra:~/src/qsort> time ./qsbench -n 9000000 -p 10 -s 140175100
63.070u 1.560s 1:14.83 86.3% 0+0k 0+0io 5811pf+0w
lenstra:~/src/qsort> time ./qsbench -n 9000000 -p 10 -s 140175100
63.160u 1.650s 1:17.84 83.2% 0+0k 0+0io 6036pf+0w

lenstra:~/src/qsort> time ./qsbench -n 45000000 -p 2 -s 140175100
71.290u 2.010s 1:50.11 66.5% 0+0k 0+0io 10462pf+0w
lenstra:~/src/qsort> time ./qsbench -n 45000000 -p 2 -s 140175100
71.490u 2.220s 1:49.62 67.2% 0+0k 0+0io 11413pf+0w
lenstra:~/src/qsort> time ./qsbench -n 45000000 -p 2 -s 140175100
71.280u 2.360s 1:54.79 64.1% 0+0k 0+0io 11110pf+0w



Linux-2.4.12-aa1:

lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100
seed = 140175100
71.420u 2.110s 2:49.01 43.5% 0+0k 0+0io 16110pf+0w
lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100
seed = 140175100
70.960u 1.850s 2:45.05 44.1% 0+0k 0+0io 15463pf+0w
lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100
seed = 140175100
70.760u 1.980s 2:45.61 43.9% 0+0k 0+0io 15595pf+0w

lenstra:~/src/qsort> time ./qsbench -n 9000000 -p 10 -s 140175100
64.020u 1.940s 1:38.76 66.7% 0+0k 0+0io 10206pf+0w
lenstra:~/src/qsort> time ./qsbench -n 9000000 -p 10 -s 140175100
64.190u 1.410s 1:16.98 85.2% 0+0k 0+0io 6796pf+0w
lenstra:~/src/qsort> time ./qsbench -n 9000000 -p 10 -s 140175100
63.530u 1.520s 1:13.51 88.4% 0+0k 0+0io 5274pf+0w
lenstra:~/src/qsort> time ./qsbench -n 9000000 -p 10 -s 140175100
63.980u 1.370s 1:16.53 85.3% 0+0k 0+0io 6456pf+0w
lenstra:~/src/qsort> time ./qsbench -n 9000000 -p 10 -s 140175100
63.640u 1.680s 1:16.38 85.5% 0+0k 0+0io 6189pf+0w
lenstra:~/src/qsort> time ./qsbench -n 9000000 -p 10 -s 140175100
63.720u 1.500s 1:15.33 86.5% 0+0k 0+0io 5777pf+0w

lenstra:~/src/qsort> time ./qsbench -n 45000000 -p 2 -s 140175100
72.810u 2.010s 1:58.16 63.3% 0+0k 0+0io 14220pf+0w
lenstra:~/src/qsort> time ./qsbench -n 45000000 -p 2 -s 140175100
71.700u 2.290s 1:58.68 62.3% 0+0k 0+0io 13803pf+0w
lenstra:~/src/qsort> time ./qsbench -n 45000000 -p 2 -s 140175100
72.440u 2.220s 1:56.13 64.2% 0+0k 0+0io 12911pf+0w




--
Lorenzo

2001-10-12 05:07:15

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.4.12aa1 [was Re: 2.4.11aa1 [was Re: 2.4.11pre6aa1]]

On Thu, Oct 11, 2001 at 09:59:17PM +0200, Lorenzo Allegrucci wrote:
> Linux-2.4.11:
>
> lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100
> seed = 140175100
> 71.020u 1.650s 2:20.74 51.6% 0+0k 0+0io 10652pf+0w
> lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100
> seed = 140175100
> 71.070u 1.650s 2:21.51 51.3% 0+0k 0+0io 10499pf+0w
> lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100
> seed = 140175100
> 70.790u 1.670s 2:21.01 51.3% 0+0k 0+0io 10641pf+0w
>
> lenstra:~/src/qsort> time ./qsbench -n 9000000 -p 10 -s 140175100
> 63.500u 1.800s 1:34.06 69.4% 0+0k 0+0io 8836pf+0w
> lenstra:~/src/qsort> time ./qsbench -n 9000000 -p 10 -s 140175100
> 63.020u 1.470s 1:20.22 80.3% 0+0k 0+0io 6394pf+0w
> lenstra:~/src/qsort> time ./qsbench -n 9000000 -p 10 -s 140175100
> 63.130u 1.520s 1:12.21 89.5% 0+0k 0+0io 5676pf+0w
> lenstra:~/src/qsort> time ./qsbench -n 9000000 -p 10 -s 140175100
> 62.820u 1.560s 1:12.61 88.6% 0+0k 0+0io 5433pf+0w
> lenstra:~/src/qsort> time ./qsbench -n 9000000 -p 10 -s 140175100
> 63.070u 1.560s 1:14.83 86.3% 0+0k 0+0io 5811pf+0w
> lenstra:~/src/qsort> time ./qsbench -n 9000000 -p 10 -s 140175100
> 63.160u 1.650s 1:17.84 83.2% 0+0k 0+0io 6036pf+0w
>
> lenstra:~/src/qsort> time ./qsbench -n 45000000 -p 2 -s 140175100
> 71.290u 2.010s 1:50.11 66.5% 0+0k 0+0io 10462pf+0w
> lenstra:~/src/qsort> time ./qsbench -n 45000000 -p 2 -s 140175100
> 71.490u 2.220s 1:49.62 67.2% 0+0k 0+0io 11413pf+0w
> lenstra:~/src/qsort> time ./qsbench -n 45000000 -p 2 -s 140175100
> 71.280u 2.360s 1:54.79 64.1% 0+0k 0+0io 11110pf+0w
>
>
>
> Linux-2.4.12-aa1:
>
> lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100
> seed = 140175100
> 71.420u 2.110s 2:49.01 43.5% 0+0k 0+0io 16110pf+0w
> lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100
> seed = 140175100
> 70.960u 1.850s 2:45.05 44.1% 0+0k 0+0io 15463pf+0w
> lenstra:~/src/qsort> time ./qsbench -n 90000000 -p 1 -s 140175100
> seed = 140175100
> 70.760u 1.980s 2:45.61 43.9% 0+0k 0+0io 15595pf+0w
>
> lenstra:~/src/qsort> time ./qsbench -n 9000000 -p 10 -s 140175100
> 64.020u 1.940s 1:38.76 66.7% 0+0k 0+0io 10206pf+0w
> lenstra:~/src/qsort> time ./qsbench -n 9000000 -p 10 -s 140175100
> 64.190u 1.410s 1:16.98 85.2% 0+0k 0+0io 6796pf+0w
> lenstra:~/src/qsort> time ./qsbench -n 9000000 -p 10 -s 140175100
> 63.530u 1.520s 1:13.51 88.4% 0+0k 0+0io 5274pf+0w
> lenstra:~/src/qsort> time ./qsbench -n 9000000 -p 10 -s 140175100
> 63.980u 1.370s 1:16.53 85.3% 0+0k 0+0io 6456pf+0w
> lenstra:~/src/qsort> time ./qsbench -n 9000000 -p 10 -s 140175100
> 63.640u 1.680s 1:16.38 85.5% 0+0k 0+0io 6189pf+0w
> lenstra:~/src/qsort> time ./qsbench -n 9000000 -p 10 -s 140175100
> 63.720u 1.500s 1:15.33 86.5% 0+0k 0+0io 5777pf+0w
>
> lenstra:~/src/qsort> time ./qsbench -n 45000000 -p 2 -s 140175100
> 72.810u 2.010s 1:58.16 63.3% 0+0k 0+0io 14220pf+0w
> lenstra:~/src/qsort> time ./qsbench -n 45000000 -p 2 -s 140175100
> 71.700u 2.290s 1:58.68 62.3% 0+0k 0+0io 13803pf+0w
> lenstra:~/src/qsort> time ./qsbench -n 45000000 -p 2 -s 140175100
> 72.440u 2.220s 1:56.13 64.2% 0+0k 0+0io 12911pf+0w

ok, it probably swapouts a little slower during the most heavy run,
that's pretty much intentional with the page-layer write throttling. The
little slowdown in swapout performance in the most heavy swapout
testcase should provide a much smoother responsiveness to the system
during the load that is more important than pure swap bandwith (if the
system feels unresponsive during the swap storm is much worse than the
pure speed of the allocations). In my interactive tests it was taking
more than 2 minutes to startup netscape during very heavy swapout
storms. With this new code it took more than 1 minute showing a
sensible improvement in the understanding of the working set and the
swap beandwith decrease doesn't seem significant given that the cpu load
decreased of 8% (so again in-core applications could make more
progress).

thanks for the feedback Lorenzo!

Andrea