2002-11-27 09:04:31

by Andrew Morton

[permalink] [raw]
Subject: 2.5.49-mm2


url: http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.49/2.5.49-mm2/

Lots of little things.

. Various micro-speedups from the AIM9 testing.

. VM changes to reduce the amount of (pointless) work which is done
against memory-backed filesystems, leading up to the removal of
fail_writepage(). (Hugh, please take a look...)

. Various fixes to the AIO-for-direct-IO code.

. An updated rbtree IO scheduler from Jens.

. Some code from Ingo Oeser to start using the expanded and cleaned up
user pagetable walker code. This affects the st and sg drivers; I'm
not sure of the testing status of this?


Changes since 2.5.49-mm1:

+linus.patch

Latest from Linus

+oprofile-fix.patch

oprofile compilation fix

-kgdb-ga.patch
-kgdb-nmi-signal.patch
-kgdb-nr-cpus.patch
-kgdb-use-stabs.patch

I was getting deadlocks (of the NMI watchdog variety) on scheduler
locks. Go back to the old stub for now.

-plugbug.patch
-writeback-reduced-context-switches.patch
-scheduling-points.patch
-swap-accounting.patch
-swapoff-cleanup.patch
-page-reclaim-scheduling-points.patch
-sync_blockdev-lock-kernel.patch
-incremental-slab-shrink.patch

Merged

+kgdb.patch

The old stub

+aio-dio-really-submit.patch

AIO/direct-IO fixes

+ipc_barriers.patch

Some IPC memory barrier fixes

-reiserfs-readpages-fix.patch

Merged into reiserfs-readpages.patch

-less-requests.patch

Jens made this change to the updated rbtree-iosched patch

+pf_memdie.patch

Fix the PF_MEMDIE logic

+truncate-speedup.patch

Special-case the truncation of zero-length files. Saves some CPU.

+spill-lru-lists.patch

Untangle interactions between the deferred lru addition queue and the
per-cpu page allocator queue.

+readdir-speedup.patch

Make readdir faster

-genksyms-fix.patch

Dropped. It was modules stuff, and is probably now irrelevant.

+page-walk-api-improvements.patch

More get_user_pages work from Ingo (Oeser)

+page-walk-scsi.patch

Start to use Ingo's new APIs in the scsi code. Basically, remove
driver-private implementations in favour of new core APIs

+bcrl-printk.patch

Ben's patch to create /dev/kmsg. You can write to it from initscripts
to inject text into the printk buffer.

+read_zero-speedup.patch

Speed up read_zero() for !CONFIG_MMU

+nommu-rmap-locking.patch

Fix an rmap deadlock for !CONFIG_SWAP

+semtimedop.patch

semtimedop() implementation

+writeback-handle-memory-backed.patch

Don't try to write out memory-backed filesystems at all

+remove-fail_writepage.patch

fail_writepage() is no longer needed.

+ptrace-flush.patch

Fix some cache coherency things in the ptrace code (this patch
isn't right, but I'm keeping it around so the right fix gets
done one day)

+pentium-II.patch

Optimisations for Pentium-II config




All 54 patches:

linus.patch
cset-1.842.2.15-to-1.893.txt.gz

oprofile-fix.patch

epoll-bits-0.57.patch
epoll bits 0.57 ( on top of 2.5.49 ) ...

kgdb.patch

simplified-vm-throttling.patch
Remove the final per-page throttling site in the VM

page-reclaim-motion.patch
Move reclaimable pages to the tail ofthe inactive list on IO completion

handle-fail-writepage.patch
Special-case fail_writepage() in page reclaim

activate-unreleaseable-pages.patch
Move unreleasable pages onto the active list

aio-direct-io-infrastructure.patch
AIO support for raw/O_DIRECT

deferred-bio-dirtying.patch
bio dirtying infrastructure

aio-direct-io.patch
AIO support for raw/O_DIRECT

aio-dio-really-submit.patch
Fix up aio-for-dio

aio-dio-deferred-dirtying.patch
Use the deferred-page-dirtying code in the AIO-DIO code.

aio-dio-debug.patch

dio-counting.patch

dio-reduce-context-switch-rate.patch
Reduced wakeup rate in direct-io code

ipc_barriers.patch
memory barrier work in ipc/util.c

signal-speedup.patch
speed up signals

reiserfs-readpages.patch
reiserfs v3 readpages support

reduce-random-context-switch-rate.patch
Reduce context switch rate due to the random driver

pf_memdie.patch
Subject: Re: [patch] 2.5: kill PF_MEMDIE

truncate-speedup.patch

spill-lru-lists.patch
Fix interaction between batched lru addition and hot/cold pages

readdir-speedup.patch
readdir speedup and fixes

page-walk-api.patch

page-walk-api-improvements.patch

page-walk-scsi.patch

poll-1-wqalloc.patch
poll 1/6: reduced mempory requirements

poll-2-selectalloc.patch
poll 2/6: put small bitmaps into a local

poll-3-alloc.patch
poll 3/6: improved pollfd memory allocation

poll-4-fast-select.patch
poll 4/6: select() speedups

poll-5-fast-poll.patch
poll 5/6: poll() speedup

poll-6-merge.patch
poll6/6: merge poll() and select() common code

bcrl-printk.patch

read_zero-speedup.patch
speed up read_zero() for !CONFIG_MMU

nommu-rmap-locking.patch
Fix rmap locking for CONFIG_SWAP=n

semtimedop.patch
semtimedop - semop() with a timeout

writeback-handle-memory-backed.patch
skip memory-backed filesystems in writeback

remove-fail_writepage.patch
Remove fail_writepage()

page-reservation.patch
Page reservation API

wli-show_free_areas.patch
show_free_areas extensions

inlines-net.patch

rbtree-iosched.patch
rbtree-based IO scheduler

ptrace-flush.patch
Subject: [PATCH] ptrace on 2.5.44

buffer-debug.patch
buffer.c debugging

warn-null-wakeup.patch

pentium-II.patch
Pentium-II support bits

radix-tree-overflow-fix.patch
handle overflows in radix_tree_gang_lookup()

rcu-stats.patch
RCU statistics reporting

auto-unplug.patch
self-unplugging request queues

less-unplugging.patch
Remove most of the blk_run_queues() calls

dcache_rcu-2-2.5.48.patch

dcache_rcu-3-2.5.48.patch

shpte-ng.patch
pagetable sharing for ia32


2002-11-27 11:31:38

by Rik van Riel

[permalink] [raw]
Subject: Re: 2.5.49-mm2

On Wed, 27 Nov 2002, Andrew Morton wrote:

> +pf_memdie.patch
>
> Fix the PF_MEMDIE logic

The first part of the patch looks suspicious. If PF_MEMALLOC
is set we shouldn't be allowed to go into try_to_free_pages()
in the first place, should we ?

> +writeback-handle-memory-backed.patch
>
> Don't try to write out memory-backed filesystems at all

Neat. Exactly the thing I was looking for for an O(1) VM
optimisation, good to know it's possible in 2.5 ;)

> simplified-vm-throttling.patch
> Remove the final per-page throttling site in the VM
>
> page-reclaim-motion.patch
> Move reclaimable pages to the tail ofthe inactive list on IO completion

Very nice, though if you're worried about effective reclaiming
you might be interested in Arjan's O(1) VM code, which I'll
probably forward-port to 2.5 once I've got it properly tuned.

> activate-unreleaseable-pages.patch
> Move unreleasable pages onto the active list

Interesting, does this make much difference ?

cheers,

Rik
--
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/ http://guru.conectiva.com/
Current spamtrap: <a href=mailto:"[email protected]">[email protected]</a>

2002-11-27 14:39:25

by Con Kolivas

[permalink] [raw]
Subject: Re: 2.5.49-mm2



Compile problem:

drivers/pci/quirks.c: In function `quirk_ioapic_rmw':
drivers/pci/quirks.c:354: `sis_apic_bug' undeclared (first use in this function)

Con

2002-11-27 16:04:03

by Ingo Oeser

[permalink] [raw]
Subject: [PATCH] page walker bugfix (was: 2.5.49-mm2)

Hi Andrew,
hi list readers,

On Wed, Nov 27, 2002 at 01:11:38AM -0800, Andrew Morton wrote:
> .. Some code from Ingo Oeser to start using the expanded and cleaned up
> user pagetable walker code. This affects the st and sg drivers; I'm
> not sure of the testing status of this?

The testing status is: None, but it compiles.

The sg-driver maintainer has already said he does some testing
and the author of the previous code in st.c was positive about
using these features. That's why I've choosen these as my "victims".

I also found a locking bug in walk_user_pages() in case of OOM or
SIGBUS. Fixed by the attached patch.

The attached patch also exports some functions to enable modular
builds for testing.

And it also deletes the redundant check of the pfn_valid(), which
is done in follow_page() already.

Regards

Ingo Oeser
--
Science is what we can tell a computer. Art is everything else. --- D.E.Knuth


Attachments:
(No filename) (920.00 B)
page-walk-api-2.5.49-mm2-bugfix.patch (1.80 kB)
Download all attachments

2002-11-27 18:21:05

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.5.49-mm2

Rik van Riel wrote:
>
> On Wed, 27 Nov 2002, Andrew Morton wrote:
>
> > +pf_memdie.patch
> >
> > Fix the PF_MEMDIE logic
>
> The first part of the patch looks suspicious. If PF_MEMALLOC
> is set we shouldn't be allowed to go into try_to_free_pages()
> in the first place, should we ?

Long story. Someone sent out a 2.4 patch quite a long time ago to
preserve PF_MEMALLOC in there because they were running userspace
processes as PF_MEMALLOC. These were realtime "userspace device drivers"
which actually provided block driver services to the kernel.

When you think about it, that's not totally dumb, and all the recursion
protection etc works OK. Supporting it is just a two-liner, so...

hm. OK, let's forget that idea ;)

> ...
> > page-reclaim-motion.patch
> > Move reclaimable pages to the tail ofthe inactive list on IO completion
>
> Very nice, though if you're worried about effective reclaiming
> you might be interested in Arjan's O(1) VM code, which I'll
> probably forward-port to 2.5 once I've got it properly tuned.

2.5 tends to refile pages more than 2.4, in preference to blocking
on them (the latency thing). Of course this blows CPU and perverts
page aging (not that the LRU lists add much value in page aging under
heavy loads anyway...)

Under stupid qsbench loads this patch took the reclaimed/scanned ratio
from ~15% to ~25% and reduced runtime from 7min 45sec to 6min 42sec.

Yup, splitting the lists up would make sense. Of course, the interrupt-time
motion is "ideal" in that the right pages are placed in the right place
at the right time - we never have to scan past pages which are still
under IO due to elevator reordering, device speed differences, etc...

> > activate-unreleaseable-pages.patch
> > Move unreleasable pages onto the active list
>
> Interesting, does this make much difference ?

My notes are not clear :( No, I wouldn't expect it to make a lot
of difference. I was seeing quite a lot of normal zone pages which
were pinned by buffers getting churned around on the inactive list.
Things like ext2 group descriptor blocks, etc.

There shouldn't normally be many of these, but there may be some
scenarios in which there are a lot of these, and the inactive list
gets really small due to large amounts of pinned memory.

2002-11-27 19:28:55

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] page walker bugfix (was: 2.5.49-mm2)

Ingo Oeser wrote:
>
> Hi Andrew,
> hi list readers,
>
> On Wed, Nov 27, 2002 at 01:11:38AM -0800, Andrew Morton wrote:
> > .. Some code from Ingo Oeser to start using the expanded and cleaned up
> > user pagetable walker code. This affects the st and sg drivers; I'm
> > not sure of the testing status of this?
>
> The testing status is: None, but it compiles.
>
> The sg-driver maintainer has already said he does some testing
> and the author of the previous code in st.c was positive about
> using these features. That's why I've choosen these as my "victims".

Yes, Doug Gilbert will help us out here.

> I also found a locking bug in walk_user_pages() in case of OOM or
> SIGBUS. Fixed by the attached patch.
>

Thanks.

We'll need to be concentrating on the shared pagetable code for
a while, and your patch overlaps with that. So I've swapped the
applying order (you come second) and I'll probably break your
stuff out separately for a while so Dave can generate clean patches.

When mm3 emerges could you please check mm/mmap.c around here:

vma = NULL; /* needed for out-label */

I may have misplaced that one...

2002-11-27 19:54:42

by Rasmus Andersen

[permalink] [raw]
Subject: Re: 2.5.49-mm2

On Wed, Nov 27, 2002 at 01:11:38AM -0800, Andrew Morton wrote:
>
> url: http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.49/2.5.49-mm2/

I'm fairly sure this is not specific to -mm2 since it looks
at lot like my problem from plain 2.5.49
(http://marc.theaimsgroup.com/?l=linux-kernel&m=103805691602076&w=2)
but -mm2 gave me some usable debug output:

Debug: Sleeping function called from illegal context at include/
linux/rwsem.h:66
Call Trace: __might_sleep+0x54/0x58
sys_mprotect+0x97/0x22b
syscall_call+0x7/0xb

Unable to handle kernel paging request at virtual address 4001360c

(I did not copy the rest but can reproduce at will.)

AFAICS, the fingered function is down_write at mprotect.c:262
but I fail to see why it is in a illegal context...

Hope this is useful for somebody. I am willing to test patches.

Regards,
Rasmus


Attachments:
(No filename) (859.00 B)
(No filename) (189.00 B)
Download all attachments

2002-11-27 20:04:31

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.5.49-mm2

Rasmus Andersen wrote:
>
> On Wed, Nov 27, 2002 at 01:11:38AM -0800, Andrew Morton wrote:
> >
> > url: http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.49/2.5.49-mm2/
>
> I'm fairly sure this is not specific to -mm2 since it looks
> at lot like my problem from plain 2.5.49
> (http://marc.theaimsgroup.com/?l=linux-kernel&m=103805691602076&w=2)
> but -mm2 gave me some usable debug output:
>
> Debug: Sleeping function called from illegal context at include/
> linux/rwsem.h:66
> Call Trace: __might_sleep+0x54/0x58
> sys_mprotect+0x97/0x22b
> syscall_call+0x7/0xb

Oh that's cute. Looks like we've accidentally disabled preemption
somewhere...

> Unable to handle kernel paging request at virtual address 4001360c

And once you do that, the pagefault handler won't handle pagefaults.

> (I did not copy the rest but can reproduce at will.)

Please do. And tell how you're making it happen.

Is that .config still current?

Does it go away if you turn off preemption?

2002-11-27 20:17:29

by Rasmus Andersen

[permalink] [raw]
Subject: Re: 2.5.49-mm2

On Wed, Nov 27, 2002 at 12:11:40PM -0800, Andrew Morton wrote:
> > Debug: Sleeping function called from illegal context at include/
> > linux/rwsem.h:66
> > Call Trace: __might_sleep+0x54/0x58
> > sys_mprotect+0x97/0x22b
> > syscall_call+0x7/0xb
>
> Oh that's cute. Looks like we've accidentally disabled preemption
> somewhere...
>
> > Unable to handle kernel paging request at virtual address 4001360c
>
> And once you do that, the pagefault handler won't handle pagefaults.
>
> > (I did not copy the rest but can reproduce at will.)
>
> Please do. And tell how you're making it happen.

I'm booting my debian testing system, going into kdm.
Various versions as per my last mail.

Does your 'Please do' mean that you would like the rest of
oops?

> Is that .config still current?

The .config used for -mm2 is at http://www.jaquet.dk/kernel/config-2.5.49-mm2

> Does it go away if you turn off preemption?

I'll test that right away.

Regards,
Rasmus


Attachments:
(No filename) (979.00 B)
(No filename) (189.00 B)
Download all attachments

2002-11-27 21:12:00

by Rasmus Andersen

[permalink] [raw]
Subject: Re: 2.5.49-mm2

On Wed, Nov 27, 2002 at 12:11:40PM -0800, Andrew Morton wrote:
> > (I did not copy the rest but can reproduce at will.)
>
> Please do. And tell how you're making it happen.

Hand copied. Sorry, but I am not able to get a ksymoops
running on my system to decode this so raw oops follows.
I have put the System.map at http://www.jaquet.dk/kernel/System.map-2.5.49-mm2.
I hope that helps some.


Printing eip:
4008c90
*pde = 06e73067
*pte = 071c8065
Oops: 0007
CPU: 0
EIP: 0023:[<40008c90>] Not tained
EFLAGS: 00010202
EIP is at 0x40008c90
eax: 0000003d ebx: 4001274c ecx: 401c2600 edx: 400134f0
ds: 002b es: 002b ss: 002b

Process ntpd (pid: 220, threadinfo=c6e64000 task=c7a9a0c0)
<0> Kernel panic: Aiee, killing interrupt handler!
In interrupt handler - not syncing


> Does it go away if you turn off preemption?

It does.

Regards,
Rasmus


Attachments:
(No filename) (848.00 B)
(No filename) (189.00 B)
Download all attachments