2003-02-27 10:48:09

by Andrew Morton

[permalink] [raw]
Subject: 2.5.63-mm1


ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.5/2.5.63/2.5.63-mm1/

. Tons of changes to the anticipatory scheduler. It may not be working
very well at present. Please use "elevator=deadline" if it causes
problems.

. Updated smalldevfs patch.

. A fix for the VMA-based reverse mapping patch.

. Added Ingo's latest CPU scheduler update.

. Lots of random fixes.



linus.patch

Latest from Linus

-initial-jiffies.patch
-user-times-jiffies-wrap-fix.patch
-put_page-speedup.patch
-slab-batchcount-limit-fix.patch
-crc32-speedup-2.patch
-flush-tlb-all-2.patch
-linux-2.5.62-early_ioremap_A0.patch
-linux-2.5.62-x440disco_A0.patch
-use-find_get_page.patch
-irda-interruptible-sleep.patch
-dget-BUG.patch
-disk-accounting-fix.patch
-hugh-inode-pruning-race-fix.patch
-kill-bogus-wakeup-messge.patch
-dont-sync-with-stopped-pdflush.patch
-irq-balance-disable-fix.patch
-oom-killer-dont-spin-on-same-task.patch
-add-missing-global_flush_tlb-calls.patch
-ext3-O_SYNC-speedup.patch
-remove-MAX_BLKDEV-from-genhd.patch

Merged

+separate.patch

My contribution to the spelling bee.

+rpc_rmdir-fix.patch

Fix the NFS oops

+ppc64-scruffiness.patch

Fix some warnings

-reiserfs_file_write-4.patch
+reiserfs_file_write-5.patch

Updated (I don't think it changed)

+limit-write-latency.patch

Fix potential source of write-vs-write latency in VFS

+lockd-lockup-fix-2.patch

Updated patch from Neil for an NFS server deadlock

+loop-hack.patch

Fix an OOM and oops in loop

+flock-fix.patch

File locking fix from Matthew

+sysfs-dget-fix-2.patch

Fix a sysfs dentry race (this isn't right)

+irq-sharing-fix.patch

Fix SA_INTERRUPT for shared interrupts

+anticipation_is_killing_me.patch
+as-fix-hughs-problem.patch
+as-cleanup.patch
+as-start-stop-anticipation-helpers.patch
+as-cleanup-2.patch
+as-cleanup-3.patch
+as-cleanup-3-write-latency-fix.patch
+as-handle-exitted-tasks.patch
+as-handle-exitted-tasks-fix.patch
+as-no-plugging-and-cleanups.patch
+as-remove-debug.patch
+as-track-queued-reads.patch
+as-accounting-fix.patch
+as-nr_reads-fix.patch
+as-tuning.patch
+as-disable-nr_reads.patch

Anticipatory scheduler work

smalldevfs.patch

Updated

-smalldevfs-dcache_rcu-fix.patch

Folded into smalldevfs.patch

+objrmap-X-fix.patch

Fix VMA-based reverse mapping

+per-cpu-disk-stats.patch

Use per-cpu data for disk accounting

+presto_get_sb-fix.patch

Fix an intermezzo oops

+on_each_cpu.patch
+on_each_cpu-ldt-cleanup.patch

preempt-safety for smp_call_function()

+notsc-panic.patch

x86 TSC cleanup

+alloc_pages_cleanup.patch

Code consolidation

+ext2-handle-htree-flag.patch

ext2 htree back-compatibility

+sched-a3.patch

CPU scheduler update

+mpparse-typo-fix.patch

Fix a printk bug

+i386-no-swap-fix.patch

Fix ia32 CONFIG_SWAP=n

+remove-hugetlb_key.patch
+hugetlbpage-doc-update.patch
+hugetlb-valid-page-ranges.patch

Hugetlbpage work




All 88 patches:

linus.patch
Latest from Linus

separate.patch

mm.patch
add -mmN to EXTRAVERSION

rpc_rmdir-fix.patch
Fix nfs oops during mount

ppc64-reloc_hide.patch

ppc64-pci-patch.patch
Subject: pci patch

ppc64-e100-fix.patch
fix e100 for big-endian machines

ppc64-aio-32bit-emulation.patch
32/64bit emulation for aio

ppc64-64-bit-exec-fix.patch
Subject: 64bit exec

ppc64-scruffiness.patch
Fix some PPC64 compile warnings

sym-do-160.patch
make the SYM driver do 160 MB/sec

kgdb.patch

nfsd-disable-softirq.patch
Fix race in svcsock.c in 2.5.61

report-lost-ticks.patch
make lost-tick detection more informative

devfs-fix.patch

ptrace-flush.patch
cache flushing in the ptrace code

buffer-debug.patch
buffer.c debugging

warn-null-wakeup.patch

ext3-truncate-ordered-pages.patch
ext3: explicitly free truncated pages

deadline-dispatching-fix.patch
deadline IO scheduler dispatching fix

nfs-unstable-pages.patch
"unstable" page accounting for NFS.

limit-write-latency.patch

reiserfs_file_write-5.patch

tcp-wakeups.patch
Use fast wakeups in TCP/IPV4

lockd-lockup-fix-2.patch
Subject: Re: Fw: Re: 2.4.20 NFS server lock-up (SMP)

rcu-stats.patch
RCU statistics reporting

ext3-journalled-data-assertion-fix.patch
Remove incorrect assertion from ext3

nfs-speedup.patch

nfs-oom-fix.patch
nfs oom fix

sk-allocation.patch
Subject: Re: nfs oom

nfs-more-oom-fix.patch

nfs-sendfile.patch
Implement sendfile() for NFS

rpciod-atomic-allocations.patch
Make rcpiod use atomic allocations

linux-isp.patch

isp-update-1.patch

remove-unused-congestion-stuff.patch
Subject: [PATCH] remove unused congestion stuff

aic-makefile-fix.patch
aicasm Makefile fix

loop-hack.patch
loop: Fix OOM and oops

atm_dev_sem.patch
convert atm_dev_lock from spinlock to semaphore

flock-fix.patch
flock fixes for 2.5.62

sysfs-dget-fix-2.patch

irq-sharing-fix.patch
fix irq sharing and SA_INTERRUPT on x86

as-iosched.patch
anticipatory I/O scheduler

as-comments-and-tweaks.patch
antsched: commentary and

as-hz-1000-fix.patch
Fix anticipatory scheduler for HZ=100

as-tidy-up-rename.patch
tidy up AS rename

anticipation_is_killing_me.patch

as-update-1.patch
AS update

as-break-anticipation-on-write.patch
AS break on write

as-break-if-readahead.patch
detect overlapping reads and writes

as-fix-hughs-problem.patch
Add a pointer to the queue into struct as_data

as-cleanup.patch
anticipatory scheduler cleanups

as-start-stop-anticipation-helpers.patch
AS: add anticipation stop/start helper functions

as-cleanup-2.patch
Subject: [PATCH] some cleanups 2

as-cleanup-3.patch
AS: more cleanups

as-cleanup-3-write-latency-fix.patch
Fix as-cleanup-3

as-handle-exitted-tasks.patch

as-handle-exitted-tasks-fix.patch
fix for as IO contexts

as-no-plugging-and-cleanups.patch
AS no plugging + cleanups

as-remove-debug.patch

as-track-queued-reads.patch
AS: track queued reads

as-accounting-fix.patch
AS: track queued reads (fix)

as-nr_reads-fix.patch
AS: read accounting fix

as-tuning.patch
AS: tuning

as-disable-nr_reads.patch
AS: disable per-process in-flight read logic

readahead-shrink-to-zero.patch
Allow VFS readahead to fall to zero

cfq-2.patch
CFQ scheduler, #2

smalldevfs.patch
smalldevfs

objrmap-2.5.62-5.patch
object-based rmap

objrmap-X-fix.patch
objrmap fix for X

oprofile-up-fix.patch
fix oprofile on UP (lockless sync)

update_atime-speedup.patch
speed up update_atime()

ext2-update_atime_speedup.patch
Use one_sec_update_atime in ext2

ext3-update_atime_speedup.patch
Use one_sec_update_atime in ext2

UPDATE_ATIME-to-update_atime.patch
Rename UPDATE_ATIME to update_atime

per-cpu-disk-stats.patch
Make diskstats per-cpu using kmalloc_percpu

presto_get_sb-fix.patch
fix presto_get_sb() return value and oops.

on_each_cpu.patch
fix preempt-issues with smp_call_function()

on_each_cpu-ldt-cleanup.patch

notsc-panic.patch
Don't panic if TSC is enabled and notsc is used

alloc_pages_cleanup.patch
clean up redundant code for alloc_pages

ext2-handle-htree-flag.patch
ext2: clear ext3 htree flag on directories

sched-a3.patch
"HT scheduler", sched-2.5.63-A3

mpparse-typo-fix.patch
fix typo in arch/i386/kernel/mpparse.c in printk

i386-no-swap-fix.patch
allow CONFIG_SWAP=n for i386

remove-hugetlb_key.patch
remove dead hugetlb_key forward decl

hugetlbpage-doc-update.patch
hugetlbpage documentation update

hugetlb-valid-page-ranges.patch
hugetlb: fix MAP_FIXED handling




2003-02-27 21:12:01

by Con Kolivas

[permalink] [raw]
Subject: Rising io_load results Re: 2.5.63-mm1


I mentioned this previously; it's still happening.

This started some time around 2.5.62-mm3 with the io_load results on contest
benchmarking (http://contest.kolivas.org) rising with each run. It still
occurs with 2.5.63-mm1 regardless of which elevator is specified. This is the
io load result time(seconds) for 6 consecutive runs in compile time:

111
147
221
284
334
358

/proc/meminfo after 6 runs and mem flushing:

MemTotal: 256156 kB
MemFree: 238708 kB
Buffers: 2320 kB
Cached: 1552 kB
SwapCached: 1780 kB
Active: 5876 kB
Inactive: 2120 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 256156 kB
LowFree: 238708 kB
SwapTotal: 4194272 kB
SwapFree: 4192416 kB
Dirty: 28 kB
Writeback: 0 kB
Mapped: 4294923652 kB
Slab: 4872 kB
Committed_AS: 7032 kB
PageTables: 200 kB
ReverseMaps: 631

I am refraining from publishing any benchmark results with this happening. It
doesn't seem to occur on 2.5.63

Con

2003-02-27 21:37:16

by Andrew Morton

[permalink] [raw]
Subject: Re: Rising io_load results Re: 2.5.63-mm1

Con Kolivas <[email protected]> wrote:
>
>
> This started some time around 2.5.62-mm3 with the io_load results on contest
> benchmarking (http://contest.kolivas.org) rising with each run.
> ...
> Mapped: 4294923652 kB

Well that's gotta hurt. This metric is used in making writeback decisions.
Probably the objrmap patch.

2003-02-27 21:51:05

by Dave McCracken

[permalink] [raw]
Subject: Re: Rising io_load results Re: 2.5.63-mm1


--On Thursday, February 27, 2003 13:44:03 -0800 Andrew Morton
<[email protected]> wrote:

>> ...
>> Mapped: 4294923652 kB
>
> Well that's gotta hurt. This metric is used in making writeback
> decisions. Probably the objrmap patch.

Oops. You're right. Here's a patch to fix it.

Dave McCracken

======================================================================
Dave McCracken IBM Linux Base Kernel Team 1-512-838-3059
[email protected] T/L 678-3059


Attachments:
(No filename) (517.00 B)
objmapped-2.5.63-1.diff (337.00 B)
Download all attachments

2003-02-27 22:23:44

by Andrew Morton

[permalink] [raw]
Subject: Re: Rising io_load results Re: 2.5.63-mm1

Dave McCracken <[email protected]> wrote:
>
>
> --On Thursday, February 27, 2003 13:44:03 -0800 Andrew Morton
> <[email protected]> wrote:
>
> >> ...
> >> Mapped: 4294923652 kB
> >
> > Well that's gotta hurt. This metric is used in making writeback
> > decisions. Probably the objrmap patch.
>
> Oops. You're right. Here's a patch to fix it.
>

Thanks.

I'm just looking at page_mapped(). It is now implicitly assuming that the
architecture's representation of a zero-count atomic_t is all-bits-zero.

This is not true on sparc32 if some other CPU is in the middle of an
atomic_foo() against that counter. Maybe the assumption is false on other
architectures too.

So page_mapped() really should be performing an atomic_read() if that is
appropriate to the particular page. I guess this involves testing
page->mapping. Which is stable only when the page is locked or
mapping->page_lock is held.

It appears that all page_mapped() callers are inside lock_page() at present,
so a quick audit and addition of a comment would be appropriate there please.


2003-02-27 23:47:21

by Con Kolivas

[permalink] [raw]
Subject: Re: Rising io_load results Re: 2.5.63-mm1

On Fri, 28 Feb 2003 09:01 am, Dave McCracken wrote:
> --On Thursday, February 27, 2003 13:44:03 -0800 Andrew Morton
>
> <[email protected]> wrote:
> >> ...
> >> Mapped: 4294923652 kB
> >
> > Well that's gotta hurt. This metric is used in making writeback
> > decisions. Probably the objrmap patch.
>
> Oops. You're right. Here's a patch to fix it.

Thanks.

This looks better after a run:

MemTotal: 256156 kB
MemFree: 189448 kB
Buffers: 46744 kB
Cached: 4176 kB
SwapCached: 0 kB
Active: 51840 kB
Inactive: 1768 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 256156 kB
LowFree: 189448 kB
SwapTotal: 4194272 kB
SwapFree: 4194272 kB
Dirty: 0 kB
Writeback: 0 kB
Mapped: 4546752 kB
Slab: 8468 kB
Committed_AS: 7032 kB
PageTables: 200 kB
ReverseMaps: 662

Con

2003-02-28 00:00:08

by Andrew Morton

[permalink] [raw]
Subject: Re: Rising io_load results Re: 2.5.63-mm1

Con Kolivas <[email protected]> wrote:
>
> On Fri, 28 Feb 2003 09:01 am, Dave McCracken wrote:
> > --On Thursday, February 27, 2003 13:44:03 -0800 Andrew Morton
> >
> > <[email protected]> wrote:
> > >> ...
> > >> Mapped: 4294923652 kB
> > >
> > > Well that's gotta hurt. This metric is used in making writeback
> > > decisions. Probably the objrmap patch.
> >
> > Oops. You're right. Here's a patch to fix it.
>
> Thanks.
>
> This looks better after a run:
>
> MemTotal: 256156 kB
> ...
> Mapped: 4546752 kB

No, it is still wrong. Mapped cannot exceed MemTotal.


2003-02-28 00:06:35

by Ed Tomlinson

[permalink] [raw]
Subject: Re: 2.5.63-mm1

On February 27, 2003 05:59 am, Andrew Morton wrote:
> . Tons of changes to the anticipatory scheduler. It may not be working
> very well at present. Please use "elevator=deadline" if it causes
> problems.

The anticipatory scheduler hangs here at the same place it did in 62-mm2,
cfq continues to work fine. A sysrq+T of the hang follows:

Hope this helps,
Ed Tomlinson

SysRq : Show State

free sibling
task PC stack pid father child younger older
swapper D DFF8FB20 11876 1 0 2 (L-TLB)
Call Trace:
[<c01143aa>] io_schedule+0xe/0x18
[<c012a105>] __lock_page+0x8d/0xac
[<c0114ba8>] autoremove_wake_function+0x0/0x38
[<c0114ba8>] autoremove_wake_function+0x0/0x38
[<c012a58e>] do_generic_mapping_read+0x13a/0x340
[<c012aa5a>] __generic_file_aio_read+0x1c6/0x1e4
[<c012a794>] file_read_actor+0x0/0x100
[<c012ab3f>] generic_file_read+0x7f/0x9c
[<c015400c>] dput+0x1c/0x1a0
[<c015400c>] dput+0x1c/0x1a0
[<c012ff37>] kmem_cache_alloc+0x23/0x60
[<c0140e57>] vfs_read+0xab/0x150
[<c01498c4>] kernel_read+0x3c/0x48
[<c0161f82>] load_elf_binary+0x2f2/0xbbc
[<c012ab3f>] generic_file_read+0x7f/0x9c
[<c012f91c>] cache_init_objs+0x34/0x60
[<c012d2af>] buffered_rmqueue+0xfb/0x108
[<c012d33c>] __alloc_pages+0x80/0x264
[<c014a4ad>] search_binary_handler+0xad/0x23c
[<c0161c90>] load_elf_binary+0x0/0xbbc
[<c014a786>] do_execve+0x14a/0x1a8
[<c0107750>] sys_execve+0x2c/0x60
[<c0108c47>] syscall_call+0x7/0xb
[<c0105175>] init+0x109/0x174
[<c010506c>] init+0x0/0x174
[<c0107019>] kernel_thread_helper+0x5/0xc

ksoftirqd/0 S DFF8A000 4294963836 2 1 3 (L-TLB)
Call Trace:
[<c011a1fc>] ksoftirqd+0x24/0xa4
[<c011a23e>] ksoftirqd+0x66/0xa4
[<c011a1d8>] ksoftirqd+0x0/0xa4
[<c0107019>] kernel_thread_helper+0x5/0xc

events/0 D DFF89ED4 4294953708 3 1 12 4 2 (L-TLB)
Call Trace:
[<c0113985>] wait_for_completion+0x9d/0xe0
[<c0113788>] default_wake_function+0x0/0x18
[<c0113788>] default_wake_function+0x0/0x18
[<c0116363>] do_fork+0x113/0x14c
[<c010708e>] kernel_thread+0x6e/0x84
[<c0122b50>] __call_usermodehelper+0x0/0x58
[<c0122a70>] ____call_usermodehelper+0x0/0x94
[<c0107014>] kernel_thread_helper+0x0/0xc
[<c0122b80>] __call_usermodehelper+0x30/0x58
[<c0122a70>] ____call_usermodehelper+0x0/0x94
[<c012304f>] worker_thread+0x1a3/0x274
[<c0122eac>] worker_thread+0x0/0x274
[<c0122b50>] __call_usermodehelper+0x0/0x58
[<c0113788>] default_wake_function+0x0/0x18
[<c0113788>] default_wake_function+0x0/0x18
[<c0107019>] kernel_thread_helper+0x5/0xc

khubd D DFD61D94 4292690652 4 1 5 3 (L-TLB)
Call Trace:
[<c01136a0>] do_schedule+0x2a0/0x348
[<c0113985>] wait_for_completion+0x9d/0xe0
[<c0113788>] default_wake_function+0x0/0x18
[<c0113788>] default_wake_function+0x0/0x18
[<c0122cb2>] call_usermodehelper+0x10a/0x118
[<c01f44d8>] usb_hotplug+0x0/0x1c4
[<c0122b50>] __call_usermodehelper+0x0/0x58
[<c0122b50>] __call_usermodehelper+0x0/0x58
[<c01b5a42>] do_hotplug+0x1c2/0x1ec
[<c01b5a91>] dev_hotplug+0x25/0x30
[<c01f44d8>] usb_hotplug+0x0/0x1c4
[<c01b3d9a>] device_add+0x112/0x148
[<c01f4ef6>] usb_new_device+0x322/0x480
[<c0117086>] printk+0x122/0x148
[<c01f6a9f>] usb_hub_port_connect_change+0x233/0x2c4
[<c01f6c69>] usb_hub_events+0x139/0x2c8
[<c01f6e25>] usb_hub_thread+0x2d/0xd4
[<c01f6df8>] usb_hub_thread+0x0/0xd4
[<c0113788>] default_wake_function+0x0/0x18
[<c0107019>] kernel_thread_helper+0x5/0xc

pdflush S DFD2FFD4 4292485228 5 1 6 4 (L-TLB)
Call Trace:
[<c012e7e5>] __pdflush+0x95/0x1b0
[<c012e900>] pdflush+0x0/0x14
[<c012e90f>] pdflush+0xf/0x14
[<c0107019>] kernel_thread_helper+0x5/0xc

pdflush S DFD2DFD4 14388 6 1 7 5 (L-TLB)
Call Trace:
[<c012e7e5>] __pdflush+0x95/0x1b0
[<c012e900>] pdflush+0x0/0x14
[<c012e90f>] pdflush+0xf/0x14
[<c0107019>] kernel_thread_helper+0x5/0xc

kswapd0 S DFD29F44 4294958912 7 1 8 6 (L-TLB)
Call Trace:
[<c01328fb>] kswapd+0xcb/0xf0
[<c0132830>] kswapd+0x0/0xf0
[<c0109d26>] math_state_restore+0x2a/0x3c
[<c0108f05>] device_not_available+0x25/0x2a
[<c010e3f5>] save_init_fpu+0x1d/0x3c
[<c0113770>] preempt_schedule+0x28/0x40
[<c0112eb3>] schedule_tail+0x2f/0x94
[<c0108b06>] ret_from_fork+0x6/0x20
[<c0114ba8>] autoremove_wake_function+0x0/0x38
[<c0114ba8>] autoremove_wake_function+0x0/0x38
[<c0107019>] kernel_thread_helper+0x5/0xc

aio/0 S DFFE8EA0 4294952400 8 1 9 7 (L-TLB)
Call Trace:
[<c0122fa8>] worker_thread+0xfc/0x274
[<c0122eac>] worker_thread+0x0/0x274
[<c0113788>] default_wake_function+0x0/0x18
[<c0113788>] default_wake_function+0x0/0x18
[<c0107019>] kernel_thread_helper+0x5/0xc

kpnpbiosd Z DFFEE800 4294880232 9 1 10 8 (L-TLB)
Call Trace:
[<c0118b99>] do_exit+0x41d/0x428
[<c01aca44>] pnp_dock_thread+0x0/0xf4
[<c0118bbb>] complete_and_exit+0x17/0x18
[<c01acadc>] pnp_dock_thread+0x98/0xf4
[<c01aca44>] pnp_dock_thread+0x0/0xf4
[<c0107019>] kernel_thread_helper+0x5/0xc

kseriod S DFC44000 4294030016 10 1 11 9 (L-TLB)
Call Trace:
[<c02073e7>] serio_thread+0x9f/0x12c
[<c0207348>] serio_thread+0x0/0x12c
[<c0113788>] default_wake_function+0x0/0x18
[<c0107019>] kernel_thread_helper+0x5/0xc

reiserfs/0 S DFCBD460 8080 11 1 10 (L-TLB)
Call Trace:
[<c0122fa8>] worker_thread+0xfc/0x274
[<c0122eac>] worker_thread+0x0/0x274
[<c0113788>] default_wake_function+0x0/0x18
[<c0113788>] default_wake_function+0x0/0x18
[<c0107019>] kernel_thread_helper+0x5/0xc

events/0 D DFAC7A30 4294892756 12 3 (L-TLB)
Call Trace:
[<c01143aa>] io_schedule+0xe/0x18
[<c012a105>] __lock_page+0x8d/0xac
[<c0114ba8>] autoremove_wake_function+0x0/0x38
[<c0114ba8>] autoremove_wake_function+0x0/0x38
[<c012a58e>] do_generic_mapping_read+0x13a/0x340
[<c012aa5a>] __generic_file_aio_read+0x1c6/0x1e4
[<c012a794>] file_read_actor+0x0/0x100
[<c017f6b0>] reiserfs_get_block+0x0/0x11cc
[<c012ab3f>] generic_file_read+0x7f/0x9c
[<c015400c>] dput+0x1c/0x1a0
[<c015400c>] dput+0x1c/0x1a0
[<c012ff37>] kmem_cache_alloc+0x23/0x60
[<c0140e57>] vfs_read+0xab/0x150
[<c01498c4>] kernel_read+0x3c/0x48
[<c0161f82>] load_elf_binary+0x2f2/0xbbc
[<c012ab3f>] generic_file_read+0x7f/0x9c
[<c014bf83>] real_lookup+0x67/0xd0
[<c014c254>] do_lookup+0x48/0x84
[<c015400c>] dput+0x1c/0x1a0
[<c014c95a>] link_path_walk+0x6ca/0x848
[<c014a4ad>] search_binary_handler+0xad/0x23c
[<c0161c90>] load_elf_binary+0x0/0xbbc
[<c01614c1>] load_script+0x1d1/0x1e0
[<c012d2af>] buffered_rmqueue+0xfb/0x108
[<c012d33c>] __alloc_pages+0x80/0x264
[<c014a4ad>] search_binary_handler+0xad/0x23c
[<c01612f0>] load_script+0x0/0x1e0
[<c014a786>] do_execve+0x14a/0x1a8
[<c0107750>] sys_execve+0x2c/0x60
[<c0108c47>] syscall_call+0x7/0xb
[<c0122ae8>] ____call_usermodehelper+0x78/0x94
[<c0122a70>] ____call_usermodehelper+0x0/0x94
[<c0107019>] kernel_thread_helper+0x5/0xc



2003-02-28 00:17:54

by Con Kolivas

[permalink] [raw]
Subject: Re: Rising io_load results Re: 2.5.63-mm1

On Fri, 28 Feb 2003 11:06 am, Andrew Morton wrote:
> Con Kolivas <[email protected]> wrote:
> > On Fri, 28 Feb 2003 09:01 am, Dave McCracken wrote:
> > > --On Thursday, February 27, 2003 13:44:03 -0800 Andrew Morton
> > >
> > > <[email protected]> wrote:
> > > >> ...
> > > >> Mapped: 4294923652 kB
> > > >
> > > > Well that's gotta hurt. This metric is used in making writeback
> > > > decisions. Probably the objrmap patch.
> > >
> > > Oops. You're right. Here's a patch to fix it.
> >
> > Thanks.
> >
> > This looks better after a run:
> >
> > MemTotal: 256156 kB
> > ...
> > Mapped: 4546752 kB
>
> No, it is still wrong. Mapped cannot exceed MemTotal.

Hmm a few more runs and io_load starts rising again and this is the meminfo in
the middle of a run:

MemTotal: 256156 kB
MemFree: 26564 kB
Buffers: 11300 kB
Cached: 198048 kB
SwapCached: 0 kB
Active: 7164 kB
Inactive: 204736 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 256156 kB
LowFree: 26564 kB
SwapTotal: 4194272 kB
SwapFree: 4194272 kB
Dirty: 5780 kB
Writeback: 0 kB
Mapped: 6000680 kB
Slab: 13056 kB
Committed_AS: 7040 kB
PageTables: 200 kB
ReverseMaps: 664

Con

2003-02-28 00:39:33

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.5.63-mm1

Ed Tomlinson <[email protected]> wrote:
>
> On February 27, 2003 05:59 am, Andrew Morton wrote:
> > . Tons of changes to the anticipatory scheduler. It may not be working
> > very well at present. Please use "elevator=deadline" if it causes
> > problems.
>
> The anticipatory scheduler hangs here at the same place it did in 62-mm2,
> cfq continues to work fine. A sysrq+T of the hang follows:

I must say, Ed: you have an eerie ability to break stuff.

Please send me your .config.

> free sibling
> task PC stack pid father child younger older
> swapper D DFF8FB20 11876 1 0 2 (L-TLB)

Interesting amount of free stack you have there. You broke show_task() too!


2003-02-28 07:36:08

by Duncan Sands

[permalink] [raw]
Subject: Re: Rising io_load results Re: 2.5.63-mm1

Hi Con, are you sure this is not the same for 2.5.63?
I left 2.5.63 running over night (doing nothing but run
KDE), and in the morning it was swapping heavily.
About 200MB was swapped out and this did not reduce
with usage. According to top, 10% of memory was being
used by a Konsole with nothing in it (could be a memory
leak in Konsole). After half an hour I gave up - it was
too unusable. Maybe -mm1 just accentuates a problem
that is already there in 2.5.63.

Ciao,

Duncan.

2003-02-28 07:55:27

by Andrew Morton

[permalink] [raw]
Subject: Re: Rising io_load results Re: 2.5.63-mm1

Duncan Sands <[email protected]> wrote:
>
> Hi Con, are you sure this is not the same for 2.5.63?
> I left 2.5.63 running over night (doing nothing but run
> KDE), and in the morning it was swapping heavily.
> About 200MB was swapped out and this did not reduce
> with usage. According to top, 10% of memory was being
> used by a Konsole with nothing in it (could be a memory
> leak in Konsole). After half an hour I gave up - it was
> too unusable. Maybe -mm1 just accentuates a problem
> that is already there in 2.5.63.
>

Please take a snapshot of /proc/meminfo and /proc/slabinfo
if anything like this happens.

2003-02-28 12:06:51

by steven roemen

[permalink] [raw]
Subject: Re: 2.5.63-mm1


the kernel oopses when i2c is compiled into the kernel with -mm1, and
-mm1 with dave mccraken's patch.

also when i remove i2c from the kernel and boot into it with AS as the
elevator, the load (via top) starts at 2.00, yet the processors aren't
loaded very much at all. is this a known issue(this is the first -mm
kernel i've run)?

-steve

On Thu, 2003-02-27 at 04:59, Andrew Morton wrote:
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.5/2.5.63/2.5.63-mm1/
>
> . Tons of changes to the anticipatory scheduler. It may not be working
> very well at present. Please use "elevator=deadline" if it causes
> problems.
>
> . Updated smalldevfs patch.
>
> . A fix for the VMA-based reverse mapping patch.
>
> . Added Ingo's latest CPU scheduler update.
>
> . Lots of random fixes.
>
>
>
> linus.patch
>
> Latest from Linus
>
> -initial-jiffies.patch
> -user-times-jiffies-wrap-fix.patch
> -put_page-speedup.patch
> -slab-batchcount-limit-fix.patch
> -crc32-speedup-2.patch
> -flush-tlb-all-2.patch
> -linux-2.5.62-early_ioremap_A0.patch
> -linux-2.5.62-x440disco_A0.patch
> -use-find_get_page.patch
> -irda-interruptible-sleep.patch
> -dget-BUG.patch
> -disk-accounting-fix.patch
> -hugh-inode-pruning-race-fix.patch
> -kill-bogus-wakeup-messge.patch
> -dont-sync-with-stopped-pdflush.patch
> -irq-balance-disable-fix.patch
> -oom-killer-dont-spin-on-same-task.patch
> -add-missing-global_flush_tlb-calls.patch
> -ext3-O_SYNC-speedup.patch
> -remove-MAX_BLKDEV-from-genhd.patch
>
> Merged
>
> +separate.patch
>
> My contribution to the spelling bee.
>
> +rpc_rmdir-fix.patch
>
> Fix the NFS oops
>
> +ppc64-scruffiness.patch
>
> Fix some warnings
>
> -reiserfs_file_write-4.patch
> +reiserfs_file_write-5.patch
>
> Updated (I don't think it changed)
>
> +limit-write-latency.patch
>
> Fix potential source of write-vs-write latency in VFS
>
> +lockd-lockup-fix-2.patch
>
> Updated patch from Neil for an NFS server deadlock
>
> +loop-hack.patch
>
> Fix an OOM and oops in loop
>
> +flock-fix.patch
>
> File locking fix from Matthew
>
> +sysfs-dget-fix-2.patch
>
> Fix a sysfs dentry race (this isn't right)
>
> +irq-sharing-fix.patch
>
> Fix SA_INTERRUPT for shared interrupts
>
> +anticipation_is_killing_me.patch
> +as-fix-hughs-problem.patch
> +as-cleanup.patch
> +as-start-stop-anticipation-helpers.patch
> +as-cleanup-2.patch
> +as-cleanup-3.patch
> +as-cleanup-3-write-latency-fix.patch
> +as-handle-exitted-tasks.patch
> +as-handle-exitted-tasks-fix.patch
> +as-no-plugging-and-cleanups.patch
> +as-remove-debug.patch
> +as-track-queued-reads.patch
> +as-accounting-fix.patch
> +as-nr_reads-fix.patch
> +as-tuning.patch
> +as-disable-nr_reads.patch
>
> Anticipatory scheduler work
>
> smalldevfs.patch
>
> Updated
>
> -smalldevfs-dcache_rcu-fix.patch
>
> Folded into smalldevfs.patch
>
> +objrmap-X-fix.patch
>
> Fix VMA-based reverse mapping
>
> +per-cpu-disk-stats.patch
>
> Use per-cpu data for disk accounting
>
> +presto_get_sb-fix.patch
>
> Fix an intermezzo oops
>
> +on_each_cpu.patch
> +on_each_cpu-ldt-cleanup.patch
>
> preempt-safety for smp_call_function()
>
> +notsc-panic.patch
>
> x86 TSC cleanup
>
> +alloc_pages_cleanup.patch
>
> Code consolidation
>
> +ext2-handle-htree-flag.patch
>
> ext2 htree back-compatibility
>
> +sched-a3.patch
>
> CPU scheduler update
>
> +mpparse-typo-fix.patch
>
> Fix a printk bug
>
> +i386-no-swap-fix.patch
>
> Fix ia32 CONFIG_SWAP=n
>
> +remove-hugetlb_key.patch
> +hugetlbpage-doc-update.patch
> +hugetlb-valid-page-ranges.patch
>
> Hugetlbpage work
>
>
>
>
> All 88 patches:
>
> linus.patch
> Latest from Linus
>
> separate.patch
>
> mm.patch
> add -mmN to EXTRAVERSION
>
> rpc_rmdir-fix.patch
> Fix nfs oops during mount
>
> ppc64-reloc_hide.patch
>
> ppc64-pci-patch.patch
> Subject: pci patch
>
> ppc64-e100-fix.patch
> fix e100 for big-endian machines
>
> ppc64-aio-32bit-emulation.patch
> 32/64bit emulation for aio
>
> ppc64-64-bit-exec-fix.patch
> Subject: 64bit exec
>
> ppc64-scruffiness.patch
> Fix some PPC64 compile warnings
>
> sym-do-160.patch
> make the SYM driver do 160 MB/sec
>
> kgdb.patch
>
> nfsd-disable-softirq.patch
> Fix race in svcsock.c in 2.5.61
>
> report-lost-ticks.patch
> make lost-tick detection more informative
>
> devfs-fix.patch
>
> ptrace-flush.patch
> cache flushing in the ptrace code
>
> buffer-debug.patch
> buffer.c debugging
>
> warn-null-wakeup.patch
>
> ext3-truncate-ordered-pages.patch
> ext3: explicitly free truncated pages
>
> deadline-dispatching-fix.patch
> deadline IO scheduler dispatching fix
>
> nfs-unstable-pages.patch
> "unstable" page accounting for NFS.
>
> limit-write-latency.patch
>
> reiserfs_file_write-5.patch
>
> tcp-wakeups.patch
> Use fast wakeups in TCP/IPV4
>
> lockd-lockup-fix-2.patch
> Subject: Re: Fw: Re: 2.4.20 NFS server lock-up (SMP)
>
> rcu-stats.patch
> RCU statistics reporting
>
> ext3-journalled-data-assertion-fix.patch
> Remove incorrect assertion from ext3
>
> nfs-speedup.patch
>
> nfs-oom-fix.patch
> nfs oom fix
>
> sk-allocation.patch
> Subject: Re: nfs oom
>
> nfs-more-oom-fix.patch
>
> nfs-sendfile.patch
> Implement sendfile() for NFS
>
> rpciod-atomic-allocations.patch
> Make rcpiod use atomic allocations
>
> linux-isp.patch
>
> isp-update-1.patch
>
> remove-unused-congestion-stuff.patch
> Subject: [PATCH] remove unused congestion stuff
>
> aic-makefile-fix.patch
> aicasm Makefile fix
>
> loop-hack.patch
> loop: Fix OOM and oops
>
> atm_dev_sem.patch
> convert atm_dev_lock from spinlock to semaphore
>
> flock-fix.patch
> flock fixes for 2.5.62
>
> sysfs-dget-fix-2.patch
>
> irq-sharing-fix.patch
> fix irq sharing and SA_INTERRUPT on x86
>
> as-iosched.patch
> anticipatory I/O scheduler
>
> as-comments-and-tweaks.patch
> antsched: commentary and
>
> as-hz-1000-fix.patch
> Fix anticipatory scheduler for HZ=100
>
> as-tidy-up-rename.patch
> tidy up AS rename
>
> anticipation_is_killing_me.patch
>
> as-update-1.patch
> AS update
>
> as-break-anticipation-on-write.patch
> AS break on write
>
> as-break-if-readahead.patch
> detect overlapping reads and writes
>
> as-fix-hughs-problem.patch
> Add a pointer to the queue into struct as_data
>
> as-cleanup.patch
> anticipatory scheduler cleanups
>
> as-start-stop-anticipation-helpers.patch
> AS: add anticipation stop/start helper functions
>
> as-cleanup-2.patch
> Subject: [PATCH] some cleanups 2
>
> as-cleanup-3.patch
> AS: more cleanups
>
> as-cleanup-3-write-latency-fix.patch
> Fix as-cleanup-3
>
> as-handle-exitted-tasks.patch
>
> as-handle-exitted-tasks-fix.patch
> fix for as IO contexts
>
> as-no-plugging-and-cleanups.patch
> AS no plugging + cleanups
>
> as-remove-debug.patch
>
> as-track-queued-reads.patch
> AS: track queued reads
>
> as-accounting-fix.patch
> AS: track queued reads (fix)
>
> as-nr_reads-fix.patch
> AS: read accounting fix
>
> as-tuning.patch
> AS: tuning
>
> as-disable-nr_reads.patch
> AS: disable per-process in-flight read logic
>
> readahead-shrink-to-zero.patch
> Allow VFS readahead to fall to zero
>
> cfq-2.patch
> CFQ scheduler, #2
>
> smalldevfs.patch
> smalldevfs
>
> objrmap-2.5.62-5.patch
> object-based rmap
>
> objrmap-X-fix.patch
> objrmap fix for X
>
> oprofile-up-fix.patch
> fix oprofile on UP (lockless sync)
>
> update_atime-speedup.patch
> speed up update_atime()
>
> ext2-update_atime_speedup.patch
> Use one_sec_update_atime in ext2
>
> ext3-update_atime_speedup.patch
> Use one_sec_update_atime in ext2
>
> UPDATE_ATIME-to-update_atime.patch
> Rename UPDATE_ATIME to update_atime
>
> per-cpu-disk-stats.patch
> Make diskstats per-cpu using kmalloc_percpu
>
> presto_get_sb-fix.patch
> fix presto_get_sb() return value and oops.
>
> on_each_cpu.patch
> fix preempt-issues with smp_call_function()
>
> on_each_cpu-ldt-cleanup.patch
>
> notsc-panic.patch
> Don't panic if TSC is enabled and notsc is used
>
> alloc_pages_cleanup.patch
> clean up redundant code for alloc_pages
>
> ext2-handle-htree-flag.patch
> ext2: clear ext3 htree flag on directories
>
> sched-a3.patch
> "HT scheduler", sched-2.5.63-A3
>
> mpparse-typo-fix.patch
> fix typo in arch/i386/kernel/mpparse.c in printk
>
> i386-no-swap-fix.patch
> allow CONFIG_SWAP=n for i386
>
> remove-hugetlb_key.patch
> remove dead hugetlb_key forward decl
>
> hugetlbpage-doc-update.patch
> hugetlbpage documentation update
>
> hugetlb-valid-page-ranges.patch
> hugetlb: fix MAP_FIXED handling
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2003-02-28 12:13:07

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.5.63-mm1

steven roemen <[email protected]> wrote:
>
>
> the kernel oopses when i2c is compiled into the kernel with -mm1, and
> -mm1 with dave mccraken's patch.

Please send a full report on this to the mailing list.

> also when i remove i2c from the kernel and boot into it with AS as the
> elevator, the load (via top) starts at 2.00, yet the processors aren't
> loaded very much at all. is this a known issue(this is the first -mm
> kernel i've run)?

Run `ps aux' when the system is idle and see if there are any tasks
in "D" state.

2003-02-28 12:36:01

by Hugh Dickins

[permalink] [raw]
Subject: Re: Rising io_load results Re: 2.5.63-mm1

On Thu, 27 Feb 2003, Andrew Morton wrote:
>
> No, it is still wrong. Mapped cannot exceed MemTotal.

It needs this in addition to Dave's patch from yesterday:

--- 2.5.63-objfix-1/mm/rmap.c Thu Feb 27 23:37:28 2003
+++ 2.5.63-objfix-2/mm/rmap.c Fri Feb 28 12:33:58 2003
@@ -349,7 +349,8 @@
BUG();
if (atomic_read(&page->pte.mapcount) == 0)
BUG();
- atomic_dec(&page->pte.mapcount);
+ if (atomic_dec_and_test(&page->pte.mapcount))
+ dec_page_state(nr_mapped);
return;
}


2003-02-28 15:46:37

by Dave McCracken

[permalink] [raw]
Subject: Re: Rising io_load results Re: 2.5.63-mm1


--On Friday, February 28, 2003 12:48:06 +0000 Hugh Dickins
<[email protected]> wrote:

> On Thu, 27 Feb 2003, Andrew Morton wrote:
>>
>> No, it is still wrong. Mapped cannot exceed MemTotal.
>
> It needs this in addition to Dave's patch from yesterday:
>
> --- 2.5.63-objfix-1/mm/rmap.c Thu Feb 27 23:37:28 2003
> +++ 2.5.63-objfix-2/mm/rmap.c Fri Feb 28 12:33:58 2003
> @@ -349,7 +349,8 @@
> BUG();
> if (atomic_read(&page->pte.mapcount) == 0)
> BUG();
> - atomic_dec(&page->pte.mapcount);
> + if (atomic_dec_and_test(&page->pte.mapcount))
> + dec_page_state(nr_mapped);
> return;
> }

D'oh. I should have seen that one. Thanks.

Dave McCracken

======================================================================
Dave McCracken IBM Linux Base Kernel Team 1-512-838-3059
[email protected] T/L 678-3059

2003-03-03 20:56:08

by Dave McCracken

[permalink] [raw]
Subject: [PATCH 2.5.63] Teach page_mapped about the anon flag


--On Thursday, February 27, 2003 14:24:50 -0800 Andrew Morton
<[email protected]> wrote:

> I'm just looking at page_mapped(). It is now implicitly assuming that the
> architecture's representation of a zero-count atomic_t is all-bits-zero.
>
> This is not true on sparc32 if some other CPU is in the middle of an
> atomic_foo() against that counter. Maybe the assumption is false on other
> architectures too.
>
> So page_mapped() really should be performing an atomic_read() if that is
> appropriate to the particular page. I guess this involves testing
> page->mapping. Which is stable only when the page is locked or
> mapping->page_lock is held.
>
> It appears that all page_mapped() callers are inside lock_page() at
> present, so a quick audit and addition of a comment would be appropriate
> there please.

I'm not at all confident that page_mapped() is adequately protected.
Here's a patch that explicitly handles the atomic_t case.

Dave McCracken

======================================================================
Dave McCracken IBM Linux Base Kernel Team 1-512-838-3059
[email protected] T/L 678-3059


Attachments:
(No filename) (1.15 kB)
objfix-2.5.63-1.diff (718.00 B)
Download all attachments

2003-03-03 21:05:42

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 2.5.63] Teach page_mapped about the anon flag

Dave McCracken <[email protected]> wrote:
>
>
> --On Thursday, February 27, 2003 14:24:50 -0800 Andrew Morton
> <[email protected]> wrote:
>
> > I'm just looking at page_mapped(). It is now implicitly assuming that the
> > architecture's representation of a zero-count atomic_t is all-bits-zero.
> >
> > This is not true on sparc32 if some other CPU is in the middle of an
> > atomic_foo() against that counter. Maybe the assumption is false on other
> > architectures too.
> >
> > So page_mapped() really should be performing an atomic_read() if that is
> > appropriate to the particular page. I guess this involves testing
> > page->mapping. Which is stable only when the page is locked or
> > mapping->page_lock is held.
> >
> > It appears that all page_mapped() callers are inside lock_page() at
> > present, so a quick audit and addition of a comment would be appropriate
> > there please.
>
> I'm not at all confident that page_mapped() is adequately protected.

It is. All callers which need to be 100% accurate are under
pte_chain_lock().

> Here's a patch that explicitly handles the atomic_t case.

OK.. But it increases dependency on PageAnon. Wasn't the plan to remove
that at some time?

2003-03-03 21:14:32

by Dave McCracken

[permalink] [raw]
Subject: Re: [PATCH 2.5.63] Teach page_mapped about the anon flag


--On Monday, March 03, 2003 13:12:10 -0800 Andrew Morton <[email protected]>
wrote:

> It is. All callers which need to be 100% accurate are under
> pte_chain_lock().

Hmm, good point. Some places may not need perfect accuracy. Also, if it
gives a false positive it means someone else is doing an atomic op on it,
so it's likely to be in transition to/from true anyway.

Ok, you've convinced me. Please ignore the patch. I'll hang onto it in
case we get proved wrong at some point.

Dave

======================================================================
Dave McCracken IBM Linux Base Kernel Team 1-512-838-3059
[email protected] T/L 678-3059

2003-03-03 21:29:06

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 2.5.63] Teach page_mapped about the anon flag

Dave McCracken <[email protected]> wrote:
>
>
> --On Monday, March 03, 2003 13:12:10 -0800 Andrew Morton <[email protected]>
> wrote:
>
> > It is. All callers which need to be 100% accurate are under
> > pte_chain_lock().
>
> Hmm, good point. Some places may not need perfect accuracy. Also, if it
> gives a false positive it means someone else is doing an atomic op on it,
> so it's likely to be in transition to/from true anyway.
>
> Ok, you've convinced me. Please ignore the patch. I'll hang onto it in
> case we get proved wrong at some point.

We do need a patch I think. page_mapped() is still assuming that an
all-bits-zero atomic_t corresponds to a zero-value atomic_t.

This does appear to be true for all supported architectures, but it's a bit
grubby.

2003-03-03 21:42:37

by Dave McCracken

[permalink] [raw]
Subject: Re: [PATCH 2.5.63] Teach page_mapped about the anon flag


--On Monday, March 03, 2003 13:35:39 -0800 Andrew Morton <[email protected]>
wrote:

> We do need a patch I think. page_mapped() is still assuming that an
> all-bits-zero atomic_t corresponds to a zero-value atomic_t.
>
> This does appear to be true for all supported architectures, but it's a
> bit grubby.

If that's ever not true then we need extra code to initialize/rezero that
field, since we assume it's zero on alloc, and the pte_chain code also
assumes it's zero for a new page.

Dave

======================================================================
Dave McCracken IBM Linux Base Kernel Team 1-512-838-3059
[email protected] T/L 678-3059

2003-03-03 22:08:49

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 2.5.63] Teach page_mapped about the anon flag

Dave McCracken <[email protected]> wrote:
>
>
> --On Monday, March 03, 2003 13:35:39 -0800 Andrew Morton <[email protected]>
> wrote:
>
> > We do need a patch I think. page_mapped() is still assuming that an
> > all-bits-zero atomic_t corresponds to a zero-value atomic_t.
> >
> > This does appear to be true for all supported architectures, but it's a
> > bit grubby.
>
> If that's ever not true then we need extra code to initialize/rezero that
> field, since we assume it's zero on alloc, and the pte_chain code also
> assumes it's zero for a new page.

Well why not make mapcount an "int" and move the places where it is modified
inside pte_chain_lock()?

That does not increase the number of atomic operations, and it makes me stop
wondering if this:

if (atomic_read(&page->pte.mapcount) == 0)
inc_page_state(nr_mapped);

is racy ;)

2003-03-04 18:21:49

by Dave McCracken

[permalink] [raw]
Subject: [PATCH 2.5.63] Make objrmap mapcount non-atomic


--On Monday, March 03, 2003 14:15:18 -0800 Andrew Morton <[email protected]>
wrote:

> Well why not make mapcount an "int" and move the places where it is
> modified inside pte_chain_lock()?
>
> That does not increase the number of atomic operations, and it makes me
> stop wondering if this:
>
> if (atomic_read(&page->pte.mapcount) == 0)
> inc_page_state(nr_mapped);
>
> is racy ;)

That would be entirely too easy a solution :)

You're entirely right, of course. Here's the patch that makes it an int
instead of atomic, with the appropriate locking.

Dave

======================================================================
Dave McCracken IBM Linux Base Kernel Team 1-512-838-3059
[email protected] T/L 678-3059


Attachments:
(No filename) (815.00 B)
objfix-2.5.63-2.diff (2.22 kB)
Download all attachments