2003-03-03 01:59:35

by Andrew Morton

[permalink] [raw]
Subject: 2.5.63-mm2


http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.5/2.5.63/2.5.63-mm2/

Mainly bugfixes, and solidification of the anticipatory scheduler.

The anticipatory scheduler has become significantly better - I believe that
all of the little regressions which had previously been identified are fixed
up now, with the exception of the OLTP-style database workload which is still
10% slower. Write-versus-write latency problems have been fixed, which is
important for ext3 behaviour during heavy writeback.

All the infrastructure for per-task IO pattern tracking is in now place so we
should be able to fix the OLTP slowdown without any requirement for manual
tuning.

We still have not located Ed Tomlinson's lost IO request. It's odd.



If you see this come out:

Slab corruption: start=cde0414c, expend=cde0418b, problemat=cde04162
Data: **********************7B ****************************************A5
Next: 71 F0 2C .A5 C2 0F 17 84 10 B3 CE 00 80 04 08 00 A0 05 08 8C 1C 90 CE 25 00 00 00 75 18 00 00
slab error in check_poison_obj(): cache `vm_area_struct': object was modified after freeing
Call Trace:
[<c013aabd>] __slab_error+0x21/0x28
[<c013acac>] check_poison_obj+0x104/0x110
[<c013c080>] kmem_cache_alloc+0x90/0x11c
[<c01449c0>] split_vma+0x28/0xcc
[<c0144b35>] do_munmap+0xd1/0x178
[<c0144c21>] sys_munmap+0x45/0x64
[<c0109143>] syscall_call+0x7/0xb

please do not report it. We know.

If this message comes out for any cache apart from vm_area_struct then please
_do_ report it.




Changes since 2.5.63-mm1:


-devfs-fix.patch

Dropped for now - conflicts with changes in Linus's tree

-nfs-unstable-pages.patch

Dropped for a while - it could impact testing of limit-write-latency.patch

-as-comments-and-tweaks.patch
-as-hz-1000-fix.patch
-as-tidy-up-rename.patch
-anticipation_is_killing_me.patch
-as-update-1.patch
-as-break-anticipation-on-write.patch
-as-break-if-readahead.patch
-as-fix-hughs-problem.patch
-as-cleanup.patch
-as-start-stop-anticipation-helpers.patch
-as-cleanup-2.patch
-as-cleanup-3.patch
-as-cleanup-3-write-latency-fix.patch
-as-handle-exitted-tasks.patch
-as-handle-exitted-tasks-fix.patch
-as-no-plugging-and-cleanups.patch
-as-remove-debug.patch
-as-track-queued-reads.patch
-as-accounting-fix.patch
-as-nr_reads-fix.patch
-as-tuning.patch
-as-disable-nr_reads.patch

Folded into anticipatory-scheduling.patch

+as-random-fixes.patch
+as-comment-fix.patch

More anticipatory scheduling work

+objrmap-nr_mapped-fix.patch
+objrmap-mapped-mem-fix-2.patch

Fix up the mapped page accounting

+sched-b3.patch

Latest HT-aware CPU scheduler patch

+cciss-startup-problem-fix.patch
+cciss-retry-bus-reset.patch
+cciss-add-cmd-type.patch
+cciss-getluninfo-ioctl.patch
+cciss-passthrough-ioctl.patch

cciss update

+show_task-free-stack-fix.patch

Fix some nonsense in the sysrq-t output. Probably we should just remove
the non-functional "free stack" accounting.

+use-after-free-check.patch

Full use-after-free checking in slab

+reiserfs-fix-memleaks.patch

Reiserfs fixes

+copy_page_range-invalid-page-fix.patch

Fix a crash when an app forks while holding a mmap of /dev/mem. This is
incomplete.




All 77 patches:

linus.patch
Latest from Linus

separate.patch

mm.patch
add -mmN to EXTRAVERSION

rpc_rmdir-fix.patch
Fix nfs oops during mount

ppc64-reloc_hide.patch

ppc64-pci-patch.patch
Subject: pci patch

ppc64-e100-fix.patch
fix e100 for big-endian machines

ppc64-aio-32bit-emulation.patch
32/64bit emulation for aio

ppc64-64-bit-exec-fix.patch
Subject: 64bit exec

ppc64-scruffiness.patch
Fix some PPC64 compile warnings

sym-do-160.patch
make the SYM driver do 160 MB/sec

kgdb.patch

nfsd-disable-softirq.patch
Fix race in svcsock.c in 2.5.61

report-lost-ticks.patch
make lost-tick detection more informative

ptrace-flush.patch
cache flushing in the ptrace code

buffer-debug.patch
buffer.c debugging

warn-null-wakeup.patch

ext3-truncate-ordered-pages.patch
ext3: explicitly free truncated pages

deadline-dispatching-fix.patch
deadline IO scheduler dispatching fix

limit-write-latency.patch
fix possible latency in balance_dirty_pages()

reiserfs_file_write-5.patch

tcp-wakeups.patch
Use fast wakeups in TCP/IPV4

lockd-lockup-fix-2.patch
Subject: Re: Fw: Re: 2.4.20 NFS server lock-up (SMP)

rcu-stats.patch
RCU statistics reporting

ext3-journalled-data-assertion-fix.patch
Remove incorrect assertion from ext3

nfs-speedup.patch

nfs-oom-fix.patch
nfs oom fix

sk-allocation.patch
Subject: Re: nfs oom

nfs-more-oom-fix.patch

nfs-sendfile.patch
Implement sendfile() for NFS

rpciod-atomic-allocations.patch
Make rcpiod use atomic allocations

linux-isp.patch

isp-update-1.patch

remove-unused-congestion-stuff.patch
Subject: [PATCH] remove unused congestion stuff

aic-makefile-fix.patch
aicasm Makefile fix

loop-hack.patch
loop: Fix OOM and oops

atm_dev_sem.patch
convert atm_dev_lock from spinlock to semaphore

flock-fix.patch
flock fixes for 2.5.62

sysfs-dget-fix-2.patch

irq-sharing-fix.patch
fix irq sharing and SA_INTERRUPT on x86

as-iosched.patch
anticipatory I/O scheduler

as-random-fixes.patch
Subject: [PATCH] important fixes

as-comment-fix.patch
AS: comment fix

readahead-shrink-to-zero.patch
Allow VFS readahead to fall to zero

cfq-2.patch
CFQ scheduler, #2

smalldevfs.patch
smalldevfs

objrmap-2.5.62-5.patch
object-based rmap

objrmap-X-fix.patch
objrmap fix for X

objrmap-nr_mapped-fix.patch
objrmap: fix /proc/meminfo:Mapped

objrmap-mapped-mem-fix-2.patch
fix objrmap mapped mem accounting again

oprofile-up-fix.patch
fix oprofile on UP (lockless sync)

update_atime-speedup.patch
speed up update_atime()

ext2-update_atime_speedup.patch
Use one_sec_update_atime in ext2

ext3-update_atime_speedup.patch
Use one_sec_update_atime in ext2

UPDATE_ATIME-to-update_atime.patch
Rename UPDATE_ATIME to update_atime

per-cpu-disk-stats.patch
Make diskstats per-cpu using kmalloc_percpu

presto_get_sb-fix.patch
fix presto_get_sb() return value and oops.

on_each_cpu.patch
fix preempt-issues with smp_call_function()

on_each_cpu-ldt-cleanup.patch

notsc-panic.patch
Don't panic if TSC is enabled and notsc is used

alloc_pages_cleanup.patch
clean up redundant code for alloc_pages

ext2-handle-htree-flag.patch
ext2: clear ext3 htree flag on directories

sched-b3.patch
HT scheduler, sched-2.5.63-B3

mpparse-typo-fix.patch
fix typo in arch/i386/kernel/mpparse.c in printk

i386-no-swap-fix.patch
allow CONFIG_SWAP=n for i386

remove-hugetlb_key.patch
remove dead hugetlb_key forward decl

hugetlbpage-doc-update.patch
hugetlbpage documentation update

hugetlb-valid-page-ranges.patch
hugetlb: fix MAP_FIXED handling

cciss-startup-problem-fix.patch
cciss: fix unlikely startup problem

cciss-retry-bus-reset.patch
cciss: retry bus resets

cciss-add-cmd-type.patch
cciss: add cmd_type to sendcmd parameters

cciss-getluninfo-ioctl.patch
cciss: add CCISS_GETLUNINFO ioctl

cciss-passthrough-ioctl.patch
cciss: add passthrough ioctl

show_task-free-stack-fix.patch
show_task() fix and cleanup

use-after-free-check.patch
slab use-after-free detector

reiserfs-fix-memleaks.patch
ReiserFS: fix memleaks on journal opening failures

copy_page_range-invalid-page-fix.patch
Fix copy_page_range()'s handling of invalid pages




2003-03-04 00:34:08

by Randy Hron

[permalink] [raw]
Subject: Re: 2.5.63-mm2

2.5.63-mm2 gave the oops below running dbench 192.

Quad P3 Xeon
3.75 GB Ram
ext3 filesystems for most things.
ext2 filesystem for testing.
SCSI disks.
Using anticipatory scheduler.

kernel BUG at drivers/block/as-iosched.c:188!
invalid operand: 0000
CPU: 1
EIP: 0060:[put_as_io_context+17/64] Not tainted
EIP: 0060:[<c0257191>] Not tainted
EFLAGS: 00010046
EIP is at put_as_io_context+0x11/0x40
eax: 00000000 ebx: f7e5c968 ecx: 00000000 edx: f6c08500
esi: f7e5c968 edi: 00000000 ebp: 00000001 esp: c3671f10
ds: 007b es: 007b ss: 0068
Process events/1 (pid: 11, threadinfo=c3670000 task=c36772e0)
Stack: f6c08760 c0257267 f7e5c968 c372f278 f7e5c8c0 c0257937 f7e5c968 c372f290
f7e5c8c0 c372f278 00000000 c0258091 f7e5c8c0 c372f278 c37c1200 00000287
f7fe20a0 c37c1200 c02581e6 f7e5c8c0 c02510ce c37c1200 c0252905 c37c1200
Call Trace:
[copy_as_io_context+39/48] copy_as_io_context+0x27/0x30
[<c0257267>] copy_as_io_context+0x27/0x30
[as_move_to_dispatch+71/144] as_move_to_dispatch+0x47/0x90
[<c0257937>] as_move_to_dispatch+0x47/0x90
[as_dispatch_request+417/432] as_dispatch_request+0x1a1/0x1b0
[<c0258091>] as_dispatch_request+0x1a1/0x1b0
[as_queue_notready+38/64] as_queue_notready+0x26/0x40
[<c02581e6>] as_queue_notready+0x26/0x40
[elv_queue_empty+14/32] elv_queue_empty+0xe/0x20
[<c02510ce>] elv_queue_empty+0xe/0x20
[generic_unplug_device+69/112] generic_unplug_device+0x45/0x70
[<c0252905>] generic_unplug_device+0x45/0x70
[worker_thread+458/656] worker_thread+0x1ca/0x290
[<c01281ba>] worker_thread+0x1ca/0x290
[blk_unplug_work+0/16] blk_unplug_work+0x0/0x10
[<c0252930>] blk_unplug_work+0x0/0x10
[default_wake_function+0/32] default_wake_function+0x0/0x20
[<c0117680>] default_wake_function+0x0/0x20
[ret_from_fork+6/20] ret_from_fork+0x6/0x14
[<c0108e5e>] ret_from_fork+0x6/0x14
[default_wake_function+0/32] default_wake_function+0x0/0x20
[<c0117680>] default_wake_function+0x0/0x20
[worker_thread+0/656] worker_thread+0x0/0x290
[<c0127ff0>] worker_thread+0x0/0x290
[kernel_thread_helper+5/16] kernel_thread_helper+0x5/0x10
[<c01070e5>] kernel_thread_helper+0x5/0x10

Code: 0f 0b bc 00 73 69 35 c0 f0 ff 0a 0f 94 c0 84 c0 74 14 f0 ff

dbench 64 completed 5 times before this. The first dbench 192
completed okay too (based on logfile). Oops came during 2nd dbench
192 run.

After rebooting, I noticed a load average while the system was
supposed to be idle:

$ uptime
4:30pm up 50 min, 2 users, load average: 4.00, 3.97, 3.82

$ ps aux|grep D[W]
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 2 0.0 0.0 0 0 ? DW 15:40 0:00 [migration/0]
root 4 0.0 0.0 0 0 ? DW 15:40 0:00 [migration/1]
root 6 0.0 0.0 0 0 ? DW 15:40 0:00 [migration/2]
root 8 0.0 0.0 0 0 ? DW 15:40 0:00 [migration/3]


Things in .config that were not in 2.5.62-mm2: (2.5.62-mm2 was fine on this machine)
CONFIG_JFS_FS=y
CONFIG_NLS_DEFAULT="iso8859-1"
CONFIG_NLS=y
CONFIG_NR_SIBLINGS_0=y
CONFIG_XFS_FS=y

No jfs or xfs filesystems were mounted at the time of oops nor
when migration/N was in DW state.

--
Randy Hron
http://home.earthlink.net/~rwhron/kernel/bigbox.html

2003-03-04 21:47:31

by Mark Wong

[permalink] [raw]
Subject: Re: 2.5.63-mm2

It appears something is conflicting with the old Adapatec AIC7xxx. My
system halts when it attempts to probe the devices (I think it's that.)
So I started using the new AIC7xxx driver and all is well. I don't see
any messages to the console that points to any causes. Is there
someplace I can look for a clue to the problem?

I actually didn't realize I was using the old driver and have no qualms
about not using it, but if it'll help someone else, I can help gather
information.

--
Mark Wong - - [email protected]
Open Source Development Lab Inc - A non-profit corporation
15275 SW Koll Parkway - Suite H - Beaverton OR, 97006
(503)-626-2455 x 32 (office)
(503)-626-2436 (fax)
http://www.osdl.org/archive/markw/

2003-03-04 22:02:50

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.5.63-mm2

Mark Wong <[email protected]> wrote:
>
> It appears something is conflicting with the old Adapatec AIC7xxx. My
> system halts when it attempts to probe the devices (I think it's that.)
> So I started using the new AIC7xxx driver and all is well. I don't see
> any messages to the console that points to any causes. Is there
> someplace I can look for a clue to the problem?
>
> I actually didn't realize I was using the old driver and have no qualms
> about not using it, but if it'll help someone else, I can help gather
> information.

There are "fixes" in that driver in Linus's tree. I suggest you revert to
the 2.5.63 version of aic7xxx_old.c, see if that fixes it.

2003-03-04 22:55:58

by Mark Wong

[permalink] [raw]
Subject: Re: 2.5.63-mm2

On Tue, 2003-03-04 at 14:09, Andrew Morton wrote:
> Mark Wong <[email protected]> wrote:
> >
> > It appears something is conflicting with the old Adapatec AIC7xxx. My
> > system halts when it attempts to probe the devices (I think it's that.)
> > So I started using the new AIC7xxx driver and all is well. I don't see
> > any messages to the console that points to any causes. Is there
> > someplace I can look for a clue to the problem?
> >
> > I actually didn't realize I was using the old driver and have no qualms
> > about not using it, but if it'll help someone else, I can help gather
> > information.
>
> There are "fixes" in that driver in Linus's tree. I suggest you revert to
> the 2.5.63 version of aic7xxx_old.c, see if that fixes it.

Reverting to Linus's 2.5.63 tree produces the same problem for me. I
had thought I tried it before, but it turns out I was running 2.5.62.
2.5.62's aic7xxx_old is good for me.

2003-03-04 23:11:37

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.5.63-mm2

Mark Wong <[email protected]> wrote:
>
> Reverting to Linus's 2.5.63 tree produces the same problem for me. I
> had thought I tried it before, but it turns out I was running 2.5.62.
> 2.5.62's aic7xxx_old is good for me.

There are no significant differences in that driver between .62 and .63. So
I am assuming that 2.5.62 works, 2.5.63 doesn't, and that you have not
actually tried 2.5.62's aic7xxx_old in a 2.5.63 tree?

If so, don't bother - it won't make any difference. Looks like someone broke
something in scsi core which colaterally damaged aic7xxx_old. I suggest you
feed it into bugme for now.


2003-03-09 17:19:37

by Randy Hron

[permalink] [raw]
Subject: Re: 2.5.63-mm2

2.5.64-mm1 on K6/2 uniprocessor completed the 24 hour benchmarks
using anticipatory scheduler with no oops. (Badness in request_irq
at arch/i386/kernel/irq.c:475 is known and non-fatal).

2.5.64-mm1 had ~ 47% improvement in tiobench sequential read
throughput on ext2, compared to 2.5.64 on uniprocessor K6/2.

2.5.64-mm2 ran dbench, Linux Test Project, unixbench, lmbench
with no problems on uniprocessor too.

2.5.64-mm4 is running on Quad Xeon now, and has finished dbench
runs with no oops.

--
Randy Hron
http://home.earthlink.net/~rwhron/kernel/bigbox.html