http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.59/2.5.59-mm6/
. Some rework and restructuring of the anticipatory scheduling code.
The reported slowdown in RAID1 rebuild _may_ have been fixed. At least,
it doesn't happen for me with this patchset.
. The request aliasing problem hasn't been fixed yet, so this kernel (and
2.5.59) will still fail under heavy direct-IO load.
. The mysterious "machine hangs late in boot" problem has been narrowed
down thanks to some great work by Andres Salomon. The machine is stuck
waiting on I/O completion when performing the initial lookup for
/sbin/devfs_helper:
Thread 11 (Thread 10):
#0 io_schedule () at include/asm/atomic.h:122
#1 0xc014cd0a in __wait_on_buffer (bh=0xd3fe45b0) at fs/buffer.c:132
#2 0xc014dfa6 in __bread_slow (bh=0xd3fe45b0)
at include/linux/buffer_head.h:260
#3 0xc014e1c8 in __bread (bdev=0x0, block=0, size=0) at fs/buffer.c:1385
#4 0xc0181774 in ext3_get_inode_loc (inode=0xd3d697bc, iloc=0xd3d13ce0)
at include/linux/buffer_head.h:235
#5 0xc0181841 in ext3_read_inode (inode=0xd3d697bc) at fs/ext3/inode.c:2205
#6 0xc0183db4 in ext3_lookup (dir=0x0, dentry=0xd3d4cae0)
at include/linux/fs.h:1199
#7 0xc01585fb in real_lookup (parent=0xd3d4cce0, name=0xd3d13d94, flags=0)
at fs/namei.c:372
#8 0xc0158849 in do_lookup (nd=0xd3d4cae0, name=0xd3d13d94, path=0xd3d13d84,
cached_path=0xd3d13d8c, flags=-1071144428) at fs/namei.c:537
#9 0xc01589ef in link_path_walk (name=0x0, nd=0xd3d13dc8) at fs/namei.c:651
#10 0xc01558c1 in open_exec (name=0x0) at fs/exec.c:454
#11 0xc0156200 in do_execve (filename=0xd3d6d000 "/sbin/devfs_helper",
argv=0xc133bd08, envp=0xd3d13dc8, regs=0x0) at fs/exec.c:1032
#12 0xc0107e0d in sys_execve (regs=
{ebx = -1071125472, ecx = -1053573880, edx = -1071125308, esi =
-740448672, edi = 0, ebp = -741261356, eax = 11, xds = -1072562053,
Which _looks_ like a request queueing problem, but Andres says it goes
away when devfs is disabled in config. So I've dropped the smalldevfs
patch for now - would be appreciated if devfs users could retest this
patch, with CONFIG_DEVFS=y.
. There appears to be a CPU utilisation problem with
reiserfs_file_write.patch - but it doesn't oops or corrupt data so I've
left that in for now while Oleg scratches his head over that one.
Changes since 2.5.59-mm5:
-devfs-fix.patch
This might have caused interactions with Adam's patch (which isn't here
anyway), so leave it out.
+sync-fix.patch
Fix rare data loss problem with ext2 and heavy use of sync()
+direct-io-ENOSPC-fix.patch
Fix inode accounting error which occurs when an O_DIRECT write hits ENOSPC.
+frlock-xtime.patch
+frlock-xtime-i386.patch
+frlock-xtime-ia64.patch
+frlock-xtime-other.patch
An alternative version of the lockless gettimeofday() patch. Needs testing
on other architectures.
+inode-accounting-race-fix.patch
Fix SMP race in i_blocks/i_bytes accounting.
-lockless-current_kernel_time.patch
Replaced by the frlock version.
+agp-warning-fix.patch
Fix a warning
+slab-poisoning-fix.patch
Slab debug fix
+modversions.patch
Resurrect module versioning support
+pcmcia_timer_init.patch
Timer initialisation fixes
+no_space_in_slabnames.patch
/proc/slabinfo sanity
+epoll-update.patch
Latest from Davide (I think. May be latest-but-one)
+hash-warnings.patch
Compile warnings.
+discarded-section-fix.patch
Build fix
-smalldevfs.patch
Might be causing the boot hangs
+atyfb-compile-fix.patch
Build fix
+floppy-locking-fix.patch
Floppy forgot to take queue_lock
+lost-tick.patch
Kep time going forward when someone disables interrupts for ages
-exit_mmap-fix-ppc64.patch
-exit_mmap-ia64-fix.patch
+exit_mmap-fix-47.patch
Yet another take on the TASK_SIZE fix for exit_mmap()
anticipatory_io_scheduling-2_5_59-mm3.patch
+ant-cleanup.patch
+antsched-update-1.patch
Anticipatory scheduler changes
All 82 patches:
kgdb.patch
sync-fix.patch
Fix data loss problem due to sys_sync
direct-io-ENOSPC-fix.patch
direct-IO: fix i_size handling on ENOSPC
frlock-xtime.patch
fast reader locks for gettimeofday() and friends
frlock-xtime-i386.patch
frlock-xtime-ia64.patch
frlock-xtime-other.patch
inode-accounting-race-fix.patch
Fix inode size accounting race
vmlinux-fix.patch
vmlinux fix
maestro-fix.patch
Compile fix in sound/oss/maestro.c
deadline-np-42.patch
(undescribed patch)
deadline-np-43.patch
(undescribed patch)
setuid-exec-no-lock_kernel.patch
remove lock_kernel() from exec of setuid apps
buffer-debug.patch
buffer.c debugging
warn-null-wakeup.patch
reiserfs-readpages.patch
reiserfs v3 readpages support
fadvise.patch
implement posix_fadvise64()
ext3-scheduling-storm.patch
ext3: fix scheduling storm and lockups
auto-unplug.patch
self-unplugging request queues
less-unplugging.patch
Remove most of the blk_run_queues() calls
scheduler-tunables.patch
scheduler tunables
htlb-2.patch
hugetlb: fix MAP_FIXED handling
kirq.patch
kirq-up-fix.patch
Subject: Re: 2.5.59-mm1
agp-warning-fix.patch
fix agp compile warning
ext3-truncate-ordered-pages.patch
ext3: explicitly free truncated pages
prune-icache-stats.patch
add stats for page reclaim via inode freeing
vma-file-merge.patch
mmap-whitespace.patch
read_cache_pages-cleanup.patch
cleanup in read_cache_pages()
remove-GFP_HIGHIO.patch
remove __GFP_HIGHIO
quota-lockfix.patch
quota locking fix
quota-offsem.patch
quota semaphore fix
slab-poisoning-fix.patch
slab poison checking fix
oprofile-p4.patch
oprofile_cpu-as-string.patch
oprofile cpu-as-string
preempt-locking.patch
Subject: spinlock efficiency problem [was 2.5.57 IO slowdown with CONFIG_PREEMPT enabled)
wli-11_pgd_ctor.patch
(undescribed patch)
wli-11_pgd_ctor-update.patch
pgd_ctor update
stack-overflow-fix.patch
stack overflow checking fix
ext2-allocation-failure-fix.patch
Subject: [PATCH] ext2 allocation failures
ext2_new_block-fixes.patch
ext2_new_block cleanups and fixes
hangcheck-timer.patch
hangcheck-timer
slab-irq-fix.patch
slab IRQ fix
Richard_Henderson_for_President.patch
Subject: [PATCH] Richard Henderson for President!
parenthesise-pgd_index.patch
Subject: i386 pgd_index() doesn't parenthesize its arg
sendfile-security-hooks.patch
Subject: [RFC][PATCH] Restore LSM hook calls to sendfile
macro-double-eval-fix.patch
Subject: Re: i386 pgd_index() doesn't parenthesize its arg
mmzone-parens.patch
asm-i386/mmzone.h macro paren/eval fixes
blkdev-fixes.patch
blkdev.h fixes
modversions.patch
Subject: [PATCH] new modversions
pcmcia_timer_init.patch
pcmcia timer initialisation fixes
no_space_in_slabnames.patch
remove spaces from slab names
remove-will_become_orphaned_pgrp.patch
remove will_become_orphaned_pgrp()
buffer-io-accounting.patch
correct wait accounting in wait_on_buffer()
aic79xx-linux-2.5.59-20030122.patch
aic7xxx update
MAX_IO_APICS-ifdef.patch
MAX_IO_APICS #ifdef'd wrongly
dac960-error-retry.patch
Subject: [PATCH] linux2.5.56 patch to DAC960 driver for error retry
epoll-update.patch
epoll timeout and syscall return types ...
topology-remove-underbars.patch
Remove __ from topology macros
mandlock-oops-fix.patch
ftruncate/truncate oopses with mandatory locking
put_user-warning-fix.patch
Subject: Re: Linux 2.5.59
hash-warnings.patch
fix #warning's
discarded-section-fix.patch
Subject: [PATCH] discarded section errors (2.5.59)
reiserfs_file_write.patch
Subject: reiserfs file_write patch
atyfb-compile-fix.patch
atyfb compilation fix
floppy-locking-fix.patch
floppy locking fix
lost-tick.patch
Lost tick compensation
sound-firmware-load-fix.patch
soundcore.c referenced non-existent errno variable
generic_file_readonly_mmap-fix.patch
Fix generic_file_readonly_mmap()
seq_file-page-defn.patch
Include <asm/page.h> in fs/seq_file.c, as it uses PAGE_SIZE
exit_mmap-fix-47.patch
show_task-fix.patch
Subject: [PATCH] 2.5.59: show_task() oops
scsi-iothread.patch
scsi_eh_* needs to run even during suspend
numaq-ioapic-fix2.patch
NUMAQ io_apic programming fix
misc.patch
misc fixes
writeback-sync-cleanup.patch
Remove unneeded code in fs/fs-writeback.c
dont-wait-on-inode.patch
Fix latencies during writeback
unlink-latency-fix.patch
fix i_sem contention in sys_unlink()
anticipatory_io_scheduling-2_5_59-mm3.patch
Subject: [PATCH] 2.5.59-mm3 antic io sched
ant-cleanup.patch
antsched-update-1.patch
Subject: [PATCH] 2.5.59-snap2 updates
This one boots for me (with devfs enabled). I got some rather interesting
stack dumps, however, during boot.
Linux version 2.5.59 (dilinger@pea) (gcc version 3.2.2 20030124 (Debian prerelease)) #4 Mon Jan 27 03:02:50 EST 2003
Video mode to be used for restore is f00
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000c0000 - 00000000000cc000 (reserved)
BIOS-e820: 0000000000100000 - 0000000013fec000 (usable)
BIOS-e820: 0000000013fec000 - 0000000013ff0000 (reserved)
BIOS-e820: 00000000ffe00000 - 0000000100000000 (reserved)
319MB LOWMEM available.
On node 0 totalpages: 81900
DMA zone: 4096 pages, LIFO batch:1
Normal zone: 77804 pages, LIFO batch:16
HighMem zone: 0 pages, LIFO batch:1
Dell Inspiron with broken BIOS detected. Refusing to enable the local APIC.
Building zonelist for node : 0
Kernel command line: auto BOOT_IMAGE=Linux-2.5 ro root=302 devfs=mount gdb gdbttyS=1 gdbbaud=115200
Initializing CPU#0
PID hash table entries: 2048 (order 11: 16384 bytes)
Detected 498.395 MHz processor.
Console: colour VGA+ 80x25
Warning! Detected 2173 micro-second gap between interrupts.
Compensating for 1 lost ticks.
Call Trace:
[<c010b8a8>] handle_IRQ_event+0x38/0x60
[<c010bade>] do_IRQ+0xae/0x160
[<c0105000>] _stext+0x0/0x30
[<c010a150>] common_interrupt+0x18/0x20
[<c0105000>] _stext+0x0/0x30
Calibrating delay loop... 985.08 BogoMIPS
Memory: 321540k/327600k available (1328k kernel code, 5320k reserved, 396k data, 120k init, 0k highmem)
Dentry cache hash table entries: 65536 (order: 7, 524288 bytes)
Inode-cache hash table entries: 32768 (order: 6, 262144 bytes)
Mount-cache hash table entries: 512 (order: 0, 4096 bytes)
-> /dev
-> /dev/console
-> /root
CPU: L1 I cache: 16K, L1 D cache: 16K
CPU: L2 cache: 128K
CPU: After generic, caps: 0383f9ff 00000000 00000000 00000000
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU: Intel Celeron (Coppermine) stepping 03
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
Serial: 8250/16550 driver $Revision: 1.90 $ IRQ sharing disabled
tts/1 at I/O 0x2f8 (irq = 3) is a 16550A
Waiting for connection from remote gdb... <4>
Warning! Detected 6839271 micro-second gap between interrupts.
Compensating for 6838 lost ticks.
Call Trace:
[<c010b8a8>] handle_IRQ_event+0x38/0x60
[<c010bade>] do_IRQ+0xae/0x160
[<c010a850>] do_int3+0x0/0x80
[<c010a150>] common_interrupt+0x18/0x20
[<c010a850>] do_int3+0x0/0x80
[<c0115df8>] handle_exception+0x7a8/0x7f0
[<c01c905f>] vt_console_print+0x21f/0x310
[<c0105000>] _stext+0x0/0x30
[<c0115e7d>] breakpoint+0xd/0x10
[<c010a850>] do_int3+0x0/0x80
[<c010a8c9>] do_int3+0x79/0x80
[<c011e2d8>] release_console_sem+0xd8/0xe0
[<c010a1ed>] error_code+0x2d/0x38
[<c0105000>] _stext+0x0/0x30
[<c0115e7d>] breakpoint+0xd/0x10
[<c01cafb2>] gdb_hook+0xa2/0xf0
[<c01cae80>] gdb_interrupt+0x0/0x80
Connected.
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Initializing RT netlink socket
mtrr: v2.0 (20020519)
PCI: PCI BIOS revision 2.10 entry at 0xfc0be, last bus=1
PCI: Using configuration type 1
BIO: pool of 256 setup, 14Kb (56 bytes/bio)
biovec pool[0]: 1 bvecs: 256 entries (12 bytes)
biovec pool[1]: 4 bvecs: 256 entries (48 bytes)
biovec pool[2]: 16 bvecs: 256 entries (192 bytes)
...and so on
On Sun, 26 Jan 2003 23:10:15 -0800, Andrew Morton wrote:
>
> http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.59/2.5.59-mm6/
>
[...]
>
> antsched-update-1.patch
> Subject: [PATCH] 2.5.59-snap2 updates
On Mon, Jan 27, 2003 at 03:17:54AM -0500, Andres Salomon wrote:
> This one boots for me (with devfs enabled). I got some rather interesting
> stack dumps, however, during boot.
I'm experiencing similar problems without devfs...
> Warning! Detected 2173 micro-second gap between interrupts.
> Compensating for 1 lost ticks.
> Call Trace:
> [<c010b8a8>] handle_IRQ_event+0x38/0x60
> [<c010bade>] do_IRQ+0xae/0x160
> [<c0105000>] _stext+0x0/0x30
> [<c010a150>] common_interrupt+0x18/0x20
> [<c0105000>] _stext+0x0/0x30
Each of these warnings reproduces for each input device on my system
(there are 3 now, so if i disconnect, say, my USB mouse, there will be
only 2.)
In other news (this happened in -mm5, not sure if this happened to
others or not:)
Hangcheck: starting hangcheck timer 0.5.0 (tick is 180 seconds, margin
is 60 seconds).
Uninitialised timer!
This is just a warning. Your computer is OK
function=0xc0216100, data=0x0
Call Trace:
[<c0121ce1>] check_timer_failed+0x61/0x70
[<c0216100>] hangcheck_fire+0x0/0xc0
[<c0121e5f>] mod_timer+0x2f/0x180
[<c0105075>] init+0x35/0x160
[<c0105040>] init+0x0/0x160
[<c010713d>] kernel_thread_helper+0x5/0x18
No visible problems though, at all.
-Josh
Joshua Kwan <[email protected]> wrote:
>
> On Mon, Jan 27, 2003 at 03:17:54AM -0500, Andres Salomon wrote:
> > This one boots for me (with devfs enabled). I got some rather interesting
> > stack dumps, however, during boot.
>
> I'm experiencing similar problems without devfs...
>
> > Warning! Detected 2173 micro-second gap between interrupts.
> > Compensating for 1 lost ticks.
> > Call Trace:
> > [<c010b8a8>] handle_IRQ_event+0x38/0x60
> > [<c010bade>] do_IRQ+0xae/0x160
> > [<c0105000>] _stext+0x0/0x30
> > [<c010a150>] common_interrupt+0x18/0x20
> > [<c0105000>] _stext+0x0/0x30
>
> Each of these warnings reproduces for each input device on my system
> (there are 3 now, so if i disconnect, say, my USB mouse, there will be
> only 2.)
This is debug stuff - it tells us which drivers are disabling interrupts for
more than one or two clock ticks. Please send the full trace so we can bug
the maintainers into fixing the drivers up.
> In other news (this happened in -mm5, not sure if this happened to
> others or not:)
>
> Hangcheck: starting hangcheck timer 0.5.0 (tick is 180 seconds, margin
> is 60 seconds).
> Uninitialised timer!
> This is just a warning. Your computer is OK
> function=0xc0216100, data=0x0
> Call Trace:
> [<c0121ce1>] check_timer_failed+0x61/0x70
> [<c0216100>] hangcheck_fire+0x0/0xc0
> [<c0121e5f>] mod_timer+0x2f/0x180
> [<c0105075>] init+0x35/0x160
> [<c0105040>] init+0x0/0x160
> [<c010713d>] kernel_thread_helper+0x5/0x18
Ah, bug. Thanks, I shall repair that.
> No visible problems though, at all.
>
No, the uninitialised timer detector fixes the timer up.
On Mon, Jan 27, 2003 at 12:40:59AM -0800, Andrew Morton wrote:
[snip]
> This is debug stuff - it tells us which drivers are disabling interrupts for
> more than one or two clock ticks. Please send the full trace so we can bug
> the maintainers into fixing the drivers up.
>
Sure:
------
Warning! Detected 30879 micro-second gap between interrupts.
Compensating for 29 lost ticks.
Call Trace:
[<c010a948>] handle_IRQ_event+0x38/0x60
[<c010ab77>] do_IRQ+0x97/0x120
[<c010957c>] common_interrupt+0x18/0x20
[<c02601f4>] i8042_command+0x94/0xc0
[<c02602b6>] i8042_aux_write+0x36/0x70
[<c025e1cd>] atkbd_sendbyte+0x7d/0x80
[<c025e2b1>] atkbd_command+0xe1/0xf0
[<c025e64b>] atkbd_probe+0x12b/0x180
[<c025e96a>] atkbd_connect+0x25a/0x2b0
[<c025fb93>] serio_find_dev+0x53/0x60
[<c0105075>] init+0x35/0x160
[<c0105040>] init+0x0/0x160
[<c010713d>] kernel_thread_helper+0x5/0x18
Warning! Detected 113343 micro-second gap between interrupts.
Compensating for 112 lost ticks.
Call Trace:
[<c010a948>] handle_IRQ_event+0x38/0x60
[<c010ab77>] do_IRQ+0x97/0x120
[<c010957c>] common_interrupt+0x18/0x20
[<c02601f4>] i8042_command+0x94/0xc0
[<c0260436>] i8042_close+0x46/0x90
[<c025ff81>] serio_close+0x11/0x20
[<c025e989>] atkbd_connect+0x279/0x2b0
[<c025fb93>] serio_find_dev+0x53/0x60
[<c0105075>] init+0x35/0x160
[<c0105040>] init+0x0/0x160
[<c010713d>] kernel_thread_helper+0x5/0x18
Warning! Detected 30145 micro-second gap between interrupts.
Compensating for 29 lost ticks.
Call Trace:
[<c010a948>] handle_IRQ_event+0x38/0x60
[<c010ab77>] do_IRQ+0x97/0x120
[<c010957c>] common_interrupt+0x18/0x20
[<c02601f4>] i8042_command+0x94/0xc0
[<c0260436>] i8042_close+0x46/0x90
[<c025ff81>] serio_close+0x11/0x20
[<c025fa7e>] psmouse_connect+0x19e/0x1c0
[<c025fb93>] serio_find_dev+0x53/0x60
[<c0105075>] init+0x35/0x160
[<c0105040>] init+0x0/0x160
[<c010713d>] kernel_thread_helper+0x5/0x18
---
>> Each of these warnings reproduces for each input device on my system
>> (there are 3 now, so if i disconnect, say, my USB mouse, there will be
>> only 2.)
A closer look tells me that this isn't quite true. Sorry..
Regards
Josh
Andrew Morton wrote:
>
> Which _looks_ like a request queueing problem, but Andres says it goes
> away when devfs is disabled in config. So I've dropped the smalldevfs
> patch for now - would be appreciated if devfs users could retest this
> patch, with CONFIG_DEVFS=y.
mm6 works where mm5 failed. You are probably right suspecting devfs,
I have devfs enabled although I don't actually use it. No problems
with RAID1 either.
I enabled hangcheck timer, and gets this now and then:
Warning! Detected 2106 micro-second gap between interrupts.
Compensating for 1 lost ticks.
Call Trace:
[<c010a6ad>] handle_IRQ_event+0x29/0x4c
[<c010a881>] do_IRQ+0xbd/0x138
[<c0106cc0>] default_idle+0x0/0x28
[<c0106cc0>] default_idle+0x0/0x28
[<c01093e0>] common_interrupt+0x18/0x20
[<c0106cc0>] default_idle+0x0/0x28
[<c0106cc0>] default_idle+0x0/0x28
[<c0106ce3>] default_idle+0x23/0x28
[<c0106d63>] cpu_idle+0x37/0x48
[<c0105000>] rest_init+0x0/0x50
[<c010504d>] rest_init+0x4d/0x50
Warning! Detected 2043 micro-second gap between interrupts.
Compensating for 1 lost ticks.
Call Trace:
[<c010a6ad>] handle_IRQ_event+0x29/0x4c
[<c010a881>] do_IRQ+0xbd/0x138
[<c0106cc0>] default_idle+0x0/0x28
[<c0106cc0>] default_idle+0x0/0x28
[<c01093e0>] common_interrupt+0x18/0x20
[<c0106cc0>] default_idle+0x0/0x28
[<c0106cc0>] default_idle+0x0/0x28
[<c0106ce3>] default_idle+0x23/0x28
[<c0106d63>] cpu_idle+0x37/0x48
[<c0105000>] rest_init+0x0/0x50
[<c010504d>] rest_init+0x4d/0x50
Helge Hafting
Hello mm-users,
. The mysterious "machine hangs late in boot" problem has been narrowed
down thanks to some great work by Andres Salomon. The machine is stuck
waiting on I/O completion when performing the initial lookup for
/sbin/devfs_helper:
I don't believe it to be an exclusively small-devfs helper problem.
It is an interaction at best. Sure I had problems using devfs-small, but
mm2 worked and mm3 was the first that halted during boot. Both have
devfs-small, and both need its helper. Or I am missing a subtlety here?
Secondly, Andrew sent me a rollup of patches against 2.5.59 he thought
were suspicious, without smalldevfs and it also halted, but at another
place in boot, at adding swap.
Can someone besides me confirm this behavior or am I the loon who just
won't understand?
Luuk
At 01:27 PM 1/27/2003 +0100, Luuk van der Duim wrote:
>Hello mm-users,
>
>
> . The mysterious "machine hangs late in boot" problem has been narrowed
> down thanks to some great work by Andres Salomon. The machine is stuck
> waiting on I/O completion when performing the initial lookup for
> /sbin/devfs_helper:
>
>
>I don't believe it to be an exclusively small-devfs helper problem.
Well, my test box agrees (I have never ever used devfs, but could lock hard
in minutes) mm6 works fine here, so I _think_ it's probably resolved...
>It is an interaction at best. Sure I had problems using devfs-small, but
>mm2 worked and mm3 was the first that halted during boot. Both have
>devfs-small, and both need its helper. Or I am missing a subtlety here?
I don't think you're missing anything, but I also don't know wtf the
interaction is. I put a couple of man-days into looking for it, and came
up with exactly nada of interest.
>Secondly, Andrew sent me a rollup of patches against 2.5.59 he thought
>were suspicious, without smalldevfs and it also halted, but at another
>place in boot, at adding swap.
Mine locked hard hard hard. Booted fine, but died reliably under heavy load.
(something seems funky with nmi_watchdog... hard lock = no_more_nmi_ticks
. Anybody out there know enough about local APIC to explain why idle=poll
gives nice 1 second nmi, but everything else depends upon cpu load?... and
why when hardlock happens, it _stops_)
>Can someone besides me confirm this behavior or am I the loon who just
>won't understand?
My box agrees that you're not a loon fwTw :)
-Mike
On Mon, 27 Jan 2003, Mike Galbraith wrote:
> (something seems funky with nmi_watchdog... hard lock = no_more_nmi_ticks
> . Anybody out there know enough about local APIC to explain why idle=poll
> gives nice 1 second nmi, but everything else depends upon cpu load?... and
> why when hardlock happens, it _stops_)
Because we base the performance counter on unhalted cycles, whilst the
normal idle function does an hlt. I think the K7 can do halted too.
Zwane
--
function.linuxpower.ca
At 02:17 PM 1/27/2003 -0500, Zwane Mwaikambo wrote:
>On Mon, 27 Jan 2003, Mike Galbraith wrote:
>
> > (something seems funky with nmi_watchdog... hard lock = no_more_nmi_ticks
> > . Anybody out there know enough about local APIC to explain why idle=poll
> > gives nice 1 second nmi, but everything else depends upon cpu load?... and
> > why when hardlock happens, it _stops_)
>
>Because we base the performance counter on unhalted cycles, whilst the
>normal idle function does an hlt. I think the K7 can do halted too.
(well bugger, I _know_ I'm gonna regret this;)
When can the darn thing actually trigger an oops?
-Mike
On Mon, 27 Jan 2003, Mike Galbraith wrote:
> (well bugger, I _know_ I'm gonna regret this;)
>
> When can the darn thing actually trigger an oops?
Depends, i have seen hardlocks where you don't get an oops, the nmi
watchdog will work if the kernel is still running but say stuck in a busy
loop and without the timer interrupt firing. Sometimes upping the interval
by using idle=poll does help me out. Otherwise your cpu or kernel is
really in a bad state.
Zwane
--
function.linuxpower.ca