ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.5/2.5.66/2.5.66-mm1/
. The anticipatory scheduler is in wrapup mode now. It is pretty much in
its final form.
. The ext2 locking changes have been significantly redone.
The per-blockgroup data structures had to go. For a 4TB filesystem we
cannot even kmalloc that many pointers, let alone data structures.
So the per-blockgroup spinlocking has been replaced with hashed
spinlocking and the per-blockgroup accounting has been removed. A "per-cpu
counter" thing has been invented to amortise the locking cost of the
filesystem-wide counters.
. ext3 is now using spinlocking in its block allocator rather than a
filesystem-wide semaphore.
It is stability-tested but I have not yet performance tested this
closely. It does appear to have improved the context switch problem (and
the file fragmentation problem which the context switch problem causes).
But there's a way to go here.
Changes since 2.5.65-mm4:
linus.patch
Latest -bk
-nfsd-32-bit-dev_t-fixes.patch
-i2c-fix.patch
Merged
+kgdb-ga.patch
George Anzinger's gdb stub
+ppa-null-pointer-fix.patch
Might fix the parport scsi driver
+initcall-debug.patch
Debugging support for misbehaving initcalls
+posix-timers-64-bit-fix.patch
Timer fix for 64-bit machines
+slab-off-by-one-fix.patch
Slab was using too much memory.
+install_page-flush_cache_page.patch
Cache coherency bug in remap_file_pages()
+as-minor-tweaks.patch
+as-remove-stats.patch
Anticipaory scheduler tuning and clanups.
+posix-timer-double-expiration-fix.patch
Posix timers were sending timer expiry info twice.
+hugh-01-no-SWAP_ERROR.patch
+hugh-02-try_to_unmap-CONFIG_SWAP.patch
+hugh-03-add_to_swap_cache.patch
+hugh-04-page_convert_anon-ENOMEM.patch
+hugh-05-page_convert_anon-unlocking.patch
+hugh-06-wrap-below-vm_start.patch
+hugh-07-objrmap-page_table_lock.patch
+hugh-08-rmap-comments.patch
+hugh-09-tmpfs-truncation.patch
+hugh-10-tmpfs-atomics.patch
+hugh-11-fix-unuse_pmd-fixme.patch
+hugh-12-vm_enough_memory-double-counts.patch
Various vm/mm fixes and cleanups
+ext3-max-file-size-fix.patch
Allow ext3 to create files larger than 32GB (should be nearly 2TB)
-ext2-no-lock_super.patch
-ext2-ialloc-no-lock_super.patch
+ext2-no-lock_super-ng.patch
+ext2-ialloc-no-lock_super-ng.patch
Rework the ext2 block and inode allocator locking changes.
+dev_t-remove-B_FREE.patch
Remove B_FREE.
+tty_io-cleanup.patch
+page_to_pfn-in-blk_queue_bounce.patch
+init_inode_once-bloat-fix.patch
Cleanups and fixlets
+compound-page-warning-fix.patch
Fix a warning
+slab-cache-sizes-cleanup.patch
Unduplicate some tables in slab.
+stat_t-larger-dev_t.patch
Large dev_t fix.
+acpi-build-fix.patch
make acpi compile.
+sync_blockdev-on-final-close.patch
Only write out blockdev mappings on the final close.
+ext3-concurrent-block-inode-allocation.patch
+ext3-concurrent-block-allocation-fix-1.patch
Use spinlocking in the ext3 block allocator, not as fs-wide semaphore.
All 104 patches:
linus.patch
mm.patch
add -mmN to EXTRAVERSION
kgdb-ga.patch
kgdb stub for ia32 (George Anzinger's one)
ppa-null-pointer-fix.patch
initcall-debug.patch
initcall debugging support
posix-timers-64-bit-fix.patch
POSIX timers interface long/int cleanup
slab-off-by-one-fix.patch
slab: fix off-by-one in size calculation
config_spinline.patch
uninline spinlocks for profiling accuracy.
ppc64-reloc_hide.patch
ppc64-pci-patch.patch
Subject: pci patch
ppc64-aio-32bit-emulation.patch
32/64bit emulation for aio
ppc64-scruffiness.patch
Fix some PPC64 compile warnings
sym-do-160.patch
make the SYM driver do 160 MB/sec
install_page-flush_cache_page.patch
add flush_cache_page() to install_page()
config-PAGE_OFFSET.patch
Configurable kenrel/user memory split
ptrace-flush.patch
cache flushing in the ptrace code
buffer-debug.patch
buffer.c debugging
warn-null-wakeup.patch
ext3-truncate-ordered-pages.patch
ext3: explicitly free truncated pages
reiserfs_file_write-5.patch
rcu-stats.patch
RCU statistics reporting
ext3-journalled-data-assertion-fix.patch
Remove incorrect assertion from ext3
nfs-speedup.patch
nfs-oom-fix.patch
nfs oom fix
sk-allocation.patch
Subject: Re: nfs oom
nfs-more-oom-fix.patch
rpciod-atomic-allocations.patch
Make rcpiod use atomic allocations
linux-isp.patch
isp-update-1.patch
kblockd.patch
Create `kblockd' workqueue
as-iosched.patch
anticipatory I/O scheduler
as-np-reads-1.patch
AS: read-vs-read fixes
as-np-reads-2.patch
AS: more read-vs-read fixes
as-predict-data-direction.patch
as: predict direction of next IO
as-remove-frontmerge.patch
AS: remove frontmerge tunable
as-misc-cleanups.patch
AS: misc cleanups
as-minor-tweaks.patch
AS: tuning and tweaks
as-remove-stats.patch
AS: remove statistics
cfq-2.patch
CFQ scheduler, #2
unplug-use-kblockd.patch
Use kblockd for running request queues
fremap-all-mappings.patch
Make all executable mappings be nonlinear
objrmap-2.5.62-5.patch
object-based rmap
sched-2.5.64-D3.patch
sched-2.5.64-D3, more interactivity changes
scheduler-tunables.patch
scheduler tunables
show_task-free-stack-fix.patch
show_task() fix and cleanup
yellowfin-set_bit-fix.patch
yellowfin driver set_bit fix
htree-nfs-fix.patch
Fix ext3 htree / NFS compatibility problems
task_prio-fix.patch
simple task_prio() fix
slab_store_user-large-objects.patch
slab debug: perform redzoning against larger objects
pcmcia-2.patch
pcmcia-3b.patch
pcmcia-3.patch
pcmcia-4.patch
pcmcia-5.patch
pcmcia-6.patch
pcmcia-7b.patch
pcmcia-7.patch
pcmcia-8.patch
pcmcia-9.patch
pcmcia-10.patch
htree-nfs-fix-2.patch
htree nfs fix
posix-timer-double-expiration-fix.patch
posix timers: fix double-reporting of timer expiration
hugh-01-no-SWAP_ERROR.patch
swap 01/13 no SWAP_ERROR
hugh-02-try_to_unmap-CONFIG_SWAP.patch
Subject: [PATCH] swap 02/13 !CONFIG_SWAP try_to_unmap
hugh-03-add_to_swap_cache.patch
swap 03/13 add_to_swap_cache
hugh-04-page_convert_anon-ENOMEM.patch
swap 04/13 page_convert_anon -ENOMEM
hugh-05-page_convert_anon-unlocking.patch
swap 05/13 page_convert_anon unlocking
hugh-06-wrap-below-vm_start.patch
swap 06/13 wrap below vm_start
hugh-07-objrmap-page_table_lock.patch
swap 07/13 objrmap page_table_lock
hugh-08-rmap-comments.patch
swap 08/13 rmap comments
hugh-09-tmpfs-truncation.patch
swap 09/13 tmpfs truncation
hugh-10-tmpfs-atomics.patch
swap 10/13 tmpfs atomics
hugh-11-fix-unuse_pmd-fixme.patch
swap 11/13 fix unuse_pmd fixme
hugh-12-vm_enough_memory-double-counts.patch
swap 12/13 vm_enough_memory double counts
ext3-max-file-size-fix.patch
ext3: fix max file size
ext2-no-lock_super-ng.patch
ext2-ialloc-no-lock_super-ng.patch
linear-oops-fix-1.patch
md/linear oops fix
dev_t-32-bit.patch
[for playing only] change type of dev_t
dev_t-remove-B_FREE.patch
dev_t: eliminate B_FREE
dev_t-drm-warnings.patch
dev_t: fix drm printk warnings
sg-dev_t-fix.patch
32-bit dev_t fix for sg
oops-dump-preceding-code.patch
i386 oops output: dump preceding code
x86-clock-override-option.patch
x86 clock override boot option
tty_io-cleanup.patch
tty_io cleanup
page_to_pfn-in-blk_queue_bounce.patch
Subject: use page_to_pfn() in __blk_queue_bounce()
init_inode_once-bloat-fix.patch
Subject: init_inode_once() wants sizeof(struct hlist_head)
conntrack-use-after-free-fix.patch
fix use-after-free in ip_conntrack
VM_DONTEXPAND-fix.patch
honour VM_DONTEXPAND in vma merging
compound-page-warning-fix.patch
Fix 64bit warnings in mm/page_alloc.c
cdevname-irq-safety-fix.patch
make cdevname() callable from interrupts
register_chrdev_region-leak-fix.patch
register_chrdev_region() leak and race fix
slab-cache-sizes-cleanup.patch
slab: cache sizes cleanup
stat_t-larger-dev_t.patch
struct stat - support larger dev_t
acpi-build-fix.patch
ACPI build fix
sync_blockdev-on-final-close.patch
sync blockdevs on the final close only
ext3_mark_inode_dirty-speedup.patch
ext3_mark_inode_dirty() speedup
ext3_mark_inode_dirty-less-calls.patch
ext3_commit_write speedup
ext3-handle-cache.patch
ext3: create a slab cache for transaction handles
ext3-no-bkl.patch
journal_dirty_metadata-speedup.patch
journal_get_write_access-speedup.patch
ext3-concurrent-block-inode-allocation.patch
Subject: [PATCH] concurrent block/inode allocation for EXT3
ext3-concurrent-block-allocation-fix-1.patch
Hi Andrew,
Got this opps after about 20 hours with mm1 (65-mm3 lasted 5 days
until I rebooted).
Unable to handle kernel NULL pointer dereference at virtual address 00000000
printing eip:
c011516d
*pde = 00000000
Oops: 0002 [#1]
CPU: 0
EIP: 0060:[<c011516d>] Not tainted VLI
EFLAGS: 00010097
EIP is at schedule+0x8d/0x3a0
eax: 00000001 ebx: cf5e99c0 ecx: cf5e99c0 edx: ffffffff
esi: 00000000 edi: c031de00 ebp: cf5ebf08 esp: cf5ebef0
ds: 007b es: 007b ss: 0068
Process newsplex (pid: 1205, threadinfo=cf5ea000 task=cf5e99c0)
Stack: c011fbd7 c02bbc40 00000246 05261e41 cf5ebf14 cf5ebf50 cf5ebf3c c0120754
cf5ebf14 c02bc538 c02bc538 05261e41 4b87ad6e c01206e0 cf5e99c0 c02bbc40
c015abd6 000007d1 00000000 cf5ebf60 c015ac19 cf5ea000 cf5ea000 00000000
Call Trace:
[<c011fbd7>] add_timer+0x57/0xa0
[<c0120754>] schedule_timeout+0x54/0xa0
[<c01206e0>] process_timeout+0x0/0x20
[<c015abd6>] do_poll+0x56/0xc0
[<c015ac19>] do_poll+0x99/0xc0
[<c015ad88>] sys_poll+0x148/0x220
[<c013eb3b>] sys_mprotect+0x21b/0x22f
[<c01079ec>] sys_clone+0x2c/0x60
[<c015a200>] __pollwait+0x0/0xc0
[<c0109277>] syscall_call+0x7/0xb
Code: 40 17 04 75 4d 8b 03 85 c0 74 47 48 0f 84 da 02 00 00 ff 0d 00 de 31 c0 8b 43 68 ff 08 8b 03 83 f8 02 0f 84 b6 02 00 00 8b 73 28 <ff> 4e 00 8b 53 24 8b 43 20 89 50 04 89 02 8b 4b 18 8d 14 ce 8d
<6>note: newsplex[1205] exited with preempt_count 2
Debug: sleeping function called from illegal context at include/linux/rwsem.h:43
Call Trace:
[<c01168d3>] __might_sleep+0x53/0x60
[<c01198d5>] profile_exit_task+0x15/0x60
[<c011aee6>] do_exit+0x86/0x460
[<c0109ab5>] die+0x75/0x80
[<c0113854>] do_page_fault+0x134/0x45e
[<c0114798>] try_to_wake_up+0x138/0x240
[<c011fde4>] mod_timer+0x124/0x180
[<c012a520>] nanosleep_wake_up+0x0/0x20
[<c0131feb>] buffered_rmqueue+0xab/0x140
[<c0132103>] __alloc_pages+0x83/0x280
[<c0113720>] do_page_fault+0x0/0x45e
[<c01094dd>] error_code+0x2d/0x40
[<c011516d>] schedule+0x8d/0x3a0
[<c011fbd7>] add_timer+0x57/0xa0
[<c0120754>] schedule_timeout+0x54/0xa0
[<c01206e0>] process_timeout+0x0/0x20
[<c015abd6>] do_poll+0x56/0xc0
[<c015ac19>] do_poll+0x99/0xc0
[<c015ad88>] sys_poll+0x148/0x220
[<c013eb3b>] sys_mprotect+0x21b/0x22f
[<c01079ec>] sys_clone+0x2c/0x60
[<c015a200>] __pollwait+0x0/0xc0
[<c0109277>] syscall_call+0x7/0xb
Hope this helps
Ed Tomlinson
Ed Tomlinson <[email protected]> wrote:
>
> Hi Andrew,
>
> Got this opps after about 20 hours with mm1 (65-mm3 lasted 5 days
> until I rebooted).
>
> Unable to handle kernel NULL pointer dereference at virtual address 00000000
> printing eip:
> c011516d
> *pde = 00000000
> Oops: 0002 [#1]
> CPU: 0
> EIP: 0060:[<c011516d>] Not tainted VLI
> EFLAGS: 00010097
> EIP is at schedule+0x8d/0x3a0
> eax: 00000001 ebx: cf5e99c0 ecx: cf5e99c0 edx: ffffffff
> esi: 00000000 edi: c031de00 ebp: cf5ebf08 esp: cf5ebef0
> ds: 007b es: 007b ss: 0068
> Process newsplex (pid: 1205, threadinfo=cf5ea000 task=cf5e99c0)
> Stack: c011fbd7 c02bbc40 00000246 05261e41 cf5ebf14 cf5ebf50 cf5ebf3c c0120754
> cf5ebf14 c02bc538 c02bc538 05261e41 4b87ad6e c01206e0 cf5e99c0 c02bbc40
> c015abd6 000007d1 00000000 cf5ebf60 c015ac19 cf5ea000 cf5ea000 00000000
> Call Trace:
> [<c011fbd7>] add_timer+0x57/0xa0
> [<c0120754>] schedule_timeout+0x54/0xa0
> [<c01206e0>] process_timeout+0x0/0x20
> [<c015abd6>] do_poll+0x56/0xc0
> [<c015ac19>] do_poll+0x99/0xc0
> [<c015ad88>] sys_poll+0x148/0x220
> [<c013eb3b>] sys_mprotect+0x21b/0x22f
> [<c01079ec>] sys_clone+0x2c/0x60
> [<c015a200>] __pollwait+0x0/0xc0
> [<c0109277>] syscall_call+0x7/0xb
>
> Code: 40 17 04 75 4d 8b 03 85 c0 74 47 48 0f 84 da 02 00 00 ff 0d 00 de 31 c0 8b 43 68 ff 08 8b 03 83 f8 02 0f 84 b6 02 00 00 8b 73 28 <ff> 4e 00 8b 53 24 8b 43 20 89 50 04 89 02 8b 4b 18 8d 14 ce 8d
That longer Code: line is really handy.
You died in schedule()->deactivate_task()->dequeue_task().
static inline void dequeue_task(struct task_struct *p, prio_array_t *array)
{
array->nr_active--;
`array' is zero.
I'm going to Cc Ingo and run away. Ed uses preempt.
On Thu, 27 Mar 2003, Andrew Morton wrote:
> That longer Code: line is really handy.
>
> You died in schedule()->deactivate_task()->dequeue_task().
>
> static inline void dequeue_task(struct task_struct *p, prio_array_t *array)
> {
> array->nr_active--;
>
> `array' is zero.
>
> I'm going to Cc Ingo and run away. Ed uses preempt.
hm, this is an 'impossible' scenario from the scheduler code POV. Whenever
we deactivate a task, we remove it from the runqueue and set p->array to
NULL. Whenever we activate a task again, we set p->array to non-NULL. A
double-deactivate is not possible. I tried to reproduce it with various
scheduler workloads, but didnt succeed.
Mike, do you have a backtrace of the crash you saw?
Ingo
At 11:45 AM 3/28/2003 +0100, Ingo Molnar wrote:
>On Thu, 27 Mar 2003, Andrew Morton wrote:
>
> > That longer Code: line is really handy.
> >
> > You died in schedule()->deactivate_task()->dequeue_task().
> >
> > static inline void dequeue_task(struct task_struct *p, prio_array_t *array)
> > {
> > array->nr_active--;
> >
> > `array' is zero.
> >
> > I'm going to Cc Ingo and run away. Ed uses preempt.
>
>hm, this is an 'impossible' scenario from the scheduler code POV. Whenever
>we deactivate a task, we remove it from the runqueue and set p->array to
>NULL. Whenever we activate a task again, we set p->array to non-NULL. A
>double-deactivate is not possible. I tried to reproduce it with various
>scheduler workloads, but didnt succeed.
>
>Mike, do you have a backtrace of the crash you saw?
No, I didn't save it due to "grubby fingerprints".
-Mike
On Fri, 28 Mar 2003, Mike Galbraith wrote:
> >hm, this is an 'impossible' scenario from the scheduler code POV. Whenever
> >we deactivate a task, we remove it from the runqueue and set p->array to
> >NULL. Whenever we activate a task again, we set p->array to non-NULL. A
> >double-deactivate is not possible. I tried to reproduce it with various
> >scheduler workloads, but didnt succeed.
> >
> >Mike, do you have a backtrace of the crash you saw?
>
> No, I didn't save it due to "grubby fingerprints".
Hmm i think i may have his this one but i never posted due to being unable
to reproduce it on a vanilla kernel or the same kernel afterwards (which
was hacked so i won't vouch for it's cleanliness). I think preempt
might have bitten him in a bad place (mine is also CONFIG_PREEMPT), is it
possible that when we did the task_rq_unlock we got preempted and when we
got back we used the local variable requeue_waker which was set before
dropping the lock, and therefore might not be valid anymore due to
scheduler decisions done after dropping the runqueue lock?
Unable to handle kernel NULL pointer dereference at virtual address 00000000
printing eip:
c011b8d9
*pde = 00000000
Oops: 0000 [#1]
CPU: 0
EIP: 0060:[<c011b8d9>] Not tainted
EFLAGS: 00010046
EIP is at try_to_wake_up+0x1e9/0x4f0
eax: c055a000 ebx: c04e5aa0 ecx: c0552fc0 edx: c04e5aa0
esi: 00000000 edi: 00000000 ebp: c055bee4 esp: c055beb8
ds: 007b es: 007b ss: 0068
Process swapper (pid: 0, threadinfo=c055a000 task=c04e5aa0)
Stack: 00000001 c055a000 c0552fc0 00000000 cb1a0000 00000001 00000001 00000002
00000000 c04e88e4 00000001 c055bf08 c011d172 c1694700 00000001 00000000
c04e88e4 c04e88dc c055a000 00000001 c055bf3c c011d203 c04e88dc 00000001
Call Trace:
[<c011d172>] __wake_up_common+0x32/0x60
[<c011d203>] __wake_up+0x63/0xb0
[<c0122fb5>] release_console_sem+0x165/0x170
[<c0122d7b>] printk+0x1eb/0x270
[<c015e210>] invalidate_bh_lru+0x0/0x60
[<c015e210>] invalidate_bh_lru+0x0/0x60
[<c015e210>] invalidate_bh_lru+0x0/0x60
[<c01163f2>] smp_call_function_interrupt+0x42/0xb0
[<c015e210>] invalidate_bh_lru+0x0/0x60
[<c0106eb0>] default_idle+0x0/0x40
[<c010a41a>] call_function_interrupt+0x1a/0x20
[<c0106eb0>] default_idle+0x0/0x40
[<c0106ede>] default_idle+0x2e/0x40
[<c0106f6a>] cpu_idle+0x3a/0x50
[<c0105000>] rest_init+0x0/0x80
Code: 8b 06 48 89 06 8b 4a 24 8b 42 20 89 01 89 48 04 8b 4a 18 8d
0xc011b8d9 is in try_to_wake_up (kernel/sched.c:282).
277 /*
278 * Adding/removing a task to/from a priority array:
279 */
280 static inline void dequeue_task(struct task_struct *p, prio_array_t *array)
281 {
282 array->nr_active--;
283 list_del(&p->run_list);
284 if (list_empty(array->queue + p->prio))
285 __clear_bit(p->prio, array->bitmap);
286 }
(gdb) list *__wake_up_common+0x32
0xc011d1b2 is in __wake_up_common (kernel/sched.c:1424).
1419 list_for_each_safe(tmp, next, &q->task_list) {
1420 wait_queue_t *curr;
1421 unsigned flags;
1422 curr = list_entry(tmp, wait_queue_t, task_list);
1423 flags = curr->flags;
1424 if (curr->func(curr, mode, sync) &&
1425 (flags & WQ_FLAG_EXCLUSIVE) &&
1426 !--nr_exclusive)
1427 break;
1428 }
(gdb) list *__wake_up+0x62
0xc011d242 is in __wake_up (kernel/sched.c:1445).
1440
1441 if (unlikely(!q))
1442 return;
1443
1444 spin_lock_irqsave(&q->lock, flags);
1445 __wake_up_common(q, mode, nr_exclusive, 0);
1446 spin_unlock_irqrestore(&q->lock, flags);
1447 }
1448
1449 /*
--
function.linuxpower.ca
On Fri, 28 Mar 2003, Zwane Mwaikambo wrote:
> Hmm i think i may have his this one but i never posted due to being
> unable to reproduce it on a vanilla kernel or the same kernel afterwards
> (which was hacked so i won't vouch for it's cleanliness). I think
> preempt might have bitten him in a bad place (mine is also
> CONFIG_PREEMPT), is it possible that when we did the task_rq_unlock we
> got preempted and when we got back we used the local variable
> requeue_waker which was set before dropping the lock, and therefore
> might not be valid anymore due to scheduler decisions done after
> dropping the runqueue lock?
yes, this one was my only suspect, but it should really never cause any
problems. We might change sleep_avg during the wakeup, and carry the
requeue_waker flag over a preemptible window, but the requeueing itself
re-takes the runqueue lock, and does not take anything for granted. The
flag could very well be random as well, and the code should still be
correct - there's no requirement to recalculate the priority every time we
change sleep_avg. (in fact we at times intentionally keep those values
detached.)
Ingo
At 09:56 AM 3/28/2003 -0500, Zwane Mwaikambo wrote:
>On Fri, 28 Mar 2003, Mike Galbraith wrote:
>
> > >hm, this is an 'impossible' scenario from the scheduler code POV. Whenever
> > >we deactivate a task, we remove it from the runqueue and set p->array to
> > >NULL. Whenever we activate a task again, we set p->array to non-NULL. A
> > >double-deactivate is not possible. I tried to reproduce it with various
> > >scheduler workloads, but didnt succeed.
> > >
> > >Mike, do you have a backtrace of the crash you saw?
> >
> > No, I didn't save it due to "grubby fingerprints".
>
>Hmm i think i may have his this one but i never posted due to being unable
>to reproduce it on a vanilla kernel or the same kernel afterwards (which
>was hacked so i won't vouch for it's cleanliness). I think preempt
>might have bitten him in a bad place (mine is also CONFIG_PREEMPT), is it
>possible that when we did the task_rq_unlock we got preempted and when we
>got back we used the local variable requeue_waker which was set before
>dropping the lock, and therefore might not be valid anymore due to
>scheduler decisions done after dropping the runqueue lock?
Dunno. I did have one lying around. The attached one was while printing
out array switch latency after starvation timeout. Others happened while
printing wakeup stats for p->state > 1 tasks in scheduler_tick() [under
lock w/ wakeup disabled in printk.c]. It's nothing I did to the scheduler
;-) I don't think, but this was in 65-mm3-twiddle-twiddle-twiddle.
>Unable to handle kernel NULL pointer dereference at virtual address 00000000
> printing eip:
>c011b8d9
>*pde = 00000000
>Oops: 0000 [#1]
>CPU: 0
>EIP: 0060:[<c011b8d9>] Not tainted
>EFLAGS: 00010046
>EIP is at try_to_wake_up+0x1e9/0x4f0
>eax: c055a000 ebx: c04e5aa0 ecx: c0552fc0 edx: c04e5aa0
>esi: 00000000 edi: 00000000 ebp: c055bee4 esp: c055beb8
>ds: 007b es: 007b ss: 0068
>Process swapper (pid: 0, threadinfo=c055a000 task=c04e5aa0)
>Stack: 00000001 c055a000 c0552fc0 00000000 cb1a0000 00000001 00000001
>00000002
> 00000000 c04e88e4 00000001 c055bf08 c011d172 c1694700 00000001
> 00000000
> c04e88e4 c04e88dc c055a000 00000001 c055bf3c c011d203 c04e88dc
> 00000001
>Call Trace:
> [<c011d172>] __wake_up_common+0x32/0x60
> [<c011d203>] __wake_up+0x63/0xb0
> [<c0122fb5>] release_console_sem+0x165/0x170
> [<c0122d7b>] printk+0x1eb/0x270
> [<c015e210>] invalidate_bh_lru+0x0/0x60
> [<c015e210>] invalidate_bh_lru+0x0/0x60
> [<c015e210>] invalidate_bh_lru+0x0/0x60
> [<c01163f2>] smp_call_function_interrupt+0x42/0xb0
> [<c015e210>] invalidate_bh_lru+0x0/0x60
> [<c0106eb0>] default_idle+0x0/0x40
> [<c010a41a>] call_function_interrupt+0x1a/0x20
> [<c0106eb0>] default_idle+0x0/0x40
> [<c0106ede>] default_idle+0x2e/0x40
> [<c0106f6a>] cpu_idle+0x3a/0x50
> [<c0105000>] rest_init+0x0/0x80
>
>Code: 8b 06 48 89 06 8b 4a 24 8b 42 20 89 01 89 48 04 8b 4a 18 8d
>
>0xc011b8d9 is in try_to_wake_up (kernel/sched.c:282).
>277 /*
>278 * Adding/removing a task to/from a priority array:
>279 */
>280 static inline void dequeue_task(struct task_struct *p,
>prio_array_t *array)
>281 {
>282 array->nr_active--;
>283 list_del(&p->run_list);
>284 if (list_empty(array->queue + p->prio))
>285 __clear_bit(p->prio, array->bitmap);
>286 }
Same spot.
-Mike
At 04:25 PM 3/28/2003 +0100, Ingo Molnar wrote:
>On Fri, 28 Mar 2003, Zwane Mwaikambo wrote:
>
> > Hmm i think i may have his this one but i never posted due to being
> > unable to reproduce it on a vanilla kernel or the same kernel afterwards
> > (which was hacked so i won't vouch for it's cleanliness). I think
> > preempt might have bitten him in a bad place (mine is also
> > CONFIG_PREEMPT), is it possible that when we did the task_rq_unlock we
> > got preempted and when we got back we used the local variable
> > requeue_waker which was set before dropping the lock, and therefore
> > might not be valid anymore due to scheduler decisions done after
> > dropping the runqueue lock?
>
>yes, this one was my only suspect, but it should really never cause any
>problems. We might change sleep_avg during the wakeup, and carry the
>requeue_waker flag over a preemptible window, but the requeueing itself
>re-takes the runqueue lock, and does not take anything for granted. The
>flag could very well be random as well, and the code should still be
>correct - there's no requirement to recalculate the priority every time we
>change sleep_avg. (in fact we at times intentionally keep those values
>detached.)
In my 66-twiddle tree, I moved that under the lock out of pure paranoia. I
can try to see if printing under hefty (very) load will still trigger the
occasional explosion.
-Mike