2003-08-26 17:36:08

by Roger Luethi

[permalink] [raw]
Subject: [2.6] kernel BUG at arch/i386/mm/highmem.c:14!

This BUG immediately followed my turning off DMA on hda (with a mounted
partition and dirty buffers). Register and stack info are available upon
request (screenshot). 2.6.0-test4.

Roger

kernel BUG at arch/i386/mm/highmem.c:14!
invalid operand: 0000 [#1]
CPU: 0
EIP: 0060:[<c011bc36>] Not tainted
EFLAGS: 00010206
EIP is at kunmap+0x16/0x30
[...]
Process pdflush [...]

[<c0158f69>] __blk_queue_bounce+0x1e9/0x230
[<c0158fe7>] blk_queue_bounce+0x37/0x60
[<c022be20>] __make_request+0x50/0x670
[<c023ccb0>] task_mulout_intr+0x0/0x290
[<c022c584>] generic_make_request+0x144/0x1e0
[<c0176fec>] bio_alloc+0xcc/0x1a0
[<c022c65d>] submit_bio+0x3d/0x70
[<c01c80e0>] reiserfs_write_full_page+0x210/0x350
[<c01c8265>] reiserfs_writepage+0x25/0x40
[<c019ed3f>] mpage_writepages+0x38f/0x4d0
[<c01c8240>] reiserfs_writepage+0x0/0x40
[<c014e375>] do_writepages+0x35/0x40
[<c019c644>] __sync_single_inode+0x194/0x470
[<c010a10c>] common_interrupt+0x18/0x20
[<c019cce1>] sync_sb_inodes+0x1d1/0x3d0
[<c019d041>] writeback_inodes+0x161/0x420
[<c014d0a9>] get_page_state+0x19/0x20
[<c014e1bc>] wb_kupdate+0xdc/0x160
[<c014eb0c>] __pdflush+0x21c/0x5f0
[<c014eee0>] pdflush+0x0/0x20
[<c014eef1>] pdflush+0x11/0x20
[<c014e0e0>] wb_kupdate+0x0/0x160
[<c0107259>] kernel_thread_helper+0x5/0xc


2003-08-26 18:01:57

by Jens Axboe

[permalink] [raw]
Subject: Re: [2.6] kernel BUG at arch/i386/mm/highmem.c:14!

On Tue, Aug 26 2003, Roger Luethi wrote:
> This BUG immediately followed my turning off DMA on hda (with a mounted
> partition and dirty buffers). Register and stack info are available upon
> request (screenshot). 2.6.0-test4.
>
> Roger
>
> kernel BUG at arch/i386/mm/highmem.c:14!
> invalid operand: 0000 [#1]
> CPU: 0
> EIP: 0060:[<c011bc36>] Not tainted
> EFLAGS: 00010206
> EIP is at kunmap+0x16/0x30
> [...]
> Process pdflush [...]
>
> [<c0158f69>] __blk_queue_bounce+0x1e9/0x230
> [<c0158fe7>] blk_queue_bounce+0x37/0x60

Funky, it is indeed strictly forbidden to enter this path with
interrupts disabled.

> [<c022be20>] __make_request+0x50/0x670
> [<c023ccb0>] task_mulout_intr+0x0/0x290
> [<c022c584>] generic_make_request+0x144/0x1e0

Possibly some artifact of using read/write multiple on IDE, does it work
if you disable that with -m0 on the device? I'm guessing the error is
100% reproducible?

--
Jens Axboe

2003-08-26 19:03:26

by Roger Luethi

[permalink] [raw]
Subject: Re: [2.6] kernel BUG at arch/i386/mm/highmem.c:14!

On Tue, 26 Aug 2003 20:01:46 +0200, Jens Axboe wrote:
> Possibly some artifact of using read/write multiple on IDE, does it work
> if you disable that with -m0 on the device? I'm guessing the error is
> 100% reproducible?

I'm guessing you're right (small sample size, though).

I made one (successful) attempt at reproducing the BUG (copy some 300 MB of
data to a partition, then disable DMA on the target disk).

I made two further attempts but with all kernel output on the serial
console, one with -m16 and one with -m0. Both resulted in a flood of "bad:
scheduling while atomic!", but no BUG. The -m0 version looked like a flood
in slow motion (line by line).

I suppose the first BUG had the scheduling messages, too, but I didn't see
them because they had scrolled past by the time I got to the console. I did
see those "bad: scheduling while atomic!" messages, tough, when I
reproduced the BUG.

Call traces would look like this:
----------------------------------------------------------------------------
hda: DMA disabled
bad: scheduling while atomic!
Call Trace:
[<c011ce00>] schedule+0x510/0x520
[<c022a820>] blk_run_queues+0x120/0x300
[<c0147e50>] find_get_page+0x70/0x140
[<c011ed6e>] io_schedule+0xe/0x20
[<c0171f7f>] __wait_on_buffer+0xcf/0xe0
[<c011fa80>] autoremove_wake_function+0x0/0x50
[<c011fa80>] autoremove_wake_function+0x0/0x50
[<c01de17a>] flush_commit_list+0x31a/0x440
[<c01e28fb>] do_journal_end+0x5fb/0xc00
[<c01e1b79>] flush_old_commits+0x139/0x1d0
[<c01cf940>] reiserfs_write_super+0x30/0x40
[<c01791bf>] sync_supers+0x1ef/0x280
[<c014e143>] wb_kupdate+0x63/0x160
[<c011cae2>] schedule+0x1f2/0x520
[<c014eb0c>] __pdflush+0x21c/0x5f0
[<c014eee0>] pdflush+0x0/0x20
[<c014eef1>] pdflush+0x11/0x20
[<c014e0e0>] wb_kupdate+0x0/0x160
[<c0107259>] kernel_thread_helper+0x5/0xc

bad: scheduling while atomic!
Call Trace:
[<c011ce00>] schedule+0x510/0x520
[...]
----------------------------------------------------------------------------

Or this:

----------------------------------------------------------------------------
hda: DMA disabled
bad: scheduling while atomic!
Call Trace:
[<c011ce00>] schedule+0x510/0x520
[<c014ea26>] __pdflush+0x136/0x5f0
[<c014eee0>] pdflush+0x0/0x20
[<c014eef1>] pdflush+0x11/0x20
[<c0107259>] kernel_thread_helper+0x5/0xc

bad: scheduling while atomic!
Call Trace:
[<c011ce00>] schedule+0x510/0x520
[<c010babb>] handle_IRQ_event+0x3b/0x70
[<c012d7cc>] schedule_timeout+0x8c/0xe0
[<c012d730>] process_timeout+0x0/0x10
[<c011ed91>] io_schedule_timeout+0x11/0x20
[<c022bb1b>] blk_congestion_wait+0x8b/0xa0
[<c011fa80>] autoremove_wake_function+0x0/0x50
[<c011fa80>] autoremove_wake_function+0x0/0x50
[<c019cee1>] writeback_inodes+0x1/0x420
[<c014e1e4>] wb_kupdate+0x104/0x160
[<c014eb0c>] __pdflush+0x21c/0x5f0
[<c014eee0>] pdflush+0x0/0x20
[<c014eef1>] pdflush+0x11/0x20
[<c014e0e0>] wb_kupdate+0x0/0x160
[<c0107259>] kernel_thread_helper+0x5/0xc

bad: scheduling while atomic!
Call Trace:
[...]
----------------------------------------------------------------------------

2003-08-29 08:35:58

by Jens Axboe

[permalink] [raw]
Subject: Re: [2.6] kernel BUG at arch/i386/mm/highmem.c:14!

On Tue, Aug 26 2003, Roger Luethi wrote:
> On Tue, 26 Aug 2003 20:01:46 +0200, Jens Axboe wrote:
> > Possibly some artifact of using read/write multiple on IDE, does it work
> > if you disable that with -m0 on the device? I'm guessing the error is
> > 100% reproducible?
>
> I'm guessing you're right (small sample size, though).
>
> I made one (successful) attempt at reproducing the BUG (copy some 300 MB of
> data to a partition, then disable DMA on the target disk).
>
> I made two further attempts but with all kernel output on the serial
> console, one with -m16 and one with -m0. Both resulted in a flood of "bad:
> scheduling while atomic!", but no BUG. The -m0 version looked like a flood
> in slow motion (line by line).
>
> I suppose the first BUG had the scheduling messages, too, but I didn't see
> them because they had scrolled past by the time I got to the console. I did
> see those "bad: scheduling while atomic!" messages, tough, when I
> reproduced the BUG.

Strange, I cannot reproduce the problem here locally... A look over of
the code doesn't reveal any problems either. I'll keep it in mind,
though.

>
> Call traces would look like this:
> ----------------------------------------------------------------------------
> hda: DMA disabled
> bad: scheduling while atomic!
> Call Trace:
> [<c011ce00>] schedule+0x510/0x520
> [<c022a820>] blk_run_queues+0x120/0x300
> [<c0147e50>] find_get_page+0x70/0x140
> [<c011ed6e>] io_schedule+0xe/0x20
> [<c0171f7f>] __wait_on_buffer+0xcf/0xe0
> [<c011fa80>] autoremove_wake_function+0x0/0x50
> [<c011fa80>] autoremove_wake_function+0x0/0x50
> [<c01de17a>] flush_commit_list+0x31a/0x440
> [<c01e28fb>] do_journal_end+0x5fb/0xc00
> [<c01e1b79>] flush_old_commits+0x139/0x1d0
> [<c01cf940>] reiserfs_write_super+0x30/0x40
> [<c01791bf>] sync_supers+0x1ef/0x280
> [<c014e143>] wb_kupdate+0x63/0x160
> [<c011cae2>] schedule+0x1f2/0x520
> [<c014eb0c>] __pdflush+0x21c/0x5f0
> [<c014eee0>] pdflush+0x0/0x20
> [<c014eef1>] pdflush+0x11/0x20
> [<c014e0e0>] wb_kupdate+0x0/0x160
> [<c0107259>] kernel_thread_helper+0x5/0xc

These traces look suspicious. Are you using a standard kernel? What
options?

--
Jens Axboe

2003-08-29 16:23:45

by Andrew Morton

[permalink] [raw]
Subject: Re: [2.6] kernel BUG at arch/i386/mm/highmem.c:14!

Jens Axboe <[email protected]> wrote:
>
> > ----------------------------------------------------------------------------
> > hda: DMA disabled
> > bad: scheduling while atomic!
> > Call Trace:
> > [<c011ce00>] schedule+0x510/0x520
> > [<c022a820>] blk_run_queues+0x120/0x300
> > [<c0147e50>] find_get_page+0x70/0x140
> > [<c011ed6e>] io_schedule+0xe/0x20
> > [<c0171f7f>] __wait_on_buffer+0xcf/0xe0
> > [<c011fa80>] autoremove_wake_function+0x0/0x50
> > [<c011fa80>] autoremove_wake_function+0x0/0x50
> > [<c01de17a>] flush_commit_list+0x31a/0x440
> > [<c01e28fb>] do_journal_end+0x5fb/0xc00
> > [<c01e1b79>] flush_old_commits+0x139/0x1d0
> > [<c01cf940>] reiserfs_write_super+0x30/0x40
> > [<c01791bf>] sync_supers+0x1ef/0x280
> > [<c014e143>] wb_kupdate+0x63/0x160
> > [<c011cae2>] schedule+0x1f2/0x520
> > [<c014eb0c>] __pdflush+0x21c/0x5f0
> > [<c014eee0>] pdflush+0x0/0x20
> > [<c014eef1>] pdflush+0x11/0x20
> > [<c014e0e0>] wb_kupdate+0x0/0x160
> > [<c0107259>] kernel_thread_helper+0x5/0xc
>
> These traces look suspicious. Are you using a standard kernel? What
> options?

This trace has been reported several times, and it always correlates with
an IDE error message of some form.

I'm suspecting that something in the IDE error handling path either forgets
to undo a spinlock or leaves interrupts disabled.