2001-11-25 22:18:20

by Jim Henderson

[permalink] [raw]
Subject: PROBLEM: kernel BUG at filemap.c:791

After compiling a 2.2.14 kernel on one of my systems, I've started
receiving this problem. Kernel is patched with the ext3 filesystem
patches, however running without ext3 being used results in this problem
as well.

This can happen at any time, I've seen it during startup and after the
system has been running for a couple of days; it is reproducable, but
not at will (ie, it's a given that it will happen but unknown as to what
causes it). Previous kernels I've run (2.4.2 and 2.4.6) do not exhibit
this problem.

Here's the crash info with symbols resolved:

--- snip ---

ksymoops 2.4.0 on i486 2.4.6. Options used
-v /usr/src/linux/vmlinux (specified)
-K (specified)
-L (specified)
-o /lib/modules/2.4.14/ (specified)
-m /boot/System.map-2.4.14 (specified)

No modules in ksyms, skipping objects
kernel BUG at filemap.c:791!
invalid operand: 0000
CPU: 0
EIP: 0010:[<c0122938>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010086
eax: 0000001d ebx: c1133980 ecx: c0270660 edx: 00006181
esi: c0080134 edi: c11e3c00 ebp: 00000001 esp: c4ef5b50
ds: 0018 es: 0018 ss: 0018
Process syslogd (pid: 365, stackpage=c4ef5000)
Stack: c022640b 00000317 00000000 c0080134 c01abbb4 c4cc96c0 00000001
00000002
c011a834 00000001 c02c65c0 00000000 c11444e0 24000001 0000000e
c4ef5bd4
c01082aa 0000000e c11e3c00 c4ef5bd4 c4ef5bd4 0000000e c02bcac0
c11444e0
Call Trace: [<c01abbb4>] [<c011a834>] [<c01082aa>] [<c010842d>]
[<c01ab6bc>]
[<c01176a0>] [<c0117473>] [<c010845c>] [<c011a740>] [<c019ec3a>]
[<c0117473>]
[<c010845c>] [<c019f2ea>] [<c019e75e>] [<c0117858>] [<c012ed26>]
[<c011778b>]
[<c01176a0>] [<c012f985>] [<c010845c>] [<c01ab8ba>] [<c01d8b6c>]
[<c01da75d>]
[<c020966f>] [<c02096cf>] [<c012fc95>] [<c0130640>] [<c0130b14>]
[<c0124db6>]
[<c0124e16>] [<c012e248>] [<c0124900>] [<c0161d4a>] [<c015fb74>]
[<c012f1aa>]
[<c0106ef3>]
Code: 0f 0b 5a 59 8d 43 28 8d 73 24 39 43 28 74 19 89 f0 5b b9 01

>>EIP; c0122938 <unlock_page+28/60> <=====
Trace; c01abbb4 <do_ida_intr+1f4/270>
Trace; c011a834 <timer_bh+24/250>
Trace; c01082aa <handle_IRQ_event+3a/70>
Trace; c010842d <do_IRQ+6d/b0>
Trace; c01ab6bc <do_ida_request+dc/2f0>
Trace; c01176a0 <tasklet_hi_action+50/80>
Trace; c0117473 <do_softirq+53/a0>
Trace; c010845c <do_IRQ+9c/b0>
Trace; c011a740 <update_process_times+20/b0>
Trace; c019ec3a <__make_request+fa/670>
Trace; c0117473 <do_softirq+53/a0>
Trace; c010845c <do_IRQ+9c/b0>
Trace; c019f2ea <generic_make_request+13a/150>
Trace; c019e75e <generic_unplug_device+1e/30>
Trace; c0117858 <__run_task_queue+48/60>
Trace; c012ed26 <__wait_on_buffer+56/90>
Trace; c011778b <bh_action+1b/50>
Trace; c01176a0 <tasklet_hi_action+50/80>
Trace; c012f985 <fsync_inode_data_buffers+e5/120>
Trace; c010845c <do_IRQ+9c/b0>
Trace; c01ab8ba <do_ida_request+2da/2f0>
Trace; c01d8b6c <__kfree_skb+dc/e0>
Trace; c01da75d <skb_free_datagram+1d/30>
Trace; c020966f <unix_dgram_recvmsg+9f/110>
Trace; c02096cf <unix_dgram_recvmsg+ff/110>
Trace; c012fc95 <__refile_buffer+55/60>
Trace; c0130640 <__block_commit_write+a0/c0>
Trace; c0130b14 <generic_commit_write+54/60>
Trace; c0124db6 <generic_file_write+4b6/590>
Trace; c0124e16 <generic_file_write+516/590>
Trace; c012e248 <do_readv_writev+1d8/260>
Trace; c0124900 <generic_file_write+0/590>
Trace; c0161d4a <ext2_update_inode+38a/3a0>
Trace; c015fb74 <ext2_fsync_inode+14/50>
Trace; c012f1aa <sys_fsync+5a/90>
Trace; c0106ef3 <system_call+33/40>
Code; c0122938 <unlock_page+28/60>
00000000 <_EIP>:
Code; c0122938 <unlock_page+28/60> <=====
0: 0f 0b ud2a <=====
Code; c012293a <unlock_page+2a/60>
2: 5a pop %edx
Code; c012293b <unlock_page+2b/60>
3: 59 pop %ecx
Code; c012293c <unlock_page+2c/60>
4: 8d 43 28 lea 0x28(%ebx),%eax
Code; c012293f <unlock_page+2f/60>
7: 8d 73 24 lea 0x24(%ebx),%esi
Code; c0122942 <unlock_page+32/60>
a: 39 43 28 cmp %eax,0x28(%ebx)
Code; c0122945 <unlock_page+35/60>
d: 74 19 je 28 <_EIP+0x28> c0122960
<unlock_page+50/60>
Code; c0122947 <unlock_page+37/60>
f: 89 f0 mov %esi,%eax
Code; c0122949 <unlock_page+39/60>
11: 5b pop %ebx
Code; c012294a <unlock_page+3a/60>
12: b9 01 00 00 00 mov $0x1,%ecx

<0>Kernel panic: Aiee, killing interrupt handler!

--- snip ---

The problem is reported in the unlock_page function in filemap.c with
this bit of code:

if (!test_and_clear_bit(PG_locked, &(page)->flags))
BUG();

Module-wise, the only options selected are for LAN drivers (not
including the one actually being used) and a SYM710 SCSI controller (for
an external CD-ROM drive that is powered off).

The following info comes from a stable 2.4.6 kernel procfs output (since
2.2.14 doesn't want to stay running long enough for me to get this info
from there):

CPU Info:

--- snip ---
processor : 0
vendor_id : AuthenticAMD
cpu family : 4
model : 14
model name : Am5x86-WT
stepping : 4
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu
bogomips : 66.35

--- snip ---

No SCSI drivers loaded, but a Compaq SMART controller is in use:

--- snip ---

ida0: Compaq SMART Controller
Board ID: 0x2040110e
Firmware Revision: 2.26
Controller Sig: 0x2ae7909f
Memory Address: 0x00000000
I/O Port: 0x5000
IRQ: 14
Logical drives: 1
Physical drives: 2

Current Q depth: 0
Max Q depth since init: 42

Logical Drive Info:
ida/c0d0: blksz=512 nr_blks=8217120
nr_allocs = 3961
nr_frees = 3961

--- snip ---

The system is an EISA system, Compaq Prosignia VS with 80 MB of RAM and
an AMD K5 processor, integrated AMD network adapter, Compaq SMART
controller.

The only other patch applied to this kernel is the one suggested in this
list for loopback device (loop.c to remove the deactivate_page(page)
calls which prevent the kernel from properly compiling.

I have a second system (Celeron 300A processor, 640 MB of RAM, and IDE
drives) that runs 2.2.14 with these same patches with no problems that
I've seen.

Please let me know if there's any other information needed to diagnose
this problem - this is the first time I've ever had a kernel crash, and
I hope I've included all the relevant information here. I don't
subscribe to the linux-kernel list, so all correspondence should be
copied to me directly as well if possible.

Thanks,

Jim


2001-11-25 23:10:57

by J Sloan

[permalink] [raw]
Subject: Re: PROBLEM: kernel BUG at filemap.c:791

Jim Henderson wrote:

> After compiling a 2.2.14 kernel on one of my systems, I've started
> receiving this problem. Kernel is patched with the ext3 filesystem
> patches, however running without ext3 being used results in this problem
> as well.

Wow, it must have taken quite some effort
to patch a 2.2 kernel for ext3!

OK, assuming you really mean 2.4.14, there
is a patch floating around the list for that -

I had a compaq 6500 that would scribble
on the disk and then lock up hard at some
random point in time - but that behaviour
could be triggered immediatley by running
dbench - Look for the compaq patches from
Jens Axboe or better yet, lose 2.4.14 and go
straight to 2.4.16-pre1, since it has the ida
raid fixes, and ext3 support already.

cu

jjs

> This can happen at any time, I've seen it during startup and after the
> system has been running for a couple of days; it is reproducable, but
> not at will (ie, it's a given that it will happen but unknown as to what
> causes it). Previous kernels I've run (2.4.2 and 2.4.6) do not exhibit
> this problem.

>
>
> Here's the crash info with symbols resolved:
>
> --- snip ---
>
> ksymoops 2.4.0 on i486 2.4.6. Options used
> -v /usr/src/linux/vmlinux (specified)
> -K (specified)
> -L (specified)
> -o /lib/modules/2.4.14/ (specified)
> -m /boot/System.map-2.4.14 (specified)
>
> No modules in ksyms, skipping objects
> kernel BUG at filemap.c:791!
> invalid operand: 0000
> CPU: 0
> EIP: 0010:[<c0122938>] Not tainted
> Using defaults from ksymoops -t elf32-i386 -a i386
> EFLAGS: 00010086
> eax: 0000001d ebx: c1133980 ecx: c0270660 edx: 00006181
> esi: c0080134 edi: c11e3c00 ebp: 00000001 esp: c4ef5b50
> ds: 0018 es: 0018 ss: 0018
> Process syslogd (pid: 365, stackpage=c4ef5000)
> Stack: c022640b 00000317 00000000 c0080134 c01abbb4 c4cc96c0 00000001
> 00000002
> c011a834 00000001 c02c65c0 00000000 c11444e0 24000001 0000000e
> c4ef5bd4
> c01082aa 0000000e c11e3c00 c4ef5bd4 c4ef5bd4 0000000e c02bcac0
> c11444e0
> Call Trace: [<c01abbb4>] [<c011a834>] [<c01082aa>] [<c010842d>]
> [<c01ab6bc>]
> [<c01176a0>] [<c0117473>] [<c010845c>] [<c011a740>] [<c019ec3a>]
> [<c0117473>]
> [<c010845c>] [<c019f2ea>] [<c019e75e>] [<c0117858>] [<c012ed26>]
> [<c011778b>]
> [<c01176a0>] [<c012f985>] [<c010845c>] [<c01ab8ba>] [<c01d8b6c>]
> [<c01da75d>]
> [<c020966f>] [<c02096cf>] [<c012fc95>] [<c0130640>] [<c0130b14>]
> [<c0124db6>]
> [<c0124e16>] [<c012e248>] [<c0124900>] [<c0161d4a>] [<c015fb74>]
> [<c012f1aa>]
> [<c0106ef3>]
> Code: 0f 0b 5a 59 8d 43 28 8d 73 24 39 43 28 74 19 89 f0 5b b9 01
>
> >>EIP; c0122938 <unlock_page+28/60> <=====
> Trace; c01abbb4 <do_ida_intr+1f4/270>
> Trace; c011a834 <timer_bh+24/250>
> Trace; c01082aa <handle_IRQ_event+3a/70>
> Trace; c010842d <do_IRQ+6d/b0>
> Trace; c01ab6bc <do_ida_request+dc/2f0>
> Trace; c01176a0 <tasklet_hi_action+50/80>
> Trace; c0117473 <do_softirq+53/a0>
> Trace; c010845c <do_IRQ+9c/b0>
> Trace; c011a740 <update_process_times+20/b0>
> Trace; c019ec3a <__make_request+fa/670>
> Trace; c0117473 <do_softirq+53/a0>
> Trace; c010845c <do_IRQ+9c/b0>
> Trace; c019f2ea <generic_make_request+13a/150>
> Trace; c019e75e <generic_unplug_device+1e/30>
> Trace; c0117858 <__run_task_queue+48/60>
> Trace; c012ed26 <__wait_on_buffer+56/90>
> Trace; c011778b <bh_action+1b/50>
> Trace; c01176a0 <tasklet_hi_action+50/80>
> Trace; c012f985 <fsync_inode_data_buffers+e5/120>
> Trace; c010845c <do_IRQ+9c/b0>
> Trace; c01ab8ba <do_ida_request+2da/2f0>
> Trace; c01d8b6c <__kfree_skb+dc/e0>
> Trace; c01da75d <skb_free_datagram+1d/30>
> Trace; c020966f <unix_dgram_recvmsg+9f/110>
> Trace; c02096cf <unix_dgram_recvmsg+ff/110>
> Trace; c012fc95 <__refile_buffer+55/60>
> Trace; c0130640 <__block_commit_write+a0/c0>
> Trace; c0130b14 <generic_commit_write+54/60>
> Trace; c0124db6 <generic_file_write+4b6/590>
> Trace; c0124e16 <generic_file_write+516/590>
> Trace; c012e248 <do_readv_writev+1d8/260>
> Trace; c0124900 <generic_file_write+0/590>
> Trace; c0161d4a <ext2_update_inode+38a/3a0>
> Trace; c015fb74 <ext2_fsync_inode+14/50>
> Trace; c012f1aa <sys_fsync+5a/90>
> Trace; c0106ef3 <system_call+33/40>
> Code; c0122938 <unlock_page+28/60>
> 00000000 <_EIP>:
> Code; c0122938 <unlock_page+28/60> <=====
> 0: 0f 0b ud2a <=====
> Code; c012293a <unlock_page+2a/60>
> 2: 5a pop %edx
> Code; c012293b <unlock_page+2b/60>
> 3: 59 pop %ecx
> Code; c012293c <unlock_page+2c/60>
> 4: 8d 43 28 lea 0x28(%ebx),%eax
> Code; c012293f <unlock_page+2f/60>
> 7: 8d 73 24 lea 0x24(%ebx),%esi
> Code; c0122942 <unlock_page+32/60>
> a: 39 43 28 cmp %eax,0x28(%ebx)
> Code; c0122945 <unlock_page+35/60>
> d: 74 19 je 28 <_EIP+0x28> c0122960
> <unlock_page+50/60>
> Code; c0122947 <unlock_page+37/60>
> f: 89 f0 mov %esi,%eax
> Code; c0122949 <unlock_page+39/60>
> 11: 5b pop %ebx
> Code; c012294a <unlock_page+3a/60>
> 12: b9 01 00 00 00 mov $0x1,%ecx
>
> <0>Kernel panic: Aiee, killing interrupt handler!
>
> --- snip ---
>
> The problem is reported in the unlock_page function in filemap.c with
> this bit of code:
>
> if (!test_and_clear_bit(PG_locked, &(page)->flags))
> BUG();
>
> Module-wise, the only options selected are for LAN drivers (not
> including the one actually being used) and a SYM710 SCSI controller (for
> an external CD-ROM drive that is powered off).
>
> The following info comes from a stable 2.4.6 kernel procfs output (since
> 2.2.14 doesn't want to stay running long enough for me to get this info
> from there):
>
> CPU Info:
>
> --- snip ---
> processor : 0
> vendor_id : AuthenticAMD
> cpu family : 4
> model : 14
> model name : Am5x86-WT
> stepping : 4
> fdiv_bug : no
> hlt_bug : no
> f00f_bug : no
> coma_bug : no
> fpu : yes
> fpu_exception : yes
> cpuid level : 1
> wp : yes
> flags : fpu
> bogomips : 66.35
>
> --- snip ---
>
> No SCSI drivers loaded, but a Compaq SMART controller is in use:
>
> --- snip ---
>
> ida0: Compaq SMART Controller
> Board ID: 0x2040110e
> Firmware Revision: 2.26
> Controller Sig: 0x2ae7909f
> Memory Address: 0x00000000
> I/O Port: 0x5000
> IRQ: 14
> Logical drives: 1
> Physical drives: 2
>
> Current Q depth: 0
> Max Q depth since init: 42
>
> Logical Drive Info:
> ida/c0d0: blksz=512 nr_blks=8217120
> nr_allocs = 3961
> nr_frees = 3961
>
> --- snip ---
>
> The system is an EISA system, Compaq Prosignia VS with 80 MB of RAM and
> an AMD K5 processor, integrated AMD network adapter, Compaq SMART
> controller.
>
> The only other patch applied to this kernel is the one suggested in this
> list for loopback device (loop.c to remove the deactivate_page(page)
> calls which prevent the kernel from properly compiling.
>
> I have a second system (Celeron 300A processor, 640 MB of RAM, and IDE
> drives) that runs 2.2.14 with these same patches with no problems that
> I've seen.
>
> Please let me know if there's any other information needed to diagnose
> this problem - this is the first time I've ever had a kernel crash, and
> I hope I've included all the relevant information here. I don't
> subscribe to the linux-kernel list, so all correspondence should be
> copied to me directly as well if possible.
>
> Thanks,
>
> Jim
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2001-11-25 23:18:07

by Jim Henderson

[permalink] [raw]
Subject: Re: PROBLEM: kernel BUG at filemap.c:791

J Sloan wrote:
> Wow, it must have taken quite some effort
> to patch a 2.2 kernel for ext3!
>
> OK, assuming you really mean 2.4.14, there
> is a patch floating around the list for that -

You are correct, I fat-fingered the kernel version, 2.4.14 is what I'm
running. Been running 2.4.x kernels for quite a while now, and I catch
myself freqently making that mistake when I type the version.

> I had a compaq 6500 that would scribble
> on the disk and then lock up hard at some
> random point in time - but that behaviour
> could be triggered immediatley by running
> dbench - Look for the compaq patches from
> Jens Axboe or better yet, lose 2.4.14 and go
> straight to 2.4.16-pre1, since it has the ida
> raid fixes, and ext3 support already.

Will give the 2.4.16-pre1 kernel a shot and see how it behaves.

I should have mentioned in my intial post as well that I've seen both
hard locks (as in this case) and soft locks (where I could use the
'magic sysreq key' feature), both of which referenced this particular
code segment.

Thanks for the quick response.

Jim