2002-08-09 17:28:32

by Badari Pulavarty

[permalink] [raw]
Subject: kernel BUG at /usr/src/linux-2.5.30/include/linux/dcache.h:261!

Hi,

I get following BUG() while trying to "rmmod" qlogic driver on 2.5.30.
Is this a known problem ? Any ideas to fix it ?

Thanks,
Badari

kernel BUG at /usr/src/linux-2.5.30/include/linux/dcache.h:261!
invalid operand: 0000
CPU: 0
EIP: 0010:[<c0160d0f>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010246
eax: 00000000 ebx: f67ff7c0 ecx: f67f44fc edx: c3c3d220
esi: f67f44fc edi: f67ff7c0 ebp: ffffffd9 esp: db46de90
ds: 0018 es: 0018 ss: 0018
Stack: f67f44a0 c0160d95 f67ff7c0 f67ff7e8 f67ff7c0 f67ff7e8 f8860700 c016157e
f69a7e80 f67ff7c0 f88607e0 f88607e0 0000000f c01948f5 f8860878 c03179a0
c031b508 f88607e0 c015fef5 f88607e0 f88607e0 c0309fec f88607e0 c0309ffc
Call Trace: [<c0160d95>] [<c016157e>] [<c01948f5>] [<c015fef5>] [<c020c52f>]
[<f89ac8a0>] [<c01e870a>] [<f899444a>] [<f89ac8a0>] [<c011859e>] [<c0117992>]
[<c0107173>]
Code: 0f 0b 05 01 00 db 2a c0 f0 ff 03 f0 fe 0d 80 0e 38 c0 0f 88

>>EIP; c0160d0f <d_unhash+f/70> <=====
Trace; c0160d95 <driverfs_rmdir+25/90>
Trace; c016157e <driverfs_remove_dir+8e/b2>
Trace; c01948f5 <put_device+65/82>
Trace; c015fef5 <driverfs_remove_partitions+65/a0>
Trace; c020c52f <sd_detach+af/110>
Trace; f89ac8a0 <END_OF_CODE+385cf3c4/????>
Trace; c01e870a <scsi_unregister_host+21a/4e0>
Trace; f899444a <END_OF_CODE+385b6f6e/????>
Trace; f89ac8a0 <END_OF_CODE+385cf3c4/????>
Trace; c011859e <free_module+1e/d0>
Trace; c0117992 <sys_delete_module+122/250>
Trace; c0107173 <syscall_call+7/b>
Code; c0160d0f <d_unhash+f/70>
00000000 <_EIP>:
Code; c0160d0f <d_unhash+f/70> <=====
0: 0f 0b ud2a <=====
Code; c0160d11 <d_unhash+11/70>
2: 05 01 00 db 2a add $0x2adb0001,%eax
Code; c0160d16 <d_unhash+16/70>
7: c0 (bad)
Code; c0160d17 <d_unhash+17/70>
8: f0 ff 03 lock incl (%ebx)
Code; c0160d1a <d_unhash+1a/70>
b: f0 fe 0d 80 0e 38 c0 lock decb 0xc0380e80
Code; c0160d21 <d_unhash+21/70>
12: 0f 88 00 00 00 00 js 18 <_EIP+0x18> c0160d27 <d_unhash+27/70>




2002-08-09 20:51:16

by Dave Hansen

[permalink] [raw]
Subject: Re: kernel BUG at /usr/src/linux-2.5.30/include/linux/dcache.h:261!

Badari Pulavarty wrote:
> Code; c0160d0f <d_unhash+f/70> <=====
> 0: 0f 0b ud2a <=====
> Code; c0160d11 <d_unhash+11/70>
> 2: 05 01 00 db 2a add $0x2adb0001,%eax
> Code; c0160d16 <d_unhash+16/70>
> 7: c0 (bad)

Doesn't that (bad) instruction look suspicious? Martin was seeing
strange oopses on Hummer (16-way NUMA-Q) compiling with egcs 2.91
because it was generating bad instructions. It may be another
problem, but that c0 jumped out at me. The two instructions after it
look bretty bogus too.

--
Dave Hansen
[email protected]

2002-08-09 20:59:11

by Andrew Morton

[permalink] [raw]
Subject: Re: kernel BUG at /usr/src/linux-2.5.30/include/linux/dcache.h:261!

Dave Hansen wrote:
>
> Badari Pulavarty wrote:
> > Code; c0160d0f <d_unhash+f/70> <=====
> > 0: 0f 0b ud2a <=====
> > Code; c0160d11 <d_unhash+11/70>
> > 2: 05 01 00 db 2a add $0x2adb0001,%eax
> > Code; c0160d16 <d_unhash+16/70>
> > 7: c0 (bad)
>
> Doesn't that (bad) instruction look suspicious? Martin was seeing
> strange oopses on Hummer (16-way NUMA-Q) compiling with egcs 2.91
> because it was generating bad instructions. It may be another
> problem, but that c0 jumped out at me. The two instructions after it
> look bretty bogus too.

We're encoding the file-and-line information in the program text
immediately after the undefined opcode, so you'll always see junk
in there. Sorry.

It would be much more useful if the oops code were to dump the
text preceding the exception EIP rather than after it, actually.
I think Keith said that ksymoops supports that.

2002-08-09 21:18:55

by Andries Brouwer

[permalink] [raw]
Subject: Re: kernel BUG at /usr/src/linux-2.5.30/include/linux/dcache.h:261!

On Fri, Aug 09, 2002 at 02:00:37PM -0700, Andrew Morton wrote:

> > > Code; c0160d0f <d_unhash+f/70> <=====

> It would be much more useful if the oops code were to dump the
> text preceding the exception EIP rather than after it, actually.

I think I already mentioned what the stack trace is for this oops:
for me, it is sd_detach -> driverfs_remove_partitions ->
put_device -> driverfs_remove_dir -> driverfs_rmdir -> d_unhash.

I have seen lots of other oopses related to driverfs.
Submitted a stopgap patch to prevent some of them, but withdrew it
when it became clear that even the ugly stopgap did not prevent all.

This driverfs partition stuff is messy. The code paths where partitions
are created are very different from those where partitions are removed,
and it can easily happen that a partition is removed that was never
created, leading to an oops.

Andries

2002-08-09 21:41:14

by Patrick Mochel

[permalink] [raw]
Subject: Re: kernel BUG at /usr/src/linux-2.5.30/include/linux/dcache.h:261!


On Fri, 9 Aug 2002, Andries Brouwer wrote:

> On Fri, Aug 09, 2002 at 02:00:37PM -0700, Andrew Morton wrote:
>
> > > > Code; c0160d0f <d_unhash+f/70> <=====
>
> > It would be much more useful if the oops code were to dump the
> > text preceding the exception EIP rather than after it, actually.
>
> I think I already mentioned what the stack trace is for this oops:
> for me, it is sd_detach -> driverfs_remove_partitions ->

For some reason, the put_device() is forcing the refcount to 0, which
shouldn't be happening. The refcounting model for devices is pretty wack
right now, and this is one of a few places that's hitting it..

To solve this issue, I really think that driverfs_remove_partitions can go
away. When a device's driverfs directory, all the files in it will be
removed, so explicitly removing them is unnecssary.

-pat


2002-08-10 01:08:17

by Keith Owens

[permalink] [raw]
Subject: Re: kernel BUG at /usr/src/linux-2.5.30/include/linux/dcache.h:261!

On Fri, 09 Aug 2002 14:00:37 -0700,
Andrew Morton <[email protected]> wrote:
>It would be much more useful if the oops code were to dump the
>text preceding the exception EIP rather than after it, actually.
>I think Keith said that ksymoops supports that.

Not only does ksymoops support it but some architectures already do
this. Mind you, they are not consistent :(

Alpha: Code: 44220001 f4200003 46520400 <a77d9c38> 6b9b4a40 a44803a8 42425401 42c10403 40603401
Arm: Code: e7973108 e1a02423 (e5c42001) e5c43000 e1a02823

If any instruction in the code line is enclosed in <> or () then
ksymoops assumes that the first byte is EIP. Otherwise the first byte
of the line is EIP. Anybody want to update i386/ia64/mips/... oops code
to dump context around the failing instruction? Maximum of 64 bytes in
the code line please.

2002-08-10 08:04:24

by Russell King

[permalink] [raw]
Subject: Re: kernel BUG at /usr/src/linux-2.5.30/include/linux/dcache.h:261!

On Sat, Aug 10, 2002 at 11:09:42AM +1000, Keith Owens wrote:
> On Fri, 09 Aug 2002 14:00:37 -0700,
> Andrew Morton <[email protected]> wrote:
> >It would be much more useful if the oops code were to dump the
> >text preceding the exception EIP rather than after it, actually.
> >I think Keith said that ksymoops supports that.
>
> Not only does ksymoops support it but some architectures already do
> this. Mind you, they are not consistent :(
>
> Alpha: Code: 44220001 f4200003 46520400 <a77d9c38> 6b9b4a40 a44803a8 42425401 42c10403 40603401
> Arm: Code: e7973108 e1a02423 (e5c42001) e5c43000 e1a02823
>
> If any instruction in the code line is enclosed in <> or () then
> ksymoops assumes that the first byte is EIP.

In 2.5, I changed ARM to indicate the last word as the EIP (so we get more
context as Andrew Morton suggests.) However, ksymoops now seems to ignore
the '()' !

At some point I plan to check what happens if its the second to last. I
suspect ksymoops is looking for the strings ' (' and ') ', the second of
which obviously doesn't exist.

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2002-08-10 08:40:47

by Keith Owens

[permalink] [raw]
Subject: Re: kernel BUG at /usr/src/linux-2.5.30/include/linux/dcache.h:261!

On Sat, 10 Aug 2002 09:08:04 +0100,
Russell King <[email protected]> wrote:
>In 2.5, I changed ARM to indicate the last word as the EIP (so we get more
>context as Andrew Morton suggests.) However, ksymoops now seems to ignore
>the '()' !
>
>At some point I plan to check what happens if its the second to last. I
>suspect ksymoops is looking for the strings ' (' and ') ', the second of
>which obviously doesn't exist.

ksymoops is scanning for (oops.c line 361)

"([<(]?)" /* 2 */
"([0-9a-fA-F]+)" /* 3 */
"[)>]?"
" *"

The trailing [)>] is required but any space after that is optional. It
works for me.

Code: e7973108 e1a02423 e5c42001 e5c43000 (e1a02823)

Code; c00160b8 No symbols available
00000000 <_EIP>:
Code; c00160b8 No symbols available
0: 08 31 or %dh,(%ecx)
Code; c00160ba No symbols available
2: 97 xchg %eax,%edi
Code; c00160bb No symbols available
3: e7 23 out %eax,$0x23
Code; c00160bd No symbols available
5: 24 a0 and $0xa0,%al
Code; c00160bf No symbols available
7: e1 01 loope a <_EIP+0xa> c00160c2 No symbols available
Code; c00160c1 No symbols available
9: 20 c4 and %al,%ah
Code; c00160c3 No symbols available
b: e5 00 in $0x0,%eax
Code; c00160c5 No symbols available
d: 30 c4 xor %al,%ah
Code; c00160c7 No symbols available <=====
f: e5 23 in $0x23,%eax <=====
Code; c00160c9 No symbols available
11: 28 a0 e1 00 00 00 sub %ah,0xe1(%eax)

Disassembling arm as i386 is pointless, but it shows that ksymoops
2.4.5 recognises () as the last code fragment.