2006-09-26 15:44:56

by Ben Duncan

[permalink] [raw]
Subject: EIP Errors kernel 2.6.18 ...

Getting EIP erros with 2.6.18 ...
following from Syslog ..

Any Ideas what is going on?
Thanks ...

Sep 26 04:46:23 desktop kernel: ------------[ cut here ]------------
Sep 26 04:46:23 desktop kernel: kernel BUG at lib/radix-tree.c:404!
Sep 26 04:46:23 desktop kernel: invalid opcode: 0000 [#1]
Sep 26 04:46:23 desktop kernel: DEBUG_PAGEALLOC
Sep 26 04:46:23 desktop kernel: Modules linked in: sr_mod nvidia uhci_hcd nvidia_agp
i2c_nforce2 sata_nv sd_mod ide_scsi agpgart sata_sil libata genrtc
Sep 26 04:46:23 desktop kernel: CPU: 0
Sep 26 04:46:23 desktop kernel: EIP: 0060:[<c01ba714>] Tainted: P VLI
Sep 26 04:46:23 desktop kernel: EFLAGS: 00010046 (2.6.18 #1)
Sep 26 04:46:23 desktop kernel: EIP is at radix_tree_tag_set+0x6a/0xa2
Sep 26 04:46:23 desktop kernel: eax: 00000001 ebx: 00000001 ecx: 00000001 edx: 00000001
Sep 26 04:46:23 desktop kernel: esi: 00000000 edi: 00000000 ebp: f7cf3d70 esp: f7cf3d54
Sep 26 04:46:23 desktop kernel: ds: 007b es: 007b ss: 0068
Sep 26 04:46:23 desktop kernel: Process pdflush (pid: 181, ti=f7cf2000 task=f7cd6a90
task.ti=f7cf2000)
Sep 26 04:46:23 desktop kernel: Stack: 00000001 00000001 00050001 f71a2f4c c1061be0 f71a2f48
00000000 f7cf3d88
Sep 26 04:46:23 desktop kernel: c013579e 00000286 f57d5f68 c1061be0 00050002 f7cf3db8
c014ccc8 00000000
Sep 26 04:46:23 desktop kernel: 00001000 f57d5f68 000773b0 00000000 c0150760 f71a2e70
00000000 c1061be0
Sep 26 04:46:23 desktop kernel: Call Trace:
Sep 26 04:46:23 desktop kernel: [<c0103485>] show_stack_log_lvl+0x8f/0x97
Sep 26 04:46:23 desktop kernel: [<c01035e6>] show_registers+0x116/0x17f
Sep 26 04:46:23 desktop kernel: [<c01037cb>] die+0x108/0x1ba
Sep 26 04:46:23 desktop kernel: [<c01038f9>] do_trap+0x7c/0x96
Sep 26 04:46:23 desktop kernel: [<c0103b64>] do_invalid_op+0x95/0x9c
Sep 26 04:46:23 desktop kernel: [<c0103161>] error_code+0x39/0x40
Sep 26 04:46:23 desktop kernel: [<c013579e>] test_set_page_writeback+0x79/0xd1
Sep 26 04:46:23 desktop kernel: [<c014ccc8>] __block_write_full_page+0x18f/0x292
Sep 26 04:46:23 desktop kernel: [<c014e087>] block_write_full_page+0x9e/0xa6
Sep 26 04:46:23 desktop kernel: [<c0150850>] blkdev_writepage+0xf/0x11
Sep 26 04:46:23 desktop kernel: [<c0168e74>] mpage_writepages+0x1aa/0x304
Sep 26 04:46:23 desktop kernel: [<c01519a3>] generic_writepages+0xa/0xf
Sep 26 04:46:23 desktop kernel: [<c013531b>] do_writepages+0x25/0x38
Sep 26 04:46:23 desktop kernel: [<c016788e>] __sync_single_inode+0x62/0x1bc
Sep 26 04:46:23 desktop kernel: [<c0167b31>] __writeback_single_inode+0x149/0x151
Sep 26 04:46:23 desktop kernel: [<c0167ce3>] sync_sb_inodes+0x1aa/0x275
Sep 26 04:46:23 desktop kernel: [<c0167e39>] writeback_inodes+0x8b/0xd9
Sep 26 04:46:23 desktop kernel: [<c01351aa>] wb_kupdate+0x70/0xd3
Sep 26 04:46:23 desktop kernel: [<c013590a>] __pdflush+0xda/0x171
Sep 26 04:46:23 desktop kernel: [<c01359ca>] pdflush+0x29/0x2b
Sep 26 04:46:23 desktop kernel: [<c01242ee>] kthread+0x79/0xa1
Sep 26 04:46:23 desktop kernel: [<c0100d19>] kernel_thread_helper+0x5/0xb
Sep 26 04:46:23 desktop kernel: Code: 83 e0 3f 89 45 e4 8d 14 ce 0f a3 82 04 01 00 00 19 c0
85 c0 75 0a 8b 45 e4 0f ab 82 04 01 00 00 8b 55 e4 8b 74 96 04 85 f6 75 08 <0f> 0
b 94 01 70 35 31 c0 83 ef 06 4b 75 bd 85 f6 74 1c 8b 4d e8
Sep 26 04:46:23 desktop kernel: EIP: [<c01ba714>] radix_tree_tag_set+0x6a/0xa2 SS:ESP
0068:f7cf3d54


--
Ben Duncan - Business Network Solutions, Inc. 336 Elton Road Jackson MS, 39212
"Never attribute to malice, that which can be adequately explained by stupidity"
- Hanlon's Razor


2006-09-26 15:52:03

by Michal Piotrowski

[permalink] [raw]
Subject: Re: EIP Errors kernel 2.6.18 ...

Hi,

On 26/09/06, Ben Duncan <[email protected]> wrote:
> Getting EIP erros with 2.6.18 ...
> following from Syslog ..
>
> Any Ideas what is going on?
> Thanks ...
>
> Sep 26 04:46:23 desktop kernel: ------------[ cut here ]------------
> Sep 26 04:46:23 desktop kernel: kernel BUG at lib/radix-tree.c:404!
> Sep 26 04:46:23 desktop kernel: invalid opcode: 0000 [#1]
> Sep 26 04:46:23 desktop kernel: DEBUG_PAGEALLOC
> Sep 26 04:46:23 desktop kernel: Modules linked in: sr_mod nvidia uhci_hcd nvidia_agp
> i2c_nforce2 sata_nv sd_mod ide_scsi agpgart sata_sil libata genrtc
> Sep 26 04:46:23 desktop kernel: CPU: 0
> Sep 26 04:46:23 desktop kernel: EIP: 0060:[<c01ba714>] Tainted: P VLI

"When emailing [email protected], please attach an
nvidia-bug-report.log, which is generated by running
"nvidia-bug-report.sh". "
http://www.nvidia.com/object/linux_display_ia32_1.0-8774.html

Please send this report to Nvidia.

Regards,
Michal

--
Michal K. K. Piotrowski
LTG - Linux Testers Group
(http://www.stardust.webpages.pl/ltg/)

2006-09-26 17:50:19

by Ben Duncan

[permalink] [raw]
Subject: Re: EIP Errors kernel 2.6.18 .AND hard lockup ...

Ok, first IANAKP (I am not a kernel programmer ) ;-> ...
But, just got another hard lockup and EIP code ...

I am not sure why this should got to nVidia (Please, I
personally know the Head of nVidias' Linux driver development,
so if it is a nVidia problem, I can help there).

Anway,

EIP shows :

desktop kernel: CPU: 0
desktop kernel: EIP: 0060:[<c01ba714>] Tainted: P VLI
EFLAGS: 00010046 (2.6.18 #1)
desktop kernel: EIP is at radix_tree_tag_set+0x6a/0xa2
desktop kernel: eax: 00000001 ebx: 00000001 ecx: 00000001 edx: 00000001
desktop kernel: esi: 00000000 edi: 00000000 ebp: f7cf3d70 esp: f7cf3d54
desktop kernel: ds: 007b es: 007b ss: 0068
desktop kernel: Process pdflush (pid: 181, ti=f7cf2000 task=f7cd6a90 task.ti=f7cf2000)
desktop kernel: Stack: 00000001 00000001 00050001 f71a2f4c c1061be0 f71a2f48 00000000
f7cf3d88c01ba3cc t radix_tree_node_alloc
c01ba416 T radix_tree_preload
c01ba475 t radix_tree_extend
c01ba4ff T radix_tree_insert
c01ba5fc T radix_tree_lookup_slot
c01ba651 T radix_tree_lookup
c01ba6aa T radix_tree_tag_set
c01ba74c T radix_tree_tag_clear
c01ba81a t __lookup
c01ba906 T radix_tree_gang_lookup

desktop kernel: c013579e 00000286 f57d5f68 c1061be0 00050002 f7cf3db8 c014ccc8 00000000
desktop kernel: 00001000 f57d5f68 000773b0 00000000 c0150760 f71a2e70 00000000 c1061be0

With current system map showing:

c01ba3cc t radix_tree_node_alloc
c01ba416 T radix_tree_preload
c01ba475 t radix_tree_extend
c01ba4ff T radix_tree_insert
c01ba5fc T radix_tree_lookup_slot
c01ba651 T radix_tree_lookup
c01ba6aa T radix_tree_tag_set
c01ba74c T radix_tree_tag_clear
c01ba81a t __lookup
c01ba906 T radix_tree_gang_lookup

The "radix lookup" seems to be occuring inside of the radix_tree_tag_set
and in particular the pdflush routine.

fgrep'ing kernel shows this is in the lib/radix-tree routine ..

To me seems to be a PDFLUSH eip and the nvidia stuff is just
a by product of loaded modules, no?

Thnaks ..

Michal Piotrowski wrote:
> Hi,
>
<SNIP>
>
> "When emailing [email protected], please attach an
> nvidia-bug-report.log, which is generated by running
> "nvidia-bug-report.sh". "
> http://www.nvidia.com/object/linux_display_ia32_1.0-8774.html
>
> Please send this report to Nvidia.
>
> Regards,
> Michal
>

--
Ben Duncan - Business Network Solutions, Inc. 336 Elton Road Jackson MS, 39212
"Never attribute to malice, that which can be adequately explained by stupidity"
- Hanlon's Razor

2006-09-26 18:00:45

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: EIP Errors kernel 2.6.18 .AND hard lockup ...

On Tue, 26 Sep 2006 12:39:49 CDT, Ben Duncan said:
> I am not sure why this should got to nVidia (Please, I
> personally know the Head of nVidias' Linux driver development,
> so if it is a nVidia problem, I can help there).

Maybe is, maybe isn't.

> desktop kernel: EIP: 0060:[<c01ba714>] Tainted: P VLI
proprietary module loaded--^

> To me seems to be a PDFLUSH eip and the nvidia stuff is just
> a by product of loaded modules, no?

The point is that we can't know that the NVidia module hasn't stomped on
some random memory location that happened to corrupt a radix tree. Note
that this is true even if you've loaded and then unloaded the module - it
may have splatted something before it departed....

Is it a replicatable error, and if so, can you replicate it without loading
the NVidia module? If you can come up with a traceback that doesn't have
an NVidia tainting in it, we'll be glad to look at it. Conversely, if you're
able to replicate it with nvidia loaded, but not without, toss it over
the fence to your friend.


Attachments:
(No filename) (226.00 B)

2006-09-26 18:19:54

by Ben Duncan

[permalink] [raw]
Subject: Re: EIP Errors kernel 2.6.18 .AND hard lockup ...

Ok, I can remove the module so it no longer is loaded ..

It is replicate able, but randomly. Seems to occur when I hammer on
the SATA drive in the system, which is running on a add-on SIL 3112a
controller card.

Anyway, driver is removed, system rebooted, ksyms logged.
I will hammer again on the system to see if it fails ...

Thanks ...

[email protected] wrote:
>
>
>>desktop kernel: EIP: 0060:[<c01ba714>] Tainted: P VLI
>
> proprietary module loaded--^
>
>
>>To me seems to be a PDFLUSH eip and the nvidia stuff is just
>>a by product of loaded modules, no?
>
>
> The point is that we can't know that the NVidia module hasn't stomped on
> some random memory location that happened to corrupt a radix tree. Note
> that this is true even if you've loaded and then unloaded the module - it
> may have splatted something before it departed....
>
> Is it a replicatable error, and if so, can you replicate it without loading
> the NVidia module? If you can come up with a traceback that doesn't have
> an NVidia tainting in it, we'll be glad to look at it. Conversely, if you're
> able to replicate it with nvidia loaded, but not without, toss it over
> the fence to your friend.

--
Ben Duncan - Business Network Solutions, Inc. 336 Elton Road Jackson MS, 39212
"Never attribute to malice, that which can be adequately explained by stupidity"
- Hanlon's Razor

2006-10-25 14:08:13

by Ben Duncan

[permalink] [raw]
Subject: EIP Errors kernel 2.6.18 .AND hard lockup ... Revisted

Ok, Same stuff, different day. Took the nVidia drivers out.
Ran HOT MEM checks, all passed (elimnating RAM issues).

Same as this time last month.

Got hard lock ups every 2 - 3 days. No syslog / debug ever given -
pdflush ,when top is left running on console - shows 100% CPU usage.

FINALLY got this morning a EIP error :

Oct 25 04:45:02 desktop kernel: BUG: unable to handle kernel NULL pointer dereference at
virtual address 00000020
Oct 25 04:45:02 desktop kernel: printing eip:
Oct 25 04:45:02 desktop kernel: c0160620
Oct 25 04:45:02 desktop kernel: *pde = 00000000
Oct 25 04:45:02 desktop kernel: Oops: 0000 [#1]
Oct 25 04:45:02 desktop kernel: DEBUG_PAGEALLOC
Oct 25 04:45:02 desktop kernel: Modules linked in: nls_iso8859_15 usb_storage uhci_hcd
nvidia_agp i2c_nforce2 sata_nv sd
_mod ide_scsi agpgart sata_sil libata genrtc
Oct 25 04:45:02 desktop kernel: CPU: 0
Oct 25 04:45:02 desktop kernel: EIP: 0060:[<c0160620>] Not tainted VLI
Oct 25 04:45:02 desktop kernel: EFLAGS: 00010283 (2.6.18 #1)
Oct 25 04:45:02 desktop kernel: EIP is at iput+0x17/0x65
Oct 25 04:45:02 desktop kernel: eax: 00000000 ebx: cf2aee70 ecx: cf2aee88 edx: f7cf4000
Oct 25 04:45:02 desktop kernel: esi: cf2aee70 edi: d5e8c128 ebp: f7cf5e84 esp: f7cf5e80
Oct 25 04:45:02 desktop kernel: ds: 007b es: 007b ss: 0068
Oct 25 04:45:02 desktop kernel: Process kswapd0 (pid: 182, ti=f7cf4000 task=f7cdaa90
task.ti=f7cf4000)
Oct 25 04:45:02 desktop kernel: Stack: d5e8c120 f7cf5e98 c015dc4e d5e8c120 d5e8c120 d5e8c128
f7cf5ea8 c015e093
Oct 25 04:45:02 desktop kernel: f5e96e34 d5e8c120 f7cf5ec4 c015e1cc 00000000 0000003c
00017124 00000000
Oct 25 04:45:02 desktop kernel: 000000a6 f7cf5ecc c015e441 f7cf5f08 c0136fef 005c4900
00000000 00017124
Oct 25 04:45:02 desktop kernel: Call Trace:
Oct 25 04:45:02 desktop kernel: [<c0103485>] show_stack_log_lvl+0x8f/0x97
Oct 25 04:45:02 desktop kernel: [<c01035e6>] show_registers+0x116/0x17f
Oct 25 04:45:02 desktop kernel: [<c01037cb>] die+0x108/0x1ba
Oct 25 04:45:02 desktop kernel: [<c010ecb3>] do_page_fault+0x3a4/0x481
Oct 25 04:45:02 desktop kernel: [<c0103161>] error_code+0x39/0x40
Oct 25 04:45:02 desktop kernel: [<c015dc4e>] dentry_iput+0x5b/0x73
Oct 25 04:45:02 desktop kernel: [<c015e093>] prune_one_dentry+0x56/0x79
Oct 25 04:45:02 desktop kernel: [<c015e1cc>] prune_dcache+0x116/0x14a
Oct 25 04:45:02 desktop kernel: [<c015e441>] shrink_dcache_memory+0x19/0x31
Oct 25 04:45:02 desktop kernel: [<c0136fef>] shrink_slab+0x12f/0x18a
Oct 25 04:45:02 desktop kernel: [<c013803b>] balance_pgdat+0x1c4/0x29c
Oct 25 04:45:02 desktop kernel: [<c0138207>] kswapd+0xf4/0xf6
Oct 25 04:45:02 desktop kernel: [<c01242ee>] kthread+0x79/0xa1
Oct 25 04:45:02 desktop kernel: [<c0100d19>] kernel_thread_helper+0x5/0xb
Oct 25 04:45:02 desktop kernel: Code: 00 55 89 e5 75 07 e8 de fd ff ff eb 05 e8 bd fe ff ff
c9 c3 55 85 c0 89 e5 53 89 c
3 74 58 83 bb 70 01 00 00 20 8b 80 cc 00 00 00 <8b> 40 20 75 08 0f 0b 73 04 e6 94 30 c0 85
c0 74 0b 8b 50 14 85
Oct 25 04:45:02 desktop kernel: EIP: [<c0160620>] iput+0x17/0x65 SS:ESP 0068:f7cf5e80

--------------------------------------------------------------------------------------

Problems seemt to have started when I added at start of summer, a SIL 3112A SATA controller
and a WD 250GB WD2500SD-01K Rev: 08.0 250GB SATA disk.


--
Ben Duncan - Business Network Solutions, Inc. 336 Elton Road Jackson MS, 39212
"Never attribute to malice, that which can be adequately explained by stupidity"
- Hanlon's Razor