2007-11-09 09:55:58

by Mark Hindley

[permalink] [raw]
Subject: [OOPS] 2.6.23.1 in NFS

I got this OOPS this morning on a K6. Sorry it is tainted by madwifi.
ath5k doesn't (yet!) support my card. Box is K6 200, headless, used as
firewall/router

It looks like the same codepath as http://lkml.org/lkml/2007/11/6/275
but my kernel was definitely compiled on a local disk.

Let me know if you want any other info

Mark

Nov 9 08:55:36 titan kernel: kernel BUG at net/sunrpc/sched.c:909!
Nov 9 08:55:36 titan kernel: invalid opcode: 0000 [#1]
Nov 9 08:55:36 titan kernel: PREEMPT
Nov 9 08:55:36 titan kernel: Modules linked in: apm nfs softdog nfsd exportfs lockd sunrpc autofs4 ipv6 act_police sch_ingress cls_u32 sch_sfq sch_cbq via_rhine bitrev crc32 af_packet 3c59x mii bridge llc wlan_wep wlan_scan_ap ath_rate_sample ath_pci wlan ath_hal(P) floppy uhci_hcd ohci_hcd ehci_hcd usbcore ipt_owner ipt_REDIRECT xt_ limit ipt_recent xt_state ipt_REJECT ipt_LOG xt_tcpudp ipt_MASQUERADE iptable_filter iptable_nat ip_tables nf_nat x_tables nf_conntrack_ftp nf_conntrack_ipv4 nf_conntra ck nls_iso8859_1 nls_cp437 vfat fat tcp_diag inet_diag genrtc
Nov 9 08:55:36 titan kernel: CPU: 0
Nov 9 08:55:36 titan kernel: EIP: 0060:[<c4a5770d>] Tainted: P VLI
Nov 9 08:55:36 titan kernel: EFLAGS: 00210297 (2.6.23.1-mk6 #1)
Nov 9 08:55:36 titan kernel: EIP is at __rpc_execute+0x179/0x216 [sunrpc]
Nov 9 08:55:36 titan kernel: eax: 00000001 ebx: c198b9c4 ecx: c4a572e6 edx: 00000000
Nov 9 08:55:36 titan kernel: esi: c198ba2c edi: 00000000 ebp: 00005000 esp: c0caccc4
Nov 9 08:55:36 titan kernel: ds: 007b es: 007b fs: 0000 gs: 0033 ss: 0068
Nov 9 08:55:36 titan kernel: Process crm (pid: 7357, ti=c0cac000 task=c3a074c0 task.ti=c0cac000)
Nov 9 08:55:36 titan kernel: Stack: 00000001 c198b9c4 c198bb40 c198b9c4 c198b9c4 c198bb40 c305d500 c4ada61a
Nov 9 08:55:36 titan kernel: 00000000 00000000 c198b9c0 c198bb40 c0cacde0 c4ada78e 00004000 00000000
Nov 9 08:55:36 titan kernel: 00004000 c0cacde0 c2006840 c4ad8974 00004000 00000000 c20068c0 c0cacde0
Nov 9 08:55:36 titan kernel: Call Trace:
Nov 9 08:55:36 titan kernel: [<c4ada61a>] nfs_execute_read+0x30/0x3f [nfs]
Nov 9 08:55:36 titan kernel: [<c4ada78e>] nfs_pagein_one+0x93/0xc9 [nfs]
Nov 9 08:55:36 titan kernel: [<c4ad8974>] nfs_pageio_doio+0x2c/0x52 [nfs]
Nov 9 08:55:36 titan kernel: [<c4ad8a32>] nfs_pageio_add_request+0x98/0xa9 [nfs]
Nov 9 08:55:36 titan kernel: [<c4adaad6>] readpage_async_filler+0x11e/0x13b [nfs]
Nov 9 08:55:36 titan kernel: [<c4ada9b8>] readpage_async_filler+0x0/0x13b [nfs]
Nov 9 08:55:36 titan kernel: [read_cache_pages+111/183] read_cache_pages+0x6f/0xb7
Nov 9 08:55:36 titan kernel: [<c4adac13>] nfs_readpages+0x120/0x16f [nfs]
Nov 9 08:55:36 titan kernel: [<c4ada6fb>] nfs_pagein_one+0x0/0xc9 [nfs]
Nov 9 08:55:36 titan kernel: [<c4adaaf3>] nfs_readpages+0x0/0x16f [nfs]
Nov 9 08:55:36 titan kernel: [__do_page_cache_readahead+361/556] __do_page_cache_readahead+0x169/0x22c
Nov 9 08:55:36 titan kernel: [io_schedule+14/22] io_schedule+0xe/0x16
Nov 9 08:55:36 titan kernel: [__wait_on_bit_lock+74/81] __wait_on_bit_lock+0x4a/0x51
Nov 9 08:55:36 titan kernel: [do_page_cache_readahead+73/83] do_page_cache_readahead+0x49/0x53
Nov 9 08:55:36 titan kernel: [filemap_fault+406/913] filemap_fault+0x196/0x391
Nov 9 08:55:36 titan kernel: [__do_fault+80/740] __do_fault+0x50/0x2e4
Nov 9 08:55:36 titan kernel: [handle_mm_fault+711/1456] handle_mm_fault+0x2c7/0x5b0
Nov 9 08:55:36 titan kernel: [do_page_fault+530/1419] do_page_fault+0x212/0x58b
Nov 9 08:55:36 titan kernel: [update_stats_wait_end+155/190] update_stats_wait_end+0x9b/0xbe
Nov 9 08:55:36 titan kernel: [irq_exit+64/98] irq_exit+0x40/0x62
Nov 9 08:55:36 titan kernel: [do_page_fault+0/1419] do_page_fault+0x0/0x58b
Nov 9 08:55:36 titan kernel: [error_code+106/112] error_code+0x6a/0x70
Nov 9 08:55:36 titan kernel: =======================
Nov 9 08:55:36 titan kernel: Code: 22 8b 53 18 0f b7 83 94 00 00 00 89 7c 24 08 c7 04 24 f8 42 a6 c4 89 54 24 0c 89 44 24 04 e8 d0 3c 6c fb 81 3b aa 0b f0 00 74 04 <0f > 0b eb fe f6 05 88 89 a7 c4 40 74 17 0f b7 83 94 00 00 00 c7
Nov 9 08:55:36 titan kernel: EIP: [<c4a5770d>] __rpc_execute+0x179/0x216 [sunrpc] SS:ESP 0068:c0caccc4


Attachments:
(No filename) (4.16 kB)
config-2.6.23.1-mk6 (53.76 kB)
Download all attachments

2007-11-09 14:13:24

by Trond Myklebust

[permalink] [raw]
Subject: Re: [OOPS] 2.6.23.1 in NFS


On Fri, 2007-11-09 at 09:55 +0000, Mark Hindley wrote:
> I got this OOPS this morning on a K6. Sorry it is tainted by madwifi.
> ath5k doesn't (yet!) support my card. Box is K6 200, headless, used as
> firewall/router
>
> It looks like the same codepath as http://lkml.org/lkml/2007/11/6/275
> but my kernel was definitely compiled on a local disk.

You are hitting the BUG_ON() in line 909, whereas Mathieu was hitting
something further up in the same routine.

However I agree that they look suspicious. Both look as if something is
corrupting the rpc_task structure. In your case, a debugging tag that is
only ever touched by the RPC code at the very start and very finish of
the call is being changed, which points at something like a
use-after-free issue.

Hmm... I note that you are both running with the SLUB allocator. Any
chance you can reproduce using SLAB (and with the SLAB debugging
enabled)?

Trond


-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs