Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751859AbZL2NyL (ORCPT ); Tue, 29 Dec 2009 08:54:11 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751818AbZL2NyK (ORCPT ); Tue, 29 Dec 2009 08:54:10 -0500 Received: from ppp-156-114.adsl.restena.lu ([158.64.156.114]:38539 "EHLO bonbons.gotdns.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751630AbZL2NyJ (ORCPT ); Tue, 29 Dec 2009 08:54:09 -0500 Date: Tue, 29 Dec 2009 14:54:03 +0100 From: Bruno =?UTF-8?B?UHLDqW1vbnQ=?= To: "Benjamin Li" Cc: "netdev@vger.kernel.org" , "Michael Chan" , "linux-kernel@vger.kernel.org" Subject: Re: BNX2: Kernel crashes with 2.6.31 and 2.6.31.9 Message-ID: <20091229145403.39f82773@pluto.restena.lu> In-Reply-To: <1262077540.12520.4.camel@localhost> References: <20091229084929.54912c0c@pluto.restena.lu> <1262077540.12520.4.camel@localhost> X-Mailer: Claws Mail 3.7.3 (GTK+ 2.16.6; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3140 Lines: 74 On Tue, 29 Dec 2009 01:05:40 "Benjamin Li" wrote: > Hi Bruno, > > It looks like the the NULL dereference is happening at a0fc. > > a0f8: 48 8b 42 70 mov 0x70(%rdx),%rax > a0fc: 0f b7 10 movzwl (%rax),%edx > a0ff: 31 c0 xor %eax,%eax > > The offset of 0x70 is the bp field in the bnx2_napi structure. (Seen > in the bnx2_napi structure dump below) These lines are found in the > routine, bnx2_get_hw_tx_cons() which look like they were inlined by > the compiler. More specifically it looks like the dereference of the > hw_tx_cons_ptr failed. > > cons = *bnapi->hw_tx_cons_ptr; > > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=drivers/net/bnx2.c;h=06b901152d4487fa04164437cc179661b44657fe;hb=74fca6a42863ffacaf7ba6f1936a9f228950f657#l2761 > > To be sure this is the case, could you send the .config file you are > using or if you could send me the bnx2 kernel module built with the > CFLAG '-g', then we can definitely verify where in the code it is > crashing. > > Did you see anything suspicious in the system kernel logs? If you > could isolate the logs from when the machine booted to when it crash > and send it to us it would be very helpful. It crashes every now and then (since netconsole is enabled it does not survive 24 hours :( ) while or just after transmitting log messages with netconsole, the messages being transmitted are logging that occurs with netfilter 'LOG' target. Sample output as seen by netconsole recipient (1 packet per line, IP addresses masked): [ 2115.949606] (reject)output: IN= OUT=eth0 SRC=***.**.*.** DST=**.***.**.*** LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=29589 DF PROTO=TCP SPT=58991 DPT=80 WINDOW=5840 RES=0x00 SYN URGP=0 [ 2115.949704] (reject)output: IN= OUT=eth0 SRC=***.**.*.** DST=**.***.**.*** [ 2115.949729] BUG: unable to handle kernel NULL pointer dereference at (null) [ 2115.949732] IP: [] bnx2_poll_work+0x2c/0x12d0 [bnx2] [ 2115.949742] PGD 5b6f0067 PUD 59c04067 PMD 0 [ 2115.949744] Oops: 0000 [#1] SMP [ 2115.949746] last sysfs file: /sys/kernel/uevent_seqnum [ 2115.949749] CPU 3 [ 2115.949750] Modules linked in: dm_round_robin scsi_dh_rdac ipmi_devintf netconsole squashfs configfs zlib_inflate ext2 loop dm_multipath scsi_dh dm_mod sg sr_mod cdrom ata_piix h pwdt qla2xxx ipmi_si ahci bnx2 ipmi_msghandler libata uhci_hcd ehci_hcd [ 2115.949764] Pid: 7926, comm: php-cgi Not tainted 2.6.31.9-x86_64 #1 ProLiant DL360 G5 [ 2115.949766] RIP: 0010:[] [] bnx2_poll_work+0x2c/0x12d0 [bnx2] Looks like netpoll is triggering suicide on BNX2. Any way to get the NULL-pointer non-fatal would help a lot! (any sensible thing to do when bnapi->hw_tx_cons_ptr is NULL that would allow the system to continue working without killing everything?) Regards, Bruno -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/