Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755085AbYJHNah (ORCPT ); Wed, 8 Oct 2008 09:30:37 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753428AbYJHNa0 (ORCPT ); Wed, 8 Oct 2008 09:30:26 -0400 Received: from david.siemens.de ([192.35.17.14]:15630 "EHLO david.siemens.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753314AbYJHNaY (ORCPT ); Wed, 8 Oct 2008 09:30:24 -0400 Message-ID: <48ECB727.6050905@siemens.com> Date: Wed, 08 Oct 2008 15:35:35 +0200 From: "Hillier, Gernot" Organization: Siemens AG, CT SE 2 User-Agent: Mozilla/5.0 (X11; U; Linux i686; de; rv:1.8.1.9) Gecko/20070801 SUSE/2.0.0.9-0.1 Thunderbird/2.0.0.9 Mnenhy/0.7.5.666 MIME-Version: 1.0 To: Krzysztof Halasa CC: jesse.brandeburg@intel.com, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, bruce.w.allan@intel.com Subject: Re: e1000e: sporadic "hardware error"s with Intel 82563EB on Supermicro X7DB3 References: <48EB7161.60004@siemens.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5954 Lines: 112 Hello! Krzysztof Halasa wrote: > Hi, > > "Hillier, Gernot" writes: > >> On at least two machines using the Supermicro X7DB3 board with Intel >> 82563EB (a.k.a. PCI device 8086:1096), we see sporadic problems on modprobe >> (about 1 time in some hundred tries): >> >> e1000e: Intel(R) PRO/1000 Network Driver - 0.3.3.3-k2 >> e1000e: Copyright (c) 1999-2008 Intel Corporation. >> e1000e 0000:06:00.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18 >> e1000e 0000:06:00.0: setting latency timer to 64 >> 0000:06:00.0: 0000:06:00.0: Hardware Error > > What does "lspci -vv" say about it when the above happens? > > I spurious chip reset (hardware) could probably cause that. Here's the output of "lspci -vv" in the error case (for the eth devices): ------- SNIP ----------- 06:00.0 Class 0200: Device 8086:1096 (rev 01) Subsystem: Device 15d9:1096 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- Capabilities: [140] Device Serial Number 06-c7-66-ff-ff-48-30-00 Kernel driver in use: e1000e Kernel modules: e1000e 06:00.1 Class 0200: Device 8086:1096 (rev 01) Subsystem: Device 15d9:1096 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- Capabilities: [140] Device Serial Number 06-c7-66-ff-ff-48-30-00 Kernel driver in use: e1000e Kernel modules: e1000e ------- SNIP ----------- Retried this several times in the error and normal case. The only things which change are three values for device 06:00.0: - Control "DisINTx-" changes to "DisINTx+" if the card is correctly initialized - Interrupt changes from IRQ 18 to IRQ 4345 if card is correctly initialized - Message Signalled Interrupts change from "Enable-" to "Enable+" In addition, the "Data" field from "Message Signalled Interrupts" seems to change w/o any clear pattern. For 06:00.1, everything seems to be the same in the error as well as in the normal case. Does this tell you anything valuable? -- Gernot Hillier, Siemens AG, CT SE 2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/