After kernel was bootup through kexec command, the NIC failed to
initialize. The 2.5.52 kernel was patched with kexec and kexec-hwfix
patch.
the following was is the dmesg output:
Linux version 2.5.52 (root@aminoacin) (gcc version 2.96 20000731 (Red
Hat Linux 7.1 2.96-81)) #1 SMP Fri Jan 24 14:17:58 CST 2003
Video mode to be used for restore is ffff
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 00000000000a0000 (usable)
BIOS-e820: 0000000000100000 - 000000000ff87000 (usable)
BIOS-e820: 000000000ff87000 - 000000000ffa6000 (ACPI data)
BIOS-e820: 000000000ffa6000 - 0000000010000000 (reserved)
BIOS-e820: 00000000ffb00000 - 0000000100000000 (reserved)
255MB LOWMEM available.
On node 0 totalpages: 65415
DMA zone: 4096 pages, LIFO batch:1
Normal zone: 61319 pages, LIFO batch:14
HighMem zone: 0 pages, LIFO batch:1
ACPI: RSDP (v000 DELL ) @ 0x000fd730
ACPI: RSDT (v001 DELL GX400 00000.00005) @ 0x000fd744
ACPI: FADT (v001 DELL GX400 00000.00005) @ 0x000fd774
ACPI: SSDT (v001 DELL st_ex 00000.04096) @ 0xfffe7279
ACPI: BOOT (v001 DELL GX400 00000.00005) @ 0x000fd7e8
ACPI: DSDT (v001 DELL dt_ex 00000.04096) @ 0x00000000
ACPI: BIOS passes blacklist
ACPI: MADT not present
Building zonelist for node : 0
Kernel command line: auto ro root=/dev/hda5
No local APIC present or hardware disabled
Initializing CPU#0
Detected 1993.714 MHz processor.
Console: colour VGA+ 80x25
Calibrating delay loop... 3932.16 BogoMIPS
Memory: 253164k/261660k available (2804k kernel code, 7792k reserved,
1539k data, 140k init, 0k highmem)
Dentry cache hash table entries: 32768 (order: 6, 262144 bytes)
Inode-cache hash table entries: 16384 (order: 5, 131072 bytes)
Mount-cache hash table entries: 512 (order: 0, 4096 bytes)
-> /dev
-> /dev/console
-> /root
CPU: Before vendor init, caps: 3febf9ff 00000000 00000000, vendor = 0
CPU: Trace cache: 12K uops, L1 D cache: 8K
CPU: L2 cache: 256K
CPU: Hyper-Threading is disabled
CPU: After vendor init, caps: 3febf9ff 00000000 00000000 00000000
CPU: After generic, caps: 3febf9ff 00000000 00000000 00000000
CPU: Common caps: 3febf9ff 00000000 00000000 00000000
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU#0: Intel P4/Xeon Extended MCE MSRs (12) available
CPU#0: Thermal monitoring enabled
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
CPU0: Intel(R) Pentium(R) 4 CPU 2.00GHz stepping 02
per-CPU timeslice cutoff: 731.31 usecs.
task migration cache decay timeout: 1 msecs.
SMP motherboard not detected.
Local APIC not detected. Using dummy APIC emulation.
Starting migration thread for cpu 0
CPUS done 32
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Initializing RT netlink socket
mtrr: v2.0 (20020519)
device class 'cpu': registering
device class cpu: adding driver system:cpu
PCI: PCI BIOS revision 2.10 entry at 0xfc0be, last bus=2
PCI: Using configuration type 1
device class cpu: adding device CPU 0
interfaces: adding device CPU 0
BIO: pool of 256 setup, 14Kb (56 bytes/bio)
biovec pool[0]: 1 bvecs: 256 entries (12 bytes)
biovec pool[1]: 4 bvecs: 256 entries (48 bytes)
biovec pool[2]: 16 bvecs: 256 entries (192 bytes)
biovec pool[3]: 64 bvecs: 256 entries (768 bytes)
biovec pool[4]: 128 bvecs: 256 entries (1536 bytes)
biovec pool[5]: 256 bvecs: 256 entries (3072 bytes)
ACPI: Subsystem revision 20021212
tbxface-0099 [03] Acpi_load_tables : ACPI Tables successfully
acquired
Parsing all Control
Methods:.............................................................................................................
Table [DSDT] - 297 Objects with 29 Devices 109 Methods 19 Regions
Parsing all Control Methods:
Table [SSDT] - 0 Objects with 0 Devices 0 Methods 0 Regions
ACPI Namespace successfully loaded at root c05b2b7c
evxfevnt-0063 [04] Acpi_enable : System is already in ACPI
mode
evgpe-0259: *** Info: GPE Block0 defined as GPE0 to GPE15
evgpe-0259: *** Info: GPE Block1 defined as GPE16 to GPE31
Executing all Device _STA and_INI methods:.............................
29 Devices found containing: 29 _STA, 3 _INI methods
Completing Region/Field/Buffer/Package
initialization:....................................
Initialized 13/19 Regions 0/0 Fields 10/10 Buffers 13/13 Packages (304
nodes)
ACPI: Interpreter enabled
ACPI: Using PIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (00:00)
PCI: Probing PCI hardware (bus 00)
Transparent bridge - Intel Corp. 82801BA/CA/DB PCI Br
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCI1._PRT]
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 9 *10 11 12 15)
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 9 *10 11 12 15)
ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 7 9 10 *11 12 15)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 9 10 *11 12 15)
ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 *5 6 7 9 10 11 12 15)
ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 *9 10 11 12 15)
ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 *5 6 7 9 10 11 12 15)
ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 6 7 *9 10 11 12 15)
Linux Plug and Play Support v0.93 (c) Adam Belay
block request queues:
128 requests per read queue
128 requests per write queue
8 requests per batch
enter congestion at 31
exit congestion at 33
SCSI subsystem driver Revision: 1.00
device class 'scsi-host': registering
drivers/usb/core/usb.c: registered new driver usbfs
drivers/usb/core/usb.c: registered new driver hub
PCI: Using ACPI for IRQ routing
PCI: if you experience problems, try using option 'pci=noacpi' or even
'acpi=off'
SBF: Simple Boot Flag extension found and enabled.
SBF: Setting boot flags 0x80
aio_setup: sizeof(struct page) = 40
Journalled Block Device driver loaded
Installing knfsd (copyright (C) 1996 [email protected]).
udf: registering filesystem
ACPI: Power Button (FF) [PWRF]
ACPI: Processor [CPU0] (supports C1)
Serial: 8250/16550 driver $Revision: 1.90 $ IRQ sharing disabled
ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
parport0: PC-style at 0x378 (0x778) [PCSPP,TRISTATE,EPP]
parport0: irq 7 detected
parport0: cpp_daisy: aa5500ff(08)
parport0: assign_addrs: aa5500ff(08)
parport0: cpp_daisy: aa5500ff(08)
parport0: assign_addrs: aa5500ff(08)
pty: 256 Unix98 ptys configured
lp0: using parport0 (polling).
i810_rng: RNG not detected
Linux agpgart interface v1.0 (c) Dave Jones
agpgart: Detected Intel i850 chipset
agpgart: Maximum main memory to use for agp memory: 203M
agpgart: AGP aperture is 64M @ 0xf8000000
[drm] AGP 1.0 on Intel i850 @ 0xf8000000 64MB
[drm] Initialized radeon 1.7.0 20020828 on minor 0
Floppy drive(s): fd0 is 1.44M
FDC 0 is a National Semiconductor PC87306
Intel(R) PRO/100 Network Driver - version 2.1.24-k2
Copyright (c) 2002 Intel Corporation
PCI: Enabling device 02:09.0 (0000 -> 0003)
PCI: Setting latency timer of device 02:09.0 to 64
e100: selftest timeout
e100: Failed to initialize, instance #0
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with
idebus=xx
ICH2: IDE controller at PCI slot 00:1f.1
ICH2: chipset revision 4
ICH2: not 100% native mode: will probe irqs later
ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:DMA, hdb:pio
ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:pio, hdd:pio
hda: IC35L040AVER07-0, ATA DISK drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
hdc: SAMSUNG CD-ROM SC-148C, ATAPI CD/DVD-ROM drive
hdc: Disabling (U)DMA for SAMSUNG CD-ROM SC-148C
ide1 at 0x170-0x177,0x376 on irq 15
hda: host protected area => 1
hda: 78165360 sectors (40021 MB) w/1916KiB Cache, CHS=77545/16/63,
UDMA(100)
hda: hda1 hda2 < hda5 hda6 >
hdc: ATAPI 48X CD-ROM drive, 128kB Cache
Uniform CD-ROM driver Revision: 3.12
end_request: I/O error, dev hdc, sector 0
scsi HBA driver <NULL> didn't set max_sectors, please fix the template
scsi HBA driver Qlogic ISP 1280/12160 didn't set max_sectors, please fix
the template
request_module[scsi_hostadapter]: not ready
request_module[scsi_hostadapter]: not ready
request_module[scsi_hostadapter]: not ready
Linux Kernel Card Services 3.1.22
options: [pci] [cardbus] [pm]
Initializing USB Mass Storage driver...
drivers/usb/core/usb.c: registered new driver usb-storage
USB Mass Storage support registered.
device class 'input': registering
register interface 'mouse' with class 'input'
mice: PS/2 mouse device common for all mice
serio: i8042 AUX port at 0x60,0x64 irq 12
input: AT Set 2 keyboard on isa0060/serio0
serio: i8042 KBD port at 0x60,0x64 irq 1
Advanced Linux Sound Architecture Driver Version 0.9.0rc5 (Sun Nov 10
19:48:18 2002 UTC).
request_module[snd-card-0]: not ready
request_module[snd-card-1]: not ready
request_module[snd-card-2]: not ready
request_module[snd-card-3]: not ready
request_module[snd-card-4]: not ready
request_module[snd-card-5]: not ready
request_module[snd-card-6]: not ready
request_module[snd-card-7]: not ready
PCI: Setting latency timer of device 00:1f.5 to 64
intel8x0: clocking to 41138
ALSA device list:
#0: Intel 82801BA-ICH2 at 0xd800, irq 10
NET4: Linux TCP/IP 1.0 for NET4.0
IP: routing cache hash table of 1024 buckets, 16Kbytes
TCP: Hash tables configured (established 8192 bind 10922)
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
ds: no socket drivers loaded!
VFS: Mounted root (ext2 filesystem) readonly.
Freeing unused kernel memory: 140k freed
Adding 530104k swap on /dev/hda6. Priority:-1 extents:1
warning: process `update' used the obsolete bdflush system call
Fix your initscripts?
warning: process `update' used the obsolete bdflush system call
Fix your initscripts?
I doubt this is a bug in E100 actually.
--
Michael Fu <[email protected]>
Not speaking for Intel, opinions are my own
Michael Fu <[email protected]> writes:
> After kernel was bootup through kexec command, the NIC failed to
> initialize. The 2.5.52 kernel was patched with kexec and kexec-hwfix
> patch.
Interesting... The patch goes cleanly onto newer kernels so feel
free to play with them. You are running a single cpu system
so the kexec-hwfix patch should not make a difference at this point.
Your interrupt routing is via ACPI interesting...
>
> the following was is the dmesg output:
[snip]
> Intel(R) PRO/100 Network Driver - version 2.1.24-k2
> Copyright (c) 2002 Intel Corporation
>
>
>
>
>
>
> PCI: Enabling device 02:09.0 (0000 -> 0003)
> PCI: Setting latency timer of device 02:09.0 to 64
> e100: selftest timeout
> e100: Failed to initialize, instance #0
[snip]
> I doubt this is a bug in E100 actually.
Given that everything else was working correctly this is almost
certainly an e100 driver or a hardware bug. On x86 everything has
been working well enough that finding something that is not a
hardware/driver bug as a failure case is currently quite a challenge.
Q1: Is this reproducible?
Q2: Is this reproducible with the eepro100 driver?
You were doing the easy case of 2.5.52 to 2.5.52 I have gotten so many
false positives with things working when I reboot the exact same kernel
I barely consider it a valid test case any more...
If it is a bug in the driver a shutdown method can be used to clean up
before reboot to place the device is a quiescent state.
Either that or the drivers initialization code can be enhanced to
handle more strange states.
I know the eepro100 driver issues a reset before playing with the
card. The e100 driver is doing this in a different order, and it is
dying before it resets the card so that looks like the issue to me.
Doing a clean user space shutdown may also help. Though your kexecwrapper
script looked like it was probably doing that o.k.
Eric
Hello Eric W. Biederman,
Once you wrote about "Re: [BUG] e100 driver fails to initialize the hardware after kernel bootup through kexec":
> Michael Fu <[email protected]> writes:
>
> > After kernel was bootup through kexec command, the NIC failed to
> > initialize. The 2.5.52 kernel was patched with kexec and kexec-hwfix
> > patch.
>
> Interesting... The patch goes cleanly onto newer kernels so feel
> free to play with them. You are running a single cpu system
> so the kexec-hwfix patch should not make a difference at this point.
>
> Your interrupt routing is via ACPI interesting...
>
> >
> > the following was is the dmesg output:
>
> [snip]
> > Intel(R) PRO/100 Network Driver - version 2.1.24-k2
> > Copyright (c) 2002 Intel Corporation
> >
> >
> >
> >
> >
> >
> > PCI: Enabling device 02:09.0 (0000 -> 0003)
> > PCI: Setting latency timer of device 02:09.0 to 64
> > e100: selftest timeout
> > e100: Failed to initialize, instance #0
use NIC EEPro100+ on INTEL STL2 motherboard, with "eepro100" driver - all work ok.
or "e100" driver and with patch:
--- drivers/net/e100/e100.h- Wed Dec 4 15:16:08 2002
+++ drivers/net/e100/e100.h Wed Dec 4 15:16:20 2002
@@ -100,7 +100,7 @@
#define E100_MAX_NIC 16
-#define E100_MAX_SCB_WAIT 100 /* Max udelays in wait_scb */
+#define E100_MAX_SCB_WAIT 5000 /* Max udelays in wait_scb */
#define E100_MAX_CU_IDLE_WAIT 50 /* Max udelays in wait_cus_idle */
/* HWI feature related constant */
all work ok.
On Fri, Jan 24, 2003 at 05:57:55PM +0300, Andrey Nekrasov wrote:
> or "e100" driver and with patch:
>
>
> --- drivers/net/e100/e100.h- Wed Dec 4 15:16:08 2002
> +++ drivers/net/e100/e100.h Wed Dec 4 15:16:20 2002
> @@ -100,7 +100,7 @@
>
> #define E100_MAX_NIC 16
>
> -#define E100_MAX_SCB_WAIT 100 /* Max udelays in wait_scb */
> +#define E100_MAX_SCB_WAIT 5000 /* Max udelays in wait_scb */
> #define E100_MAX_CU_IDLE_WAIT 50 /* Max udelays in wait_cus_idle */
No, don't use this patch, it's awful. The latest Marcelo tree in
BitKeeper has this fixed... the right way. See the following e100
patch, which is what Intel emailed me, and what I merged into the
Marcelo tree.
Jeff
# --------------------------------------------
# 03/01/16 [email protected] 1.884.23.2
# [netdrvr e100] udelay a better way
# * Bug Fix: TCO workaround after hard reset of controller to wait for TCO
# traffic to settle. Workaround requires issuing a CU load base command
# after hard reset, followed by a wait for scb and finally a wait for
# TCO traffic bit to clear. Affects 82559s and above wired to SMBus.
# --------------------------------------------
#
diff -Nru a/drivers/net/e100/e100_main.c b/drivers/net/e100/e100_main.c
--- a/drivers/net/e100/e100_main.c Fri Jan 24 11:06:35 2003
+++ b/drivers/net/e100/e100_main.c Fri Jan 24 11:06:35 2003
@@ -196,6 +196,7 @@
char *e100_get_brand_msg(struct e100_private *);
static u8 e100_pci_setup(struct pci_dev *, struct e100_private *);
static u8 e100_sw_init(struct e100_private *);
+static void e100_tco_walkaround(struct e100_private *);
static unsigned char e100_alloc_space(struct e100_private *);
static void e100_dealloc_space(struct e100_private *);
static int e100_alloc_tcb_pool(struct e100_private *);
@@ -213,7 +214,7 @@
static unsigned char e100_clr_cntrs(struct e100_private *);
static unsigned char e100_load_microcode(struct e100_private *);
-static unsigned char e100_hw_init(struct e100_private *, u32);
+static unsigned char e100_hw_init(struct e100_private *);
static unsigned char e100_setup_iaaddr(struct e100_private *, u8 *);
static unsigned char e100_update_stats(struct e100_private *bdp);
@@ -1265,7 +1266,7 @@
/* read NIC's part number */
e100_rd_pwa_no(bdp);
- if (!e100_hw_init(bdp, PORT_SOFTWARE_RESET)) {
+ if (!e100_hw_init(bdp)) {
printk(KERN_ERR "e100: hw init failed\n");
return false;
}
@@ -1314,10 +1315,46 @@
return 1;
}
+static void __devinit
+e100_tco_walkaround(struct e100_private *bdp)
+{
+ int i;
+
+ /* Do software reset */
+ e100_sw_reset(bdp, PORT_SOFTWARE_RESET);
+
+ /* Do a dummy LOAD CU BASE command. */
+ /* This gets us out of pre-driver to post-driver. */
+ e100_exec_cmplx(bdp, 0, SCB_CUC_LOAD_BASE);
+
+ /* Wait 20 msec for reset to take effect */
+ set_current_state(TASK_UNINTERRUPTIBLE);
+ schedule_timeout(HZ / 50);
+
+ /* disable interrupts since they are enabled */
+ /* after device reset */
+ e100_disable_clear_intr(bdp);
+
+ /* Wait for command to be cleared up to 1 sec */
+ for (i=0; i<1000; i++) {
+ if (!readb(&bdp->scb->scb_cmd_low))
+ break;
+ set_current_state(TASK_UNINTERRUPTIBLE);
+ schedule_timeout(HZ / 1000);
+ }
+
+ /* Wait for TCO request bit in PMDR register to be clear */
+ for (i=0; i<500; i++) {
+ if (!(readb(&bdp->scb->scb_ext.d101m_scb.scb_pmdr) & BIT_1))
+ break;
+ set_current_state(TASK_UNINTERRUPTIBLE);
+ schedule_timeout(HZ / 1000);
+ }
+}
+
/**
* e100_hw_init - initialized tthe hardware
* @bdp: atapter's private data struct
- * @reset_cmd: s/w reset or selective reset
*
* This routine performs a reset on the adapter, and configures the adapter.
* This includes configuring the 82557 LAN controller, validating and setting
@@ -1329,13 +1366,16 @@
* false - If the adapter failed initialization
*/
unsigned char __devinit
-e100_hw_init(struct e100_private *bdp, u32 reset_cmd)
+e100_hw_init(struct e100_private *bdp)
{
if (!e100_phy_init(bdp))
return false;
- /* Issue a software reset to the e100 */
- e100_sw_reset(bdp, reset_cmd);
+ e100_sw_reset(bdp, PORT_SELECTIVE_RESET);
+
+ /* Only 82559 or above needs TCO walkaround */
+ if (bdp->rev_id >= D101MA_REV_ID)
+ e100_tco_walkaround(bdp);
/* Load the CU BASE (set to 0, because we use linear mode) */
if (!e100_wait_exec_cmplx(bdp, 0, SCB_CUC_LOAD_BASE, 0))
Jeff Garzik <[email protected]> writes:
>+ /* Wait 20 msec for reset to take effect */
>+ set_current_state(TASK_UNINTERRUPTIBLE);
>+ schedule_timeout(HZ / 50);
Hm. This assumes HZ=100, doesn't it?
>+ /* Wait for command to be cleared up to 1 sec */
>+ for (i=0; i<1000; i++) {
>+ if (!readb(&bdp->scb->scb_cmd_low))
>+ break;
>+ set_current_state(TASK_UNINTERRUPTIBLE);
>+ schedule_timeout(HZ / 1000);
>+ }
HZ = 100 -> HZ / 1000 == 0 ?
This whole patch scares me. :-)
Regards
Henning
--
Dipl.-Inf. (Univ.) Henning P. Schmiedehausen -- Geschaeftsfuehrer
INTERMETA - Gesellschaft fuer Mehrwertdienste mbH [email protected]
Am Schwabachgrund 22 Fon.: 09131 / 50654-0 [email protected]
D-91054 Buckenhof Fax.: 09131 / 50654-20