2006-02-06 08:41:45

by Knut Petersen

[permalink] [raw]
Subject: nfsroot doesn't work with intel card since 2.6.12.2/2.6.11

Hi Carlos,

Could you give some more information?

1. How do you boot the kernel? Bootrom?
If yes: Which protocol? PXE + pxelinux? Etherboot? ...

2. Does ip auto configuration (e.g. ip=dhcp at kernel command line) work?
If you do see the "IP-Config: Complete:" message while booting with
ip=dhcp or ip=bootp or if you don?t see it is a very valuable
information.

3. Does portmap lookup work? If ip auto configuration does not work and
you try to give the information on the command line, this will most
probably fail too, but please give it a try.

4. Could it be that you try to mount root using nfs version 2?
The current linux nfs 2 server is unable to serve the current nfs 2
client.
Unfortunately version 2 is the default. Add the v3 parameter:

root=/dev/nfs nfsroot=%s,rsize=8192,wsize=8192,v3


cu,
Knut


2006-02-06 14:40:08

by Carlos Carvalho

[permalink] [raw]
Subject: Re: nfsroot doesn't work with intel card since 2.6.12.2/2.6.11

Knut Petersen ([email protected]) wrote on 6 February 2006 09:42:
>1. How do you boot the kernel? Bootrom?
> If yes: Which protocol? PXE + pxelinux? Etherboot? ...

Yes, pxe on the Intel card plus pxelinux. The machines are diskless
with Tyan S2466 motherboards. I don't use the on-board 3Com ether chip
(which works and boots fine, btw).

>2. Does ip auto configuration (e.g. ip=dhcp at kernel command line) work?
> If you do see the "IP-Config: Complete:" message while booting

Yes, this works fine. The kernel gets it from pxelinux directly via
option IPAPPEND.

Here's what I copied manually from the screen after the IP-Config:
line:

looking up port of RPC 100003/2 on 192.168.1.1 (the server)
e1000: e1000_watchdog_task: nic link is up 1000 Mbs full duplex
portmap: server 192.168.1.1 not responding, timed out
root-nfs: unable to get nfsd port number from server, using default
looking up port of RPC 100005/1 on 192.168.1.1
portmap: server 192.168.1.1 not responding, timed out
root-nfs: unable to get mountd port number from server, using default
root-nfs: server returned -5 while mounting /home/nfsroot/servers-root

I said before that there is a progressive degradation at each kernel
version. Some versions before 2.6.12.2/driver 2.6.11 there were no
portmap problems at all. 2.6.12.2/driver 2.6.11 says that it couldn't
get the port number from the server but manages to boot using the
default.

>4. Could it be that you try to mount root using nfs version 2?
> The current linux nfs 2 server is unable to serve the current nfs 2
>client.

Wonderful :-(

> Unfortunately version 2 is the default. Add the v3 parameter:
>
> root=/dev/nfs nfsroot=%s,rsize=8192,wsize=8192,v3

Why do the previous versions work? Why does 2.6.15.2 work with a
Marvell/Yukon ether chip?

Trond asked about using tcp. I prefer to use udp because it has less
overhead. This is a computing cluster, all machines are in the same
room connected to a HP 4108gl gigabit switch. It's not a cable or
switch port problem, all machines exhibit the same behavior.

2006-02-06 15:49:09

by Knut Petersen

[permalink] [raw]
Subject: Re: nfsroot doesn't work with intel card since 2.6.12.2/2.6.11

Carlos Carvalho wrote:

>Yes, this works fine. The kernel gets it from pxelinux directly via
>option IPAPPEND.
>
>Here's what I copied manually from the screen after the IP-Config:
>line:
>
>
>
That was not what I asked. Please test without ipappend
and use ip=dhcp as a kernel parameter instead. Don?t forget
to enable dhcp autoconfiguration in the kernel config ...

I bet that you will see that the same dhcp server that proved to
work correctly by providing your server ip etc to the
pxe bootrom and via ipappend to the kernel is unable to
give that same information to the linux ipconfig code
directly. Please try and report.

I assume that you do not have any problem to pxeboot good
old DOS and memtest. Right?

Please give board name, bios version, pxe rom version, and
lspci -vv.

>Trond asked about using tcp. I prefer to use udp because it has less
>overhead. This is a computing cluster, all machines are in the same
>room connected to a HP 4108gl gigabit switch. It's not a cable or
>switch port problem, all machines exhibit the same behavior.
>
>
If I am right with my bet the problem is unrelated to the tcp / udp
choice and unrelated to ipconfig, portmap lookup and nfsroot code.



cu,
Knut

2006-02-06 20:40:35

by Carlos Carvalho

[permalink] [raw]
Subject: Re: nfsroot doesn't work with intel card since 2.6.12.2/2.6.11

Knut Petersen ([email protected]) wrote on 6 February 2006 16:49:
>>Yes, this works fine. The kernel gets it from pxelinux directly via
>>option IPAPPEND.
>>
>>Here's what I copied manually from the screen after the IP-Config:
>>line:
>>
>>
>>
>That was not what I asked. Please test without ipappend
>and use ip=dhcp as a kernel parameter instead. Don?t forget
>to enable dhcp autoconfiguration in the kernel config ...

Ops, sorry.

>I bet that you will see that the same dhcp server that proved to
>work correctly by providing your server ip etc to the
>pxe bootrom and via ipappend to the kernel is unable to
>give that same information to the linux ipconfig code
>directly. Please try and report.

Exactly. In fact it tried several times and ended up getting the
config. The dhcp log shows many trials. Apparently the server gets the
request but the client doesn't get the answer.

>I assume that you do not have any problem to pxeboot good
>old DOS and memtest. Right?

I only used linux with these machines but never had this problem
before. It even happens with 2.4

>Please give board name, bios version, pxe rom version, and
>lspci -vv.

Tyan 2466, phoenix bios 4.0 release 6.0, Intel boot agent version 1.0.15.
Here's the output of lspci -vv

00:00.0 Host bridge: Advanced Micro Devices [AMD] AMD-760 MP [IGD4-2P] System Controller (rev 11)
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
Latency: 64
Region 0: Memory at <unassigned> (32-bit, prefetchable)
Region 1: Memory at fea00000 (32-bit, prefetchable) [size=4K]
Region 2: I/O ports at 1030 [disabled] [size=4]
Capabilities: [a0] AGP version 2.0
Status: RQ=15 SBA+ 64bit- FW- Rate=x1,x2
Command: RQ=0 SBA+ AGP+ 64bit- FW- Rate=<none>

00:01.0 PCI bridge: Advanced Micro Devices [AMD] AMD-760 MP [IGD4-2P] AGP Bridge (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 64
Bus: primary=00, secondary=01, subordinate=01, sec-latency=68
BridgeCtl: Parity- SERR- NoISA+ VGA- MAbort- >Reset- FastB2B-

00:07.0 ISA bridge: Advanced Micro Devices [AMD] AMD-768 [Opus] ISA (rev 05)
Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0

00:07.1 IDE interface: Advanced Micro Devices [AMD] AMD-768 [Opus] IDE (rev 04) (prog-if 8a [Master SecP PriP])
Subsystem: Advanced Micro Devices [AMD] AMD-768 [Opus] IDE
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Region 4: I/O ports at f000 [size=16]

00:07.3 Bridge: Advanced Micro Devices [AMD] AMD-768 [Opus] ACPI (rev 03)
Subsystem: Advanced Micro Devices [AMD] AMD-768 [Opus] ACPI
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-

00:08.0 Ethernet controller: Intel Corp. 82544EI Gigabit Ethernet Controller (Co
pper) (rev 02)
Subsystem: Intel Corp. PRO/1000 XT Server Adapter
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 64 (63750ns min), cache line size 10
Interrupt: pin A routed to IRQ 5
Region 0: Memory at fe620000 (32-bit, non-prefetchable) [size=128K]
Region 1: Memory at fe600000 (32-bit, non-prefetchable) [size=128K]
Region 2: I/O ports at 1000 [size=32]
Expansion ROM at <unassigned> [disabled] [size=128K]
Capabilities: [dc] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [e4] #07 [0002]
Capabilities: [f0] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable
-
Address: 0000000000000000 Data: 0000

00:10.0 PCI bridge: Advanced Micro Devices [AMD] AMD-768 [Opus] PCI (rev 05) (pr
og-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Step
ping- SERR- FastB2B-
Status: Cap- 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort
- <MAbort+ >SERR- <PERR-
Latency: 64
Bus: primary=00, secondary=02, subordinate=02, sec-latency=168
I/O behind bridge: 00002000-00002fff
Memory behind bridge: fe700000-fe7fffff
BridgeCtl: Parity- SERR- NoISA+ VGA- MAbort- >Reset- FastB2B-

02:00.0 USB Controller: Advanced Micro Devices [AMD] AMD-768 [Opus] USB (rev 07) (prog-if 10 [OHCI])
Subsystem: Advanced Micro Devices [AMD] AMD-768 [Opus] USB
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 64 (20000ns max)
Interrupt: pin D routed to IRQ 10
Region 0: Memory at fe700000 (32-bit, non-prefetchable) [size=4K]

02:08.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 78)
Subsystem: Tyan Computer Tiger MPX S2466 (3C920 Integrated Fast Ethernet Controller)
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 80 (2500ns min, 2500ns max), cache line size 10
Interrupt: pin A routed to IRQ 10
Region 0: I/O ports at 2000 [size=128]
Region 1: Memory at fe701000 (32-bit, non-prefetchable) [size=128]
Expansion ROM at <unassigned> [disabled] [size=128K]
Capabilities: [dc] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 PME-Enable+ DSel=0 DScale=2 PME-

>If I am right with my bet the problem is unrelated to the tcp / udp
>choice and unrelated to ipconfig, portmap lookup and nfsroot code.

Exactly, that's why I talked about the driver directly in my first
post.

2006-02-13 03:52:33

by H. Peter Anvin

[permalink] [raw]
Subject: Re: nfsroot doesn't work with intel card since 2.6.12.2/2.6.11

Followup to: <[email protected]>
By author: [email protected] (Carlos Carvalho)
In newsgroup: linux.dev.kernel
>
> We have several machines with Intel Corp. 82544EI Gigabit Ethernet
> Controller (Copper) (rev 02), as reported by lspci. They don't manage
> to mount the rootfs via nfs anymore. I've tried several combinations
> and the last one that works is 2.6.12.2 using the 2.6.11 version of
> the driver (simply replacing the files in the tree). 2.6.12.2 with its
> own driver doesn't work.
>
> There seems to be a pattern: at each version the machine has more
> difficulty mounting the rootfs. Other machines using other ethercards
> but with the same server and filesystem work normally.
>

Care to try out the klibc tree?

git://git.kernel.org/pub/scm/linux/kernel/git/hpa/linux-2.6-klibc.git

-hpa

2006-02-13 08:19:55

by Rogier Wolff

[permalink] [raw]
Subject: Re: nfsroot doesn't work with intel card since 2.6.12.2/2.6.11

On Sun, Feb 12, 2006 at 07:52:04PM -0800, H. Peter Anvin wrote:
> Followup to: <[email protected]>
> By author: [email protected] (Carlos Carvalho)
> In newsgroup: linux.dev.kernel
> >
> > We have several machines with Intel Corp. 82544EI Gigabit Ethernet
> > Controller (Copper) (rev 02), as reported by lspci. They don't manage
> > to mount the rootfs via nfs anymore. I've tried several combinations
> > and the last one that works is 2.6.12.2 using the 2.6.11 version of
> > the driver (simply replacing the files in the tree). 2.6.12.2 with its
> > own driver doesn't work.
> >
> > There seems to be a pattern: at each version the machine has more
> > difficulty mounting the rootfs. Other machines using other ethercards
> > but with the same server and filesystem work normally.
> >
>
> Care to try out the klibc tree?
>
> git://git.kernel.org/pub/scm/linux/kernel/git/hpa/linux-2.6-klibc.git

What doesn't work? We have a bunch of machines that boot diskless.

zebigbos:~> cat /proc/version
Linux version 2.6.15 (erik@zebigbos) (gcc version 3.4.2) #1 SMP Tue Jan 17 11:02:22 CET 2006
zebigbos:~> df /
Filesystem Size Used Avail Use% Mounted on
numerobis:/data/nfsroot/zebigbos
67G 49G 18G 74% /
zebigbos:~>

As far as I can remember we haven't had any problems in this area for
a long time.

This machine does have a slightly newer ethernet controller:

0000:02:07.0 Ethernet controller: Intel Corporation 82546GB Gigabit Ethernet Controller (rev 03)
0000:02:07.1 Ethernet controller: Intel Corporation 82546GB Gigabit Ethernet Controller (rev 03)

Roger.

--
** [email protected] ** http://www.BitWizard.nl/ ** +31-15-2600998 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
Q: It doesn't work. A: Look buddy, doesn't work is an ambiguous statement.
Does it sit on the couch all day? Is it unemployed? Please be specific!
Define 'it' and what it isn't doing. --------- Adapted from lxrbot FAQ