2003-03-19 18:24:15

by Kresimir Kukulj

[permalink] [raw]
Subject: NFS problems (kernel locks up)

Hi

We are trying to assess if linux could perform as a NFS server to linux
client(s). In our test we moved part of mailboxes of a freemail service
(after some initial testing) to a NFS storage (linux NFS server). It worked
ok, and used very little resources. But, during the nightly backup, NFS
server crashed. Symptoms were that:
1. client detected that NFS server is not responding
2. NFS server responded to ping, but you could not log in to it. Every
attempt to log-in stopped at TCP connection being established, but
daemon did not respond (I presume, that at that particular moment
TCP/IP stack was still working).
3. After cca 10 minutes, it locks up (not ping-able).
4. I have serial console attached to the server, and kernel did not
respond to SYS-REQ.
5. After turning off the power and then back on, server booted, and
resumed its function.

This happened three times, every time during the backup (Networker),
sometimes only 5 minutes after backup started, sometimes after 1.5 hours.
This was all using 2.4.20 kernel (no extra patches), using NFSv3, udp, async.
NFS client was using: rw,hard,intr,udp,rsize=8192,wsize=8192,nodev,nosuid
NFS server used: rw,no_root_squash (default is async).

Then, I have put 2.4.21-pre5 because it contained some NFS fixes. After
that, server survived three days (2 incrementals and one full backup
completed successfully). Then it crashed during the day for no apparent
reason (we have the server monitored with 'cricket', and there were no
unusual activities...).

I have changed to NFSv2,sync,udp and it crashed during the backup that night,
and then again during the day. This resulted with filesystem corruption
(replaying the ext3 journal caused fsck to be invoked - couple of hours was
wasted on checking).

Now I have reverted back to NFSv3,udp, but kept 'sync'. I will see tonight
will it survive or not.

Filesystem is 99Gb ext3 partition, with 1024 block size, internal journal.
That fs is 50% full, and contains around 290000 files (13.7% fragmentation).
Files are between few kilobytes up to 10 Mb.

Normal filesystem usage is ~200kb read, 300Kb write per second with < 5%
disk utilization. When backup runs, reading gets ~ 5Mb/sec with disk
utilization of ~ 100%.

Client and server are connected to the same switch, with no dropped packets.

We are satisfied with performance (while the server works).

Can anybody give a suggestion ? I have tried everything I can think of.
We would like to use linux as a NFS server, but if this does not work, we
will be forced to consider alternatives like Solaris x86.
Can anyone here suggest a good alternative NFS server OS (for x86) with a
good support for SCSI HW RAID controllers ? ICP Vortex unfortunately is
not supported under Solaris x86, but what other controllers (let's say for
Solaris x86) do you reccommend ?

Also, I am concerned about filesystem. Will ext3 be able to handle, let's
say, 10 million files ? If not, will Solaris x86 UFS be any better.
[ For us, reiser proved to be sometimes difficult, and we had couple of fs
related crashes, so we are trying to find alternatives. Filesystem check
on that amount of files is measured in days. ]

Some info about hardware:
Dell PowerApp 200 with 2 x Pentium III (Coppermine), each 1GHz.
1Gb memory, with CONFIG_HIGHMEM4G=y.
eepro100 ethernet
ServerWorks chipset but nothing except CDROM is connected to it.
ICP Vortex Hardware RAID model GDT8523RZ
Driver for this (SCSI) controller is from 2.4.20 kernel (its pretty new).
5 FUJITSU MAJ3364MC 34Gb drives in RAID5 (4+hotfix).
Filesystem is ext3 with journal=ordered.

Kernel is vanilla 2.4.20, and 2.4.21-pre5.
I can provide 'dmesg' and '.config' for that kernel.

Distribution is Debian stable 3.0.
These packages are installed:
ii nfs-common 1.0-2 NFS support files common to client and server
ii nfs-kernel-server 1.0-2 Kernel NFS server support

NFS server and client use fixed ports as described at NFS-Howto:
Kernel command line: root=/dev/sda2 lockd.udpport=32768 \
lockd.tcpport=32768 console=tty0 console=ttyS0,9600
statd, mountd are fixed as well, and iptables are configured to pass
fragmented packets. By default, NFS server runs with 8 kernel threads
(knfsd). According to /proc/net/rpc/nfsd there is no need for more kernel
threads.

Services that run on NFS client are POP3 and SMTP daemons and a web based
frontend that uses them. Both daemons are configured to use their version of
dot locking (as recommended).

Thanks.

--
Kresimir Kukulj
Iskon Internet d.d.
ISS
Savska 41/X.
10000 Zagreb


-------------------------------------------------------
This SF.net email is sponsored by: Does your code think in ink?
You could win a Tablet PC. Get a free Tablet PC hat just for playing.
What are you waiting for?
http://ads.sourceforge.net/cgi-bin/redirect.pl?micr5043en
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2003-03-24 17:20:13

by David Dougall

[permalink] [raw]
Subject: Re: NFS problems (kernel locks up)

You might want to try out XFS on linux. We have been running 2.4.19rc3 on
similar machines to the ones you describe for almost a year now with
little to no problems(including networker backups). My experience has
shown that XFS is more stable and better performance than ext3.
Unfortunately, you need to get a huge kernel patch from SGI. It has been
worth it for us.
--David Dougall


On Wed, 19 Mar 2003, Kresimir Kukulj wrote:

> Hi
>
> We are trying to assess if linux could perform as a NFS server to linux
> client(s). In our test we moved part of mailboxes of a freemail service
> (after some initial testing) to a NFS storage (linux NFS server). It worked
> ok, and used very little resources. But, during the nightly backup, NFS
> server crashed. Symptoms were that:
> 1. client detected that NFS server is not responding
> 2. NFS server responded to ping, but you could not log in to it. Every
> attempt to log-in stopped at TCP connection being established, but
> daemon did not respond (I presume, that at that particular moment
> TCP/IP stack was still working).
> 3. After cca 10 minutes, it locks up (not ping-able).
> 4. I have serial console attached to the server, and kernel did not
> respond to SYS-REQ.
> 5. After turning off the power and then back on, server booted, and
> resumed its function.
>
> This happened three times, every time during the backup (Networker),
> sometimes only 5 minutes after backup started, sometimes after 1.5 hours.
> This was all using 2.4.20 kernel (no extra patches), using NFSv3, udp, async.
> NFS client was using: rw,hard,intr,udp,rsize=8192,wsize=8192,nodev,nosuid
> NFS server used: rw,no_root_squash (default is async).
>
> Then, I have put 2.4.21-pre5 because it contained some NFS fixes. After
> that, server survived three days (2 incrementals and one full backup
> completed successfully). Then it crashed during the day for no apparent
> reason (we have the server monitored with 'cricket', and there were no
> unusual activities...).
>
> I have changed to NFSv2,sync,udp and it crashed during the backup that night,
> and then again during the day. This resulted with filesystem corruption
> (replaying the ext3 journal caused fsck to be invoked - couple of hours was
> wasted on checking).
>
> Now I have reverted back to NFSv3,udp, but kept 'sync'. I will see tonight
> will it survive or not.
>
> Filesystem is 99Gb ext3 partition, with 1024 block size, internal journal.
> That fs is 50% full, and contains around 290000 files (13.7% fragmentation).
> Files are between few kilobytes up to 10 Mb.
>
> Normal filesystem usage is ~200kb read, 300Kb write per second with < 5%
> disk utilization. When backup runs, reading gets ~ 5Mb/sec with disk
> utilization of ~ 100%.
>
> Client and server are connected to the same switch, with no dropped packets.
>
> We are satisfied with performance (while the server works).
>
> Can anybody give a suggestion ? I have tried everything I can think of.
> We would like to use linux as a NFS server, but if this does not work, we
> will be forced to consider alternatives like Solaris x86.
> Can anyone here suggest a good alternative NFS server OS (for x86) with a
> good support for SCSI HW RAID controllers ? ICP Vortex unfortunately is
> not supported under Solaris x86, but what other controllers (let's say for
> Solaris x86) do you reccommend ?
>
> Also, I am concerned about filesystem. Will ext3 be able to handle, let's
> say, 10 million files ? If not, will Solaris x86 UFS be any better.
> [ For us, reiser proved to be sometimes difficult, and we had couple of fs
> related crashes, so we are trying to find alternatives. Filesystem check
> on that amount of files is measured in days. ]
>
> Some info about hardware:
> Dell PowerApp 200 with 2 x Pentium III (Coppermine), each 1GHz.
> 1Gb memory, with CONFIG_HIGHMEM4G=y.
> eepro100 ethernet
> ServerWorks chipset but nothing except CDROM is connected to it.
> ICP Vortex Hardware RAID model GDT8523RZ
> Driver for this (SCSI) controller is from 2.4.20 kernel (its pretty new).
> 5 FUJITSU MAJ3364MC 34Gb drives in RAID5 (4+hotfix).
> Filesystem is ext3 with journal=ordered.
>
> Kernel is vanilla 2.4.20, and 2.4.21-pre5.
> I can provide 'dmesg' and '.config' for that kernel.
>
> Distribution is Debian stable 3.0.
> These packages are installed:
> ii nfs-common 1.0-2 NFS support files common to client and server
> ii nfs-kernel-server 1.0-2 Kernel NFS server support
>
> NFS server and client use fixed ports as described at NFS-Howto:
> Kernel command line: root=/dev/sda2 lockd.udpport=32768 \
> lockd.tcpport=32768 console=tty0 console=ttyS0,9600
> statd, mountd are fixed as well, and iptables are configured to pass
> fragmented packets. By default, NFS server runs with 8 kernel threads
> (knfsd). According to /proc/net/rpc/nfsd there is no need for more kernel
> threads.
>
> Services that run on NFS client are POP3 and SMTP daemons and a web based
> frontend that uses them. Both daemons are configured to use their version of
> dot locking (as recommended).
>
> Thanks.
>
> --
> Kresimir Kukulj
> Iskon Internet d.d.
> ISS
> Savska 41/X.
> 10000 Zagreb
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by: Does your code think in ink?
> You could win a Tablet PC. Get a free Tablet PC hat just for playing.
> What are you waiting for?
> http://ads.sourceforge.net/cgi-bin/redirect.pl?micr5043en
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs
>
>

______________________________________
Inflex Virus Scanner - installed on mailserver for domain @et.byu.edu
Queries to: [email protected]


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-03-21 19:50:09

by Bernd Schubert

[permalink] [raw]
Subject: Re: NFS problems (kernel locks up)

> Some info about hardware:
> Dell PowerApp 200 with 2 x Pentium III (Coppermine), each 1GHz.
> 1Gb memory, with CONFIG_HIGHMEM4G=y.
> eepro100 ethernet
> ServerWorks chipset but nothing except CDROM is connected to it.
> ICP Vortex Hardware RAID model GDT8523RZ
> Driver for this (SCSI) controller is from 2.4.20 kernel (its pretty new).
> 5 FUJITSU MAJ3364MC 34Gb drives in RAID5 (4+hotfix).
> Filesystem is ext3 with journal=ordered.
>

We have a rather similar machine (well, without the raid and not from Dell)
and with 2GB. Actually we had some trouble with the serverworks chipset and
the memory.
The lockups you are describing are probably not nfs related, but due to your
hardware. Try to update your bios, disable mtrr, agp, similar speed
optimizing things in your kernel configuration.
Run memtest86 (the full test), even if you have ECC.
If all of this still doesn't help, try to disable dual-cpu support.
As much as I know ext3 make more problems with nfs than reiserfs does, but
this shouldn't cause the lockups

Hope it helps,
Bernd


-------------------------------------------------------
This SF.net email is sponsored by:Crypto Challenge is now open!
Get cracking and register here for some mind boggling fun and
the chance of winning an Apple iPod:
http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0031en
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-03-21 22:54:53

by Kresimir Kukulj

[permalink] [raw]
Subject: Re: NFS problems (kernel locks up)

Quoting Bernd Schubert ([email protected]):
> > Some info about hardware:
> > Dell PowerApp 200 with 2 x Pentium III (Coppermine), each 1GHz.
> > 1Gb memory, with CONFIG_HIGHMEM4G=y.
> > eepro100 ethernet
> > ServerWorks chipset but nothing except CDROM is connected to it.
> > ICP Vortex Hardware RAID model GDT8523RZ
> > Driver for this (SCSI) controller is from 2.4.20 kernel (its pretty new).
> > 5 FUJITSU MAJ3364MC 34Gb drives in RAID5 (4+hotfix).
> > Filesystem is ext3 with journal=ordered.
> >
>
> We have a rather similar machine (well, without the raid and not from Dell)
> and with 2GB. Actually we had some trouble with the serverworks chipset and
> the memory.
> The lockups you are describing are probably not nfs related, but due to your
> hardware. Try to update your bios, disable mtrr, agp, similar speed
> optimizing things in your kernel configuration.
> Run memtest86 (the full test), even if you have ECC.
> If all of this still doesn't help, try to disable dual-cpu support.
> As much as I know ext3 make more problems with nfs than reiserfs does, but
> this shouldn't cause the lockups

Thanks for replying. I will try your advice.
Next time it crashes, I will use same kernel but without support for:
mtrr, SMP, HIGHMEM, IDE ATA subsystem compiled in as they are not really
essential. Unfortulately, server is in semi production/testing faze, so I
cannot use memtest86 for now.

--
Kresimir Kukulj [email protected]
+--------------------------------------------------+
Old PC's never die. They just become Unix terminals.


-------------------------------------------------------
This SF.net email is sponsored by:Crypto Challenge is now open!
Get cracking and register here for some mind boggling fun and
the chance of winning an Apple iPod:
http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0031en
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-03-21 22:58:20

by Kresimir Kukulj

[permalink] [raw]
Subject: Re: NFS problems (kernel locks up)

Quoting Bernd Schubert ([email protected]):
> > Some info about hardware:
> > Dell PowerApp 200 with 2 x Pentium III (Coppermine), each 1GHz.
> > 1Gb memory, with CONFIG_HIGHMEM4G=y.
> > eepro100 ethernet
> > ServerWorks chipset but nothing except CDROM is connected to it.
> > ICP Vortex Hardware RAID model GDT8523RZ
> > Driver for this (SCSI) controller is from 2.4.20 kernel (its pretty new).
> > 5 FUJITSU MAJ3364MC 34Gb drives in RAID5 (4+hotfix).
> > Filesystem is ext3 with journal=ordered.
> >
>
> We have a rather similar machine (well, without the raid and not from Dell)
> and with 2GB. Actually we had some trouble with the serverworks chipset and
> the memory.
> The lockups you are describing are probably not nfs related, but due to your
> hardware. Try to update your bios, disable mtrr, agp, similar speed
> optimizing things in your kernel configuration.
> Run memtest86 (the full test), even if you have ECC.
> If all of this still doesn't help, try to disable dual-cpu support.
> As much as I know ext3 make more problems with nfs than reiserfs does, but
> this shouldn't cause the lockups

Uh, sorry to follow up again, but forgot to ask.
Could you elaborate a bit what problems ext3 has with NFS?
Have you tried XFS and what are your experiences using it with NFS?
I hope this in not to off-topic for this list.

--
Kresimir Kukulj [email protected]
+--------------------------------------------------+
Old PC's never die. They just become Unix terminals.


-------------------------------------------------------
This SF.net email is sponsored by:Crypto Challenge is now open!
Get cracking and register here for some mind boggling fun and
the chance of winning an Apple iPod:
http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0031en
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs