2004-03-18 21:10:05

by Chip Salzenberg

[permalink] [raw]
Subject: Debian Bug#235886: nfs-kernel-server inducing load of 8-9 with no good reason for Linux 2.6 clients


Neil (or someone), what should I tell this user?


----- Forwarded message from "Steinar H. Gunderson" <[email protected]> -----

Subject: Bug#235886: nfs-kernel-server inducing load of 8-9 with no good reason for Linux 2.6 clients
From: "Steinar H. Gunderson" <[email protected]>
To: Debian Bug Tracking System <[email protected]>
Date: Wed, 03 Mar 2004 00:41:21 +0100
X-Mailer: reportbug 1.50

Package: nfs-kernel-server
Version: 1:1.0-2woody1
Severity: important

It appears that from time to time, our NFS servers (both 2.4 and 2.6
servers, but both running woody) seem to go into giant loads with almost
no traffic (ie. 20-30 connections, but almost no file activity, as
confirmed by tcpdump). This is typically in the 7-8-9 range, and the
clients in question seem to hang almost indefinitely (like 20 minutes
for a simple ls). However, top shows no processes wanting CPU time, so
it almost looks like some kind of I/O starvation problem.

In addition, we seem to get strange errors like:

00:16:32.330039 129.241.93.186 > 129.241.93.30: icmp: ip reassembly time exceeded [tos 0xc0]

(.30 is the NFS server, .186 is one of the NFS clients)

Something is clearly wrong here; stopping nfs-kernel-server makes the
load drop to zero almost immediately, and substituting nfs-user-server for
nfs-kernel-server also fixes the problem. The servers in question are
also NFS clients, but there are no stale mounts and we aren't using NFS
re-export.

These problems seem to coincide with the rollout of Linux 2.6.x (seen
the problem with both 2.6.1 and 2.6.3) on the clients, so it seems
plausible that something in the Linux 2.6 client is triggering the NFS
kernel server code. I'm a bit unsure if I should file this on
nfs-kernel-server or on a kernel package; feel free to reassign as
needed.

-- System Information
Debian Release: 3.0
Architecture: i386
Kernel: Linux cassarossa 2.4.25 #1 SMP Wed Feb 18 22:46:21 CET 2004 i686
Locale: LANG=en_US, LC_CTYPE=en_US.ISO8859-1

Versions of packages nfs-kernel-server depends on:
ii debconf 1.2.35 Debian configuration management sy
ii libc6 2.2.5-11.5 GNU C Library: Shared libraries an
ii libwrap0 7.6-9 Wietse Venema's TCP wrappers libra
ii nfs-common 1:1.0-2woody1 NFS support files common to client

----- End forwarded message -----

--
Chip Salzenberg - a.k.a. - <[email protected]>
"I wanted to play hopscotch with the impenetrable mystery of existence,
but he stepped in a wormhole and had to go in early." // MST3K


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2004-03-19 00:08:57

by Trond Myklebust

[permalink] [raw]
Subject: Re: Debian Bug#235886: nfs-kernel-server inducing load of 8-9 with no good reason for Linux 2.6 clients

P=E5 to , 18/03/2004 klokka 16:09, skreiv Chip Salzenberg:
> Neil (or someone), what should I tell this user?
>=20

I typically see the "ip reassembly time exceeded" in situations where
the machine is dropping fragments due to missed interrupts.

What's probably happening here is that because the 2.6 clients can cache
huge amounts of writes before everything needs to be written out at
close() time, the server is being overwhelmed...

Cheers
Trond


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-03-19 08:48:14

by Olaf Kirch

[permalink] [raw]
Subject: Re: Debian Bug#235886: nfs-kernel-server inducing load of 8-9 with no good reason for Linux 2.6 clients

On Thu, Mar 18, 2004 at 04:09:43PM -0500, Chip Salzenberg wrote:
> Neil (or someone), what should I tell this user?

They should enable NFS and RPC debugging on the client when this problem
occurs, put the bzipped logs somewhere and send a pointer to this list.

To turn on debugging, do this

echo 65535 > /proc/sys/sunrpc/nfs_debug
echo 65535 > /proc/sys/sunrpc/rpc_debug

If something in the RPC client is going berserk this logs will probably
grow like crazy. It may be helpful to kill syslog and
"cat /proc/kmsg > /tmp/nfs.log" directly. Or even use "head -10000"
instead of cat.

Olaf
--
Olaf Kirch | Stop wasting entropy - start using predictable
[email protected] | tempfile names today!
---------------+


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-03-19 08:49:39

by Olaf Kirch

[permalink] [raw]
Subject: Re: Debian Bug#235886: nfs-kernel-server inducing load of 8-9 with no good reason for Linux 2.6 clients

One more thing:

> Something is clearly wrong here; stopping nfs-kernel-server makes the
> load drop to zero almost immediately, and substituting nfs-user-server for
> nfs-kernel-server also fixes the problem.

This probably means the problem is in the NFSv3 client code. The major
difference between knfsd and unfsd is that the latter is v2 only.

Olaf
--
Olaf Kirch | Stop wasting entropy - start using predictable
[email protected] | tempfile names today!
---------------+


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-03-19 11:53:46

by Bernd Schubert

[permalink] [raw]
Subject: Re: Debian Bug#235886: nfs-kernel-server inducing load of 8-9 with no good reason for Linux 2.6 clients

=2D----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Friday 19 March 2004 09:49, Olaf Kirch wrote:
> One more thing:
> > Something is clearly wrong here; stopping nfs-kernel-server makes the
> > load drop to zero almost immediately, and substituting nfs-user-server
> > for nfs-kernel-server also fixes the problem.
>
> This probably means the problem is in the NFSv3 client code. The major
> difference between knfsd and unfsd is that the latter is v2 only.
>

Now there's also unfs3 (http://unfs3.sourceforge.net/), which supports as t=
he=20
name suggests v3.

Cheers,
Bernd
=2D----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFAWt8/C8BUnAF+ydYRAhCVAJ48HQ8l8o/ZKZBhaXFBQxv4VvMoNQCeIsv0
+cn/FO2oRl/XFf9v6Ktf+kE=3D
=3DjWh4
=2D----END PGP SIGNATURE-----


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-03-19 12:03:25

by Olaf Kirch

[permalink] [raw]
Subject: Re: Debian Bug#235886: nfs-kernel-server inducing load of 8-9 with no good reason for Linux 2.6 clients

On Fri, Mar 19, 2004 at 12:53:30PM +0100, Bernd Schubert wrote:
> Now there's also unfs3 (http://unfs3.sourceforge.net/), which supports as the
> name suggests v3.

Which I think has a striking resemblance to unfsd in some parts of the
code but neglects to acknowledge that, both in the copyright statements
and the READMEs...

Olaf
--
Olaf Kirch | Stop wasting entropy - start using predictable
[email protected] | tempfile names today!
---------------+


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-03-22 00:26:02

by NeilBrown

[permalink] [raw]
Subject: Re: Debian Bug#235886: nfs-kernel-server inducing load of 8-9 with no good reason for Linux 2.6 clients

On Thursday March 18, [email protected] wrote:
>
> Neil (or someone), what should I tell this user?
>

"use tcp" might be a good answer.... especially if it works :-)

NeilBrown

>
> ----- Forwarded message from "Steinar H. Gunderson" <[email protected]> -----
>
> Subject: Bug#235886: nfs-kernel-server inducing load of 8-9 with no good reason for Linux 2.6 clients
> From: "Steinar H. Gunderson" <[email protected]>
> To: Debian Bug Tracking System <[email protected]>
> Date: Wed, 03 Mar 2004 00:41:21 +0100
> X-Mailer: reportbug 1.50
>
> Package: nfs-kernel-server
> Version: 1:1.0-2woody1
> Severity: important
>
> It appears that from time to time, our NFS servers (both 2.4 and 2.6
> servers, but both running woody) seem to go into giant loads with almost
> no traffic (ie. 20-30 connections, but almost no file activity, as
> confirmed by tcpdump). This is typically in the 7-8-9 range, and the
> clients in question seem to hang almost indefinitely (like 20 minutes
> for a simple ls). However, top shows no processes wanting CPU time, so
> it almost looks like some kind of I/O starvation problem.
>
> In addition, we seem to get strange errors like:
>
> 00:16:32.330039 129.241.93.186 > 129.241.93.30: icmp: ip reassembly time exceeded [tos 0xc0]
>
> (.30 is the NFS server, .186 is one of the NFS clients)
>
> Something is clearly wrong here; stopping nfs-kernel-server makes the
> load drop to zero almost immediately, and substituting nfs-user-server for
> nfs-kernel-server also fixes the problem. The servers in question are
> also NFS clients, but there are no stale mounts and we aren't using NFS
> re-export.
>
> These problems seem to coincide with the rollout of Linux 2.6.x (seen
> the problem with both 2.6.1 and 2.6.3) on the clients, so it seems
> plausible that something in the Linux 2.6 client is triggering the NFS
> kernel server code. I'm a bit unsure if I should file this on
> nfs-kernel-server or on a kernel package; feel free to reassign as
> needed.
>
> -- System Information
> Debian Release: 3.0
> Architecture: i386
> Kernel: Linux cassarossa 2.4.25 #1 SMP Wed Feb 18 22:46:21 CET 2004 i686
> Locale: LANG=en_US, LC_CTYPE=en_US.ISO8859-1
>
> Versions of packages nfs-kernel-server depends on:
> ii debconf 1.2.35 Debian configuration management sy
> ii libc6 2.2.5-11.5 GNU C Library: Shared libraries an
> ii libwrap0 7.6-9 Wietse Venema's TCP wrappers libra
> ii nfs-common 1:1.0-2woody1 NFS support files common to client
>
> ----- End forwarded message -----
>
> --
> Chip Salzenberg - a.k.a. - <[email protected]>
> "I wanted to play hopscotch with the impenetrable mystery of existence,
> but he stepped in a wormhole and had to go in early." // MST3K
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by: IBM Linux Tutorials
> Free Linux tutorial presented by Daniel Robbins, President and CEO of
> GenToo technologies. Learn everything from fundamentals to system
> administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs