2005-02-26 13:28:56

by Brad Barnett

[permalink] [raw]
Subject: knfsd brought to its knees, by a simple rsync or cp operation



There seems to be some odd behaviour with knfsd. I have a box with a
raid10, and a single cp or rsync operation should not effectively kill
knfsd performance.

It does, however.

First, nfs works very well as long as this box does not have any local
disk i/o. I can, literally, transfer files at the upper limit of my
100mbit network connection. Directory reads are fast, file transfers are
great in both directions. "Instant" would be the word I would use for
access.

It works great, fast and beautifully. There does not appear to be any
configuration issue at play here.

The problems start as soon as any local I/O starts. Directory listings
over nfs can take > 5 or 6 seconds, once I start my rsync backup process.
File transfer rates fall through the floor.

However, directory listings, locally on the box, are still instant. File
reads are instant. There is a _very_ minor slowdown, but my raid10 array
is doing a great job at handling a single rsync session + a single
directory request or copy request. Again, with knfsd, performance bombs.

There is obviously something wacky in the way the kernel is scheduling
things here. Any ideas, patches, suggestions?

Kernel 2.6.10, NFSv3 mounted, noatime mounts. More info can be provided
in needed, but again.. this setup works beautifully under load from
multiple NFS clients. It is fast, responsive, you name it. However, one
_single_ cp or rsync session can bring NFS responsiveness to its knees,
without tasking the cpu, ram or swap.

Thanks.


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2005-02-28 10:06:43

by Olaf Kirch

[permalink] [raw]
Subject: Re: knfsd brought to its knees, by a simple rsync or cp operation

On Sat, Feb 26, 2005 at 08:28:54AM -0500, Brad Barnett wrote:
> There is obviously something wacky in the way the kernel is scheduling
> things here. Any ideas, patches, suggestions?

That's because knfsd will write things to disk synchronously unless you
tell it not to. That can throttle other NFS activity in two locations:

- by tying up all knfsd threads on the server. Try to bump the
number of nfsd processes

- by tying up all RPC slots on the client. Make sure your wsize
isn't too big (8k is reasonable)

Olaf
--
Olaf Kirch | --- o --- Nous sommes du soleil we love when we play
[email protected] | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-02-28 15:23:11

by Brad Barnett

[permalink] [raw]
Subject: Re: knfsd brought to its knees, by a simple rsync or cp operation

On Mon, 28 Feb 2005 11:06:33 +0100
Olaf Kirch <[email protected]> wrote:

> On Sat, Feb 26, 2005 at 08:28:54AM -0500, Brad Barnett wrote:
> > There is obviously something wacky in the way the kernel is scheduling
> > things here. Any ideas, patches, suggestions?
>
> That's because knfsd will write things to disk synchronously unless you
> tell it not to. That can throttle other NFS activity in two locations:

During my tests involving "ls", no one else was accessing the server. I
have noatime set for both client and server mounts.. just in case.

So, there should be no writes for knfsd to do. There was only one
read operation, and that was a "ls -R /nfsmount".

>
> - by tying up all knfsd threads on the server. Try to bump the
> number of nfsd processes
>
> - by tying up all RPC slots on the client. Make sure your wsize
> isn't too big (8k is reasonable)

There is only one client (during my tests), so #1 can't be the case.
Number 2 applies to writes operations, although I have spent over 5 hours
trying every possible permutation to see if any significant advantage can
be had.

This is what I don't understand. Why is one single 'ls' on a single
client, the only nfs client, brought to a standstill by a single cp or
rsync? It's very weird, and it does not seem to be because of write
operations the client is performing.




>
> Olaf
> --
> Olaf Kirch | --- o --- Nous sommes du soleil we love when we play
> [email protected] | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax
>
>
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-02-28 15:49:11

by Olaf Kirch

[permalink] [raw]
Subject: Re: knfsd brought to its knees, by a simple rsync or cp operation

On Mon, Feb 28, 2005 at 10:23:07AM -0500, Brad Barnett wrote:
> During my tests involving "ls", no one else was accessing the server. I
> have noatime set for both client and server mounts.. just in case.
>
> So, there should be no writes for knfsd to do. There was only one
> read operation, and that was a "ls -R /nfsmount".

Well, you were talking about rsync and cp, so it's either reads or
writes going over the wire, or both.

> > - by tying up all knfsd threads on the server. Try to bump the
> > number of nfsd processes
> >
> > - by tying up all RPC slots on the client. Make sure your wsize
> > isn't too big (8k is reasonable)
>
> There is only one client (during my tests), so #1 can't be the case.

One NFS client can issue many requests simultaenously, thereby tying
up more than one nfsd thread.

> This is what I don't understand. Why is one single 'ls' on a single
> client, the only nfs client, brought to a standstill by a single cp or
> rsync? It's very weird, and it does not seem to be because of write
> operations the client is performing.

Where do these cp and rsync calls occur? From your first message I
assumed they were on the client, operating on the NFS mounted file system.

Olaf
--
Olaf Kirch | --- o --- Nous sommes du soleil we love when we play
[email protected] | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-02-28 16:20:19

by Brad Barnett

[permalink] [raw]
Subject: Re: knfsd brought to its knees, by a simple rsync or cp operation

On Mon, 28 Feb 2005 16:44:55 +0100
Olaf Kirch <[email protected]> wrote:

> On Mon, Feb 28, 2005 at 10:23:07AM -0500, Brad Barnett wrote:
> > During my tests involving "ls", no one else was accessing the server.
> > I have noatime set for both client and server mounts.. just in case.
> >
> > So, there should be no writes for knfsd to do. There was only one
> > read operation, and that was a "ls -R /nfsmount".
>
> Well, you were talking about rsync and cp, so it's either reads or
> writes going over the wire, or both.

The rsync or cp operation are on the server.

>
> > > - by tying up all knfsd threads on the server. Try to bump the
> > > number of nfsd processes
> > >
> > > - by tying up all RPC slots on the client. Make sure your wsize
> > > isn't too big (8k is reasonable)
> >
> > There is only one client (during my tests), so #1 can't be the case.
>
> One NFS client can issue many requests simultaenously, thereby tying
> up more than one nfsd thread.

Yes, but the only activity is a single "ls" on the client.. I don't think
this would use more than one thread.

>
> > This is what I don't understand. Why is one single 'ls' on a single
> > client, the only nfs client, brought to a standstill by a single cp or
> > rsync? It's very weird, and it does not seem to be because of write
> > operations the client is performing.
>
> Where do these cp and rsync calls occur? From your first message I
> assumed they were on the client, operating on the NFS mounted file
> system.
>

The cp or rsync are occurring locally on the server.

Eg

One client has an nfs mount. It issues an "ls". The response is instant,
without slowdowns.

I start a long and extensive cp -a process on the nfs server. Local 'ls'
responses are instant. Write and read operations are instant (it's a raid
10) on the local box, as well. However, my single remote client's "ls"
operation changes to a jerky, slow operation.. with upwards of 5 second
pauses in reads.








-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-03-01 23:38:18

by Brad Barnett

[permalink] [raw]
Subject: Re: knfsd brought to its knees, by a simple rsync or cp operation

On Tue, 1 Mar 2005 10:04:46 -0500
"Bill Rugolsky Jr." <[email protected]> wrote:

> On Tue, Mar 01, 2005 at 06:57:03AM -0500, Brad Barnett wrote:
> > However, this is what is really erking me. This isn't a heavy I/O
> > job. This is just _one_ cp. Nothing else is happening on the entire
> > server! I just did, in the above test:
> >
> > client: ls -R /home
> >
> > The client is fine, for very long periods of time...
> >
> > Then, while the above command is still happening:
> >
> > server: cp -a /raid/home /raid/hometest
>
> You say that it isn't a heavy I/O job, but a recursive copy is a very
> seek-intensive one, particularly when copying a large tree to the same
> device, which will interleave reads and writes. What filesystem are you
> using? With an internal journal, journal writes will cause additional
> seeking.

ext3. I do stress, however, that I find almost zero slowdown on the local
system, when I do a ls -R locally. This is what set off my spidey sense.

>
> > Within 10 seconds, the output of ls -R /home slows. Within 20
> > seconds, it_stop_. It then sits there for seconds, and spews out a
> > page in small jumps. Again, a ls /raid/home on the _server_ barely
> > slows, and is constant.
>
> Do you mean ls -R /raid/home here? Is it definitely the case that
> ls -R /raid/home on the server is quick, but on the client it is slow?

Yes, most definitely ls -R /raid/home. I've tested this dozens of times,
and it is as you say above. Server fast, client slow.

>
> How about ls -lR /raid/home on the server? It could be that knfsd is
> returning file attribute information, hence reading the whole inode
> for each file, and not just for the directories. getdents64() returns
> d_type=DT_DIR for directory entries, which allows ls -R to optimize the
> traversal so as to only call fstat64() on directories, not regular
> files. So on the server, ls -R would only fstat64() the entries, while
> on the client ls -R can cause knfsd to do the equivalent of ls -lR.
>

ls -lR on the server is fast still. There are no 5 second stalls... no
stalls at all, really.


> Also, which I/O scheduler are you using?

io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered
elevator: using anticipatory as default io scheduler

Should I try a different scheduler? Deadline perhaps?


>
> Regards,
>
> Bill Rugolsky


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-03-01 23:40:43

by Brad Barnett

[permalink] [raw]
Subject: Re: knfsd brought to its knees, by a simple rsync or cp operation




Wow.

CBQ instantly resolved the issue. After reading a bit more up on cbq, I
switched to deadline.

Deadline makes me happy. ;)

I am currently doing FIVE cp -al operations, as well as half a dozen ls -R
raid operations on the local box. I see almost zero nfs slowdown. ;)

Thanks very much guys, for all your help. I appreciate the effort
everyone put into this. Hopefully this thread will help some people down
the road...

Thanks!


On Tue, 1 Mar 2005 10:04:46 -0500
"Bill Rugolsky Jr." <[email protected]> wrote:

> On Tue, Mar 01, 2005 at 06:57:03AM -0500, Brad Barnett wrote:
> > However, this is what is really erking me. This isn't a heavy I/O
> > job. This is just _one_ cp. Nothing else is happening on the entire
> > server! I just did, in the above test:
> >
> > client: ls -R /home
> >
> > The client is fine, for very long periods of time...
> >
> > Then, while the above command is still happening:
> >
> > server: cp -a /raid/home /raid/hometest
>
> You say that it isn't a heavy I/O job, but a recursive copy is a very
> seek-intensive one, particularly when copying a large tree to the same
> device, which will interleave reads and writes. What filesystem are you
> using? With an internal journal, journal writes will cause additional
> seeking.

ext3. I do stress, however, that I find almost zero slowdown on the local
system, when I do a ls -R locally. This is what set off my spidey sense.

>
> > Within 10 seconds, the output of ls -R /home slows. Within 20
> > seconds, it_stop_. It then sits there for seconds, and spews out a
> > page in small jumps. Again, a ls /raid/home on the _server_ barely
> > slows, and is constant.
>
> Do you mean ls -R /raid/home here? Is it definitely the case that
> ls -R /raid/home on the server is quick, but on the client it is slow?

Yes, most definitely ls -R /raid/home. I've tested this dozens of times,
and it is as you say above. Server fast, client slow.

>
> How about ls -lR /raid/home on the server? It could be that knfsd is
> returning file attribute information, hence reading the whole inode
> for each file, and not just for the directories. getdents64() returns
> d_type=DT_DIR for directory entries, which allows ls -R to optimize the
> traversal so as to only call fstat64() on directories, not regular
> files. So on the server, ls -R would only fstat64() the entries, while
> on the client ls -R can cause knfsd to do the equivalent of ls -lR.
>

ls -lR on the server is fast still. There are no 5 second stalls... no
stalls at all, really.


> Also, which I/O scheduler are you using?

io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered
elevator: using anticipatory as default io scheduler

Should I try a different scheduler? Deadline perhaps?


>
> Regards,
>
> Bill Rugolsky


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-03-01 23:10:29

by Brad Barnett

[permalink] [raw]
Subject: Re: knfsd brought to its knees, by a simple rsync or cp operation

On Tue, 1 Mar 2005 15:37:32 +0100
Olaf Kirch <[email protected]> wrote:

> On Tue, Mar 01, 2005 at 06:57:03AM -0500, Brad Barnett wrote:
> > Within 10 seconds, the output of ls -R /home slows. Within 20
> > seconds, it_stop_. It then sits there for seconds, and spews out a
> > page in small jumps. Again, a ls /raid/home on the _server_ barely
> > slows, and is constant.
> >
> > I'm really scratching my head here.
>
> Well, it sounds like something's eating the network bandwidth,
> or otherwise interfering with nfsd responsiveness. Again, are
> you using UDP or TCP? If UDP, look at nfsstat output to see if
> you have a high retransmit count.
>

In my original post, I did mention that I can copy large files (isos) over
the network at excellent speeds. That is, I get over 6M/sec transfer
speed...


> If it's really a problem with scheduling, it should make a difference
> if you run the rsync job with lower priority, and/or renice the
> nfsd threads to run with higher priority.

You can't really renice the kernel nfsd threads though :((


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-03-01 14:18:40

by Roger Heflin

[permalink] [raw]
Subject: RE: knfsd brought to its knees, by a simple rsync or cp operation

Brad,

My post will bounce, so it won't go to the list, my email and domain
name don't agree.

The problem is simple, I have never found a decent solution to it.

The basic issue is that a large local cp quickly fills up the buffer cache
on the local machine and can cause the nfsd processes to starve and have
difficulty getting in their io. Watch what happens to the buffer cache
and disk when this is happening. The later versions of linux should
be worse as they fill the buffer cache faster. I suspect that the
problem is that there is always a long line of operations to take care
of and when NFS comes along it has to get in line behind whatever is
already queued up.

I have seen it on older versions of linux (2.2) and it took around 2-3
to make things really bad, but 1 would do a good job of making response
bad.

Roger

> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Brad Barnett
> Sent: Tuesday, March 01, 2005 5:57 AM
> To: [email protected]
> Subject: Re: [NFS] knfsd brought to its knees, by a simple
> rsync or cp operation
>
> On Tue, 1 Mar 2005 10:55:48 +0100
> Olaf Kirch <[email protected]> wrote:
>
> > On Mon, Feb 28, 2005 at 11:20:18AM -0500, Brad Barnett wrote:
> > > I start a long and extensive cp -a process on the nfs
> server. Local
> > > 'ls' responses are instant. Write and read operations
> are instant
> > > (it's a raid 10) on the local box, as well. However, my single
> > > remote client's "ls" operation changes to a jerky, slow
> operation..
> > > with upwards of 5 second pauses in reads.
> >
> > Are you using NFS over UDP? If you ping the server from the
> client, do
> > the round trip time and packet loss rate change when you start the
> > heavy IO jobs on the server?
>
> I just tried, and ping times do not visibly change (0.1ms
> before and after).
>
> However, this is what is really erking me. This isn't a
> heavy I/O job.
> This is just _one_ cp. Nothing else is happening on the
> entire server! I just did, in the above test:
>
> client: ls -R /home
>
> The client is fine, for very long periods of time...
>
> Then, while the above command is still happening:
>
> server: cp -a /raid/home /raid/hometest
>
> Within 10 seconds, the output of ls -R /home slows. Within
> 20 seconds, it _stop_. It then sits there for seconds, and
> spews out a page in small jumps. Again, a ls /raid/home on
> the _server_ barely slows, and is constant.
>
> I'm really scratching my head here.
>
>
>
>
>
>
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide Read honest &
> candid reviews on hundreds of IT Products from real users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs
>



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-03-02 09:03:27

by Olaf Kirch

[permalink] [raw]
Subject: Re: knfsd brought to its knees, by a simple rsync or cp operation

On Tue, Mar 01, 2005 at 06:10:07PM -0500, Brad Barnett wrote:
> > Well, it sounds like something's eating the network bandwidth,
> > or otherwise interfering with nfsd responsiveness. Again, are
> > you using UDP or TCP? If UDP, look at nfsstat output to see if
> > you have a high retransmit count.
> >
>
> In my original post, I did mention that I can copy large files (isos) over
> the network at excellent speeds. That is, I get over 6M/sec transfer
> speed...

Stil you won't answer: UDP or TCP? :-)

And the question about retransmits referred to the situation where you
see the slow-downs.

> > If it's really a problem with scheduling, it should make a difference
> > if you run the rsync job with lower priority, and/or renice the
> > nfsd threads to run with higher priority.
>
> You can't really renice the kernel nfsd threads though :((

renice -20 -p <pid of nfsd> works for me.

Olaf
--
Olaf Kirch | --- o --- Nous sommes du soleil we love when we play
[email protected] | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-03-02 16:41:58

by Brad Barnett

[permalink] [raw]
Subject: Re: knfsd brought to its knees, by a simple rsync or cp operation

On Wed, 2 Mar 2005 10:03:13 +0100
Olaf Kirch <[email protected]> wrote:

> On Tue, Mar 01, 2005 at 06:10:07PM -0500, Brad Barnett wrote:
> > > Well, it sounds like something's eating the network bandwidth,
> > > or otherwise interfering with nfsd responsiveness. Again, are
> > > you using UDP or TCP? If UDP, look at nfsstat output to see if
> > > you have a high retransmit count.
> > >
> >
> > In my original post, I did mention that I can copy large files (isos)
> > over the network at excellent speeds. That is, I get over 6M/sec
> > transfer speed...
>
> Stil you won't answer: UDP or TCP? :-)

Sorry Olaf, heh. UDP.

>
> And the question about retransmits referred to the situation where you
> see the slow-downs.

I did check this before, in was in an NFS howto someplace, and I did not
notice a large number of retransmits (there were one or two per several
minutes)..

>
> > > If it's really a problem with scheduling, it should make a
> > > difference if you run the rsync job with lower priority, and/or
> > > renice the nfsd threads to run with higher priority.
> >
> > You can't really renice the kernel nfsd threads though :((
>
> renice -20 -p <pid of nfsd> works for me.
>

Right, but that doesn't effect anything in the kernel...

Anyhow, the problem is solved, please see my other message about changing
i/o schedulers...

Thanks!




-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-03-01 09:55:59

by Olaf Kirch

[permalink] [raw]
Subject: Re: knfsd brought to its knees, by a simple rsync or cp operation

On Mon, Feb 28, 2005 at 11:20:18AM -0500, Brad Barnett wrote:
> I start a long and extensive cp -a process on the nfs server. Local 'ls'
> responses are instant. Write and read operations are instant (it's a raid
> 10) on the local box, as well. However, my single remote client's "ls"
> operation changes to a jerky, slow operation.. with upwards of 5 second
> pauses in reads.

Are you using NFS over UDP? If you ping the server from the client, do
the round trip time and packet loss rate change when you start the
heavy IO jobs on the server?

Olaf
--
Olaf Kirch | --- o --- Nous sommes du soleil we love when we play
[email protected] | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-03-01 11:57:09

by Brad Barnett

[permalink] [raw]
Subject: Re: knfsd brought to its knees, by a simple rsync or cp operation

On Tue, 1 Mar 2005 10:55:48 +0100
Olaf Kirch <[email protected]> wrote:

> On Mon, Feb 28, 2005 at 11:20:18AM -0500, Brad Barnett wrote:
> > I start a long and extensive cp -a process on the nfs server. Local
> > 'ls' responses are instant. Write and read operations are instant
> > (it's a raid 10) on the local box, as well. However, my single remote
> > client's "ls" operation changes to a jerky, slow operation.. with
> > upwards of 5 second pauses in reads.
>
> Are you using NFS over UDP? If you ping the server from the client, do
> the round trip time and packet loss rate change when you start the
> heavy IO jobs on the server?

I just tried, and ping times do not visibly change (0.1ms before and
after).

However, this is what is really erking me. This isn't a heavy I/O job.
This is just _one_ cp. Nothing else is happening on the entire server! I
just did, in the above test:

client: ls -R /home

The client is fine, for very long periods of time...

Then, while the above command is still happening:

server: cp -a /raid/home /raid/hometest

Within 10 seconds, the output of ls -R /home slows. Within 20 seconds, it
_stop_. It then sits there for seconds, and spews out a page in small
jumps. Again, a ls /raid/home on the _server_ barely slows, and is
constant.

I'm really scratching my head here.






-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-03-01 14:37:44

by Olaf Kirch

[permalink] [raw]
Subject: Re: knfsd brought to its knees, by a simple rsync or cp operation

On Tue, Mar 01, 2005 at 06:57:03AM -0500, Brad Barnett wrote:
> Within 10 seconds, the output of ls -R /home slows. Within 20 seconds, it
> _stop_. It then sits there for seconds, and spews out a page in small
> jumps. Again, a ls /raid/home on the _server_ barely slows, and is
> constant.
>
> I'm really scratching my head here.

Well, it sounds like something's eating the network bandwidth,
or otherwise interfering with nfsd responsiveness. Again, are
you using UDP or TCP? If UDP, look at nfsstat output to see if
you have a high retransmit count.

If it's really a problem with scheduling, it should make a difference
if you run the rsync job with lower priority, and/or renice the
nfsd threads to run with higher priority.

Olaf
--
Olaf Kirch | --- o --- Nous sommes du soleil we love when we play
[email protected] | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-03-01 15:04:56

by Bill Rugolsky Jr.

[permalink] [raw]
Subject: Re: knfsd brought to its knees, by a simple rsync or cp operation

On Tue, Mar 01, 2005 at 06:57:03AM -0500, Brad Barnett wrote:
> However, this is what is really erking me. This isn't a heavy I/O job.
> This is just _one_ cp. Nothing else is happening on the entire server! I
> just did, in the above test:
>
> client: ls -R /home
>
> The client is fine, for very long periods of time...
>
> Then, while the above command is still happening:
>
> server: cp -a /raid/home /raid/hometest

You say that it isn't a heavy I/O job, but a recursive copy is a very
seek-intensive one, particularly when copying a large tree to the same
device, which will interleave reads and writes. What filesystem are you
using? With an internal journal, journal writes will cause additional seeking.

> Within 10 seconds, the output of ls -R /home slows. Within 20 seconds, it
> _stop_. It then sits there for seconds, and spews out a page in small
> jumps. Again, a ls /raid/home on the _server_ barely slows, and is
> constant.

Do you mean ls -R /raid/home here? Is it definitely the case that
ls -R /raid/home on the server is quick, but on the client it is slow?

How about ls -lR /raid/home on the server? It could be that knfsd is
returning file attribute information, hence reading the whole inode
for each file, and not just for the directories. getdents64() returns
d_type=DT_DIR for directory entries, which allows ls -R to optimize the
traversal so as to only call fstat64() on directories, not regular files.
So on the server, ls -R would only fstat64() the entries, while
on the client ls -R can cause knfsd to do the equivalent of ls -lR.

Also, which I/O scheduler are you using?

Regards,

Bill Rugolsky


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2005-03-01 16:08:30

by Bill Rugolsky Jr.

[permalink] [raw]
Subject: Re: knfsd brought to its knees, by a simple rsync or cp operation

On Tue, Mar 01, 2005 at 10:04:46AM -0500, Bill Rugolsky Jr. wrote:
> How about ls -lR /raid/home on the server? It could be that knfsd is
> returning file attribute information, hence reading the whole inode
> for each file, and not just for the directories. getdents64() returns
> d_type=DT_DIR for directory entries, which allows ls -R to optimize the
> traversal so as to only call fstat64() on directories, not regular files.
> So on the server, ls -R would only fstat64() the entries, while
> on the client ls -R can cause knfsd to do the equivalent of ls -lR.

Sorry for replying to myself; I've had a look at the 2.6.10 code, and I think
I may understand what is going on.

In the *ideal* case, the client NFS and server NFS implementations support
READDIRPLUS. Additionally, the filesystem stores the file type in the
directory entry; such is the case with EXT3 with the "filetype" feature.

If all of the above is true, when a directory is read on the client,
the nfs client issues a READDIRPLUS call to the server. The server in
turn, issues a readdir to the VFS, which will call down into the underlying
filesystem. If the underlying filesystem stores type information in the
directory, then it will populate the the type field, and this can then
be returned to the caller without reading the on-disk inode.

If the filesystem does not store filetype information in the directory,
then filldir() on the server will return with DT_UNKNOWN, and this will
get passed back to the getdents64() caller (ls), which then has to
[f]stat() the file, which will translate into a GETATTR call, which
will require reading the on-disk inode. If the client or server doesn't
implement READDIRPLUS, then the nfs client will be unable to receive the
type information, and will either have to return DT_UNKNOWN to satisfy
the getdents64 call, will issue a GETATTR on each directory entry itself.

Note that READDIRPLUS is missing from many/most deployed 2.4 kernels, though
patches are at client.linux-nfs.org.

Bill Rugolsky


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs