2005-05-19 02:50:21

by steve

[permalink] [raw]
Subject: why nfs server delay 10ms in nfsd_write()?

Hi all??
when i build nfs server with linux 2.6.9, i found nfs write performance is very low, and with
tcpdump, i found the time to process write is very long, please refer as follows:

14:59:22.844977 IP 192.168.0.1.3376789825 > linux.site.nfs: 660 write [|nfs]
14:59:22.855134 IP linux.site.nfs > 192.168.0.1.3376789825: reply ok 136 write POST: [|nfs]

the write operation cost nearly 10ms, so i look up the source code and find the following code
in nfsd_write():

{
..
if (atomic_read(&inode-> i_writecount) > 1
|| (last_ino == inode-> i_ino && last_dev == inode-> i_sb-> s_dev)) {
dprintk("nfsd: write defer %d\n", current-> pid);
set_current_state(TASK_UNINTERRUPTIBLE);
schedule_timeout((HZ+99)/100);
dprintk("nfsd: write resume %d\n", current-> pid);
}
..
}

so it will sleep for 10ms if the condition matches.

i have 2 questions:
1.i don't know why do we have to sleep for 10 ms, why not do sync immediately?
2.what will happen if we don't sleep for 10ms?
when i delete these codes, i get a good result, and the write performace improved from 300KB/s to 18MB/s

Regards!
Steve
2005-05-19



2005-05-19 03:09:16

by Trond Myklebust

[permalink] [raw]
Subject: Re: why nfs server delay 10ms in nfsd_write()?

to den 19.05.2005 Klokka 10:46 (+0800) skreiv steve:

> i have 2 questions:
> 1.i don't know why do we have to sleep for 10 ms, why not do sync immediately?
> 2.what will happen if we don't sleep for 10ms?
> when i delete these codes, i get a good result, and the write performace improved from 300KB/s to 18MB/s

See
http://www.usenix.org/publications/library/proceedings/sf94/full_papers/juszczak.a

Write gathering is basically a method for improving NFSv2 writes without
any protocol changes. Later, NFSv3 introduced the more efficient concept
of "unstable" writes into the protocol (see
http://www.netapp.com/ftp/NFSv3_Rev_3.pdf).

You can turn NFSv2 write gathering on and off using the wdelay/no_wdelay
export options ('man 5 exports')

Trond

2005-05-19 03:13:55

by Lee Revell

[permalink] [raw]
Subject: Re: why nfs server delay 10ms in nfsd_write()?

On Thu, 2005-05-19 at 10:46 +0800, steve wrote:
> i have 2 questions:
> 1.i don't know why do we have to sleep for 10 ms, why not do sync immediately?
> 2.what will happen if we don't sleep for 10ms?
> when i delete these codes, i get a good result, and the write performace improved from 300KB/s to 18MB/s
>

Did you read the comments in the code?

/*
* Gathered writes: If another process is currently
* writing to the file, there's a high chance
* this is another nfsd (triggered by a bulk write
* from a client's biod). Rather than syncing the
* file with each write request, we sleep for 10 msec.
*
* I don't know if this roughly approximates
* C. Juszak's idea of gathered writes, but it's a
* nice and simple solution (IMHO), and it seems to
* work:-)
*/
if (EX_WGATHER(exp)) {
if (atomic_read(&inode->i_writecount) > 1
|| (last_ino == inode->i_ino && last_dev == inode->i_sb->s_dev)) {
dprintk("nfsd: write defer %d\n", current->pid);
msleep(10);
dprintk("nfsd: write resume %d\n", current->pid);
}

if (inode->i_state & I_DIRTY) {
dprintk("nfsd: write sync %d\n", current->pid);
nfsd_sync(file);
}
#if 0
wake_up(&inode->i_wait);
#endif
}


Lee

2005-05-19 12:55:35

by Peter Staubach

[permalink] [raw]
Subject: Re: why nfs server delay 10ms in nfsd_write()?

Lee Revell wrote:

>On Thu, 2005-05-19 at 10:46 +0800, steve wrote:
>
>
>>i have 2 questions:
>>1.i don't know why do we have to sleep for 10 ms, why not do sync immediately?
>>2.what will happen if we don't sleep for 10ms?
>>when i delete these codes, i get a good result, and the write performace improved from 300KB/s to 18MB/s
>>
>>
>>
>
>Did you read the comments in the code?
>
> /*
> * Gathered writes: If another process is currently
> * writing to the file, there's a high chance
> * this is another nfsd (triggered by a bulk write
> * from a client's biod). Rather than syncing the
> * file with each write request, we sleep for 10 msec.
> *
> * I don't know if this roughly approximates
> * C. Juszak's idea of gathered writes, but it's a
> * nice and simple solution (IMHO), and it seems to
> * work:-)
> */
>
>

There are certainly many others way to get gathering, without adding an
artificial delay. There are already delay slots built into the code
which could
be used to trigger the gathering, so with a little bit different
architecture, the
performance increases could be achieved.

Some implementations actually do write gathering with NFSv3, even. Is
this interesting enough to play with? I suspect that just doing the
work for
NFSv2 is not...

Thanx...

ps

2005-05-19 13:26:52

by Trond Myklebust

[permalink] [raw]
Subject: Re: why nfs server delay 10ms in nfsd_write()?

to den 19.05.2005 Klokka 08:53 (-0400) skreiv Peter Staubach:
> There are certainly many others way to get gathering, without adding an
> artificial delay. There are already delay slots built into the code
> which could
> be used to trigger the gathering, so with a little bit different
> architecture, the
> performance increases could be achieved.
>
> Some implementations actually do write gathering with NFSv3, even. Is
> this interesting enough to play with? I suspect that just doing the
> work for
> NFSv2 is not...

Write gathering does still apply to stable NFSv3/v4 writes, so an
optimisation may yet benefit applications that use O_DIRECT writes, or
that require the use of the "noac" or "sync" mount options.

I'm not aware of any ongoing projects to work on this, though, so it
would probably be up to those parties that see it as beneficial to step
up to the plate.

Cheers,
Trond

2005-05-19 13:44:07

by Peter Staubach

[permalink] [raw]
Subject: Re: why nfs server delay 10ms in nfsd_write()?

Trond Myklebust wrote:

>to den 19.05.2005 Klokka 08:53 (-0400) skreiv Peter Staubach:
>
>
>>There are certainly many others way to get gathering, without adding an
>>artificial delay. There are already delay slots built into the code
>>which could
>>be used to trigger the gathering, so with a little bit different
>>architecture, the
>>performance increases could be achieved.
>>
>>Some implementations actually do write gathering with NFSv3, even. Is
>>this interesting enough to play with? I suspect that just doing the
>>work for
>>NFSv2 is not...
>>
>>
>
>Write gathering does still apply to stable NFSv3/v4 writes, so an
>optimisation may yet benefit applications that use O_DIRECT writes, or
>that require the use of the "noac" or "sync" mount options.
>
>I'm not aware of any ongoing projects to work on this, though, so it
>would probably be up to those parties that see it as beneficial to step
>up to the plate.
>

Cool. If anyone is interested, I would be interested in participating
in a design
discussion and perhaps even prototyping.

Thanx...

ps

2005-05-19 13:59:11

by Lee Revell

[permalink] [raw]
Subject: Re: why nfs server delay 10ms in nfsd_write()?

On Thu, 2005-05-19 at 08:53 -0400, Peter Staubach wrote:
> There are certainly many others way to get gathering, without adding an
> artificial delay. There are already delay slots built into the code
> which could
> be used to trigger the gathering, so with a little bit different
> architecture, the
> performance increases could be achieved.
>
> Some implementations actually do write gathering with NFSv3, even. Is
> this interesting enough to play with? I suspect that just doing the
> work for
> NFSv2 is not...

Also, how do you explain the big performance hit that steve observed?
Write gathering is supposed to help performance, but it's a big loss on
his test...

Lee

2005-05-19 14:13:14

by Peter Staubach

[permalink] [raw]
Subject: Re: why nfs server delay 10ms in nfsd_write()?

Lee Revell wrote:

>On Thu, 2005-05-19 at 08:53 -0400, Peter Staubach wrote:
>
>
>>There are certainly many others way to get gathering, without adding an
>>artificial delay. There are already delay slots built into the code
>>which could
>>be used to trigger the gathering, so with a little bit different
>>architecture, the
>>performance increases could be achieved.
>>
>>Some implementations actually do write gathering with NFSv3, even. Is
>>this interesting enough to play with? I suspect that just doing the
>>work for
>>NFSv2 is not...
>>
>>
>
>Also, how do you explain the big performance hit that steve observed?
>Write gathering is supposed to help performance, but it's a big loss on
>his test...
>

Well, a little bit more information about what he is doing would be
helpful. I'd
like to better understand the environment and what the traffic from the
client
looks like.

Write gathering is not a panacea for all of the ills caused by the NFSv2
WRITE
stable storage requirements. In fact, if done wrong, it can cause
performance
regressions, such as those being noticed by Steve.

--

I implemented the write gathering used in Solaris and experimented with
several (many?) different approachs. Adding a delay in order to allow
subsequent WRITE requests to arrive seems like a good thing, but can
end up just adding latency to the entire process if not done right.

A suggestion might be to use the delay caused by the nfsd_sync() call
and synchronize other WRITE requests around that. The delay caused by
doing real i/o to the storage subsystem should allow write gathering to
take place, assuming that the client is generating concurrent WRITE
requests.

ps

2005-05-19 14:13:40

by Avi Kivity

[permalink] [raw]
Subject: Re: why nfs server delay 10ms in nfsd_write()?

On Thu, 2005-05-19 at 09:59 -0400, Lee Revell wrote:
> On Thu, 2005-05-19 at 08:53 -0400, Peter Staubach wrote:
> > There are certainly many others way to get gathering, without adding an
> > artificial delay. There are already delay slots built into the code
> > which could
> > be used to trigger the gathering, so with a little bit different
> > architecture, the
> > performance increases could be achieved.
> >
> > Some implementations actually do write gathering with NFSv3, even. Is
> > this interesting enough to play with? I suspect that just doing the
> > work for
> > NFSv2 is not...
>
> Also, how do you explain the big performance hit that steve observed?
> Write gathering is supposed to help performance, but it's a big loss on
> his test...
>

that would depend on the NFS client and the application. for example, if
the NFS client has no (or low) concurrency, then write gathering would
reduce preformance.



2005-05-20 10:46:19

by steve

[permalink] [raw]
Subject: Re: Re: why nfs server delay 10ms in nfsd_write()?

Hi Peter,
My envionment looks like that:

NFS Server??Suse9 Enterprise
NFS Client??Redhat AS3.0(kernel 2.4.21)

I made a ramdisk and export it with nfs
Server and Client are connected with 1000Mbps

mount the ramdisk on the client with parameters: -t nfs -o rw,noac

then test with iometer and the parameters are:
Outstanding IO is 32, transfer request size is 512, sequential write
the result is about 300KB/s, iops is about 600

with dd test i find the delay most cost at the server.

i agree with Avi that "if the NFS client has no (or low) concurrency, then write gathering would
reduce preformance"


Regards!
Steve
2005-05-20

======= 2005-05-19 10:10:00 =======

>Lee Revell wrote:
>
>>On Thu, 2005-05-19 at 08:53 -0400, Peter Staubach wrote:
>>
>>
>>>There are certainly many others way to get gathering, without adding an
>>>artificial delay. There are already delay slots built into the code
>>>which could
>>>be used to trigger the gathering, so with a little bit different
>>>architecture, the
>>>performance increases could be achieved.
>>>
>>>Some implementations actually do write gathering with NFSv3, even. Is
>>>this interesting enough to play with? I suspect that just doing the
>>>work for
>>>NFSv2 is not...
>>>
>>>
>>
>>Also, how do you explain the big performance hit that steve observed?
>>Write gathering is supposed to help performance, but it's a big loss on
>>his test...
>>
>
>Well, a little bit more information about what he is doing would be
>helpful. I'd
>like to better understand the environment and what the traffic from the
>client
>looks like.
>
>Write gathering is not a panacea for all of the ills caused by the NFSv2
>WRITE
>stable storage requirements. In fact, if done wrong, it can cause
>performance
>regressions, such as those being noticed by Steve.
>
>--
>
>I implemented the write gathering used in Solaris and experimented with
>several (many?) different approachs. Adding a delay in order to allow
>subsequent WRITE requests to arrive seems like a good thing, but can
>end up just adding latency to the entire process if not done right.
>
>A suggestion might be to use the delay caused by the nfsd_sync() call
>and synchronize other WRITE requests around that. The delay caused by
>doing real i/o to the storage subsystem should allow write gathering to
>take place, assuming that the client is generating concurrent WRITE
>requests.
>
> ps





2005-05-20 13:08:19

by Peter Staubach

[permalink] [raw]
Subject: Re: why nfs server delay 10ms in nfsd_write()?

steve wrote:

>Hi Peter,
>My envionment looks like that:
>
>NFS Server:Suse9 Enterprise
>NFS Client:Redhat AS3.0(kernel 2.4.21)
>
>I made a ramdisk and export it with nfs
>Server and Client are connected with 1000Mbps
>
>mount the ramdisk on the client with parameters: -t nfs -o rw,noac
>
>then test with iometer and the parameters are:
>Outstanding IO is 32, transfer request size is 512, sequential write
>the result is about 300KB/s, iops is about 600
>
>with dd test i find the delay most cost at the server.
>
>i agree with Avi that "if the NFS client has no (or low) concurrency, then write gathering would reduce preformance"
>

I would agree too with Avi, and especially in this configuration...

I think that we could construct a writer gathering implementation which
did not
show measurable performance impact in this sort of situation though still.

Thanx...

ps