2003-01-22 10:24:21

by Oliver Tennert

[permalink] [raw]
Subject: NFS client problem and IO blocksize


Hi,

I sem to have a problem with a 2.4.20 kernel + Trond's NFS client patches
+ some of Neil Brown's server patches.

The problem seems to be restricted to NFS client functionality, though, as
it also occurs when the NFS server is a totally different platform.

The problem is that the rsize/wsize options seem to be ignored:

hal9000:/home/tennert # mount -o nfsvers=3,udp,rsize=1024,wsize=1024
ilka2000:/scr /mnt
hal9000:/home/tennert # stat /mnt/Snatch.avi
File: `/mnt/Snatch.avi'
Size: 724893696 Blocks: 1415816 IO Block: 4096 Regular File

hal9000:/home/tennert # umount /mnt
hal9000:/home/tennert # mount -o nfsvers=3,udp,rsize=8192,wsize=8192
ilka2000:/scr /mnt
hal9000:/home/tennert # stat /mnt/Snatch.avi
File: `/mnt/Snatch.avi'
Size: 724893696 Blocks: 1415816 IO Block: 4096 Regular File

hal9000:/home/tennert # umount /mnt
hal9000:/home/tennert # mount -o nfsvers=3,udp,rsize=32768,wsize=32768
ilka2000:/scr /mnt
hal9000:/home/tennert # stat /mnt/Snatch.avi
File: `/mnt/Snatch.avi'
Size: 724893696 Blocks: 1415816 IO Block: 4096 Regular File

If TCP instead of UDP is taken as transport protocol, the behaviour is
still the same, i.e. the IO blocksize of the file does not change at all!

In this case the underlying physical file system is XFS.

Beware! I don't think the NFS server is to blame, because (with different
client machines and same kernel ) I get the same behaviour when the NFS
server platform is AIX, e.g.

It also occurred (on an older installation) that if the rsizw/wsize is
taken very small, e.g. 512 or 1024 Byte, a listing of the directory fails
to show any files at all, although the mount worked well.

Could you help me any further?

Best regards

Oliver

Dr. Oliver Tennert

+49 -7071 -9457-598

e-mail: [email protected]
science + computing AG
Hagellocher Weg 71
D-72070 Tuebingen




2003-01-22 12:57:49

by Trond Myklebust

[permalink] [raw]
Subject: NFS client problem and IO blocksize

>>>>> " " == Oliver Tennert <[email protected]> writes:

> The problem is that the rsize/wsize options seem to be ignored:

> hal9000:/home/tennert # mount -o
> nfsvers=3,udp,rsize=1024,wsize=1024 ilka2000:/scr /mnt
> hal9000:/home/tennert # stat /mnt/Snatch.avi
> File: `/mnt/Snatch.avi' Size: 724893696 Blocks: 1415816 IO
> Block: 4096 Regular File

rsize/wsize have in principle nothing to do with the blocksize that
'stat' returns. The f_bsize value specifies the 'optimal transfer
block size'. Previously this has been set to the rsize/wsize, but when
you add in O_DIRECT, then 32k becomes too large a value to align
to. For this reason, the f_bsize value was changed to reflect the
actual block size used by the *server*.

Cheers,
Trond

2003-01-22 13:26:01

by Oliver Tennert

[permalink] [raw]
Subject: Re: NFS client problem and IO blocksize

Hi Trond,

> rsize/wsize have in principle nothing to do with the blocksize that
> 'stat' returns. The f_bsize value specifies the 'optimal transfer
> block size'. Previously this has been set to the rsize/wsize, but when
> you add in O_DIRECT, then 32k becomes too large a value to align
> to. For this reason, the f_bsize value was changed to reflect the
> actual block size used by the *server*.
>

OK that explains the data with my own computer. But there is another case
when a Linux NFS client mounts an AIX NFS file system:

(preferred_blksize.perl shows s_blksize of stat() and is used due to the
non-existent stat command on this machine)

root@w0008077 / # mount -o hard,intr,nfsvers=3,udp,rsize=8192,wsize=8192
ibm03:/net/ibm03/fs7/share/netinst_aix /mnt
root@w0008077 / # ~xrsc062/preferred_blksize.perl /mnt/nmon7.tar
file /mnt/nmon7.tar has the preferred blksize 512.
root@w0008077 / # umount /mnt

root@ibm03 / # ~xrsc062/preferred_blksize.perl /net/ibm03/fs7/share/netinst_aix/nmon7.tar
file /net/ibm03/fs7/share/netinst_aix/nmon7.tar has the preferred blksize
4096.

As you can see, the actual server-side s_blksize is 4k, whereas the Linux
client takes it to be 512 bytes. An strace output confirms that a "cat" of
a file actually uses 512 byte IO chunks.

In each case the kernel used is a 2.4.20 with your NFS client patches (and
some of Neil's server patches).

If a RedHat or SuSE kernel is used (which probably does not include
O_DIRECT, but I am not sure) then there seems to be the correct behaviour,
and the s_blksize is the same as the rsize/wsize options demand, e.g. 8k
or 32k.

Many thanks in advance for your help and best regards.

Oliver


Dr. Oliver Tennert

+49 -7071 -9457-598

e-mail: [email protected]
science + computing AG
Hagellocher Weg 71
D-72070 Tuebingen



2003-01-22 13:32:44

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS client problem and IO blocksize

>>>>> " " == Oliver Tennert <[email protected]> writes:

> As you can see, the actual server-side s_blksize is 4k, whereas
> the Linux client takes it to be 512 bytes. An strace output
> confirms that a "cat" of a file actually uses 512 byte IO
> chunks.

I'm taking the value from the NFSv3 'wtmult' attribute, which is
described thus in RFC1813:

wtmult
The suggested multiple for the size of a WRITE
request.

If the AIX NFS server is returning 512 bytes, then that's what
'statfs' returns.

Cheers,
Trond

2003-01-22 14:09:46

by Oliver Tennert

[permalink] [raw]
Subject: Re: NFS client problem and IO blocksize


OK thanks. But I am sorry to push you once more, Trond: can you now give
me just a brief explanation of difference between the "wsize" option and
the "wtmult" attribute? Is it better now to disable O_DIRECT and use a
larger wsize/rsize, or to enable it and be content with the parameters it
uses?

(Sorry, I have got myself only a vague idea of the answer, but I am not so
expert in all this I/O stuff and its different layers.)

Many thanks and best regards.

Oliver


Dr. Oliver Tennert

+49 -7071 -9457-598

e-mail: [email protected]
science + computing AG
Hagellocher Weg 71
D-72070 Tuebingen




2003-01-22 15:23:11

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS client problem and IO blocksize

>>>>> " " == Oliver Tennert <[email protected]> writes:

> OK thanks. But I am sorry to push you once more, Trond: can you
> now give me just a brief explanation of difference between the
> "wsize" option and the "wtmult" attribute? Is it better now to
> disable O_DIRECT and use a larger wsize/rsize, or to enable it
> and be content with the parameters it uses?

wsize gives you the maximum number of bytes NFS is allowed to send in
a single NFSPROC_WRITE RPC call. (rsize gives the same number for
NFSPROC_READ calls). The NFS client will usually wait until it has
'wsize' bytes or more in the page cache before it tries to send
anything over to the server.

OTOH wtmult has nothing to do with RPC, and has more to do with the
disk organization on the server.
As I understand it, in many cases the significance of this value lies
in the fact that hardware operations to the disk have a lower limit on
the number of bytes that can be read/written. IOW if your write is not
aligned to a 'wtmult' boundary, then the server may be forced to read
in the remaining data from the disk before it writes the entire block
back.

Cheers,
Trond