2003-03-27 21:23:33

by Mark Price

[permalink] [raw]
Subject: wsize & PAGE_SIZE issues on IA64 clients.


Hi Folks,

I've been chasing a very poor performance problem from a SuSE Sles-8
(2.4.19) system on an IA64 box to an AIX NFS server. Writes to the same
AIX server on an IA32 box also running Sles-8 performed as expected.

These are v3 mounts with a 4K rsize/wsize. Eg.

mount -t nfs -o nfsvers=3,udp,rsize=4096,wsize=4096,hard rs75:/linux_test
/linux_test

I tracked the problem down to the default PAGE_SIZE on ia64, which is 16K,
versus the wsize which was 4K.

In nfs_updatepage() if the wsize is smaller than the page size the write
is performed synchronously. It appears though that the block size is then
reduced further, in this case to 512 bytes. From what I could work out,
it was cp_new_stat() in linux/fs/stat.c that determined the new preferred
block size from the remote filesystem.

On ia32 both the page size and the wsize were 4K, and no problem was seen.

Can someone give me rough explanation of why that logic is used? ie. Why
the page is written synchronously if its smaller than the page size? and
Why even when its written synchronously the wsize wasn't used, but the
remote filesystems block size was used?

The fix/workaround was to either increase wsize to 16K, or reduce
PAGE_SIZE to 4K, obviously increasing wsize to 16K makes more sense.

However this leads to a problem between Linux clients and servers where
the maximum supported block size on the server is 8K and the page size on
the IA64 client is 16K.

Is increasing the maximum blocksize for the server as simple as changing
NFSSVC_MAXBLKSIZE (linux/include/linux/nfsd/const.h) to 16K or 32K ? Or is
more porting work required?

Cheers, Mark.

--
Mark Price
IBM Linux Change Team
(503)-578-7524






-------------------------------------------------------
This SF.net email is sponsored by:
The Definitive IT and Networking Event. Be There!
NetWorld+Interop Las Vegas 2003 -- Register today!
http://ads.sourceforge.net/cgi-bin/redirect.pl?keyn0001en
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2003-03-27 23:57:56

by Trond Myklebust

[permalink] [raw]
Subject: Re: wsize & PAGE_SIZE issues on IA64 clients.

>>>>> " " == Mark Price <[email protected]> writes:

> Can someone give me rough explanation of why that logic is
> used? ie. Why the page is written synchronously if its smaller
> than the page size? and Why even when its written synchronously
> the wsize wasn't used, but the remote filesystems block size
> was used?

1) Because in order to write asynchronously you would have to have
more than 1 outstanding request per page. The locking models all
assume only 1 request / page.
2) 'cos the servers get upset if we deliberately allow you to do
things that violate the protocol.

Cheers,
Trond


-------------------------------------------------------
This SF.net email is sponsored by:
The Definitive IT and Networking Event. Be There!
NetWorld+Interop Las Vegas 2003 -- Register today!
http://ads.sourceforge.net/cgi-bin/redirect.pl?keyn0001en
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-03-28 00:01:41

by Lever, Charles

[permalink] [raw]
Subject: RE: wsize & PAGE_SIZE issues on IA64 clients.

hi mark-

> In nfs_updatepage() if the wsize is smaller than the page=20
> size the write=20
> is performed synchronously. It appears though that the block=20
> size is then=20
> reduced further, in this case to 512 bytes. From what I could=20
> work out,=20
> it was cp_new_stat() in linux/fs/stat.c that determined the=20
> new preferred=20
> block size from the remote filesystem.=20
>=20
> On ia32 both the page size and the wsize were 4K, and no=20
> problem was seen.=20
>=20
> Can someone give me rough explanation of why that logic is=20
> used? ie. Why=20
> the page is written synchronously if its smaller than the=20
> page size? and=20
> Why even when its written synchronously the wsize wasn't=20
> used, but the=20
> remote filesystems block size was used?

i can answer the first question.

the problem is that databases need synchronous writes to
always go to disk from the lowest to highest byte; otherwise,
a system restart during a write could result in a torn
database page that can not be detected by some databases.

the Linux NFS client is handed write requests a page at a
time by the VFS layer. if the NFS client queued pages for
sync writes the way it does for async writes, there is a window
where asynchronous events on the client (like an interrupt,
or the VM decides it needs to reclaim memory, or the fs
syncer runs) can push out incomplete writes to the server.
there is a good chance that even during a small test dd run
with wsize=3D32k, some 32k writes will be broken into smaller
writes on the network.

so the low-risk solution is to make the client always
do sync writes in page-sized pieces; that way, the byte
order at the server is always guaranteed.

i've never seen a problem where the writes are further
reduced in size; i imagine this is an application issue,
not an NFS client issue.

> The fix/workaround was to either increase wsize to 16K, or reduce=20
> PAGE_SIZE to 4K, obviously increasing wsize to 16K makes more sense.

correct.

> However this leads to a problem between Linux clients and=20
> servers where=20
> the maximum supported block size on the server is 8K and the=20
> page size on=20
> the IA64 client is 16K.

correct again.

> Is increasing the maximum blocksize for the server as simple=20
> as changing=20
> NFSSVC_MAXBLKSIZE (linux/include/linux/nfsd/const.h) to 16K=20
> or 32K ? Or is=20
> more porting work required?

neil can answer this with authority, but my impression is
there is more to it than simply bumping NFSSVC_MAXBLKSIZE.


-------------------------------------------------------
This SF.net email is sponsored by:
The Definitive IT and Networking Event. Be There!
NetWorld+Interop Las Vegas 2003 -- Register today!
http://ads.sourceforge.net/cgi-bin/redirect.pl?keyn0001en
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-03-31 23:26:41

by Lever, Charles

[permalink] [raw]
Subject: RE: wsize & PAGE_SIZE issues on IA64 clients.

> Is the "wsize < PAGE_SIZE" problem worth a printf in mount or=20
> mountd or
> the kernel?

perhaps an entry in the NFS FAQ might be more timely, considering
how long it would take this kind of change to make it into the
common commercial distributions.


-------------------------------------------------------
This SF.net email is sponsored by: ValueWeb:
Dedicated Hosting for just $79/mo with 500 GB of bandwidth!
No other company gives more support or power for your dedicated server
http://click.atdmt.com/AFF/go/sdnxxaff00300020aff/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-03-31 23:47:37

by Mark Price

[permalink] [raw]
Subject: RE: wsize & PAGE_SIZE issues on IA64 clients.


I'm stil looking into how I can get reasonable performance from ia64 to
ia32 linux, with regards to maximum server block size of 8K. I found some
references to 64K blksize limit with V3 (I assume this meant what the
protocol definition will support - not what linux will support), and also
a 32K blocksize for TCP mounts.

http://www.linux.org/docs/ldp/howto/NFS-HOWTO/performance.html

I haven't played much with TCP mounts, how stable and how well do
they perform under linux?

I'm currently getting my ia64 box rebuilt in order to an NFSSVC_MAXBLKSIZE
of 64K. Anyone ever tried it?

Cheers, Mark.

On Mon, 31 Mar 2003, Lever, Charles wrote:

> > Is the "wsize < PAGE_SIZE" problem worth a printf in mount or
> > mountd or
> > the kernel?
>
> perhaps an entry in the NFS FAQ might be more timely, considering
> how long it would take this kind of change to make it into the
> common commercial distributions.
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by: ValueWeb:
> Dedicated Hosting for just $79/mo with 500 GB of bandwidth!
> No other company gives more support or power for your dedicated server
> http://click.atdmt.com/AFF/go/sdnxxaff00300020aff/direct/01/
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs
>



-------------------------------------------------------
This SF.net email is sponsored by: ValueWeb:
Dedicated Hosting for just $79/mo with 500 GB of bandwidth!
No other company gives more support or power for your dedicated server
http://click.atdmt.com/AFF/go/sdnxxaff00300020aff/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-03-31 19:10:28

by Greg Lindahl

[permalink] [raw]
Subject: Re: wsize & PAGE_SIZE issues on IA64 clients.

> The fix/workaround was to either increase wsize to 16K, or reduce
> PAGE_SIZE to 4K, obviously increasing wsize to 16K makes more sense.
>
> However this leads to a problem between Linux clients and servers where
> the maximum supported block size on the server is 8K and the page size on
> the IA64 client is 16K.

Is the "wsize < PAGE_SIZE" problem worth a printf in mount or mountd or
the kernel?

-- greg



-------------------------------------------------------
This SF.net email is sponsored by: ValueWeb:
Dedicated Hosting for just $79/mo with 500 GB of bandwidth!
No other company gives more support or power for your dedicated server
http://click.atdmt.com/AFF/go/sdnxxaff00300020aff/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-04-01 05:48:38

by Trond Myklebust

[permalink] [raw]
Subject: Re: wsize & PAGE_SIZE issues on IA64 clients.

>>>>> " " == Mark Price <[email protected]> writes:

> I'm currently getting my ia64 box rebuilt in order to an
> NFSSVC_MAXBLKSIZE of 64K. Anyone ever tried it?

That won't work over UDP. The latter has a packet size that is limited
to 64k, so you wouldn't have any space for the RPC header.

For TCP it could work though...

Cheers,
Trond


-------------------------------------------------------
This SF.net email is sponsored by: ValueWeb:
Dedicated Hosting for just $79/mo with 500 GB of bandwidth!
No other company gives more support or power for your dedicated server
http://click.atdmt.com/AFF/go/sdnxxaff00300020aff/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-04-01 06:12:02

by Mark Price

[permalink] [raw]
Subject: Re: wsize & PAGE_SIZE issues on IA64 clients.


Good point ;-) I hadn't thought of that. Its been one of those days!

Cheers, Mark.


On 1 Apr 2003, Trond Myklebust wrote:

> >>>>> " " == Mark Price <[email protected]> writes:
>
> > I'm currently getting my ia64 box rebuilt in order to an
> > NFSSVC_MAXBLKSIZE of 64K. Anyone ever tried it?
>
> That won't work over UDP. The latter has a packet size that is limited
> to 64k, so you wouldn't have any space for the RPC header.
>
> For TCP it could work though...
>
> Cheers,
> Trond
>



-------------------------------------------------------
This SF.net email is sponsored by: ValueWeb:
Dedicated Hosting for just $79/mo with 500 GB of bandwidth!
No other company gives more support or power for your dedicated server
http://click.atdmt.com/AFF/go/sdnxxaff00300020aff/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-04-01 06:25:40

by Trond Myklebust

[permalink] [raw]
Subject: Re: wsize & PAGE_SIZE issues on IA64 clients.


One approach we might want to look into, would be to backport the
'wb_index' from 2.5.x, and then make that map to 4k segments (or
smaller) on ia64. That would divorce the nfs_page from the real page
size.

Needs some work in order to make reads work ('cos you would have
mapped several nfs_page into one real page => the last reader has to
remove the page lock) and a bit of plumbing in pagelist.c, but looks
doable...

Cheers,
Trond


-------------------------------------------------------
This SF.net email is sponsored by: ValueWeb:
Dedicated Hosting for just $79/mo with 500 GB of bandwidth!
No other company gives more support or power for your dedicated server
http://click.atdmt.com/AFF/go/sdnxxaff00300020aff/direct/01/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs