2003-11-28 12:43:47

by Greg Banks

[permalink] [raw]
Subject: [PATCH] SGI 905314 (1/2): make NFSSVC_MAXBLKSIZE depend on PAGE_SIZE

G'day,

SGI bug #905314

This patch makes NFSSVC_MAXBLKSIZE depend on PAGE_SIZE so that machines
with large page sizes can take advantage of that feature to serve NFS
with larger blocksizes, increasing performance and avoiding a fallback
to synchronous traffic between machines with page sizes greater than 8K.
Also, documents the actual constraints on NFSSVC_MAXBLKSIZE.

The patch has been running for some hours now, reading and writing NFSv3
over UDP on gigabit ethernet between two Altix boxes (16K page sizes) with
4G RAM each. I have verified with ethereal that reads and writes proceed
in 32K blocks, and performance tests show good throughput for streaming
reads and writes (although 2.6.0-test8 still does better).


--- /usr/tmp/TmpDir.28396-0/linux/linux/include/linux/nfsd/const.h_1.5 Fri Nov 28
23:07:47 2003
+++ linux/include/linux/nfsd/const.h Fri Nov 28 23:07:36 2003
@@ -12,6 +12,7 @@
#include <linux/nfs.h>
#include <linux/nfs2.h>
#include <linux/nfs3.h>
+#include <asm/page.h>

/*
* Maximum protocol version supported by knfsd
@@ -19,9 +20,24 @@
#define NFSSVC_MAXVERS 3

/*
- * Maximum blocksize supported by daemon currently at 8K
+ * Maximum blocksize supported by daemon. The value is
+ * constrained by 1) has to fit in a UDP datagram less some
+ * headers 2) must be a multiple of page size 3) will have to
+ * be allocated plus some headers as a physically contiguous
+ * buffer for each nfsd. For best performance we usually want
+ * the largest value consistent with those constraints. The
+ * fuzziest condition is 3; here we choose to limit the gfp
+ * allocation order <= 2 and cross our fingers. Note we also
+ * choose powers of 2 like the client code even though there's
+ * no good reason to do so.
*/
+#if PAGE_SIZE > (8*1024)
+#define NFSSVC_MAXBLKSIZE (32*1024)
+#elif PAGE_SIZE == (8*1024)
+#define NFSSVC_MAXBLKSIZE (16*1024)
+#else /* 4K pages */
#define NFSSVC_MAXBLKSIZE (8*1024)
+#endif

#ifdef __KERNEL__




Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.


-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
Does SourceForge.net help you be more productive? Does it
help you create better code? SHARE THE LOVE, and help us help
YOU! Click Here: http://sourceforge.net/donate/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2003-11-30 22:27:26

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH] SGI 905314 (1/2): make NFSSVC_MAXBLKSIZE depend on PAGE_SIZE

On Friday November 28, [email protected] wrote:
> G'day,
>
> SGI bug #905314
>
> This patch makes NFSSVC_MAXBLKSIZE depend on PAGE_SIZE so that machines
> with large page sizes can take advantage of that feature to serve NFS
> with larger blocksizes, increasing performance and avoiding a fallback
> to synchronous traffic between machines with page sizes greater than 8K.
> Also, documents the actual constraints on NFSSVC_MAXBLKSIZE.
>
> The patch has been running for some hours now, reading and writing NFSv3
> over UDP on gigabit ethernet between two Altix boxes (16K page sizes) with
> 4G RAM each. I have verified with ethereal that reads and writes proceed
> in 32K blocks, and performance tests show good throughput for streaming
> reads and writes (although 2.6.0-test8 still does better).

> +#if PAGE_SIZE > (8*1024)
> +#define NFSSVC_MAXBLKSIZE (32*1024)
> +#elif PAGE_SIZE == (8*1024)
> +#define NFSSVC_MAXBLKSIZE (16*1024)
> +#else /* 4K pages */
> #define NFSSVC_MAXBLKSIZE (8*1024)
> +#endif
>


What you you think of simply:

#define NFSSVC_MAXBLKSIZE (PAGE_SIZE * 2)

With a comment "We want this to be large, but we need to be sure that
a kmalloc of this size has a very good chance of succeeding.".

NeilBrown


-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
Does SourceForge.net help you be more productive? Does it
help you create better code? SHARE THE LOVE, and help us help
YOU! Click Here: http://sourceforge.net/donate/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-11-30 23:26:22

by Greg Banks

[permalink] [raw]
Subject: Re: Re: [PATCH] SGI 905314 (1/2): make NFSSVC_MAXBLKSIZE depend on PAGE_SIZE

Neil Brown wrote:
>
> What you you think of simply:
>
> #define NFSSVC_MAXBLKSIZE (PAGE_SIZE * 2)
>

It's not that simple; the ia64 port can be configured for 64K pages,
which would result in nfsd reporting 128K for wtmax on UDP.

Here's a patch which is closer to that simple ideal ;-)

--- /usr/tmp/TmpDir.26244-0/linux/linux/include/linux/nfsd/const.h_1.5 Mon Dec 1
10:24:10 2003
+++ linux/include/linux/nfsd/const.h Mon Dec 1 10:24:11 2003
@@ -12,6 +12,7 @@
#include <linux/nfs.h>
#include <linux/nfs2.h>
#include <linux/nfs3.h>
+#include <asm/page.h>

/*
* Maximum protocol version supported by knfsd
@@ -19,9 +20,16 @@
#define NFSSVC_MAXVERS 3

/*
- * Maximum blocksize supported by daemon currently at 8K
+ * Maximum blocksize supported by daemon. We want the largest
+ * value which 1) fits in a UDP datagram less some headers
+ * 2) is a multiple of page size 3) can be successfully kmalloc()ed
+ * by each nfsd.
*/
-#define NFSSVC_MAXBLKSIZE (8*1024)
+#if PAGE_SIZE > (16*1024)
+#define NFSSVC_MAXBLKSIZE (32*1024)
+#else
+#define NFSSVC_MAXBLKSIZE (2*PAGE_SIZE)
+#endif

#ifdef __KERNEL__



Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.


-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
Does SourceForge.net help you be more productive? Does it
help you create better code? SHARE THE LOVE, and help us help
YOU! Click Here: http://sourceforge.net/donate/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-12-01 11:38:41

by Bogdan Costescu

[permalink] [raw]
Subject: Re: Re: [PATCH] SGI 905314 (1/2): make NFSSVC_MAXBLKSIZE depend on PAGE_SIZE

On Mon, 1 Dec 2003, Greg Banks wrote:

> It's not that simple; the ia64 port can be configured for 64K pages,
> which would result in nfsd reporting 128K for wtmax on UDP.

But allocating only 32K in this case means half a page... A waste. Then
maybe TCP should get its own setting ?

--
Bogdan Costescu

IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: [email protected]



-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
Does SourceForge.net help you be more productive? Does it
help you create better code? SHARE THE LOVE, and help us help
YOU! Click Here: http://sourceforge.net/donate/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-12-02 11:26:46

by Shivaji Navale

[permalink] [raw]
Subject: Mailbox corruption on The NFS server

Hi,

We have this peculiar problem for the Mailboxes of users
/var/spool/mail/username.
The mailboxes get corrupted asto the first 20-26 lines of mailbox get
DELETED.

We are using 2.4.20-18.8.um.1 kernel on the (LVS Director) which exports
the mail partition to 30 NFS/NIS clients.

i am not sure if this would be the right place
to ask this question, but this is a problem bugging us since long.

Googled extensively, but couldnt work out the proper solution.

Could anybody suggest, why this is happening and how it could be overcome.
Thanks a lot

portmapper gives following results

100000 2 tcp 111 portmapper
100000 2 udp 111 portmapper
100024 1 udp 32768 status
100024 1 tcp 32768 status
100004 2 udp 953 ypserv
100004 1 udp 953 ypserv
100004 2 tcp 956 ypserv
100004 1 tcp 956 ypserv
391002 2 tcp 32769 sgi_fam
100003 2 udp 2049 nfs
100003 3 udp 2049 nfs
100021 1 udp 32770 nlockmgr
100021 3 udp 32770 nlockmgr
100021 4 udp 32770 nlockmgr
100005 1 udp 32771 mountd
100005 1 tcp 32770 mountd
100005 2 udp 32771 mountd
100005 2 tcp 32770 mountd
100005 3 udp 32771 mountd
100005 3 tcp 32770 mountd
100009 1 udp 668 yppasswdd
100011 1 udp 907 rquotad
100011 2 udp 907 rquotad
100011 1 tcp 910 rquotad
100011 2 tcp 910 rquotad


i am not sure if this would be the right place
to ask this question, but this is a problem bugging us since long.

Googled extensively, but couldnt work out the proper solution.


-- EVERYONE should contribute to THE BEST of their capacity
for THE DEVELOPMENT of THE NATION
-- A P J KALAM

On Mon, 1 Dec 2003, Bogdan Costescu wrote:

> On Mon, 1 Dec 2003, Greg Banks wrote:
>
> > It's not that simple; the ia64 port can be configured for 64K pages,
> > which would result in nfsd reporting 128K for wtmax on UDP.
>
> But allocating only 32K in this case means half a page... A waste. Then
> maybe TCP should get its own setting ?
>
> --
> Bogdan Costescu
>
> IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
> Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
> Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
> E-mail: [email protected]
>
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by: SF.net Giveback Program.
> Does SourceForge.net help you be more productive? Does it
> help you create better code? SHARE THE LOVE, and help us help
> YOU! Click Here: http://sourceforge.net/donate/
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs
>



-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
Does SourceForge.net help you be more productive? Does it
help you create better code? SHARE THE LOVE, and help us help
YOU! Click Here: http://sourceforge.net/donate/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-12-02 18:51:02

by Juri Haberland

[permalink] [raw]
Subject: Re: Mailbox corruption on The NFS server

Shivaji Navale <[email protected]> wrote:
> Hi,

Hi,

please don't start a new topic/thread by just replying to another mail.
Thanks.

> We have this peculiar problem for the Mailboxes of users
> /var/spool/mail/username.
> The mailboxes get corrupted asto the first 20-26 lines of mailbox get
> DELETED.
>
> We are using 2.4.20-18.8.um.1 kernel on the (LVS Director) which exports
> the mail partition to 30 NFS/NIS clients.

It's considered a bad idea to put mailboxes on a NFS share as there
might be locking issues if two applications simultanously acces the same
mailbox. Either use the maildir format or don't export your mailboxes
via NFS.

Regards,
Juri

--
Juri Haberland <[email protected]>



-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
Does SourceForge.net help you be more productive? Does it
help you create better code? SHARE THE LOVE, and help us help
YOU! Click Here: http://sourceforge.net/donate/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-12-02 19:35:29

by Trond Myklebust

[permalink] [raw]
Subject: Re: Mailbox corruption on The NFS server

>>>>> " " == Juri Haberland <[email protected]> writes:

> It's considered a bad idea to put mailboxes on a NFS share as
> there might be locking issues if two applications simultanously
> acces the same mailbox. Either use the maildir format or don't
> export your mailboxes via NFS.

Sort of. It can be made to work *provided* that you can guarantee that
your mail programs all agree to support the same file locking scheme.
Currently that means they must chose one (or both) of the following
schemes:

- fcntl() (a.k.a. POSIX, a.k.a. lockf()) locking

- dotlocking (a.k.a. creating a lock file using something like 'ln
mailbox .mailbox.locked')

Note that the BSD flock() and use of O_EXCL in a dotlocking scheme are
not considered to be reliable within Linux NFS.

Cheers,
Trond


-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
Does SourceForge.net help you be more productive? Does it
help you create better code? SHARE THE LOVE, and help us help
YOU! Click Here: http://sourceforge.net/donate/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-12-03 10:56:51

by Shivaji Navale

[permalink] [raw]
Subject: Re: Mailbox corruption on The NFS server


On 2 Dec 2003, Trond Myklebust wrote:

> >>>>> " " == Juri Haberland <[email protected]> writes:
>
> > It's considered a bad idea to put mailboxes on a NFS share as
> > there might be locking issues if two applications simultanously
> > acces the same mailbox. Either use the maildir format or don't
> > export your mailboxes via NFS.
>
> Sort of. It can be made to work *provided* that you can guarantee that
> your mail programs all agree to support the same file locking scheme.
> Currently that means they must chose one (or both) of the following
> schemes:
>
> - fcntl() (a.k.a. POSIX, a.k.a. lockf()) locking
>
> - dotlocking (a.k.a. creating a lock file using something like 'ln
> mailbox .mailbox.locked')
>
> Note that the BSD flock() and use of O_EXCL in a dotlocking scheme are
> not considered to be reliable within Linux NFS.

We use sendmail as our MTA and pine for mails, the dotlocking feature is
there with pine. And it appears sendmail on invoking procmail waits for
this lock to be released, if it exists.

Is there a way out of this, or rather not have mailboxes exported to
clients.

thanks,
shivaji...

PS:sorry, for starting new thread from an old one.




>
> Cheers,
> Trond
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by: SF.net Giveback Program.
> Does SourceForge.net help you be more productive? Does it
> help you create better code? SHARE THE LOVE, and help us help
> YOU! Click Here: http://sourceforge.net/donate/
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs
>



-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
Does SourceForge.net help you be more productive? Does it
help you create better code? SHARE THE LOVE, and help us help
YOU! Click Here: http://sourceforge.net/donate/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-12-03 11:38:24

by Shivaji Navale

[permalink] [raw]
Subject: Re: Mailbox corruption on The NFS server


Thanks a lot!!
i would try it out and post the results for perusage.

Would this solve the corruption problem.

Is it that this entire issue is because of pine and sendmail.
And configuring either properly would eradicate the problem.
So NFS doesnt need to be messed with.

thanks
-shivaji


-- EVERYONE should contribute to THE BEST of their capacity
for THE DEVELOPMENT of THE NATION
-- A P J KALAM

On Wed, 3 Dec 2003, Mark Cooke wrote:

> You have to wait for the lock to be released - sendmail doesn't know if
> pine is partway through changing the mailbox file.
>
> Alternatively, switch to using maildir rather than mailbox. Locking
> isn't required at that point - but I believe you have to patch pine for
> maildir support.
>
> Mark
>
> On Wed, 2003-12-03 at 10:56, Shivaji Navale wrote:
> > On 2 Dec 2003, Trond Myklebust wrote:
> >
> > > >>>>> " " == Juri Haberland <[email protected]> writes:
> > >
> > > > It's considered a bad idea to put mailboxes on a NFS share as
> > > > there might be locking issues if two applications simultanously
> > > > acces the same mailbox. Either use the maildir format or don't
> > > > export your mailboxes via NFS.
> > >
> > > Sort of. It can be made to work *provided* that you can guarantee that
> > > your mail programs all agree to support the same file locking scheme.
> > > Currently that means they must chose one (or both) of the following
> > > schemes:
> > >
> > > - fcntl() (a.k.a. POSIX, a.k.a. lockf()) locking
> > >
> > > - dotlocking (a.k.a. creating a lock file using something like 'ln
> > > mailbox .mailbox.locked')
> > >
> > > Note that the BSD flock() and use of O_EXCL in a dotlocking scheme are
> > > not considered to be reliable within Linux NFS.
> >
> > We use sendmail as our MTA and pine for mails, the dotlocking feature is
> > there with pine. And it appears sendmail on invoking procmail waits for
> > this lock to be released, if it exists.
> >
> > Is there a way out of this, or rather not have mailboxes exported to
> > clients.
> >
> > thanks,
> > shivaji...
> >
> > PS:sorry, for starting new thread from an old one.
> >
> >
> >
> >
> > >
> > > Cheers,
> > > Trond
> > >
> > >
> > > -------------------------------------------------------
> > > This SF.net email is sponsored by: SF.net Giveback Program.
> > > Does SourceForge.net help you be more productive? Does it
> > > help you create better code? SHARE THE LOVE, and help us help
> > > YOU! Click Here: http://sourceforge.net/donate/
> > > _______________________________________________
> > > NFS maillist - [email protected]
> > > https://lists.sourceforge.net/lists/listinfo/nfs
> > >
> >
> >
> >
> > -------------------------------------------------------
> > This SF.net email is sponsored by: SF.net Giveback Program.
> > Does SourceForge.net help you be more productive? Does it
> > help you create better code? SHARE THE LOVE, and help us help
> > YOU! Click Here: http://sourceforge.net/donate/
> > _______________________________________________
> > NFS maillist - [email protected]
> > https://lists.sourceforge.net/lists/listinfo/nfs
> --
> Mark Cooke <[email protected]>
> University Of Birmingham
>



-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
Does SourceForge.net help you be more productive? Does it
help you create better code? SHARE THE LOVE, and help us help
YOU! Click Here: http://sourceforge.net/donate/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-12-03 12:36:00

by Mark Cooke

[permalink] [raw]
Subject: Re: Mailbox corruption on The NFS server

Hi,

Shouldn't need to change the NFS server as maildir was specifically
designed to operate with NFS and lockfree schemes.

Useful links:

http://cr.yp.to/proto/maildir.html (Info on maildir)
http://www.firstpr.com.au/web-mail/mb2md/ (Convert mbox to mdir)

Cheers,

Mark

On Wed, 2003-12-03 at 11:38, Shivaji Navale wrote:
> Thanks a lot!!
> i would try it out and post the results for perusage.
>
> Would this solve the corruption problem.
>
> Is it that this entire issue is because of pine and sendmail.
> And configuring either properly would eradicate the problem.
> So NFS doesnt need to be messed with.
>
> thanks
> -shivaji
>
>
> -- EVERYONE should contribute to THE BEST of their capacity
> for THE DEVELOPMENT of THE NATION
> -- A P J KALAM
>
> On Wed, 3 Dec 2003, Mark Cooke wrote:
>
> > You have to wait for the lock to be released - sendmail doesn't know if
> > pine is partway through changing the mailbox file.
> >
> > Alternatively, switch to using maildir rather than mailbox. Locking
> > isn't required at that point - but I believe you have to patch pine for
> > maildir support.
> >
> > Mark
> >
> > On Wed, 2003-12-03 at 10:56, Shivaji Navale wrote:
> > > On 2 Dec 2003, Trond Myklebust wrote:
> > >
> > > > >>>>> " " == Juri Haberland <[email protected]> writes:
> > > >
> > > > > It's considered a bad idea to put mailboxes on a NFS share as
> > > > > there might be locking issues if two applications simultanously
> > > > > acces the same mailbox. Either use the maildir format or don't
> > > > > export your mailboxes via NFS.
> > > >
> > > > Sort of. It can be made to work *provided* that you can guarantee that
> > > > your mail programs all agree to support the same file locking scheme.
> > > > Currently that means they must chose one (or both) of the following
> > > > schemes:
> > > >
> > > > - fcntl() (a.k.a. POSIX, a.k.a. lockf()) locking
> > > >
> > > > - dotlocking (a.k.a. creating a lock file using something like 'ln
> > > > mailbox .mailbox.locked')
> > > >
> > > > Note that the BSD flock() and use of O_EXCL in a dotlocking scheme are
> > > > not considered to be reliable within Linux NFS.
> > >
> > > We use sendmail as our MTA and pine for mails, the dotlocking feature is
> > > there with pine. And it appears sendmail on invoking procmail waits for
> > > this lock to be released, if it exists.
> > >
> > > Is there a way out of this, or rather not have mailboxes exported to
> > > clients.
> > >
> > > thanks,
> > > shivaji...
> > >
> > > PS:sorry, for starting new thread from an old one.
> > >
> > >
> > >
> > >
> > > >
> > > > Cheers,
> > > > Trond
> > > >
> > > >
> > > > -------------------------------------------------------
> > > > This SF.net email is sponsored by: SF.net Giveback Program.
> > > > Does SourceForge.net help you be more productive? Does it
> > > > help you create better code? SHARE THE LOVE, and help us help
> > > > YOU! Click Here: http://sourceforge.net/donate/
> > > > _______________________________________________
> > > > NFS maillist - [email protected]
> > > > https://lists.sourceforge.net/lists/listinfo/nfs
> > > >
> > >
> > >
> > >
> > > -------------------------------------------------------
> > > This SF.net email is sponsored by: SF.net Giveback Program.
> > > Does SourceForge.net help you be more productive? Does it
> > > help you create better code? SHARE THE LOVE, and help us help
> > > YOU! Click Here: http://sourceforge.net/donate/
> > > _______________________________________________
> > > NFS maillist - [email protected]
> > > https://lists.sourceforge.net/lists/listinfo/nfs
> > --
> > Mark Cooke <[email protected]>
> > University Of Birmingham
> >
--
Mark Cooke <[email protected]>
University Of Birmingham



-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
Does SourceForge.net help you be more productive? Does it
help you create better code? SHARE THE LOVE, and help us help
YOU! Click Here: http://sourceforge.net/donate/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs