2006-08-14 07:00:19

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH 003 of 4] Prepare knfsd for support of rsize/wsize of up to 1MB, over TCP.

On Tuesday August 8, [email protected] wrote:
> On Mon, 2006-08-07 at 23:04, Chuck Lever wrote:
> > I know the name NFSSVC_MAXBLKSIZE is historical, but to me it is
> > confusing. It isn't a block size at all, it's really a maximum
> > tranfser size, in bytes, right?
>
> NFSSVC_MAXBLKSIZE is the maximum NFS payload size, i.e. the largest
> size of file data that can passed in WRITE and returned from READ.
> The transfer size (as I understand the term) would be that plus the
> NFS and RPC headers, i.e. 65536 bytes for UDP. So I think the name
> is mostly OK.
>

It tripped over this issue when I was working through this patch set,
and decided to ignore it :-)

Mostly we try to maintain clear separation between the sunrpc layer
and the NFS layer. But it breaks down here.

The RPC layer allocates a certain amount of memory for each request.
This need to hold the request and the reply, both rounded up to whole
pages (with allowances for odd things like the tail of a readdir
appearing in the first page of the reply, but that isn't really
important).

The NFS server requests a buffer size, which should be enough to hold
the largest req+reply pair.
For v2 and v3 at least, this is one page each plus the largest IO
block.

So far this makes sense. NFSd should ask for 2+blocksize having
convinced itself that apart from the datablocks, the whole req and
reply each fit in a page (and if they don't we have lots of other
breakage).

But the names don't quite follow this.
RPCSVC_MAXPAYLOAD really doesn't say anything important about RPC
RPCSVC_MAXPAGES is a significant value, and maybe it is the one we
should be explicitly setting, though we would set it to
1024*1024/PAGE_SIZE + 2
which is ugly.

Then NFSSVC_MAXBLKSIZE is set to RPCSVC_MAXPAYLOAD which isn't
really right. It should be (RPCSVC_MAXPAGES-2)*PAGE_SIZE

And NFS_BUFSIZE is ((RPC_MAX_HEADER_WITH_AUTH+26)*XDR_UNIT +
NFSSVC_MAXBLKSIZE)
which sort-of makes sense, except that we really need 2
headers in there, both the request and the reply. And
we need them both to be page aligned, so giving a number of bytes
is odd.

And then there is RPCSVC_MAXPAYLOAD_UDP... which is really
a very NFS specific thing.

So......
Maybe we should pass a number of pages to svc_create rather than a
number of bytes. This needs to be the max size of a request + reply.
And svc_init_buffer shouldn't really add a magic '2'... or should it?

And I think we should replace svc_max_payload with nfsd_max_payload.

So nfsd calls svc_create with (1 + nfsd_max_block_pages + 1)
and svc_init_buffer allocates that many pages.
Then nfsd_max_payload returns
(rqstp->rq_server->sv_bufpages-2)<<PAGE_SHIFT;

But is it really worth the effort?
I guess that must be why I decided to ignore it.

Any suggestions?

NeilBrown


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2006-08-14 07:53:02

by Greg Banks

[permalink] [raw]
Subject: Re: [PATCH 003 of 4] Prepare knfsd for support of rsize/wsize of up to 1MB, over TCP.

On Mon, 2006-08-14 at 17:00, Neil Brown wrote:
> On Tuesday August 8, [email protected] wrote:
> > On Mon, 2006-08-07 at 23:04, Chuck Lever wrote:
> > > I know the name NFSSVC_MAXBLKSIZE is historical, but to me it is
> > > confusing. It isn't a block size at all, it's really a maximum
> > > tranfser size, in bytes, right?
> >
> > NFSSVC_MAXBLKSIZE is the maximum NFS payload size, i.e. the largest
> > size of file data that can passed in WRITE and returned from READ.
> > The transfer size (as I understand the term) would be that plus the
> > NFS and RPC headers, i.e. 65536 bytes for UDP. So I think the name
> > is mostly OK.
> >
>
> It tripped over this issue when I was working through this patch set,
> and decided to ignore it :-)
>
> Mostly we try to maintain clear separation between the sunrpc layer
> and the NFS layer. But it breaks down here.
>
> The RPC layer allocates a certain amount of memory for each request.
> This need to hold the request and the reply, both rounded up to whole
> pages (with allowances for odd things like the tail of a readdir
> appearing in the first page of the reply, but that isn't really
> important).
>
> The NFS server requests a buffer size, which should be enough to hold
> the largest req+reply pair.
> For v2 and v3 at least, this is one page each plus the largest IO
> block.
>
> So far this makes sense. NFSd should ask for 2+blocksize having
> convinced itself that apart from the datablocks, the whole req and
> reply each fit in a page (and if they don't we have lots of other
> breakage).

Agreed.

> But the names don't quite follow this.
> RPCSVC_MAXPAYLOAD really doesn't say anything important about RPC

True, it's an NFS level concept really. In sunrpc/ it serves
only to make the maths prettier...

> RPCSVC_MAXPAGES is a significant value, and maybe it is the one we
> should be explicitly setting, though we would set it to
> 1024*1024/PAGE_SIZE + 2
> which is ugly.

...like that.

> Then NFSSVC_MAXBLKSIZE is set to RPCSVC_MAXPAYLOAD which isn't
> really right. It should be (RPCSVC_MAXPAGES-2)*PAGE_SIZE

Sure.

> And NFS_BUFSIZE is ((RPC_MAX_HEADER_WITH_AUTH+26)*XDR_UNIT +
> NFSSVC_MAXBLKSIZE)
> which sort-of makes sense, except that we really need 2
> headers in there, both the request and the reply.

That's right, but this works because the resulting number is
rounded up to a page size anyway. We could have

#define NFS_BUFSIZE 1

and it would still work.

> And
> we need them both to be page aligned, so giving a number of bytes
> is odd.
>
> And then there is RPCSVC_MAXPAYLOAD_UDP... which is really
> a very NFS specific thing.

I presume you mean _V2 ?

> So......
> Maybe we should pass a number of pages to svc_create rather than a
> number of bytes. This needs to be the max size of a request + reply.
> And svc_init_buffer shouldn't really add a magic '2'... or should it?

Sure, you could do it that way.

> And I think we should replace svc_max_payload with nfsd_max_payload.

Ok, that makes more sense now that you have a /proc entry.
How about nfsd_max_transfer_size() ?

> So nfsd calls svc_create with (1 + nfsd_max_block_pages + 1)
> and svc_init_buffer allocates that many pages.
> Then nfsd_max_payload returns
> (rqstp->rq_server->sv_bufpages-2)<<PAGE_SHIFT;

Sure.

> But is it really worth the effort?

I think the entire way the sunrpc serverside code handles memory
is messy and error prone and if I had my druthers I'd spend a few
weeks rewriting it entirely.

> I guess that must be why I decided to ignore it.
>
> Any suggestions?

I think the best thing in the short term would be to carefully
rename a few functions and preserve the existing logic, but
perhaps this is just innate conservatism.

Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.



-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs