2016-01-31 12:50:22

by Kinglong Mee

[permalink] [raw]
Subject: nfsd: supports read buffer from multiples pages

ltp fsync02 will cause nfs sending LAYOUTCOMMIT with length
larger than two pages. nfsd returns NFSERR_BAD_XDR right now.

This patch lets nfsd supports read buffer from multiples pages.

Signed-off-by: Kinglong Mee <[email protected]>
---
fs/nfsd/nfs4xdr.c | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index d6ef095..fcf399f 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -143,11 +143,11 @@ static __be32 *read_buf(struct nfsd4_compoundargs *argp, u32 nbytes)
* Maybe we need a new page, maybe we have just run out
*/
unsigned int avail = (char *)argp->end - (char *)argp->p;
+ unsigned int copied = 0;
__be32 *p;
+
if (avail + argp->pagelen < nbytes)
return NULL;
- if (avail + PAGE_SIZE < nbytes) /* need more than a page !! */
- return NULL;
/* ok, we can do it with the current plus the next page */
if (nbytes <= sizeof(argp->tmp))
p = argp->tmp;
@@ -164,9 +164,19 @@ static __be32 *read_buf(struct nfsd4_compoundargs *argp, u32 nbytes)
* guarantee p points to at least nbytes bytes.
*/
memcpy(p, argp->p, avail);
+ copied += avail;
+ nbytes -= avail;
+
+ while (nbytes > PAGE_SIZE) {
+ next_decode_page(argp);
+ memcpy(((char*)p) + copied, argp->p, PAGE_SIZE);
+ copied += PAGE_SIZE;
+ nbytes -= PAGE_SIZE;
+ }
+
next_decode_page(argp);
- memcpy(((char*)p)+avail, argp->p, (nbytes - avail));
- argp->p += XDR_QUADLEN(nbytes - avail);
+ memcpy(((char*)p) + copied, argp->p, nbytes);
+ argp->p += XDR_QUADLEN(nbytes);
return p;
}

--
2.5.0



2016-02-01 18:38:07

by J. Bruce Fields

[permalink] [raw]
Subject: Re: nfsd: supports read buffer from multiples pages

On Sun, Jan 31, 2016 at 08:50:10PM +0800, Kinglong Mee wrote:
> ltp fsync02 will cause nfs sending LAYOUTCOMMIT with length
> larger than two pages. nfsd returns NFSERR_BAD_XDR right now.

This is with the xfs block layout?

Christoph, do we know anything about average or worst-case sizes for
that layout update field?

> This patch lets nfsd supports read buffer from multiples pages.

Hm. We'll end up kmalloc()ing the passed-in field length:

p = argp->tmpp = kmalloc(nbytes, GFP_KERNEL);

We still do still have that (avail + argp->pagelen) limit, so we're not
going to pass arbitrarily large nbytes straight from the network to
kmalloc. But we do try to avoid depending on higher-order allocations.

--b.

>
> Signed-off-by: Kinglong Mee <[email protected]>
> ---
> fs/nfsd/nfs4xdr.c | 18 ++++++++++++++----
> 1 file changed, 14 insertions(+), 4 deletions(-)
>
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index d6ef095..fcf399f 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -143,11 +143,11 @@ static __be32 *read_buf(struct nfsd4_compoundargs *argp, u32 nbytes)
> * Maybe we need a new page, maybe we have just run out
> */
> unsigned int avail = (char *)argp->end - (char *)argp->p;
> + unsigned int copied = 0;
> __be32 *p;
> +
> if (avail + argp->pagelen < nbytes)
> return NULL;
> - if (avail + PAGE_SIZE < nbytes) /* need more than a page !! */
> - return NULL;
> /* ok, we can do it with the current plus the next page */
> if (nbytes <= sizeof(argp->tmp))
> p = argp->tmp;
> @@ -164,9 +164,19 @@ static __be32 *read_buf(struct nfsd4_compoundargs *argp, u32 nbytes)
> * guarantee p points to at least nbytes bytes.
> */
> memcpy(p, argp->p, avail);
> + copied += avail;
> + nbytes -= avail;
> +
> + while (nbytes > PAGE_SIZE) {
> + next_decode_page(argp);
> + memcpy(((char*)p) + copied, argp->p, PAGE_SIZE);
> + copied += PAGE_SIZE;
> + nbytes -= PAGE_SIZE;
> + }
> +
> next_decode_page(argp);
> - memcpy(((char*)p)+avail, argp->p, (nbytes - avail));
> - argp->p += XDR_QUADLEN(nbytes - avail);
> + memcpy(((char*)p) + copied, argp->p, nbytes);
> + argp->p += XDR_QUADLEN(nbytes);
> return p;
> }
>
> --
> 2.5.0

2016-02-01 18:48:36

by J. Bruce Fields

[permalink] [raw]
Subject: Re: nfsd: supports read buffer from multiples pages

On Mon, Feb 01, 2016 at 01:38:05PM -0500, bfields wrote:
> On Sun, Jan 31, 2016 at 08:50:10PM +0800, Kinglong Mee wrote:
> > ltp fsync02 will cause nfs sending LAYOUTCOMMIT with length
> > larger than two pages. nfsd returns NFSERR_BAD_XDR right now.
>
> This is with the xfs block layout?
>
> Christoph, do we know anything about average or worst-case sizes for
> that layout update field?
>
> > This patch lets nfsd supports read buffer from multiples pages.
>
> Hm. We'll end up kmalloc()ing the passed-in field length:
>
> p = argp->tmpp = kmalloc(nbytes, GFP_KERNEL);
>
> We still do still have that (avail + argp->pagelen) limit, so we're not
> going to pass arbitrarily large nbytes straight from the network to
> kmalloc. But we do try to avoid depending on higher-order allocations.

(Which it looks like we were allowing before, possibly by accident. But
I doubt they were actually happening in practice, so that's not evidence
that we don't need to worry about allocations greater than a page.)

--b.

2016-02-02 00:54:59

by Kinglong Mee

[permalink] [raw]
Subject: Re: nfsd: supports read buffer from multiples pages

On 2/2/2016 02:38, J. Bruce Fields wrote:
> On Sun, Jan 31, 2016 at 08:50:10PM +0800, Kinglong Mee wrote:
>> ltp fsync02 will cause nfs sending LAYOUTCOMMIT with length
>> larger than two pages. nfsd returns NFSERR_BAD_XDR right now.
>
> This is with the xfs block layout?

Yes, xfs block layout.
Tested by ltp's fsync02.

thanks,
Kinglong Mee
>
> Christoph, do we know anything about average or worst-case sizes for
> that layout update field?
>
>> This patch lets nfsd supports read buffer from multiples pages.
>
> Hm. We'll end up kmalloc()ing the passed-in field length:
>
> p = argp->tmpp = kmalloc(nbytes, GFP_KERNEL);
>
> We still do still have that (avail + argp->pagelen) limit, so we're not
> going to pass arbitrarily large nbytes straight from the network to
> kmalloc. But we do try to avoid depending on higher-order allocations.
>
> --b.
>
>>
>> Signed-off-by: Kinglong Mee <[email protected]>
>> ---
>> fs/nfsd/nfs4xdr.c | 18 ++++++++++++++----
>> 1 file changed, 14 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
>> index d6ef095..fcf399f 100644
>> --- a/fs/nfsd/nfs4xdr.c
>> +++ b/fs/nfsd/nfs4xdr.c
>> @@ -143,11 +143,11 @@ static __be32 *read_buf(struct nfsd4_compoundargs *argp, u32 nbytes)
>> * Maybe we need a new page, maybe we have just run out
>> */
>> unsigned int avail = (char *)argp->end - (char *)argp->p;
>> + unsigned int copied = 0;
>> __be32 *p;
>> +
>> if (avail + argp->pagelen < nbytes)
>> return NULL;
>> - if (avail + PAGE_SIZE < nbytes) /* need more than a page !! */
>> - return NULL;
>> /* ok, we can do it with the current plus the next page */
>> if (nbytes <= sizeof(argp->tmp))
>> p = argp->tmp;
>> @@ -164,9 +164,19 @@ static __be32 *read_buf(struct nfsd4_compoundargs *argp, u32 nbytes)
>> * guarantee p points to at least nbytes bytes.
>> */
>> memcpy(p, argp->p, avail);
>> + copied += avail;
>> + nbytes -= avail;
>> +
>> + while (nbytes > PAGE_SIZE) {
>> + next_decode_page(argp);
>> + memcpy(((char*)p) + copied, argp->p, PAGE_SIZE);
>> + copied += PAGE_SIZE;
>> + nbytes -= PAGE_SIZE;
>> + }
>> +
>> next_decode_page(argp);
>> - memcpy(((char*)p)+avail, argp->p, (nbytes - avail));
>> - argp->p += XDR_QUADLEN(nbytes - avail);
>> + memcpy(((char*)p) + copied, argp->p, nbytes);
>> + argp->p += XDR_QUADLEN(nbytes);
>> return p;
>> }
>>
>> --
>> 2.5.0
>

2016-02-02 09:20:56

by Christoph Hellwig

[permalink] [raw]
Subject: Re: nfsd: supports read buffer from multiples pages

On Mon, Feb 01, 2016 at 01:38:05PM -0500, J. Bruce Fields wrote:
> This is with the xfs block layout?
>
> Christoph, do we know anything about average or worst-case sizes for
> that layout update field?

The average is rather small and fits into a single page, the worst
case is basically unlimited:

(file size / block size) * sizeof(pnfs_block_extent)

by the protocol, and about half that for a non-stupid client as
it would merge consecutive blocks and only trigger something close
to the worst case for a "block allocated, block hole, block allocated, ..."
pattern.

2016-02-02 14:29:49

by J. Bruce Fields

[permalink] [raw]
Subject: Re: nfsd: supports read buffer from multiples pages

On Tue, Feb 02, 2016 at 10:20:54AM +0100, Christoph Hellwig wrote:
> On Mon, Feb 01, 2016 at 01:38:05PM -0500, J. Bruce Fields wrote:
> > This is with the xfs block layout?
> >
> > Christoph, do we know anything about average or worst-case sizes for
> > that layout update field?
>
> The average is rather small and fits into a single page, the worst
> case is basically unlimited:
>
> (file size / block size) * sizeof(pnfs_block_extent)
>
> by the protocol, and about half that for a non-stupid client as
> it would merge consecutive blocks and only trigger something close
> to the worst case for a "block allocated, block hole, block allocated, ..."
> pattern.

OK. And what's the failure mode if the layoutcommit fails? I guess the
client returns the layout and redoes IO through the MDS. For a rare
failure maybe that's not so terrible.

So I guess the right thing to do is take Kinglong's patch for now.

After that, it wouldn't be that hard for nfsd4_block_proc_layoutcommit
to decode from an array of pages. But does the iomaps array end up
being just as big?

--b.

2016-02-29 09:49:58

by Christoph Hellwig

[permalink] [raw]
Subject: Re: nfsd: supports read buffer from multiples pages

On Tue, Feb 02, 2016 at 09:29:48AM -0500, J. Bruce Fields wrote:
> OK. And what's the failure mode if the layoutcommit fails? I guess the
> client returns the layout and redoes IO through the MDS. For a rare
> failure maybe that's not so terrible.
>
> So I guess the right thing to do is take Kinglong's patch for now.

Yes, it would be great to take it.

> After that, it wouldn't be that hard for nfsd4_block_proc_layoutcommit
> to decode from an array of pages. But does the iomaps array end up
> being just as big?

The block layout extents are rather bloated, so it will be significantly
smaller.