2011-11-29 16:13:40

by Peng Tao

[permalink] [raw]
Subject: [PATCH-RESEND 4/4] pnfsblock: do not ask for layout in pg_init

Asking for layout in pg_init will always make client ask for only 4KB
layout in every layoutget. This way, client drops the IO size information
that is meaningful for MDS in handing out layout.

In stead, if layout is not find in cache, do not send layoutget
at once. Wait until before issuing IO in pnfs_do_multiple_reads/writes
because that is where we know the real size of current IO. By telling the
real IO size to MDS, MDS will have a better chance to give proper layout.

Signed-off-by: Peng Tao <[email protected]>
---
Resend to fix patch title. Sorry for the noise...

fs/nfs/blocklayout/blocklayout.c | 54 ++++++++++++++++++++++++++++++++++++-
1 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
index 48cfac3..fd585fe 100644
--- a/fs/nfs/blocklayout/blocklayout.c
+++ b/fs/nfs/blocklayout/blocklayout.c
@@ -39,6 +39,7 @@
#include <linux/prefetch.h>

#include "blocklayout.h"
+#include "../internal.h"

#define NFSDBG_FACILITY NFSDBG_PNFS_LD

@@ -990,14 +991,63 @@ bl_clear_layoutdriver(struct nfs_server *server)
return 0;
}

+/* While RFC doesn't limit maximum size of layout, we better limit it ourself. */
+#define PNFSBLK_MAXRSIZE (0x1<<22)
+#define PNFSBLK_MAXWSIZE (0x1<<21)
+static void
+bl_pg_init_read(struct nfs_pageio_descriptor *pgio, struct nfs_page *req)
+{
+ struct inode *ino = pgio->pg_inode;
+ struct pnfs_layout_hdr *lo;
+
+ BUG_ON(pgio->pg_lseg != NULL);
+ spin_lock(&ino->i_lock);
+ lo = pnfs_find_alloc_layout(ino, req->wb_context, GFP_KERNEL);
+ if (!lo || test_bit(lo_fail_bit(IOMODE_READ), &lo->plh_flags)) {
+ spin_unlock(&ino->i_lock);
+ nfs_pageio_reset_read_mds(pgio);
+ return;
+ }
+
+ pgio->pg_bsize = PNFSBLK_MAXRSIZE;
+ pgio->pg_lseg = pnfs_find_get_layout_locked(ino,
+ req_offset(req),
+ req->wb_bytes,
+ IOMODE_READ);
+ spin_unlock(&ino->i_lock);
+}
+
+static void
+bl_pg_init_write(struct nfs_pageio_descriptor *pgio, struct nfs_page *req)
+{
+ struct inode *ino = pgio->pg_inode;
+ struct pnfs_layout_hdr *lo;
+
+ BUG_ON(pgio->pg_lseg != NULL);
+ spin_lock(&ino->i_lock);
+ lo = pnfs_find_alloc_layout(ino, req->wb_context, GFP_NOFS);
+ if (!lo || test_bit(lo_fail_bit(IOMODE_RW), &lo->plh_flags)) {
+ spin_unlock(&ino->i_lock);
+ nfs_pageio_reset_write_mds(pgio);
+ return;
+ }
+
+ pgio->pg_bsize = PNFSBLK_MAXWSIZE;
+ pgio->pg_lseg = pnfs_find_get_layout_locked(ino,
+ req_offset(req),
+ req->wb_bytes,
+ IOMODE_RW);
+ spin_unlock(&ino->i_lock);
+}
+
static const struct nfs_pageio_ops bl_pg_read_ops = {
- .pg_init = pnfs_generic_pg_init_read,
+ .pg_init = bl_pg_init_read,
.pg_test = pnfs_generic_pg_test,
.pg_doio = pnfs_generic_pg_readpages,
};

static const struct nfs_pageio_ops bl_pg_write_ops = {
- .pg_init = pnfs_generic_pg_init_write,
+ .pg_init = bl_pg_init_write,
.pg_test = pnfs_generic_pg_test,
.pg_doio = pnfs_generic_pg_writepages,
};
--
1.7.1.262.g5ef3d



2011-12-01 01:18:29

by Boaz Harrosh

[permalink] [raw]
Subject: Re: [PATCH-RESEND 4/4] pnfsblock: do not ask for layout in pg_init

On 11/30/2011 05:17 AM, Peng Tao wrote:
>>>
>>> +/* While RFC doesn't limit maximum size of layout, we better limit it ourself. */
>>
>> Why is that?
>> What do these arbitrary numbers represent?
>> If these limits depend on some other system sizes they should reflect the dependency
>> as part of their calculation.
> What I wanted to add here is a limit to stop pg_test() (like object's
> max_io_size) and 2MB is just an experience value...
>
> Thanks,
> Tao
>>
>> Benny
>>
>>> +#define PNFSBLK_MAXRSIZE (0x1<<22)
>>> +#define PNFSBLK_MAXWSIZE (0x1<<21)

You see this is the basic principal flaw of your scheme. It is equating IO sizes
with lseg sizes.

Lets back up for a second

A. First thing to understand is that any segmenting server be it blocks objects
or files, will want the client to report to the best of it's knowledge
the intention of the writing application. Therefor a solution should be
good for all Three. What ever you are trying to do should not be private to
blocks and must not conflict with other LO needs.

Note: that the NFS-write-out stack since it holds back on writing until
sync time or memory pressure that in most cases at the point of IO has at
it's disposal the complete application IO in it's page collection per file.
(Exception is very large writes which is fine to split, given resources condition
on the client)

So below when I say application we can later mean the complete page list
available per inode at the time of write-out.

B. The *optimum* for any segmented server is:
(and addressing Trond's concern of seg list exploding and never freeing up)

B.1. If an application will write O..N of the file
1. Get one lo_seg of 0..N
2. IO at max_io from O to N until done.
3. Return or forget the lo_seg

B.2. In the case of random IO O1..N1, O2..N2,..., On..Nn

For objects and files (segmented) the optimum is still:
1. Get one lo_seg of 01..Nn
2. IO at max_io for each Ox..Nx until done.
(objects: max_io is a factor of BIO sizes group boundary and alignments.
files: max_io is stripe_unit)
3. Return or forget the 1 lo_seg

For blocks the optimum is
1. Get n lo_segs of O1..N1, O2..N2,..., On..Nn
2. IO at max_io for each Ox..Nx until done.
3. Return or forget any Ox..Nx who's IO is done

You can see that stage 2. for any kind of LO and in either B.1 or B.2 cases
is the same.
And this is, as the author intended, the .bg_init -> pg_test -> pg_IO.

For blocks with in .write_paglist there is an internal loop that re-slices the
requested linear pagelist to extents, possibly slicing each extent at bio_size
boundaries. At files and objects this slicing (though I admit very different)
actually happen at .pg_test, so at .write_paglist the request is sent in full.

C. So back to our problem:

C.1 NACK on your patchset. You are shouting to the roof how the client must
report to the Server (as hint) to the best of it's knowledge what the
application is going to do. And then you sneakily introduce an IO_MAX limitation.

This you MUST fix. Ether you send good server hint for the anticipated
application IO or not at all.

(The Server can always introduce it's own slicing and limits)

You did all this because you have circumvented the chance to do so at .pg_test
because you want the .bg_init -> pg_test -> pg_IO. loop to be your
O1..N1, O2..N2,...,On..Nn parser.

C.2 You must work out a system which will satisfy not only blocks (MPFS) server
But any segmenting server out there. blocks objects or files (segmented)
By reporting the best information you have and letting the Server do it's
decisions.

Now by postponing the report to after .pg_test -> .pg_IO you break the way
objects and files IO slicing works, and leaves them in the dark. I'm not sure
you really mean that each LO needs to do it's own private hacks?


C.3 Say we go back to the drawing board and want to do the stage 1 above of
sending the exact information to server, be it B.1 or B.2.

a. We want it at .pg_init so we have a layout at .pg_test to inspect.

Done properly will let you, in blocks, slice by extents at .pg_test
and .write_pages can send the complete paglist to md (bio chaining)

b. Say theoretically that we are willing to spend CPU and memory to collect
that information, like for example also pre-loop the page-list and/or
call the LO for the final decision.

So my all point is that b. above should eventually happen but efficiently by
pre-collecting some counters. (Remember that we already saw all these pages
in generic nfs at the vfs .write_pages vector)

Then since .pg_init is already called into LO, just change the API so the
LO have all the needed information available be it B.1 or B.2 and in return
will pass on to pnfs.c the actual lo_seg size optimal. In B.1 they all
send the same thing. In B.2 they differ.

We can start by doing all the API changes so .pg_init can specify and
return the suggested lo_size. And perhaps we add to the nfs_pageio_descriptor,
passed to .pg_init, a couple of members describing above
O1 - the index of the first page
N1 - The length up to the firs hole
Nn - Highest written page


At first version:
A good approximation which gives you an exact middle point
between blocks B.2 and objects/files B.2, is dirty count.
At later patch:
Have generic NFS collect the above O1, N1, and Nn for you and base
your decision on that.


And stop the private blocks hacks and the IO_MAX capping on the lo_seg
size.

Boaz

2011-11-30 12:58:07

by Benny Halevy

[permalink] [raw]
Subject: Re: [PATCH-RESEND 4/4] pnfsblock: do not ask for layout in pg_init

On 2011-12-03 06:56, Peng Tao wrote:
> Asking for layout in pg_init will always make client ask for only 4KB
> layout in every layoutget. This way, client drops the IO size information
> that is meaningful for MDS in handing out layout.
>
> In stead, if layout is not find in cache, do not send layoutget
> at once. Wait until before issuing IO in pnfs_do_multiple_reads/writes
> because that is where we know the real size of current IO. By telling the
> real IO size to MDS, MDS will have a better chance to give proper layout.
>
> Signed-off-by: Peng Tao <[email protected]>
> ---
> Resend to fix patch title. Sorry for the noise...
>
> fs/nfs/blocklayout/blocklayout.c | 54 ++++++++++++++++++++++++++++++++++++-
> 1 files changed, 52 insertions(+), 2 deletions(-)
>
> diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
> index 48cfac3..fd585fe 100644
> --- a/fs/nfs/blocklayout/blocklayout.c
> +++ b/fs/nfs/blocklayout/blocklayout.c
> @@ -39,6 +39,7 @@
> #include <linux/prefetch.h>
>
> #include "blocklayout.h"
> +#include "../internal.h"
>
> #define NFSDBG_FACILITY NFSDBG_PNFS_LD
>
> @@ -990,14 +991,63 @@ bl_clear_layoutdriver(struct nfs_server *server)
> return 0;
> }
>
> +/* While RFC doesn't limit maximum size of layout, we better limit it ourself. */

Why is that?
What do these arbitrary numbers represent?
If these limits depend on some other system sizes they should reflect the dependency
as part of their calculation.

Benny

> +#define PNFSBLK_MAXRSIZE (0x1<<22)
> +#define PNFSBLK_MAXWSIZE (0x1<<21)
> +static void
> +bl_pg_init_read(struct nfs_pageio_descriptor *pgio, struct nfs_page *req)
> +{
> + struct inode *ino = pgio->pg_inode;
> + struct pnfs_layout_hdr *lo;
> +
> + BUG_ON(pgio->pg_lseg != NULL);
> + spin_lock(&ino->i_lock);
> + lo = pnfs_find_alloc_layout(ino, req->wb_context, GFP_KERNEL);
> + if (!lo || test_bit(lo_fail_bit(IOMODE_READ), &lo->plh_flags)) {
> + spin_unlock(&ino->i_lock);
> + nfs_pageio_reset_read_mds(pgio);
> + return;
> + }
> +
> + pgio->pg_bsize = PNFSBLK_MAXRSIZE;
> + pgio->pg_lseg = pnfs_find_get_layout_locked(ino,
> + req_offset(req),
> + req->wb_bytes,
> + IOMODE_READ);
> + spin_unlock(&ino->i_lock);
> +}
> +
> +static void
> +bl_pg_init_write(struct nfs_pageio_descriptor *pgio, struct nfs_page *req)
> +{
> + struct inode *ino = pgio->pg_inode;
> + struct pnfs_layout_hdr *lo;
> +
> + BUG_ON(pgio->pg_lseg != NULL);
> + spin_lock(&ino->i_lock);
> + lo = pnfs_find_alloc_layout(ino, req->wb_context, GFP_NOFS);
> + if (!lo || test_bit(lo_fail_bit(IOMODE_RW), &lo->plh_flags)) {
> + spin_unlock(&ino->i_lock);
> + nfs_pageio_reset_write_mds(pgio);
> + return;
> + }
> +
> + pgio->pg_bsize = PNFSBLK_MAXWSIZE;
> + pgio->pg_lseg = pnfs_find_get_layout_locked(ino,
> + req_offset(req),
> + req->wb_bytes,
> + IOMODE_RW);
> + spin_unlock(&ino->i_lock);
> +}
> +
> static const struct nfs_pageio_ops bl_pg_read_ops = {
> - .pg_init = pnfs_generic_pg_init_read,
> + .pg_init = bl_pg_init_read,
> .pg_test = pnfs_generic_pg_test,
> .pg_doio = pnfs_generic_pg_readpages,
> };
>
> static const struct nfs_pageio_ops bl_pg_write_ops = {
> - .pg_init = pnfs_generic_pg_init_write,
> + .pg_init = bl_pg_init_write,
> .pg_test = pnfs_generic_pg_test,
> .pg_doio = pnfs_generic_pg_writepages,
> };

2011-11-30 13:18:06

by Peng Tao

[permalink] [raw]
Subject: Re: [PATCH-RESEND 4/4] pnfsblock: do not ask for layout in pg_init

On Wed, Nov 30, 2011 at 8:57 PM, Benny Halevy <[email protected]> wrote:
> On 2011-12-03 06:56, Peng Tao wrote:
>> Asking for layout in pg_init will always make client ask for only 4KB
>> layout in every layoutget. This way, client drops the IO size information
>> that is meaningful for MDS in handing out layout.
>>
>> In stead, if layout is not find in cache, do not send layoutget
>> at once. Wait until before issuing IO in pnfs_do_multiple_reads/writes
>> because that is where we know the real size of current IO. By telling the
>> real IO size to MDS, MDS will have a better chance to give proper layout.
>>
>> Signed-off-by: Peng Tao <[email protected]>
>> ---
>> Resend to fix patch title. Sorry for the noise...
>>
>>  fs/nfs/blocklayout/blocklayout.c |   54 ++++++++++++++++++++++++++++++++++++-
>>  1 files changed, 52 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
>> index 48cfac3..fd585fe 100644
>> --- a/fs/nfs/blocklayout/blocklayout.c
>> +++ b/fs/nfs/blocklayout/blocklayout.c
>> @@ -39,6 +39,7 @@
>>  #include <linux/prefetch.h>
>>
>>  #include "blocklayout.h"
>> +#include "../internal.h"
>>
>>  #define NFSDBG_FACILITY      NFSDBG_PNFS_LD
>>
>> @@ -990,14 +991,63 @@ bl_clear_layoutdriver(struct nfs_server *server)
>>       return 0;
>>  }
>>
>> +/* While RFC doesn't limit maximum size of layout, we better limit it ourself. */
>
> Why is that?
> What do these arbitrary numbers represent?
> If these limits depend on some other system sizes they should reflect the dependency
> as part of their calculation.
What I wanted to add here is a limit to stop pg_test() (like object's
max_io_size) and 2MB is just an experience value...

Thanks,
Tao
>
> Benny
>
>> +#define PNFSBLK_MAXRSIZE (0x1<<22)
>> +#define PNFSBLK_MAXWSIZE (0x1<<21)
>> +static void
>> +bl_pg_init_read(struct nfs_pageio_descriptor *pgio, struct nfs_page *req)
>> +{
>> +     struct inode *ino = pgio->pg_inode;
>> +     struct pnfs_layout_hdr *lo;
>> +
>> +     BUG_ON(pgio->pg_lseg != NULL);
>> +     spin_lock(&ino->i_lock);
>> +     lo = pnfs_find_alloc_layout(ino, req->wb_context, GFP_KERNEL);
>> +     if (!lo || test_bit(lo_fail_bit(IOMODE_READ), &lo->plh_flags)) {
>> +             spin_unlock(&ino->i_lock);
>> +             nfs_pageio_reset_read_mds(pgio);
>> +             return;
>> +     }
>> +
>> +     pgio->pg_bsize = PNFSBLK_MAXRSIZE;
>> +     pgio->pg_lseg = pnfs_find_get_layout_locked(ino,
>> +                                             req_offset(req),
>> +                                             req->wb_bytes,
>> +                                             IOMODE_READ);
>> +     spin_unlock(&ino->i_lock);
>> +}
>> +
>> +static void
>> +bl_pg_init_write(struct nfs_pageio_descriptor *pgio, struct nfs_page *req)
>> +{
>> +     struct inode *ino = pgio->pg_inode;
>> +     struct pnfs_layout_hdr *lo;
>> +
>> +     BUG_ON(pgio->pg_lseg != NULL);
>> +     spin_lock(&ino->i_lock);
>> +     lo = pnfs_find_alloc_layout(ino, req->wb_context, GFP_NOFS);
>> +     if (!lo || test_bit(lo_fail_bit(IOMODE_RW), &lo->plh_flags)) {
>> +             spin_unlock(&ino->i_lock);
>> +             nfs_pageio_reset_write_mds(pgio);
>> +             return;
>> +     }
>> +
>> +     pgio->pg_bsize = PNFSBLK_MAXWSIZE;
>> +     pgio->pg_lseg = pnfs_find_get_layout_locked(ino,
>> +                                             req_offset(req),
>> +                                             req->wb_bytes,
>> +                                             IOMODE_RW);
>> +     spin_unlock(&ino->i_lock);
>> +}
>> +
>>  static const struct nfs_pageio_ops bl_pg_read_ops = {
>> -     .pg_init = pnfs_generic_pg_init_read,
>> +     .pg_init = bl_pg_init_read,
>>       .pg_test = pnfs_generic_pg_test,
>>       .pg_doio = pnfs_generic_pg_readpages,
>>  };
>>
>>  static const struct nfs_pageio_ops bl_pg_write_ops = {
>> -     .pg_init = pnfs_generic_pg_init_write,
>> +     .pg_init = bl_pg_init_write,
>>       .pg_test = pnfs_generic_pg_test,
>>       .pg_doio = pnfs_generic_pg_writepages,
>>  };



--
Thanks,
Tao

2011-11-29 17:48:42

by Jim Rees

[permalink] [raw]
Subject: Re: [PATCH-RESEND 4/4] pnfsblock: do not ask for layout in pg_init

Peng Tao wrote:

Asking for layout in pg_init will always make client ask for only 4KB
layout in every layoutget. This way, client drops the IO size information
that is meaningful for MDS in handing out layout.

In stead, if layout is not find in cache, do not send layoutget
at once. Wait until before issuing IO in pnfs_do_multiple_reads/writes
because that is where we know the real size of current IO. By telling the
real IO size to MDS, MDS will have a better chance to give proper layout.

Signed-off-by: Peng Tao <[email protected]>
---
Resend to fix patch title. Sorry for the noise...

You may want to fix the patch date too, on this series and the other series
you sent earlier (later?). I know China is in a later timezone but I don't
think it's that much later.

2011-11-30 05:43:58

by Peng, Tao

[permalink] [raw]
Subject: RE: [PATCH-RESEND 4/4] pnfsblock: do not ask for layout in pg_init

PiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiBGcm9tOiBKaW0gUmVlcyBbbWFpbHRvOnJl
ZXNAdW1pY2guZWR1XQ0KPiBTZW50OiBXZWRuZXNkYXksIE5vdmVtYmVyIDMwLCAyMDExIDE6NDkg
QU0NCj4gVG86IFBlbmcgVGFvDQo+IENjOiBUcm9uZC5NeWtsZWJ1c3RAbmV0YXBwLmNvbTsgbGlu
dXgtbmZzQHZnZXIua2VybmVsLm9yZzsgYmhhbGV2eUB0b25pYW4uY29tOyBQZW5nLCBUYW8NCj4g
U3ViamVjdDogUmU6IFtQQVRDSC1SRVNFTkQgNC80XSBwbmZzYmxvY2s6IGRvIG5vdCBhc2sgZm9y
IGxheW91dCBpbiBwZ19pbml0DQo+IA0KPiBQZW5nIFRhbyB3cm90ZToNCj4gDQo+ICAgQXNraW5n
IGZvciBsYXlvdXQgaW4gcGdfaW5pdCB3aWxsIGFsd2F5cyBtYWtlIGNsaWVudCBhc2sgZm9yIG9u
bHkgNEtCDQo+ICAgbGF5b3V0IGluIGV2ZXJ5IGxheW91dGdldC4gVGhpcyB3YXksIGNsaWVudCBk
cm9wcyB0aGUgSU8gc2l6ZSBpbmZvcm1hdGlvbg0KPiAgIHRoYXQgaXMgbWVhbmluZ2Z1bCBmb3Ig
TURTIGluIGhhbmRpbmcgb3V0IGxheW91dC4NCj4gDQo+ICAgSW4gc3RlYWQsIGlmIGxheW91dCBp
cyBub3QgZmluZCBpbiBjYWNoZSwgZG8gbm90IHNlbmQgbGF5b3V0Z2V0DQo+ICAgYXQgb25jZS4g
V2FpdCB1bnRpbCBiZWZvcmUgaXNzdWluZyBJTyBpbiBwbmZzX2RvX211bHRpcGxlX3JlYWRzL3dy
aXRlcw0KPiAgIGJlY2F1c2UgdGhhdCBpcyB3aGVyZSB3ZSBrbm93IHRoZSByZWFsIHNpemUgb2Yg
Y3VycmVudCBJTy4gQnkgdGVsbGluZyB0aGUNCj4gICByZWFsIElPIHNpemUgdG8gTURTLCBNRFMg
d2lsbCBoYXZlIGEgYmV0dGVyIGNoYW5jZSB0byBnaXZlIHByb3BlciBsYXlvdXQuDQo+IA0KPiAg
IFNpZ25lZC1vZmYtYnk6IFBlbmcgVGFvIDxwZW5nX3Rhb0BlbWMuY29tPg0KPiAgIC0tLQ0KPiAg
IFJlc2VuZCB0byBmaXggcGF0Y2ggdGl0bGUuIFNvcnJ5IGZvciB0aGUgbm9pc2UuLi4NCj4gDQo+
IFlvdSBtYXkgd2FudCB0byBmaXggdGhlIHBhdGNoIGRhdGUgdG9vLCBvbiB0aGlzIHNlcmllcyBh
bmQgdGhlIG90aGVyIHNlcmllcw0KPiB5b3Ugc2VudCBlYXJsaWVyIChsYXRlcj8pLiAgSSBrbm93
IENoaW5hIGlzIGluIGEgbGF0ZXIgdGltZXpvbmUgYnV0IEkgZG9uJ3QNCj4gdGhpbmsgaXQncyB0
aGF0IG11Y2ggbGF0ZXIuDQpTb3JyeS4uLiBsb29rIGxpa2UgdGhlIGRhdGUgb2YgdGhlIG1hY2hp
bmUgSSB1c2VkIHRvIHNlbmQgcGF0Y2hlcyBpcyBzb21laG93IG1lc3NlZCB1cC4gSSB3aWxsIHBh
eSBhdHRlbnRpb24gdG8gaXQgbmV4dCB0aW1lLg0KDQpUaGFua3MsDQpUYW8NCg==

2011-12-01 18:00:51

by Boaz Harrosh

[permalink] [raw]
Subject: Re: [PATCH-RESEND 4/4] pnfsblock: do not ask for layout in pg_init

On 12/01/2011 09:33 AM, Boaz Harrosh wrote:
> On 11/30/2011 09:05 PM, [email protected] wrote:
>>> -----Original Message-----
>> Why return or forget the 1 lo_seg? What you really need to avoid seg
>> list exploding is to have LRU based caching and merge them when
>> necessary, instead of asking and dropping lseg again and again...
>>
>
> Ya Ya, I'm talking in abstract now giving the all picture. If in our
> above case the application has closed the file and will not ever open it
> again then I'm right, right? of course once you got it you can keep it
> around in a cache. Though I think that with heavy segmenting keeping segs
> passed ROC is extremely harder to manage. Be careful with that in current
> implementation.
>
>> Removing the IO_MAX limitation can be a second optimization. I was
>> hoping to remove it if current IO_MAX thing turns out hurting
>> performance. And one reason for IO_MAX is to avoid the likelihood
>> that server returns short layout, because current implementation is
>> limited to retry nfs_read/write_data as a whole instead of splitting
>> it up. I think that if we do it this way, the IO_MAX can be removed
>> later when necessary, by introducing a splitting mechanism either on
>> nfs_read/write_data or on desc. Now that you ask for it, I think
>> following approach is possible:
>>
>> 1. remove the limit on IO_MAX at .pg_test.
>> 2. ask for layout at .pg_doio for the size of current IO desc
>> 3. if server returns short layout, split nfs_read/write_data or desc and issue the pagelist covered by lseg.
>> 4. do 2 and 3 in a loop until all pages in current desc is handled.
>>
>
> So in effect you are doing what I suggest two passes
> one: what's the next hole,
> second: Collect pages slicing by returned lo_seg
>
> Don't you think it is more simple to do a 3 line preliminary
> loop in "one:"?
>
> and keep the current code that is now exactly built to do
> "second:"
>
> You are suggesting to effectively repeat current code using
> the first .pg_init...pg_doio for one: and hacking for blocks
> "second:"
>
> I want 3 lines of code for one: and keep second: exactly as is.
>
> <snip>
>
>> Looks like you are suggesting going through the dirty page list twice
>> before issuing IO, one just for getting the IO size information and
>> another one for page collapsing. The whole point of moving layoutget
>> to .pg_doio is to collect real IO size there because we don't know it
>> at .pg_init. And it is done without any changes to generic NFS IO
>> path. I'm not sure if it is appropriate to change generic IO routine
>> to collect the information before .pg_init (I'd be happy too if we
>> can do it there).
>
> That's what you are suggesting as well look in your step 4.:
> do 2 and 3 in a loop until all pages in current desc is handled
> sounds like another loop on all pages no?
>
> BTW: we are already doing two passes in the system we already looked
> through all the pages when building the io desc at .write/read_pages
>
> At first we can do above 3 lines loop in .pg_init. Second we can
> just collect that information in generic nfs at the first loop
> we are already doing
>
>>>
>>> a. We want it at .pg_init so we have a layout at .pg_test to inspect.
>>>
>>> Done properly will let you, in blocks, slice by extents at .pg_test
>>> and .write_pages can send the complete paglist to md (bio chaining)
>>>
>> Unlike objects and files, blocks don't slice by extents, not at .pg_test, nor at .read/write_pagelist.
>>
>
> What? What do you do? Send a scsi scatter list command?
> I don't think so. Somewhere you have to see an extent boundary of the data
> and send the continue of the next extent on disk as a new block request
> in a new bio with a new disk offset. No?
>
> I'm not saying to do this right away, but you could simplify the code a lot
> by slicing by extent inside .pg_test
>
> <>
>>>
>>> At first version:
>>> A good approximation which gives you an exact middle point
>>> between blocks B.2 and objects/files B.2, is dirty count.
>>> At later patch:
>>> Have generic NFS collect the above O1, N1, and Nn for you and base
>>> your decision on that.
>>>
>>
>> Well, unless you put both the two parts in... The first version is
>> ignoring the fact that blocks MDS cannot give out file stripping
>> information as easily as objects and files do. And I will stand
>> against it alone because all it does is to benefit objects while
>> hurting blocks (files don't care because they use whole file layout,
>> at least for now).
>
> Lets make a computerize lets find O1..N1 and put in the Generic

computerize => compromise

> code for everyone objects files-segmented and blocks. Because I
> need it two. And I'd like O1..Nn but for first step I'd love
> O1..N1 a lot.
>
> Please see my point. You created a system that only blocks can
> benefit from unless objects repeats the same exact but duplicated
> hacks as blocks.
>
> Please do the *very* simple thing I suggest which we can all enjoy.
>
>>
>> Otherwise, I would suggest having private hack for blocks because we have a real problem to solve.
>>
>
> It's just as real for objects, and files when they will do segments.
>
> My suggestion is to have two helpers at pnfs.c one for blocks and
> objects, one for files. Which can be called in .pg_init.
> The blocks/objects does a simple loop counting the first contiguous
> chunk and asks for a layout, like today. files one just sends
> all-file request.
>
> Boaz
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


2011-12-01 09:57:55

by Benny Halevy

[permalink] [raw]
Subject: Re: [PATCH-RESEND 4/4] pnfsblock: do not ask for layout in pg_init

On 2011-12-01 07:05, [email protected] wrote:
> 1. remove the limit on IO_MAX at .pg_test.
> 2. ask for layout at .pg_doio for the size of current IO desc
> 3. if server returns short layout, split nfs_read/write_data or desc and issue the pagelist covered by lseg.
> 4. do 2 and 3 in a loop until all pages in current desc is handled.

Sounds reasonable to me.

Benny

2011-12-07 14:09:38

by Boaz Harrosh

[permalink] [raw]
Subject: Re: [PATCH-RESEND 4/4] pnfsblock: do not ask for layout in pg_init

On 12/02/2011 06:59 AM, [email protected] wrote:
>>
> I'm sorry but I don't get it... How do you count the first contiguous
> chunk in .pg_init? All you know at .pg_init is an empty io desc and
> an nfs page. Are you going to scan the radix tree and find the next
> page that write_cache_pages is going to flush out? I don't really
> think anyone will agree on it... Please look into the second part of
> your solution and then you'll see why I am worried that it may never
> get done.>
> Best Regards,
> Tao
>

Rrrr, one thing I cannot do is make some one understand.

I guess I'll have to do it my self. It's so simple I'm go'ne cry.

I have more pressed things on my plate then fix broken blocks servers
but you are forcing my hands. The last code you sent is not acceptable
because it is totally not usable by objects and files (segments).

Give me a week. Not that it should take longer then an hour but
because I have burning matters to attend to first.

Boaz

>> Boaz
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>


2011-12-01 05:05:47

by Peng, Tao

[permalink] [raw]
Subject: RE: [PATCH-RESEND 4/4] pnfsblock: do not ask for layout in pg_init

PiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiBGcm9tOiBsaW51eC1uZnMtb3duZXJAdmdl
ci5rZXJuZWwub3JnIFttYWlsdG86bGludXgtbmZzLW93bmVyQHZnZXIua2VybmVsLm9yZ10gT24g
QmVoYWxmIE9mIEJvYXoNCj4gSGFycm9zaA0KPiBTZW50OiBUaHVyc2RheSwgRGVjZW1iZXIgMDEs
IDIwMTEgOToxOCBBTQ0KPiBUbzogUGVuZyBUYW8NCj4gQ2M6IEJlbm55IEhhbGV2eTsgVHJvbmQu
TXlrbGVidXN0QG5ldGFwcC5jb207IGxpbnV4LW5mc0B2Z2VyLmtlcm5lbC5vcmc7IFBlbmcsIFRh
bw0KPiBTdWJqZWN0OiBSZTogW1BBVENILVJFU0VORCA0LzRdIHBuZnNibG9jazogZG8gbm90IGFz
ayBmb3IgbGF5b3V0IGluIHBnX2luaXQNCj4gDQo+IE9uIDExLzMwLzIwMTEgMDU6MTcgQU0sIFBl
bmcgVGFvIHdyb3RlOg0KPiA+Pj4NCj4gPj4+ICsvKiBXaGlsZSBSRkMgZG9lc24ndCBsaW1pdCBt
YXhpbXVtIHNpemUgb2YgbGF5b3V0LCB3ZSBiZXR0ZXIgbGltaXQgaXQgb3Vyc2VsZi4gKi8NCj4g
Pj4NCj4gPj4gV2h5IGlzIHRoYXQ/DQo+ID4+IFdoYXQgZG8gdGhlc2UgYXJiaXRyYXJ5IG51bWJl
cnMgcmVwcmVzZW50Pw0KPiA+PiBJZiB0aGVzZSBsaW1pdHMgZGVwZW5kIG9uIHNvbWUgb3RoZXIg
c3lzdGVtIHNpemVzIHRoZXkgc2hvdWxkIHJlZmxlY3QgdGhlIGRlcGVuZGVuY3kNCj4gPj4gYXMg
cGFydCBvZiB0aGVpciBjYWxjdWxhdGlvbi4NCj4gPiBXaGF0IEkgd2FudGVkIHRvIGFkZCBoZXJl
IGlzIGEgbGltaXQgdG8gc3RvcCBwZ190ZXN0KCkgKGxpa2Ugb2JqZWN0J3MNCj4gPiBtYXhfaW9f
c2l6ZSkgYW5kIDJNQiBpcyBqdXN0IGFuIGV4cGVyaWVuY2UgdmFsdWUuLi4NCj4gPg0KPiA+IFRo
YW5rcywNCj4gPiBUYW8NCj4gPj4NCj4gPj4gQmVubnkNCj4gPj4NCj4gPj4+ICsjZGVmaW5lIFBO
RlNCTEtfTUFYUlNJWkUgKDB4MTw8MjIpDQo+ID4+PiArI2RlZmluZSBQTkZTQkxLX01BWFdTSVpF
ICgweDE8PDIxKQ0KPiANCj4gWW91IHNlZSB0aGlzIGlzIHRoZSBiYXNpYyBwcmluY2lwYWwgZmxh
dyBvZiB5b3VyIHNjaGVtZS4gSXQgaXMgZXF1YXRpbmcgSU8gc2l6ZXMNCj4gd2l0aCBsc2VnIHNp
emVzLg0KPiANCj4gTGV0cyBiYWNrIHVwIGZvciBhIHNlY29uZA0KPiANCj4gQS4gRmlyc3QgdGhp
bmcgdG8gdW5kZXJzdGFuZCBpcyB0aGF0IGFueSBzZWdtZW50aW5nIHNlcnZlciBiZSBpdCBibG9j
a3Mgb2JqZWN0cw0KPiAgICBvciBmaWxlcywgd2lsbCB3YW50IHRoZSBjbGllbnQgdG8gcmVwb3J0
IHRvIHRoZSBiZXN0IG9mIGl0J3Mga25vd2xlZGdlDQo+ICAgIHRoZSBpbnRlbnRpb24gb2YgdGhl
IHdyaXRpbmcgYXBwbGljYXRpb24uIFRoZXJlZm9yIGEgc29sdXRpb24gc2hvdWxkIGJlDQo+ICAg
IGdvb2QgZm9yIGFsbCBUaHJlZS4gV2hhdCBldmVyIHlvdSBhcmUgdHJ5aW5nIHRvIGRvIHNob3Vs
ZCBub3QgYmUgcHJpdmF0ZSB0bw0KPiAgICBibG9ja3MgYW5kIG11c3Qgbm90IGNvbmZsaWN0IHdp
dGggb3RoZXIgTE8gbmVlZHMuDQo+IA0KPiAgICBOb3RlOiB0aGF0IHRoZSBORlMtd3JpdGUtb3V0
IHN0YWNrIHNpbmNlIGl0IGhvbGRzIGJhY2sgb24gd3JpdGluZyB1bnRpbA0KPiAgICBzeW5jIHRp
bWUgb3IgbWVtb3J5IHByZXNzdXJlIHRoYXQgaW4gbW9zdCBjYXNlcyBhdCB0aGUgcG9pbnQgb2Yg
SU8gaGFzIGF0DQo+ICAgIGl0J3MgZGlzcG9zYWwgdGhlIGNvbXBsZXRlIGFwcGxpY2F0aW9uIElP
IGluIGl0J3MgcGFnZSBjb2xsZWN0aW9uIHBlciBmaWxlLg0KPiAgICAoRXhjZXB0aW9uIGlzIHZl
cnkgbGFyZ2Ugd3JpdGVzIHdoaWNoIGlzIGZpbmUgdG8gc3BsaXQsIGdpdmVuIHJlc291cmNlcyBj
b25kaXRpb24NCj4gICAgIG9uIHRoZSBjbGllbnQpDQo+IA0KPiAgICBTbyBiZWxvdyB3aGVuIEkg
c2F5IGFwcGxpY2F0aW9uIHdlIGNhbiBsYXRlciBtZWFuIHRoZSBjb21wbGV0ZSBwYWdlIGxpc3QN
Cj4gICAgYXZhaWxhYmxlIHBlciBpbm9kZSBhdCB0aGUgdGltZSBvZiB3cml0ZS1vdXQuDQo+IA0K
PiBCLiBUaGUgKm9wdGltdW0qIGZvciBhbnkgc2VnbWVudGVkIHNlcnZlciBpczoNCj4gICAgKGFu
ZCBhZGRyZXNzaW5nIFRyb25kJ3MgY29uY2VybiBvZiBzZWcgbGlzdCBleHBsb2RpbmcgYW5kIG5l
dmVyIGZyZWVpbmcgdXApDQo+IA0KPiBCLjEuIElmIGFuIGFwcGxpY2F0aW9uIHdpbGwgd3JpdGUg
Ty4uTiBvZiB0aGUgZmlsZQ0KPiAxLiBHZXQgb25lIGxvX3NlZyBvZiAwLi5ODQo+IDIuIElPIGF0
IG1heF9pbyBmcm9tIE8gdG8gTiB1bnRpbCBkb25lLg0KPiAzLiBSZXR1cm4gb3IgZm9yZ2V0IHRo
ZSBsb19zZWcNCj4gDQo+IEIuMi4gSW4gdGhlIGNhc2Ugb2YgcmFuZG9tIElPIE8xLi5OMSwgTzIu
Lk4yLC4uLiwgT24uLk5uDQo+IA0KPiBGb3Igb2JqZWN0cyBhbmQgZmlsZXMgKHNlZ21lbnRlZCkg
dGhlIG9wdGltdW0gaXMgc3RpbGw6DQo+IDEuIEdldCBvbmUgbG9fc2VnIG9mIDAxLi5Obg0KPiAy
LiBJTyBhdCBtYXhfaW8gZm9yIGVhY2ggT3guLk54IHVudGlsIGRvbmUuDQo+ICAgIChvYmplY3Rz
OiBtYXhfaW8gaXMgYSBmYWN0b3Igb2YgQklPIHNpemVzIGdyb3VwIGJvdW5kYXJ5IGFuZCBhbGln
bm1lbnRzLg0KPiAgICAgZmlsZXM6IG1heF9pbyBpcyBzdHJpcGVfdW5pdCkNCj4gMy4gUmV0dXJu
IG9yIGZvcmdldCB0aGUgMSBsb19zZWcNCldoeSByZXR1cm4gb3IgZm9yZ2V0IHRoZSAxIGxvX3Nl
Zz8NCldoYXQgeW91IHJlYWxseSBuZWVkIHRvIGF2b2lkIHNlZyBsaXN0IGV4cGxvZGluZyBpcyB0
byBoYXZlIExSVSBiYXNlZCBjYWNoaW5nIGFuZCBtZXJnZSB0aGVtIHdoZW4gbmVjZXNzYXJ5LCBp
bnN0ZWFkIG9mIGFza2luZyBhbmQgZHJvcHBpbmcgbHNlZyBhZ2FpbiBhbmQgYWdhaW4uLi4NCg0K
PiANCj4gRm9yIGJsb2NrcyB0aGUgb3B0aW11bSBpcw0KPiAxLiBHZXQgbiBsb19zZWdzIG9mIE8x
Li5OMSwgTzIuLk4yLC4uLiwgT24uLk5uDQo+IDIuIElPIGF0IG1heF9pbyBmb3IgZWFjaCBPeC4u
TnggdW50aWwgZG9uZS4NCj4gMy4gUmV0dXJuIG9yIGZvcmdldCBhbnkgT3guLk54IHdobydzIElP
IGlzIGRvbmUNCj4gDQo+IFlvdSBjYW4gc2VlIHRoYXQgc3RhZ2UgMi4gZm9yIGFueSBraW5kIG9m
IExPIGFuZCBpbiBlaXRoZXIgQi4xIG9yIEIuMiBjYXNlcw0KPiBpcyB0aGUgc2FtZS4NCj4gQW5k
IHRoaXMgaXMsIGFzIHRoZSBhdXRob3IgaW50ZW5kZWQsIHRoZSAuYmdfaW5pdCAtPiBwZ190ZXN0
IC0+IHBnX0lPLg0KPiANCj4gRm9yIGJsb2NrcyB3aXRoIGluIC53cml0ZV9wYWdsaXN0IHRoZXJl
IGlzIGFuIGludGVybmFsIGxvb3AgdGhhdCByZS1zbGljZXMgdGhlDQo+IHJlcXVlc3RlZCBsaW5l
YXIgcGFnZWxpc3QgdG8gZXh0ZW50cywgcG9zc2libHkgc2xpY2luZyBlYWNoIGV4dGVudCBhdCBi
aW9fc2l6ZQ0KPiBib3VuZGFyaWVzLiBBdCBmaWxlcyBhbmQgb2JqZWN0cyB0aGlzIHNsaWNpbmcg
KHRob3VnaCBJIGFkbWl0IHZlcnkgZGlmZmVyZW50KQ0KPiBhY3R1YWxseSBoYXBwZW4gYXQgLnBn
X3Rlc3QsIHNvIGF0IC53cml0ZV9wYWdsaXN0IHRoZSByZXF1ZXN0IGlzIHNlbnQgaW4gZnVsbC4N
Cj4gDQo+IEMuIFNvIGJhY2sgdG8gb3VyIHByb2JsZW06DQo+IA0KPiBDLjEgTkFDSyBvbiB5b3Vy
IHBhdGNoc2V0LiBZb3UgYXJlIHNob3V0aW5nIHRvIHRoZSByb29mIGhvdyB0aGUgY2xpZW50IG11
c3QNCj4gICAgIHJlcG9ydCB0byB0aGUgU2VydmVyIChhcyBoaW50KSB0byB0aGUgYmVzdCBvZiBp
dCdzIGtub3dsZWRnZSB3aGF0IHRoZQ0KPiAgICAgYXBwbGljYXRpb24gaXMgZ29pbmcgdG8gZG8u
IEFuZCB0aGVuIHlvdSBzbmVha2lseSBpbnRyb2R1Y2UgYW4gSU9fTUFYIGxpbWl0YXRpb24uDQo+
IA0KPiAgICAgVGhpcyB5b3UgTVVTVCBmaXguIEV0aGVyIHlvdSBzZW5kIGdvb2Qgc2VydmVyIGhp
bnQgZm9yIHRoZSBhbnRpY2lwYXRlZA0KPiAgICAgYXBwbGljYXRpb24gSU8gb3Igbm90IGF0IGFs
bC4NCj4gDQpSZW1vdmluZyB0aGUgSU9fTUFYIGxpbWl0YXRpb24gY2FuIGJlIGEgc2Vjb25kIG9w
dGltaXphdGlvbi4gSSB3YXMgaG9waW5nIHRvIHJlbW92ZSBpdCBpZiBjdXJyZW50IElPX01BWCB0
aGluZyB0dXJucyBvdXQgaHVydGluZyBwZXJmb3JtYW5jZS4gQW5kIG9uZSByZWFzb24gZm9yIElP
X01BWCBpcyB0byBhdm9pZCB0aGUgbGlrZWxpaG9vZCB0aGF0IHNlcnZlciByZXR1cm5zIHNob3J0
IGxheW91dCwgYmVjYXVzZSBjdXJyZW50IGltcGxlbWVudGF0aW9uIGlzIGxpbWl0ZWQgdG8gcmV0
cnkgbmZzX3JlYWQvd3JpdGVfZGF0YSBhcyBhIHdob2xlIGluc3RlYWQgb2Ygc3BsaXR0aW5nIGl0
IHVwLiBJIHRoaW5rIHRoYXQgaWYgd2UgZG8gaXQgdGhpcyB3YXksIHRoZSBJT19NQVggY2FuIGJl
IHJlbW92ZWQgbGF0ZXIgd2hlbiBuZWNlc3NhcnksIGJ5IGludHJvZHVjaW5nIGEgc3BsaXR0aW5n
IG1lY2hhbmlzbSBlaXRoZXIgb24gbmZzX3JlYWQvd3JpdGVfZGF0YSBvciBvbiBkZXNjLiBOb3cg
dGhhdCB5b3UgYXNrIGZvciBpdCwgSSB0aGluayBmb2xsb3dpbmcgYXBwcm9hY2ggaXMgcG9zc2li
bGU6DQoxLiByZW1vdmUgdGhlIGxpbWl0IG9uIElPX01BWCBhdCAucGdfdGVzdC4NCjIuIGFzayBm
b3IgbGF5b3V0IGF0IC5wZ19kb2lvIGZvciB0aGUgc2l6ZSBvZiBjdXJyZW50IElPIGRlc2MNCjMu
IGlmIHNlcnZlciByZXR1cm5zIHNob3J0IGxheW91dCwgc3BsaXQgbmZzX3JlYWQvd3JpdGVfZGF0
YSBvciBkZXNjIGFuZCBpc3N1ZSB0aGUgcGFnZWxpc3QgY292ZXJlZCBieSBsc2VnLg0KNC4gZG8g
MiBhbmQgMyBpbiBhIGxvb3AgdW50aWwgYWxsIHBhZ2VzIGluIGN1cnJlbnQgZGVzYyBpcyBoYW5k
bGVkLg0KDQo+ICAgICAoVGhlIFNlcnZlciBjYW4gYWx3YXlzIGludHJvZHVjZSBpdCdzIG93biBz
bGljaW5nIGFuZCBsaW1pdHMpDQo+IA0KPiAgICAgWW91IGRpZCBhbGwgdGhpcyBiZWNhdXNlIHlv
dSBoYXZlIGNpcmN1bXZlbnRlZCB0aGUgY2hhbmNlIHRvIGRvIHNvIGF0IC5wZ190ZXN0DQo+ICAg
ICBiZWNhdXNlIHlvdSB3YW50IHRoZSAuYmdfaW5pdCAtPiBwZ190ZXN0IC0+IHBnX0lPLiBsb29w
IHRvIGJlIHlvdXINCj4gICAgIE8xLi5OMSwgTzIuLk4yLC4uLixPbi4uTm4gcGFyc2VyLg0KPiAN
Cj4gQy4yIFlvdSBtdXN0IHdvcmsgb3V0IGEgc3lzdGVtIHdoaWNoIHdpbGwgc2F0aXNmeSBub3Qg
b25seSBibG9ja3MgKE1QRlMpIHNlcnZlcg0KPiAgICAgQnV0IGFueSBzZWdtZW50aW5nIHNlcnZl
ciBvdXQgdGhlcmUuIGJsb2NrcyBvYmplY3RzIG9yIGZpbGVzIChzZWdtZW50ZWQpDQo+ICAgICBC
eSByZXBvcnRpbmcgdGhlIGJlc3QgaW5mb3JtYXRpb24geW91IGhhdmUgYW5kIGxldHRpbmcgdGhl
IFNlcnZlciBkbyBpdCdzDQo+ICAgICBkZWNpc2lvbnMuDQo+IA0KPiAgICAgTm93IGJ5IHBvc3Rw
b25pbmcgdGhlIHJlcG9ydCB0byBhZnRlciAucGdfdGVzdCAtPiAucGdfSU8geW91IGJyZWFrIHRo
ZSB3YXkNCj4gICAgIG9iamVjdHMgYW5kIGZpbGVzIElPIHNsaWNpbmcgd29ya3MsIGFuZCBsZWF2
ZXMgdGhlbSBpbiB0aGUgZGFyay4gSSdtIG5vdCBzdXJlDQo+ICAgICB5b3UgcmVhbGx5IG1lYW4g
dGhhdCBlYWNoIExPIG5lZWRzIHRvIGRvIGl0J3Mgb3duIHByaXZhdGUgaGFja3M/DQo+IA0KSSBh
bSBub3QgYXdhcmUgb2YgdGhhdC4uLiBUaGUgb25seSByZXF1aXJlbWVudCBmb3IgYmxvY2tzIGlz
IHRoYXQgcGFnZXMgbXVzdCBiZSBjb250aW51b3VzLg0KDQo+IA0KPiBDLjMgU2F5IHdlIGdvIGJh
Y2sgdG8gdGhlIGRyYXdpbmcgYm9hcmQgYW5kIHdhbnQgdG8gZG8gdGhlIHN0YWdlIDEgYWJvdmUg
b2YNCj4gICAgIHNlbmRpbmcgdGhlIGV4YWN0IGluZm9ybWF0aW9uIHRvIHNlcnZlciwgYmUgaXQg
Qi4xIG9yIEIuMi4NCj4gDQo+ICAgICBhLiBXZSB3YW50IGl0IGF0IC5wZ19pbml0IHNvIHdlIGhh
dmUgYSBsYXlvdXQgYXQgLnBnX3Rlc3QgdG8gaW5zcGVjdC4NCj4gDQo+ICAgICAgICBEb25lIHBy
b3Blcmx5IHdpbGwgbGV0IHlvdSwgaW4gYmxvY2tzLCBzbGljZSBieSBleHRlbnRzIGF0IC5wZ190
ZXN0DQo+ICAgICAgICBhbmQgLndyaXRlX3BhZ2VzIGNhbiBzZW5kIHRoZSBjb21wbGV0ZSBwYWds
aXN0IHRvIG1kIChiaW8gY2hhaW5pbmcpDQo+IA0KVW5saWtlIG9iamVjdHMgYW5kIGZpbGVzLCBi
bG9ja3MgZG9uJ3Qgc2xpY2UgYnkgZXh0ZW50cywgbm90IGF0IC5wZ190ZXN0LCBub3IgYXQgLnJl
YWQvd3JpdGVfcGFnZWxpc3QuDQogDQo+ICAgICBiLiBTYXkgdGhlb3JldGljYWxseSB0aGF0IHdl
IGFyZSB3aWxsaW5nIHRvIHNwZW5kIENQVSBhbmQgbWVtb3J5IHRvIGNvbGxlY3QNCj4gICAgICAg
IHRoYXQgaW5mb3JtYXRpb24sIGxpa2UgZm9yIGV4YW1wbGUgYWxzbyBwcmUtbG9vcCB0aGUgcGFn
ZS1saXN0IGFuZC9vcg0KPiAgICAgICAgY2FsbCB0aGUgTE8gZm9yIHRoZSBmaW5hbCBkZWNpc2lv
bi4NCj4gDQo+ICAgICBTbyBteSBhbGwgcG9pbnQgaXMgdGhhdCBiLiBhYm92ZSBzaG91bGQgZXZl
bnR1YWxseSBoYXBwZW4gYnV0IGVmZmljaWVudGx5IGJ5DQo+ICAgICBwcmUtY29sbGVjdGluZyBz
b21lIGNvdW50ZXJzLiAoUmVtZW1iZXIgdGhhdCB3ZSBhbHJlYWR5IHNhdyBhbGwgdGhlc2UgcGFn
ZXMNCj4gICAgIGluIGdlbmVyaWMgbmZzIGF0IHRoZSB2ZnMgLndyaXRlX3BhZ2VzIHZlY3RvcikN
Cj4gDQo+ICAgICBUaGVuIHNpbmNlIC5wZ19pbml0IGlzIGFscmVhZHkgY2FsbGVkIGludG8gTE8s
IGp1c3QgY2hhbmdlIHRoZSBBUEkgc28gdGhlDQo+ICAgICBMTyBoYXZlIGFsbCB0aGUgbmVlZGVk
IGluZm9ybWF0aW9uIGF2YWlsYWJsZSBiZSBpdCBCLjEgb3IgQi4yIGFuZCBpbiByZXR1cm4NCj4g
ICAgIHdpbGwgcGFzcyBvbiB0byBwbmZzLmMgdGhlIGFjdHVhbCBsb19zZWcgc2l6ZSBvcHRpbWFs
LiBJbiBCLjEgdGhleSBhbGwNCj4gICAgIHNlbmQgdGhlIHNhbWUgdGhpbmcuIEluIEIuMiB0aGV5
IGRpZmZlci4NCj4gDQo+ICAgICBXZSBjYW4gc3RhcnQgYnkgZG9pbmcgYWxsIHRoZSBBUEkgY2hh
bmdlcyBzbyAucGdfaW5pdCBjYW4gc3BlY2lmeSBhbmQNCj4gICAgIHJldHVybiB0aGUgc3VnZ2Vz
dGVkIGxvX3NpemUuIEFuZCBwZXJoYXBzIHdlIGFkZCB0byB0aGUgbmZzX3BhZ2Vpb19kZXNjcmlw
dG9yLA0KPiAgICAgcGFzc2VkIHRvIC5wZ19pbml0LCBhIGNvdXBsZSBvZiBtZW1iZXJzIGRlc2Ny
aWJpbmcgYWJvdmUNCj4gICAgIE8xIC0gdGhlIGluZGV4IG9mIHRoZSBmaXJzdCBwYWdlDQo+ICAg
ICBOMSAtIFRoZSBsZW5ndGggdXAgdG8gdGhlIGZpcnMgaG9sZQ0KPiAgICAgTm4gLSBIaWdoZXN0
IHdyaXR0ZW4gcGFnZQ0KTG9va3MgbGlrZSB5b3UgYXJlIHN1Z2dlc3RpbmcgZ29pbmcgdGhyb3Vn
aCB0aGUgZGlydHkgcGFnZSBsaXN0IHR3aWNlIGJlZm9yZSBpc3N1aW5nIElPLCBvbmUganVzdCBm
b3IgZ2V0dGluZyB0aGUgSU8gc2l6ZSBpbmZvcm1hdGlvbiBhbmQgYW5vdGhlciBvbmUgZm9yIHBh
Z2UgY29sbGFwc2luZy4gVGhlIHdob2xlIHBvaW50IG9mIG1vdmluZyBsYXlvdXRnZXQgdG8gLnBn
X2RvaW8gaXMgdG8gY29sbGVjdCByZWFsIElPIHNpemUgdGhlcmUgYmVjYXVzZSB3ZSBkb24ndCBr
bm93IGl0IGF0IC5wZ19pbml0LiBBbmQgaXQgaXMgZG9uZSB3aXRob3V0IGFueSBjaGFuZ2VzIHRv
IGdlbmVyaWMgTkZTIElPIHBhdGguIEknbSBub3Qgc3VyZSBpZiBpdCBpcyBhcHByb3ByaWF0ZSB0
byBjaGFuZ2UgZ2VuZXJpYyBJTyByb3V0aW5lIHRvIGNvbGxlY3QgdGhlIGluZm9ybWF0aW9uIGJl
Zm9yZSAucGdfaW5pdCAoSSdkIGJlIGhhcHB5IHRvbyBpZiB3ZSBjYW4gZG8gaXQgdGhlcmUpLg0K
DQpUcm9uZCwgY291bGQgeW91IHBsZWFzZSBqdW1wIGluPw0KDQo+IA0KPiANCj4gICAgIEF0IGZp
cnN0IHZlcnNpb246DQo+ICAgICAgIEEgZ29vZCBhcHByb3hpbWF0aW9uIHdoaWNoIGdpdmVzIHlv
dSBhbiBleGFjdCBtaWRkbGUgcG9pbnQNCj4gICAgICAgYmV0d2VlbiBibG9ja3MgQi4yIGFuZCBv
YmplY3RzL2ZpbGVzIEIuMiwgaXMgZGlydHkgY291bnQuDQo+ICAgICBBdCBsYXRlciBwYXRjaDoN
Cj4gICAgICAgSGF2ZSBnZW5lcmljIE5GUyBjb2xsZWN0IHRoZSBhYm92ZSBPMSwgTjEsIGFuZCBO
biBmb3IgeW91IGFuZCBiYXNlDQo+ICAgICAgIHlvdXIgZGVjaXNpb24gb24gdGhhdC4NCj4gDQpX
ZWxsLCB1bmxlc3MgeW91IHB1dCBib3RoIHRoZSB0d28gcGFydHMgaW4uLi4gVGhlIGZpcnN0IHZl
cnNpb24gaXMgaWdub3JpbmcgdGhlIGZhY3QgdGhhdCBibG9ja3MgTURTIGNhbm5vdCBnaXZlIG91
dCBmaWxlIHN0cmlwcGluZyBpbmZvcm1hdGlvbiBhcyBlYXNpbHkgYXMgb2JqZWN0cyBhbmQgZmls
ZXMgZG8uIEFuZCBJIHdpbGwgc3RhbmQgYWdhaW5zdCBpdCBhbG9uZSBiZWNhdXNlIGFsbCBpdCBk
b2VzIGlzIHRvIGJlbmVmaXQgb2JqZWN0cyB3aGlsZSBodXJ0aW5nIGJsb2NrcyAoZmlsZXMgZG9u
J3QgY2FyZSBiZWNhdXNlIHRoZXkgdXNlIHdob2xlIGZpbGUgbGF5b3V0LCBhdCBsZWFzdCBmb3Ig
bm93KS4NCg0KT3RoZXJ3aXNlLCBJIHdvdWxkIHN1Z2dlc3QgaGF2aW5nIHByaXZhdGUgaGFjayBm
b3IgYmxvY2tzIGJlY2F1c2Ugd2UgaGF2ZSBhIHJlYWwgcHJvYmxlbSB0byBzb2x2ZS4NCg0KUmVn
YXJkcywNClRhbw0KPiANCj4gQW5kIHN0b3AgdGhlIHByaXZhdGUgYmxvY2tzIGhhY2tzIGFuZCB0
aGUgSU9fTUFYIGNhcHBpbmcgb24gdGhlIGxvX3NlZw0KPiBzaXplLg0KPiANCj4gQm9heg0KPiAt
LQ0KPiBUbyB1bnN1YnNjcmliZSBmcm9tIHRoaXMgbGlzdDogc2VuZCB0aGUgbGluZSAidW5zdWJz
Y3JpYmUgbGludXgtbmZzIiBpbg0KPiB0aGUgYm9keSBvZiBhIG1lc3NhZ2UgdG8gbWFqb3Jkb21v
QHZnZXIua2VybmVsLm9yZw0KPiBNb3JlIG1ham9yZG9tbyBpbmZvIGF0ICBodHRwOi8vdmdlci5r
ZXJuZWwub3JnL21ham9yZG9tby1pbmZvLmh0bWwNCg0K

2011-12-08 03:32:41

by Peng, Tao

[permalink] [raw]
Subject: RE: [PATCH-RESEND 4/4] pnfsblock: do not ask for layout in pg_init

PiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiBGcm9tOiBsaW51eC1uZnMtb3duZXJAdmdl
ci5rZXJuZWwub3JnIFttYWlsdG86bGludXgtbmZzLW93bmVyQHZnZXIua2VybmVsLm9yZ10gT24g
QmVoYWxmIE9mIEJvYXoNCj4gSGFycm9zaA0KPiBTZW50OiBXZWRuZXNkYXksIERlY2VtYmVyIDA3
LCAyMDExIDEwOjA5IFBNDQo+IFRvOiBQZW5nLCBUYW8NCj4gQ2M6IGJlcmd3b2xmQGdtYWlsLmNv
bTsgYmhhbGV2eUB0b25pYW4uY29tOyBUcm9uZC5NeWtsZWJ1c3RAbmV0YXBwLmNvbTsgbGludXgt
bmZzQHZnZXIua2VybmVsLm9yZw0KPiBTdWJqZWN0OiBSZTogW1BBVENILVJFU0VORCA0LzRdIHBu
ZnNibG9jazogZG8gbm90IGFzayBmb3IgbGF5b3V0IGluIHBnX2luaXQNCj4gDQo+IE9uIDEyLzAy
LzIwMTEgMDY6NTkgQU0sIHRhby5wZW5nQGVtYy5jb20gd3JvdGU6DQo+ID4+DQo+ID4gSSdtIHNv
cnJ5IGJ1dCBJIGRvbid0IGdldCBpdC4uLiBIb3cgZG8geW91IGNvdW50IHRoZSBmaXJzdCBjb250
aWd1b3VzDQo+ID4gY2h1bmsgaW4gLnBnX2luaXQ/IEFsbCB5b3Uga25vdyBhdCAucGdfaW5pdCBp
cyBhbiBlbXB0eSBpbyBkZXNjIGFuZA0KPiA+IGFuIG5mcyBwYWdlLiBBcmUgeW91IGdvaW5nIHRv
IHNjYW4gdGhlIHJhZGl4IHRyZWUgYW5kIGZpbmQgdGhlIG5leHQNCj4gPiBwYWdlIHRoYXQgd3Jp
dGVfY2FjaGVfcGFnZXMgaXMgZ29pbmcgdG8gZmx1c2ggb3V0PyBJIGRvbid0IHJlYWxseQ0KPiA+
IHRoaW5rIGFueW9uZSB3aWxsIGFncmVlIG9uIGl0Li4uIFBsZWFzZSBsb29rIGludG8gdGhlIHNl
Y29uZCBwYXJ0IG9mDQo+ID4geW91ciBzb2x1dGlvbiBhbmQgdGhlbiB5b3UnbGwgc2VlIHdoeSBJ
IGFtIHdvcnJpZWQgdGhhdCBpdCBtYXkgbmV2ZXINCj4gPiBnZXQgZG9uZS4+DQo+ID4gQmVzdCBS
ZWdhcmRzLA0KPiA+IFRhbw0KPiA+DQo+IA0KPiBScnJyLCBvbmUgdGhpbmcgSSBjYW5ub3QgZG8g
aXMgbWFrZSBzb21lIG9uZSB1bmRlcnN0YW5kLg0KPiANCj4gSSBndWVzcyBJJ2xsIGhhdmUgdG8g
ZG8gaXQgbXkgc2VsZi4gSXQncyBzbyBzaW1wbGUgSSdtIGdvJ25lIGNyeS4NCj4gDQo+IEkgaGF2
ZSBtb3JlIHByZXNzZWQgdGhpbmdzIG9uIG15IHBsYXRlIHRoZW4gZml4IGJyb2tlbiBibG9ja3Mg
c2VydmVycw0KPiBidXQgeW91IGFyZSBmb3JjaW5nIG15IGhhbmRzLiBUaGUgbGFzdCBjb2RlIHlv
dSBzZW50IGlzIG5vdCBhY2NlcHRhYmxlDQo+IGJlY2F1c2UgaXQgaXMgdG90YWxseSBub3QgdXNh
YmxlIGJ5IG9iamVjdHMgYW5kIGZpbGVzIChzZWdtZW50cykuDQo+IA0KPiBHaXZlIG1lIGEgd2Vl
ay4gTm90IHRoYXQgaXQgc2hvdWxkIHRha2UgbG9uZ2VyIHRoZW4gYW4gaG91ciBidXQNCj4gYmVj
YXVzZSBJIGhhdmUgYnVybmluZyBtYXR0ZXJzIHRvIGF0dGVuZCB0byBmaXJzdC4NCj4gDQpPSy4g
TGV0J3Mgd2FpdCBhbmQgc2VlIHlvdXIgcGF0Y2hlcy4gSSdkIGJlIHJlYWxseSBpbnRlcmVzdGVk
IGluIGhvdyB5b3UgY291bnQgcGFnZXMgaW4gcGdfaW5pdCB3aGVyZSB0aGUgcGFnZSBsaXN0IGRv
ZXNuJ3QgZXZlbiBleGlzdC4gQW5kIHBsZWFzZSBkb24ndCBkaXNhcHBvaW50IG1lIHdpdGggc29t
ZSB3aWxkIGd1ZXNzIHZpYSBucGFnZXMuDQoNClRhbw0KDQo+IEJvYXoNCj4gDQo+ID4+IEJvYXoN
Cj4gPj4gLS0NCj4gPj4gVG8gdW5zdWJzY3JpYmUgZnJvbSB0aGlzIGxpc3Q6IHNlbmQgdGhlIGxp
bmUgInVuc3Vic2NyaWJlIGxpbnV4LW5mcyIgaW4NCj4gPj4gdGhlIGJvZHkgb2YgYSBtZXNzYWdl
IHRvIG1ham9yZG9tb0B2Z2VyLmtlcm5lbC5vcmcNCj4gPj4gTW9yZSBtYWpvcmRvbW8gaW5mbyBh
dCAgaHR0cDovL3ZnZXIua2VybmVsLm9yZy9tYWpvcmRvbW8taW5mby5odG1sDQo+ID4NCj4gDQo+
IC0tDQo+IFRvIHVuc3Vic2NyaWJlIGZyb20gdGhpcyBsaXN0OiBzZW5kIHRoZSBsaW5lICJ1bnN1
YnNjcmliZSBsaW51eC1uZnMiIGluDQo+IHRoZSBib2R5IG9mIGEgbWVzc2FnZSB0byBtYWpvcmRv
bW9Admdlci5rZXJuZWwub3JnDQo+IE1vcmUgbWFqb3Jkb21vIGluZm8gYXQgIGh0dHA6Ly92Z2Vy
Lmtlcm5lbC5vcmcvbWFqb3Jkb21vLWluZm8uaHRtbA0KDQo=

2011-12-01 17:33:21

by Boaz Harrosh

[permalink] [raw]
Subject: Re: [PATCH-RESEND 4/4] pnfsblock: do not ask for layout in pg_init

On 11/30/2011 09:05 PM, [email protected] wrote:
>> -----Original Message-----
> Why return or forget the 1 lo_seg? What you really need to avoid seg
> list exploding is to have LRU based caching and merge them when
> necessary, instead of asking and dropping lseg again and again...
>

Ya Ya, I'm talking in abstract now giving the all picture. If in our
above case the application has closed the file and will not ever open it
again then I'm right, right? of course once you got it you can keep it
around in a cache. Though I think that with heavy segmenting keeping segs
passed ROC is extremely harder to manage. Be careful with that in current
implementation.

> Removing the IO_MAX limitation can be a second optimization. I was
> hoping to remove it if current IO_MAX thing turns out hurting
> performance. And one reason for IO_MAX is to avoid the likelihood
> that server returns short layout, because current implementation is
> limited to retry nfs_read/write_data as a whole instead of splitting
> it up. I think that if we do it this way, the IO_MAX can be removed
> later when necessary, by introducing a splitting mechanism either on
> nfs_read/write_data or on desc. Now that you ask for it, I think
> following approach is possible:
>
> 1. remove the limit on IO_MAX at .pg_test.
> 2. ask for layout at .pg_doio for the size of current IO desc
> 3. if server returns short layout, split nfs_read/write_data or desc and issue the pagelist covered by lseg.
> 4. do 2 and 3 in a loop until all pages in current desc is handled.
>

So in effect you are doing what I suggest two passes
one: what's the next hole,
second: Collect pages slicing by returned lo_seg

Don't you think it is more simple to do a 3 line preliminary
loop in "one:"?

and keep the current code that is now exactly built to do
"second:"

You are suggesting to effectively repeat current code using
the first .pg_init...pg_doio for one: and hacking for blocks
"second:"

I want 3 lines of code for one: and keep second: exactly as is.

<snip>

> Looks like you are suggesting going through the dirty page list twice
> before issuing IO, one just for getting the IO size information and
> another one for page collapsing. The whole point of moving layoutget
> to .pg_doio is to collect real IO size there because we don't know it
> at .pg_init. And it is done without any changes to generic NFS IO
> path. I'm not sure if it is appropriate to change generic IO routine
> to collect the information before .pg_init (I'd be happy too if we
> can do it there).

That's what you are suggesting as well look in your step 4.:
do 2 and 3 in a loop until all pages in current desc is handled
sounds like another loop on all pages no?

BTW: we are already doing two passes in the system we already looked
through all the pages when building the io desc at .write/read_pages

At first we can do above 3 lines loop in .pg_init. Second we can
just collect that information in generic nfs at the first loop
we are already doing

>>
>> a. We want it at .pg_init so we have a layout at .pg_test to inspect.
>>
>> Done properly will let you, in blocks, slice by extents at .pg_test
>> and .write_pages can send the complete paglist to md (bio chaining)
>>
> Unlike objects and files, blocks don't slice by extents, not at .pg_test, nor at .read/write_pagelist.
>

What? What do you do? Send a scsi scatter list command?
I don't think so. Somewhere you have to see an extent boundary of the data
and send the continue of the next extent on disk as a new block request
in a new bio with a new disk offset. No?

I'm not saying to do this right away, but you could simplify the code a lot
by slicing by extent inside .pg_test

<>
>>
>> At first version:
>> A good approximation which gives you an exact middle point
>> between blocks B.2 and objects/files B.2, is dirty count.
>> At later patch:
>> Have generic NFS collect the above O1, N1, and Nn for you and base
>> your decision on that.
>>
>
> Well, unless you put both the two parts in... The first version is
> ignoring the fact that blocks MDS cannot give out file stripping
> information as easily as objects and files do. And I will stand
> against it alone because all it does is to benefit objects while
> hurting blocks (files don't care because they use whole file layout,
> at least for now).

Lets make a computerize lets find O1..N1 and put in the Generic
code for everyone objects files-segmented and blocks. Because I
need it two. And I'd like O1..Nn but for first step I'd love
O1..N1 a lot.

Please see my point. You created a system that only blocks can
benefit from unless objects repeats the same exact but duplicated
hacks as blocks.

Please do the *very* simple thing I suggest which we can all enjoy.

>
> Otherwise, I would suggest having private hack for blocks because we have a real problem to solve.
>

It's just as real for objects, and files when they will do segments.

My suggestion is to have two helpers at pnfs.c one for blocks and
objects, one for files. Which can be called in .pg_init.
The blocks/objects does a simple loop counting the first contiguous
chunk and asks for a layout, like today. files one just sends
all-file request.

Boaz

2011-12-02 05:00:38

by Peng, Tao

[permalink] [raw]
Subject: RE: [PATCH-RESEND 4/4] pnfsblock: do not ask for layout in pg_init

PiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiBGcm9tOiBsaW51eC1uZnMtb3duZXJAdmdl
ci5rZXJuZWwub3JnIFttYWlsdG86bGludXgtbmZzLW93bmVyQHZnZXIua2VybmVsLm9yZ10gT24g
QmVoYWxmIE9mIEJvYXoNCj4gSGFycm9zaA0KPiBTZW50OiBGcmlkYXksIERlY2VtYmVyIDAyLCAy
MDExIDE6MzMgQU0NCj4gVG86IFBlbmcsIFRhbw0KPiBDYzogYmVyZ3dvbGZAZ21haWwuY29tOyBi
aGFsZXZ5QHRvbmlhbi5jb207IFRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29tOyBsaW51eC1uZnNA
dmdlci5rZXJuZWwub3JnDQo+IFN1YmplY3Q6IFJlOiBbUEFUQ0gtUkVTRU5EIDQvNF0gcG5mc2Js
b2NrOiBkbyBub3QgYXNrIGZvciBsYXlvdXQgaW4gcGdfaW5pdA0KPiANCj4gT24gMTEvMzAvMjAx
MSAwOTowNSBQTSwgdGFvLnBlbmdAZW1jLmNvbSB3cm90ZToNCj4gPj4gLS0tLS1PcmlnaW5hbCBN
ZXNzYWdlLS0tLS0NCj4gPiBXaHkgcmV0dXJuIG9yIGZvcmdldCB0aGUgMSBsb19zZWc/IFdoYXQg
eW91IHJlYWxseSBuZWVkIHRvIGF2b2lkIHNlZw0KPiA+IGxpc3QgZXhwbG9kaW5nIGlzIHRvIGhh
dmUgTFJVIGJhc2VkIGNhY2hpbmcgYW5kIG1lcmdlIHRoZW0gd2hlbg0KPiA+IG5lY2Vzc2FyeSwg
aW5zdGVhZCBvZiBhc2tpbmcgYW5kIGRyb3BwaW5nIGxzZWcgYWdhaW4gYW5kIGFnYWluLi4uDQo+
ID4NCj4gDQo+IFlhIFlhLCBJJ20gdGFsa2luZyBpbiBhYnN0cmFjdCBub3cgZ2l2aW5nIHRoZSBh
bGwgcGljdHVyZS4gSWYgaW4gb3VyDQo+IGFib3ZlIGNhc2UgdGhlIGFwcGxpY2F0aW9uIGhhcyBj
bG9zZWQgdGhlIGZpbGUgYW5kIHdpbGwgbm90IGV2ZXIgb3BlbiBpdA0KPiBhZ2FpbiB0aGVuIEkn
bSByaWdodCwgcmlnaHQ/IG9mIGNvdXJzZSBvbmNlIHlvdSBnb3QgaXQgeW91IGNhbiBrZWVwIGl0
DQo+IGFyb3VuZCBpbiBhIGNhY2hlLiBUaG91Z2ggSSB0aGluayB0aGF0IHdpdGggaGVhdnkgc2Vn
bWVudGluZyBrZWVwaW5nIHNlZ3MNCj4gcGFzc2VkIFJPQyBpcyBleHRyZW1lbHkgaGFyZGVyIHRv
IG1hbmFnZS4gQmUgY2FyZWZ1bCB3aXRoIHRoYXQgaW4gY3VycmVudA0KPiBpbXBsZW1lbnRhdGlv
bi4NCj4gDQpJbiBhYm92ZSwgeW91ciBkZXNjcmlwdGlvbiBpcyBub3Qgb25seSBhYm91dCBrZWVw
aW5nIHNlZ21lbnRzIHBhc3NpbmcgUk9DLCBpZiB0aGUgZmlsZSBpcyBzdGlsbCBvcGVuLCB5b3Ug
d2lsbCBiZSBhc2tpbmcgZm9yIHRoZSBzYW1lIHNlZ21lbnQgYWdhaW4gYW5kIGFnYWluLi4uIG5v
Pw0KDQo+ID4gUmVtb3ZpbmcgdGhlIElPX01BWCBsaW1pdGF0aW9uIGNhbiBiZSBhIHNlY29uZCBv
cHRpbWl6YXRpb24uIEkgd2FzDQo+ID4gaG9waW5nIHRvIHJlbW92ZSBpdCBpZiBjdXJyZW50IElP
X01BWCB0aGluZyB0dXJucyBvdXQgaHVydGluZw0KPiA+IHBlcmZvcm1hbmNlLiBBbmQgb25lIHJl
YXNvbiBmb3IgSU9fTUFYIGlzIHRvIGF2b2lkIHRoZSBsaWtlbGlob29kDQo+ID4gdGhhdCBzZXJ2
ZXIgcmV0dXJucyBzaG9ydCBsYXlvdXQsIGJlY2F1c2UgY3VycmVudCBpbXBsZW1lbnRhdGlvbiBp
cw0KPiA+IGxpbWl0ZWQgdG8gcmV0cnkgbmZzX3JlYWQvd3JpdGVfZGF0YSBhcyBhIHdob2xlIGlu
c3RlYWQgb2Ygc3BsaXR0aW5nDQo+ID4gaXQgdXAuIEkgdGhpbmsgdGhhdCBpZiB3ZSBkbyBpdCB0
aGlzIHdheSwgdGhlIElPX01BWCBjYW4gYmUgcmVtb3ZlZA0KPiA+IGxhdGVyIHdoZW4gbmVjZXNz
YXJ5LCBieSBpbnRyb2R1Y2luZyBhIHNwbGl0dGluZyBtZWNoYW5pc20gZWl0aGVyIG9uDQo+ID4g
bmZzX3JlYWQvd3JpdGVfZGF0YSBvciBvbiBkZXNjLiBOb3cgdGhhdCB5b3UgYXNrIGZvciBpdCwg
SSB0aGluaw0KPiA+IGZvbGxvd2luZyBhcHByb2FjaCBpcyBwb3NzaWJsZToNCj4gPg0KPiA+IDEu
IHJlbW92ZSB0aGUgbGltaXQgb24gSU9fTUFYIGF0IC5wZ190ZXN0Lg0KPiA+IDIuIGFzayBmb3Ig
bGF5b3V0IGF0IC5wZ19kb2lvIGZvciB0aGUgc2l6ZSBvZiBjdXJyZW50IElPIGRlc2MNCj4gPiAz
LiBpZiBzZXJ2ZXIgcmV0dXJucyBzaG9ydCBsYXlvdXQsIHNwbGl0IG5mc19yZWFkL3dyaXRlX2Rh
dGEgb3IgZGVzYyBhbmQgaXNzdWUgdGhlIHBhZ2VsaXN0IGNvdmVyZWQNCj4gYnkgbHNlZy4NCj4g
PiA0LiBkbyAyIGFuZCAzIGluIGEgbG9vcCB1bnRpbCBhbGwgcGFnZXMgaW4gY3VycmVudCBkZXNj
IGlzIGhhbmRsZWQuDQo+ID4NCj4gDQo+IFNvIGluIGVmZmVjdCB5b3UgYXJlIGRvaW5nIHdoYXQg
SSBzdWdnZXN0IHR3byBwYXNzZXMNCj4gb25lOiB3aGF0J3MgdGhlIG5leHQgaG9sZSwNCj4gc2Vj
b25kOiBDb2xsZWN0IHBhZ2VzIHNsaWNpbmcgYnkgcmV0dXJuZWQgbG9fc2VnDQo+IA0KPiBEb24n
dCB5b3UgdGhpbmsgaXQgaXMgbW9yZSBzaW1wbGUgdG8gZG8gYSAzIGxpbmUgcHJlbGltaW5hcnkN
Cj4gbG9vcCBpbiAib25lOiI/DQo+IA0KPiBhbmQga2VlcCB0aGUgY3VycmVudCBjb2RlIHRoYXQg
aXMgbm93IGV4YWN0bHkgYnVpbHQgdG8gZG8NCj4gInNlY29uZDoiDQo+IA0KPiBZb3UgYXJlIHN1
Z2dlc3RpbmcgdG8gZWZmZWN0aXZlbHkgcmVwZWF0IGN1cnJlbnQgY29kZSB1c2luZw0KPiB0aGUg
Zmlyc3QgLnBnX2luaXQuLi5wZ19kb2lvIGZvciBvbmU6IGFuZCBoYWNraW5nIGZvciBibG9ja3MN
Cj4gInNlY29uZDoiDQo+IA0KPiBJIHdhbnQgMyBsaW5lcyBvZiBjb2RlIGZvciBvbmU6IGFuZCBr
ZWVwIHNlY29uZDogZXhhY3RseSBhcyBpcy4NCj4gDQo+IDxzbmlwPg0KPiANCj4gPiBMb29rcyBs
aWtlIHlvdSBhcmUgc3VnZ2VzdGluZyBnb2luZyB0aHJvdWdoIHRoZSBkaXJ0eSBwYWdlIGxpc3Qg
dHdpY2UNCj4gPiBiZWZvcmUgaXNzdWluZyBJTywgb25lIGp1c3QgZm9yIGdldHRpbmcgdGhlIElP
IHNpemUgaW5mb3JtYXRpb24gYW5kDQo+ID4gYW5vdGhlciBvbmUgZm9yIHBhZ2UgY29sbGFwc2lu
Zy4gVGhlIHdob2xlIHBvaW50IG9mIG1vdmluZyBsYXlvdXRnZXQNCj4gPiB0byAucGdfZG9pbyBp
cyB0byBjb2xsZWN0IHJlYWwgSU8gc2l6ZSB0aGVyZSBiZWNhdXNlIHdlIGRvbid0IGtub3cgaXQN
Cj4gPiBhdCAucGdfaW5pdC4gQW5kIGl0IGlzIGRvbmUgd2l0aG91dCBhbnkgY2hhbmdlcyB0byBn
ZW5lcmljIE5GUyBJTw0KPiA+IHBhdGguIEknbSBub3Qgc3VyZSBpZiBpdCBpcyBhcHByb3ByaWF0
ZSB0byBjaGFuZ2UgZ2VuZXJpYyBJTyByb3V0aW5lDQo+ID4gdG8gY29sbGVjdCB0aGUgaW5mb3Jt
YXRpb24gYmVmb3JlIC5wZ19pbml0IChJJ2QgYmUgaGFwcHkgdG9vIGlmIHdlDQo+ID4gY2FuIGRv
IGl0IHRoZXJlKS4NCj4gDQo+IFRoYXQncyB3aGF0IHlvdSBhcmUgc3VnZ2VzdGluZyBhcyB3ZWxs
IGxvb2sgaW4geW91ciBzdGVwIDQuOg0KPiAJZG8gMiBhbmQgMyBpbiBhIGxvb3AgdW50aWwgYWxs
IHBhZ2VzIGluIGN1cnJlbnQgZGVzYyBpcyBoYW5kbGVkDQo+IHNvdW5kcyBsaWtlIGFub3RoZXIg
bG9vcCBvbiBhbGwgcGFnZXMgbm8/DQo+IA0KTm8uIFRoZSBsb29wIG9ubHkgaGFwcGVucyB3aGVu
IHNlcnZlciByZXR1cm5zIGxlc3MgbGF5b3V0IHRoYW4gY2xpZW50IGFza3MgZm9yLiBXaGlsZSBp
biB5b3VyIHNvbHV0aW9uLCB5b3UgbmVlZCB0byBsb29wIHR3aWNlIGluIGFsbCBjYXNlcy4NCg0K
PiBCVFc6IHdlIGFyZSBhbHJlYWR5IGRvaW5nIHR3byBwYXNzZXMgaW4gdGhlIHN5c3RlbSB3ZSBh
bHJlYWR5IGxvb2tlZA0KPiB0aHJvdWdoIGFsbCB0aGUgcGFnZXMgd2hlbiBidWlsZGluZyB0aGUg
aW8gZGVzYyBhdCAud3JpdGUvcmVhZF9wYWdlcw0KPiANCj4gQXQgZmlyc3Qgd2UgY2FuIGRvIGFi
b3ZlIDMgbGluZXMgbG9vcCBpbiAucGdfaW5pdC4gU2Vjb25kIHdlIGNhbg0KPiBqdXN0IGNvbGxl
Y3QgdGhhdCBpbmZvcm1hdGlvbiBpbiBnZW5lcmljIG5mcyBhdCB0aGUgZmlyc3QgbG9vcA0KPiB3
ZSBhcmUgYWxyZWFkeSBkb2luZw0KPiANClBsZWFzZSB1bmRlcnN0YW5kIHRoYXQgSSB0b3RhbGx5
IGFncmVlIHdpdGggeW91IGFzIGxvbmcgYXMgdGhlIHR3byBwYXJ0cyBjYW4gZ2V0IGluIHRvZ2V0
aGVyLiBJdCdzIGp1c3QgdGhhdCB5b3VyIHNlY29uZCBoYWxmIG9mIHdvcmsgd29ycmllcyBtZS4g
SWYgd2UgcmVhbGx5IGRvIGl0IGluIHR3byBwaGFzZXMsIEkgYW0gYWZyYWlkIHRoYXQgdGhlIHNl
Y29uZCBoYWxmIG5ldmVyIGdldHMgZG9uZSBiZWNhdXNlIGl0IGdvZXMgdG9vIGRlZXAgaW50byBn
ZW5lcmljIGNvZGUuIEFuZCBpZiB3ZSBvbmx5IGhhdmUgdGhlIGZpcnN0IHZlcnNpb24gb2YgeW91
ciBzb2x1dGlvbiwgaXQgaXMgZ29pbmcgdG8gaHVydCBibG9ja3MuDQoNCj4gPj4NCj4gPj4gICAg
IGEuIFdlIHdhbnQgaXQgYXQgLnBnX2luaXQgc28gd2UgaGF2ZSBhIGxheW91dCBhdCAucGdfdGVz
dCB0byBpbnNwZWN0Lg0KPiA+Pg0KPiA+PiAgICAgICAgRG9uZSBwcm9wZXJseSB3aWxsIGxldCB5
b3UsIGluIGJsb2Nrcywgc2xpY2UgYnkgZXh0ZW50cyBhdCAucGdfdGVzdA0KPiA+PiAgICAgICAg
YW5kIC53cml0ZV9wYWdlcyBjYW4gc2VuZCB0aGUgY29tcGxldGUgcGFnbGlzdCB0byBtZCAoYmlv
IGNoYWluaW5nKQ0KPiA+Pg0KPiA+IFVubGlrZSBvYmplY3RzIGFuZCBmaWxlcywgYmxvY2tzIGRv
bid0IHNsaWNlIGJ5IGV4dGVudHMsIG5vdCBhdCAucGdfdGVzdCwgbm9yDQo+IGF0IC5yZWFkL3dy
aXRlX3BhZ2VsaXN0Lg0KPiA+DQo+IA0KPiBXaGF0PyBXaGF0IGRvIHlvdSBkbz8gU2VuZCBhIHNj
c2kgc2NhdHRlciBsaXN0IGNvbW1hbmQ/DQo+IEkgZG9uJ3QgdGhpbmsgc28uIFNvbWV3aGVyZSB5
b3UgaGF2ZSB0byBzZWUgYW4gZXh0ZW50IGJvdW5kYXJ5IG9mIHRoZSBkYXRhDQo+IGFuZCBzZW5k
IHRoZSBjb250aW51ZSBvZiB0aGUgbmV4dCBleHRlbnQgb24gZGlzayBhcyBhIG5ldyBibG9jayBy
ZXF1ZXN0DQo+IGluIGEgbmV3IGJpbyB3aXRoIGEgbmV3IGRpc2sgb2Zmc2V0LiBObz8NCj4gDQo+
IEknbSBub3Qgc2F5aW5nIHRvIGRvIHRoaXMgcmlnaHQgYXdheSwgYnV0IHlvdSBjb3VsZCBzaW1w
bGlmeSB0aGUgY29kZSBhIGxvdA0KPiBieSBzbGljaW5nIGJ5IGV4dGVudCBpbnNpZGUgLnBnX3Rl
c3QNCj4gDQpPSywgSSBzZWUgd2hhdCB5b3UgbWVhbiBieSAic2xpY2UgYnkgZXh0ZW50cyIuIEFu
ZCBJIGFkbWl0IHRoYXQgYmxvY2tzIGNhbiBiZSBjaGFuZ2VkIGFzIHlvdSBzdWdnZXN0ZWQuIEJ1
dCBJIGRvIG5vdCBzZWUgeW91ciBtb3RpdmF0aW9uIGZvciBhc2tpbmcgZm9yIHRoZSBjaGFuZ2Uu
IEV2ZW4gd2UgZG8gbGF5b3V0Z2V0IGF0IC5wZ19pbml0ICh3aGljaCBpcyB3aGF0IHdlIGhhdmUg
dG9kYXkpLCBjdXJyZW50IGJsb2NrcyB3b3JrcyB3ZWxsIHVuZGVyIGl0LiBBbmQgSSBkbyBub3Qg
c2VlIGFueSByZXN0cmljdGlvbiBhdCBnZW5lcmljIGxheWVyIHNheWluZyB0aGVyZSBpcyBzb21l
dGhpbmcgd3Jvbmcgd2l0aCBjdXJyZW50IGltcGxlbWVudGF0aW9uLiBTbyB3aGF0IGlzIHRoZSBi
ZW5lZml0IGZvciBibG9ja3MgdG8gY2hhbmdlIGFzIHlvdSBzdWdnZXN0ZWQ/IElNSE8gY3VycmVu
dCBpbXBsZW1lbnRhdGlvbiBpcyBiZXR0ZXIgYmVjYXVzZSBibG9ja3Mgb25seSBuZWVkIHRvIHZp
c2l0IGV4dGVudCBjYWNoZSBpbiByZWFkL3dyaXRlX3BhZ2VsaXN0IG9uY2UgYW5kIGZvciBhbGwu
IEJ1dCBpbiB5b3VyIHN1Z2dlc3Rpb24sIHdlIGhhdmUgdG8gZG8gaXQgZm9yIGV2ZXJ5IC5wZ190
ZXN0LCBhbmQgeWV0IGFnYWluIGluIHJlYWQvd3JpdGVfcGFnZWxpc3QuDQoNCj4gPD4NCj4gPj4N
Cj4gPj4gICAgIEF0IGZpcnN0IHZlcnNpb246DQo+ID4+ICAgICAgIEEgZ29vZCBhcHByb3hpbWF0
aW9uIHdoaWNoIGdpdmVzIHlvdSBhbiBleGFjdCBtaWRkbGUgcG9pbnQNCj4gPj4gICAgICAgYmV0
d2VlbiBibG9ja3MgQi4yIGFuZCBvYmplY3RzL2ZpbGVzIEIuMiwgaXMgZGlydHkgY291bnQuDQo+
ID4+ICAgICBBdCBsYXRlciBwYXRjaDoNCj4gPj4gICAgICAgSGF2ZSBnZW5lcmljIE5GUyBjb2xs
ZWN0IHRoZSBhYm92ZSBPMSwgTjEsIGFuZCBObiBmb3IgeW91IGFuZCBiYXNlDQo+ID4+ICAgICAg
IHlvdXIgZGVjaXNpb24gb24gdGhhdC4NCj4gPj4NCj4gPg0KPiA+IFdlbGwsIHVubGVzcyB5b3Ug
cHV0IGJvdGggdGhlIHR3byBwYXJ0cyBpbi4uLiBUaGUgZmlyc3QgdmVyc2lvbiBpcw0KPiA+IGln
bm9yaW5nIHRoZSBmYWN0IHRoYXQgYmxvY2tzIE1EUyBjYW5ub3QgZ2l2ZSBvdXQgZmlsZSBzdHJp
cHBpbmcNCj4gPiBpbmZvcm1hdGlvbiBhcyBlYXNpbHkgYXMgb2JqZWN0cyBhbmQgZmlsZXMgZG8u
IEFuZCBJIHdpbGwgc3RhbmQNCj4gPiBhZ2FpbnN0IGl0IGFsb25lIGJlY2F1c2UgYWxsIGl0IGRv
ZXMgaXMgdG8gYmVuZWZpdCBvYmplY3RzIHdoaWxlDQo+ID4gaHVydGluZyBibG9ja3MgKGZpbGVz
IGRvbid0IGNhcmUgYmVjYXVzZSB0aGV5IHVzZSB3aG9sZSBmaWxlIGxheW91dCwNCj4gPiBhdCBs
ZWFzdCBmb3Igbm93KS4NCj4gDQo+IExldHMgbWFrZSBhIGNvbXB1dGVyaXplIGxldHMgZmluZCBP
MS4uTjEgYW5kIHB1dCBpbiB0aGUgR2VuZXJpYw0KPiBjb2RlIGZvciBldmVyeW9uZSBvYmplY3Rz
IGZpbGVzLXNlZ21lbnRlZCBhbmQgYmxvY2tzLiBCZWNhdXNlIEkNCj4gbmVlZCBpdCB0d28uIEFu
ZCBJJ2QgbGlrZSBPMS4uTm4gYnV0IGZvciBmaXJzdCBzdGVwIEknZCBsb3ZlDQo+IE8xLi5OMSBh
IGxvdC4NCj4gDQo+IFBsZWFzZSBzZWUgbXkgcG9pbnQuIFlvdSBjcmVhdGVkIGEgc3lzdGVtIHRo
YXQgb25seSBibG9ja3MgY2FuDQo+IGJlbmVmaXQgZnJvbSB1bmxlc3Mgb2JqZWN0cyByZXBlYXRz
IHRoZSBzYW1lIGV4YWN0IGJ1dCBkdXBsaWNhdGVkDQo+IGhhY2tzIGFzIGJsb2Nrcy4NCj4gDQo+
IFBsZWFzZSBkbyB0aGUgKnZlcnkqIHNpbXBsZSB0aGluZyBJIHN1Z2dlc3Qgd2hpY2ggd2UgY2Fu
IGFsbCBlbmpveS4NCj4gDQpJIGRvIHdhbnQgdG8gbWFrZSBpdCBnZW5lcmljIHNvIGFsbCBsYXlv
dXRzIGNhbiBiZW5lZml0LiBCdXQgd2hlbiBJIGNhbid0LCBJIGhhdmUgdG8gZG8gcHJpdmF0ZSBo
YWNrcy4uLg0KDQo+ID4NCj4gPiBPdGhlcndpc2UsIEkgd291bGQgc3VnZ2VzdCBoYXZpbmcgcHJp
dmF0ZSBoYWNrIGZvciBibG9ja3MgYmVjYXVzZSB3ZSBoYXZlIGEgcmVhbCBwcm9ibGVtIHRvIHNv
bHZlLg0KPiA+DQo+IA0KPiBJdCdzIGp1c3QgYXMgcmVhbCBmb3Igb2JqZWN0cywgYW5kIGZpbGVz
IHdoZW4gdGhleSB3aWxsIGRvIHNlZ21lbnRzLg0KPiANCj4gTXkgc3VnZ2VzdGlvbiBpcyB0byBo
YXZlIHR3byBoZWxwZXJzIGF0IHBuZnMuYyBvbmUgZm9yIGJsb2NrcyBhbmQNCj4gb2JqZWN0cywg
b25lIGZvciBmaWxlcy4gV2hpY2ggY2FuIGJlIGNhbGxlZCBpbiAucGdfaW5pdC4NCj4gVGhlIGJs
b2Nrcy9vYmplY3RzIGRvZXMgYSBzaW1wbGUgbG9vcCBjb3VudGluZyB0aGUgZmlyc3QgY29udGln
dW91cw0KPiBjaHVuayBhbmQgYXNrcyBmb3IgYSBsYXlvdXQsIGxpa2UgdG9kYXkuIGZpbGVzIG9u
ZSBqdXN0IHNlbmRzDQo+IGFsbC1maWxlIHJlcXVlc3QuDQo+IA0KSSdtIHNvcnJ5IGJ1dCBJIGRv
bid0IGdldCBpdC4uLiBIb3cgZG8geW91IGNvdW50IHRoZSBmaXJzdCBjb250aWd1b3VzIGNodW5r
IGluIC5wZ19pbml0PyBBbGwgeW91IGtub3cgYXQgLnBnX2luaXQgaXMgYW4gZW1wdHkgaW8gZGVz
YyBhbmQgYW4gbmZzIHBhZ2UuIEFyZSB5b3UgZ29pbmcgdG8gc2NhbiB0aGUgcmFkaXggdHJlZSBh
bmQgZmluZCB0aGUgbmV4dCBwYWdlIHRoYXQgd3JpdGVfY2FjaGVfcGFnZXMgaXMgZ29pbmcgdG8g
Zmx1c2ggb3V0PyBJIGRvbid0IHJlYWxseSB0aGluayBhbnlvbmUgd2lsbCBhZ3JlZSBvbiBpdC4u
LiBQbGVhc2UgbG9vayBpbnRvIHRoZSBzZWNvbmQgcGFydCBvZiB5b3VyIHNvbHV0aW9uIGFuZCB0
aGVuIHlvdSdsbCBzZWUgd2h5IEkgYW0gd29ycmllZCB0aGF0IGl0IG1heSBuZXZlciBnZXQgZG9u
ZS4NCg0KQmVzdCBSZWdhcmRzLA0KVGFvDQoNCj4gQm9heg0KPiAtLQ0KPiBUbyB1bnN1YnNjcmli
ZSBmcm9tIHRoaXMgbGlzdDogc2VuZCB0aGUgbGluZSAidW5zdWJzY3JpYmUgbGludXgtbmZzIiBp
bg0KPiB0aGUgYm9keSBvZiBhIG1lc3NhZ2UgdG8gbWFqb3Jkb21vQHZnZXIua2VybmVsLm9yZw0K
PiBNb3JlIG1ham9yZG9tbyBpbmZvIGF0ICBodHRwOi8vdmdlci5rZXJuZWwub3JnL21ham9yZG9t
by1pbmZvLmh0bWwNCg0K