LinuxLists.cc - [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit

[permalink] [raw]

Subject: Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release

On Mon, 2011-09-12 at 13:31 -0700, Benny Halevy wrote:
> On 2011-09-12 07:56, Peng Tao wrote:
> >> The layout segments are not really in use while in LAYOUTCOMMIT.
> >> We only need to get the stateid right with respect to concurrent layout recalls.
> > LAYOUTCOMMIT takes lseg reference to mark them as in use so that
> > layoutrecall cannot free them.
> >
>
> And if layoutrecall would have freed layout segments during layoutcommit,
> what is your specific concern?

That layoutcommit is supposed to return NFS4ERR_BAD_LAYOUT in that case
according to section 18.42.3 of RFC5661. I can't find anything in the
errata that changes that requirement.

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2011-09-09 18:20:31

[permalink] [raw]

Subject: RE: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release

On Thu, 2011-09-08 at 23:11 -0400, [email protected] wrote:
> HI, Trond,
>
> > -----Original Message-----
> > From: Myklebust, Trond [mailto:[email protected]]
> > Sent: Friday, September 09, 2011 1:05 AM
> > To: Peng Tao
> > Cc: Peng, Tao; [email protected]; [email protected];
> > [email protected]
> > Subject: RE: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release
> >
> > > -----Original Message-----
> > > From: Peng Tao [mailto:[email protected]]
> > > Sent: Thursday, September 08, 2011 11:00 AM
> > > To: Myklebust, Trond
> > > Cc: [email protected]; [email protected];
> > > [email protected]; [email protected]
> > > Subject: Re: [PATCH] nfs: fix inifinite loop at
> > > nfs4_layoutcommit_release
> > >
> > > On Thu, Sep 8, 2011 at 8:01 PM, Myklebust, Trond
> > > <[email protected]> wrote:
> > > >> -----Original Message-----
> > >
> > > >
> > > > Yes, but as far as I can see, even in the blocks case there can be
> > > multiple extents per layout segment. What if I write to one
> > > uninitialised extent, layoutcommit, then write to another uninitialized
> > > extent in the same layout segment and layoutcommit? In my reading of
> > > the code, there is a chance that the second layoutcommit will fail to
> > > pick up the layout segment, and so will fail to notify the MDS that the
> > > second extent now contains data.
> > >
> > > blocklayout does not decide what to layoutcommit according to the lseg
> > > list. Instead, it keeps track of each extent's state at the
> > > granularity of blocksize, and encode whatever needs layoutcommitted in
> > > the layoutcommit call. So in your above case, as long as the second
> > > layoutcommit is issued, blocklayout will encode the newly written
> > > extent in the second layoutcommit call, even if the lseg is not
> > > attached to the second layoutcommit.
> > >
> > > But that leads to another two question:
> > > 1. How do we protect against layoutrecall if lseg is not linked to
> > > layoutcommit? For this one, can we just reject layoutrecall if there
> > > is inflight layoutcommit? It will be less parallel but can guarantee
> > > current implementation correctness.
> > > 2. blocklayout ONLY: bl_committing may be overloaded by several
> > > layoutcommit calls and we don't have information in
> > > cleanup_layoutcommit() on how many entry should be removed from
> > > bl_committing. Maybe we can add a (void*) to struct
> > > nfs4_layoutcommit_data, so that LD can pass some private information
> > > between encode_layoutcommit() and cleanup_layoutcommit()?
> >
> > 3. What is the purpose of pinning the layout segment at all if neither blocks, nor
> > objects nor files cares?
> I believe it is for protecting against layoutrecall. But since we are seperating lseg and LD specific layout information management, it is actually not working as expected.
>
> > IOW: if both objects and blocks track the information that they need for
> > layoutcommit outside the layout segments, then why do we need the
> > NFS_LSEG_LAYOUTCOMMIT bit and pls_lc_list at all?
> If we remove NFS_LSEG_LAYOUTCOMMIT bit and pls_lc_list, we must find other method to protect agains freeing lseg while layoutcommit is needed or is going on. e.g., check for NFS_INO_LAYOUTCOMMIT bit and inflight layoutcommit in initiate_file_draining().

The easiest solution is to ensure we have only _one_ LAYOUTCOMMIT on the
wire at a time. Otherwise, you are looking at a many-to-many mapping
between layoutcommits and lsegs.

We should not expect to need multiple layoutcommits on the wire if pNFS
is working smoothly (i.e. no layout recalls), so optimising for that
case is wrong.
I'd also think that we want to order layoutcommit and ordinary commits
(for those NFS files servers that require both) so that we don't end up
writing a file size change to disk before the actual data is committed.

So why not just protect the layoutcommit call using the existing
nfs_commit_set_lock/nfs_commit_clear_lock?

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2011-09-08 15:10:06

[permalink] [raw]

Subject: Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release

Hi, Gusev,
On Thu, Sep 8, 2011 at 9:02 PM, Vitaliy Gusev <[email protected]> wrote:
> On 09/08/2011 02:00 PM, [email protected] wrote:
>>
>> Hi, Gusev,
>
> Hello Tao!
>
>> Yes, you are right. How about following patch?
>>
>> From 14c6da67565fb31c2d2775ccefd93251f348910d Mon Sep 17 00:00:00 2001
>> From: Peng Tao<[email protected]>
>> Date: Thu, 8 Sep 2011 00:57:02 -0400
>> Subject: [PATCH] nfsv4: fix race in layoutcommit lseg list create/free
>>
>> Since there can be more than one layoutcommit proc happen the same time,
>> lseg list create/free should be protected. Otherwise lseg list
>> may get corrupted.
>>
>> Reported-by: Vitaliy Gusev<[email protected]>
>> Signed-off-by: Peng Tao<[email protected]>
>> ---
>> fs/nfs/nfs4proc.c | 2 ++
>> 1 files changed, 2 insertions(+), 0 deletions(-)
>>
>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>> index 8c77039..da7c20c 100644
>> --- a/fs/nfs/nfs4proc.c
>> +++ b/fs/nfs/nfs4proc.c
>> @@ -5964,6 +5964,7 @@ static void nfs4_layoutcommit_release(void
>> *calldata)
>> struct pnfs_layout_segment *lseg, *tmp;
>>
>> pnfs_cleanup_layoutcommit(data);
>> + spin_lock(&data->args.inode->i_lock);
>
> I think lock over list_del_init(&lseg->pls_lc_list) is enough, because
I put the spinlock outside of the loop because the critical section is
short enough and it should be faster than grabbing/dropping the inode
lock for every entry in the list, agree?

Thanks,
Tao

>
> 1. here lseg is deleted from unique (placed in stack) data and there is no
> any race during deletion.
>
>
> 2. Only one thing must be protected - list_empty status at
> pnfs_list_write_lseg (after my patch).
>
> The problem is that list_del_init is placed before test_and_clear_bit and
> spinlock can be some kind of barrier for protection against adding lseg to
> new data->lseg_list at
> pnfs_list_write_lseg.
>
> Do reordering list_del_init with test_and_clear_bit is not good, because
> lseg can be invalid after put_lseg.
>
>
>> /* Matched by references in pnfs_set_layoutcommit */
>> list_for_each_entry_safe(lseg, tmp,&data->lseg_list, pls_lc_list) {
>> list_del_init(&lseg->pls_lc_list);
>> @@ -5971,6 +5972,7 @@ static void nfs4_layoutcommit_release(void
>> *calldata)
>> &lseg->pls_flags))
>> put_lseg(lseg);
>> }
>> + spin_unlock(&data->args.inode->i_lock);
>> put_rpccred(data->cred);
>> kfree(data);
>> }
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

--
Thanks,
-Bergwolf

2011-09-14 06:47:03

[permalink] [raw]

Subject: Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release

On 2011-09-13 10:09, [email protected] wrote:
> Hi, Trond,
>> -----Original Message-----
>> From: [email protected] [mailto:[email protected]]
>> On Behalf Of Trond Myklebust
>> Sent: Tuesday, September 13, 2011 5:11 AM
>> To: Benny Halevy
>> Cc: Peng Tao; Peng, Tao; [email protected]; [email protected];
>> [email protected]
>> Subject: Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release
>>
>> On Mon, 2011-09-12 at 13:31 -0700, Benny Halevy wrote:
>>> On 2011-09-12 07:56, Peng Tao wrote:
>>>>> The layout segments are not really in use while in LAYOUTCOMMIT.
>>>>> We only need to get the stateid right with respect to concurrent layout recalls.
>>>> LAYOUTCOMMIT takes lseg reference to mark them as in use so that
>>>> layoutrecall cannot free them.
>>>>
>>>
>>> And if layoutrecall would have freed layout segments during layoutcommit,
>>> what is your specific concern?
>>
>> That layoutcommit is supposed to return NFS4ERR_BAD_LAYOUT in that case
>> according to section 18.42.3 of RFC5661. I can't find anything in the
>> errata that changes that requirement.
> Do you mean that if client sends layoutcommit at the same time as server sends layoutrecall for overlapped range, then
> server must reject layoutcommit with NFS4ERR_BADLAYOUT when receiving layoutcommit?
>
> RFC5661 section 18.42.3 says:
> If the layout identified in the arguments does not exist, the error
> NFS4ERR_BADLAYOUT is returned. The layout being committed may also
> be rejected if it does not correspond to an existing layout with an
> iomode of LAYOUTIOMODE4_RW.
>
> IMHO, in the above case, server cannot decide if the layout exists until client returns response to layoutrecall.

Right, unless the server revokes the layout, but then there are also other consequences
like SEQ4_STATUS_RECALLABLE_STATE_REVOKED.

Benny

> But I can't find
> any requirement in RFC5661 that server must wait for layoutrecall's response before processing layoutcommit.
>
> Thanks,
> Tao

2011-09-08 10:21:49

[permalink] [raw]

Subject: RE: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release

SGksIFRyb25kLA0KDQpTb3JyeSBmb3IgdGhlIGxhdGUgcmVzcG9uc2UuDQoNCj4gLS0tLS1Pcmln
aW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogVHJvbmQgTXlrbGVidXN0IFttYWlsdG86VHJvbmQu
TXlrbGVidXN0QG5ldGFwcC5jb21dDQo+IFNlbnQ6IFdlZG5lc2RheSwgU2VwdGVtYmVyIDA3LCAy
MDExIDY6MzMgQU0NCj4gVG86IFZpdGFsaXkgR3VzZXYNCj4gQ2M6IFZpdGFsaXkgR3VzZXY7IFBl
bmcsIFRhbzsgbGludXgtbmZzQHZnZXIua2VybmVsLm9yZw0KPiBTdWJqZWN0OiBSZTogW1BBVENI
XSBuZnM6IGZpeCBpbmlmaW5pdGUgbG9vcCBhdCBuZnM0X2xheW91dGNvbW1pdF9yZWxlYXNlDQo+
IA0KPiBPbiBXZWQsIDIwMTEtMDktMDcgYXQgMDI6MTMgKzA0MDAsIFZpdGFsaXkgR3VzZXYgd3Jv
dGU6DQo+ID4gPj4gQEAgLTEzNzYsNyArMTM3Niw4IEBAIHN0YXRpYyB2b2lkIHBuZnNfbGlzdF93
cml0ZV9sc2VnKHN0cnVjdCBpbm9kZSAqaW5vZGUsDQo+IHN0cnVjdCBsaXN0X2hlYWQgKmxpc3Rw
KQ0KPiA+ID4+DQo+ID4gPj4gICAJbGlzdF9mb3JfZWFjaF9lbnRyeShsc2VnLCZORlNfSShpbm9k
ZSktPmxheW91dC0+cGxoX3NlZ3MsIHBsc19saXN0KSB7DQo+ID4gPj4gICAJCWlmIChsc2VnLT5w
bHNfcmFuZ2UuaW9tb2RlID09IElPTU9ERV9SVyYmDQo+ID4gPj4gLQkJICAgIHRlc3RfYml0KE5G
U19MU0VHX0xBWU9VVENPTU1JVCwmbHNlZy0+cGxzX2ZsYWdzKSkNCj4gPiA+PiArCQkgICAgdGVz
dF9iaXQoTkZTX0xTRUdfTEFZT1VUQ09NTUlULCZsc2VnLT5wbHNfZmxhZ3MpJiYNCj4gPiA+PiAr
CQkgICAgbGlzdF9lbXB0eSgmbHNlZy0+cGxzX2xjX2xpc3QpKQ0KPiA+ID4+ICAgCQkJbGlzdF9h
ZGQoJmxzZWctPnBsc19sY19saXN0LCBsaXN0cCk7DQo+ID4gPj4gICAJfQ0KPiA+ID4+ICAgfQ0K
PiA+ID4NCj4gPiA+IElmIHRoZSBsc2VnIGlzIGFscmVhZHkgcGFydCBvZiBvbmUgbGF5b3V0Y29t
bWl0LCBidXQgd2UncmUgc2VuZGluZyBhDQo+ID4gPiBzZWNvbmQgb25lIGZvciB0aGUgc2FtZSBy
YW5nZSAocHJlc3VtYWJseSBiZWNhdXNlIHdlIHdyb3RlIG1vcmUgZGF0YSBpbg0KPiA+ID4gdGhl
IHNhbWUgcmVnaW9uKSwgdGhlbiB0aGUgYWJvdmUgY2F1c2VzIHRoZSBsc2VnIHRvIGJlIGV4Y2x1
ZGVkLg0KPiA+DQo+ID4NCj4gPiBZZXMsIGxzZWcgaXMgZXhjbHVkZWQsIFRoaXMgcGF0Y2ggZG9l
cyBleGFjdGx5IG9ubHkgZXhjbHVzaW9uIG9mIGxzZWcuDQo+ID4gbHNlZyBpcyB1c2VkIGhlcmUg
b25seSB0byBnZXQvcHV0IHJlZmVyZW5jZSBvbiB0aGlzIGxzZWcsIHNvIHNraXBwaW5nIGlzDQo+
ID4gY29ycmVjdC4NCj4gDQo+IEFyZSB5b3Ugc3VyZT8gQXMgZmFyIGFzIEkgY2FuIHNlZSwgcG5m
c19saXN0X3dyaXRlX2xzZWcoKSBhY3R1YWxseQ0KPiBhc3NpZ25zIHRoZSBsc2VnIHRvIGEgcGFy
dGljdWxhciBsYXlvdXRjb21taXQgY2FsbC4NCj4gDQo+IE15IHF1ZXN0aW9ucyBhcmU6IGlmIEkg
d3JpdGUgdG8gYW4gYXJlYSBkZXNjcmliZWQgYnkgdGhhdCBsc2VnIF9hZnRlcl8NCj4gaXQgaGFz
IGJlZW4gYXNzaWduZWQgdG8gdGhhdCBsYXlvdXRjb21taXQgUlBDIGNhbGwgKGFuZCB0aGUgbGF0
dGVyIGhhcw0KPiBhbHJlYWR5IGJlZW4gZGlzcGF0Y2hlZCB0byB0aGUgbWV0YWRhdGEgc2VydmVy
KSwgdGhlbg0KPiAgICAgIEEuIHNob3VsZG4ndCBpdCBiZSBhc3NpZ25lZCB0byBhIHNlY29uZCBs
YXlvdXRjb21taXQgY2FsbCB0b28/DQo+ICAgICAgQi4gSWYgbm90LCB3aGF0IGFyZSB3ZSBkb2lu
ZyB0byBlbnN1cmUgbXV0dWFsIGV4Y2x1c2lvbiBiZXR3ZWVuDQo+ICAgICAgICAgbGF5b3V0Y29t
bWl0IFJQQyBjYWxscyBhbmQgcE5GUyB3cml0ZXMgdG8gdGhlIGRhdGEgc2VydmVyPw0KSSB0aGlu
ayBpdCBkZXBlbmRzIG9uIHRoZSBwdXJwb3NlIG9mIGxheW91dGNvbW1pdC4NCjEuIGZvciB1cGRh
dGluZyBhdGltZS9tdGltZS4gSW4gdGhpcyBjYXNlLCB3ZSBkb24ndCBuZWVkIG11dHVhbCBleGNs
dXNpb24NCmJldHdlZW4gbGF5b3V0Y29tbWl0IFJQQyBhbmQgcE5GUyB3cml0ZXMgdG8gZGF0YSBz
ZXJ2ZXIuDQoyLiBmb3IgdXBkYXRpbmcgTEQgc3BlY2lmaWMgc3RhdGUuIEluIHRoaXMgY2FzZSwg
TEQgc2hvdWxkIGRlY2lkZSB3aGF0IHRvIGNvbW1pdA0KKGFuZCBhY3R1YWxseSBpZ25vcmVzIHRo
ZSByYW5nZSBvZiBsc2VnKS4gVGFrZSBibG9jayBsYXlvdXQgZm9yIGV4YW1wbGUuIEl0IG1haW50
YWlucw0KYmxvY2tzaXplZCBleHRlbnQgc3RhdGUgaW5zaWRlIExEIGFuZCBkZXRlcm1pbmVzIHdo
YXQgdG8gZW5jb2RlIGluc2lkZSBsYXlvdXRjb21taXQNClJQQyB3aGVuZXZlciB0aGVyZSBpcyBh
IGdlbmVyaWMgbGF5ZXIgbGF5b3V0Y29tbWl0IGNhbGwuIFNvIHdoZW4gYW4gZXh0ZW50IGlzIGJl
aW5nDQpsYXlvdXRjb21taXR0ZWQsIGNsaWVudCBjYW4gc3RpbGwgd3JpdGUgdG8gdGhlIHNhbWUg
cmFuZ2UsIGFzIGxvbmcgYXMgdGhlIGxzZWcgaXMgaGVsZCBieSBjbGllbnQuDQoNCkkgYW0gbm90
IGZhbWlsaWFyIGVub3VnaCB3aXRoIGZpbGUvb2JqZWN0IGxheW91dCBjYXNlIHRob3VnaC4gRG9l
cyBjdXJyZW50IGltcGxlbWVudGF0aW9uDQpicmVhayBhbnkgcmVxdWlyZW1lbnRzIGZvciBmaWxl
L29iamVjdCBsYXlvdXQ/DQoNClRoYW5rcywNClRhbw0KPiANCj4gPiBIb3dldmVyLCBjaGVja2lu
ZyBvbiBsaXN0X2VtcHR5IGNhbiBvY2N1ciAob24gb3RoZXIgQ1BVKSBpbiB0aGUgbWlkZGxlOg0K
PiA+DQo+ID4gCWxpc3RfZGVsX2luaXQoJmxzZWctPnBsc19sY19saXN0KTsNCj4gPiBIZXJlID4+
Pj4+Pg0KPiA+IAlpZiAodGVzdF9hbmRfY2xlYXJfYml0KE5GU19MU0VHX0xBWU9VVENPTU1JVCwN
Cj4gPiAJCQkgICAgICAgJmxzZWctPnBsc19mbGFncykpDQo+ID4gCQlwdXRfbHNlZyhsc2VnKTsN
Cj4gPg0KPiA+DQo+ID4gU28gbGlzdF9kZWxfaW5pdCBtdXN0IGJlIGV4ZWN1dGVkIHVuZGVyIHRo
ZSBzYW1lIGxvY2sgYXMNCj4gPiBwbmZzX2xpc3Rfd3JpdGVfbHNlZywgaS5lLiBpbm9kZS0+aV9s
b2NrLg0KPiANCj4gQWdyZWVkIGlmIG15IHF1ZXN0aW9ucyBhYm92ZSBtYWtlIG5vIHNlbnNlLg0K
PiANCj4gQ2hlZXJzDQo+ICAgVHJvbmQNCj4gDQo+IC0tDQo+IFRyb25kIE15a2xlYnVzdA0KPiBM
aW51eCBORlMgY2xpZW50IG1haW50YWluZXINCj4gDQo+IE5ldEFwcA0KPiBUcm9uZC5NeWtsZWJ1
c3RAbmV0YXBwLmNvbQ0KPiB3d3cubmV0YXBwLmNvbQ0KPiANCg0KTu+/ve+/ve+/ve+/ve+/vXLv
v73vv71577+977+977+9Yu+/vVjvv73vv73Hp3bvv71e77+9Kd66ey5u77+9K++/ve+/ve+/ve+/
vXvvv73vv73vv70i77+977+9Xm7vv71y77+977+977+9eu+/vRrvv73vv71o77+977+977+977+9
Ju+/ve+/vR7vv71H77+977+977+9aO+/vQMo77+96ZqO77+93aJqIu+/ve+/vRrvv70bbe+/ve+/
ve+/ve+/ve+/vXrvv73elu+/ve+/ve+/vWbvv73vv73vv71o77+977+977+9fu+/vW3vv70=

2011-09-08 13:13:07

by Vitaliy Gusev

[permalink] [raw]

Subject: Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release

On 09/08/2011 02:00 PM, [email protected] wrote:
> Hi, Gusev,

Hello Tao!

> Yes, you are right. How about following patch?
>
> From 14c6da67565fb31c2d2775ccefd93251f348910d Mon Sep 17 00:00:00 2001
> From: Peng Tao<[email protected]>
> Date: Thu, 8 Sep 2011 00:57:02 -0400
> Subject: [PATCH] nfsv4: fix race in layoutcommit lseg list create/free
>
> Since there can be more than one layoutcommit proc happen the same time,
> lseg list create/free should be protected. Otherwise lseg list
> may get corrupted.
>
> Reported-by: Vitaliy Gusev<[email protected]>
> Signed-off-by: Peng Tao<[email protected]>
> ---
> fs/nfs/nfs4proc.c | 2 ++
> 1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 8c77039..da7c20c 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -5964,6 +5964,7 @@ static void nfs4_layoutcommit_release(void *calldata)
> struct pnfs_layout_segment *lseg, *tmp;
>
> pnfs_cleanup_layoutcommit(data);
> + spin_lock(&data->args.inode->i_lock);

I think lock over list_del_init(&lseg->pls_lc_list) is enough, because

1. here lseg is deleted from unique (placed in stack) data and there is
no any race during deletion.

2. Only one thing must be protected - list_empty status at
pnfs_list_write_lseg (after my patch).

The problem is that list_del_init is placed before
test_and_clear_bit and spinlock can be some kind of barrier for
protection against adding lseg to new data->lseg_list at
pnfs_list_write_lseg.

Do reordering list_del_init with test_and_clear_bit is not good,
because lseg can be invalid after put_lseg.

> /* Matched by references in pnfs_set_layoutcommit */
> list_for_each_entry_safe(lseg, tmp,&data->lseg_list, pls_lc_list) {
> list_del_init(&lseg->pls_lc_list);
> @@ -5971,6 +5972,7 @@ static void nfs4_layoutcommit_release(void *calldata)
> &lseg->pls_flags))
> put_lseg(lseg);
> }
> + spin_unlock(&data->args.inode->i_lock);
> put_rpccred(data->cred);
> kfree(data);
> }

2011-09-08 12:01:38

[permalink] [raw]

Subject: RE: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release

PiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiBGcm9tOiB0YW8ucGVuZ0BlbWMuY29tIFtt
YWlsdG86dGFvLnBlbmdAZW1jLmNvbV0NCj4gU2VudDogVGh1cnNkYXksIFNlcHRlbWJlciAwOCwg
MjAxMSA2OjIxIEFNDQo+IFRvOiBNeWtsZWJ1c3QsIFRyb25kOyBndXNldi52aXRhbGl5QG5leGVu
dGEuY29tDQo+IENjOiBndXNldi52aXRhbGl5QGdtYWlsLmNvbTsgbGludXgtbmZzQHZnZXIua2Vy
bmVsLm9yZw0KPiBTdWJqZWN0OiBSRTogW1BBVENIXSBuZnM6IGZpeCBpbmlmaW5pdGUgbG9vcCBh
dA0KPiBuZnM0X2xheW91dGNvbW1pdF9yZWxlYXNlDQo+IA0KPiBIaSwgVHJvbmQsDQo+IA0KPiBT
b3JyeSBmb3IgdGhlIGxhdGUgcmVzcG9uc2UuDQo+IA0KPiA+IC0tLS0tT3JpZ2luYWwgTWVzc2Fn
ZS0tLS0tDQo+ID4gRnJvbTogVHJvbmQgTXlrbGVidXN0IFttYWlsdG86VHJvbmQuTXlrbGVidXN0
QG5ldGFwcC5jb21dDQo+ID4gU2VudDogV2VkbmVzZGF5LCBTZXB0ZW1iZXIgMDcsIDIwMTEgNjoz
MyBBTQ0KPiA+IFRvOiBWaXRhbGl5IEd1c2V2DQo+ID4gQ2M6IFZpdGFsaXkgR3VzZXY7IFBlbmcs
IFRhbzsgbGludXgtbmZzQHZnZXIua2VybmVsLm9yZw0KPiA+IFN1YmplY3Q6IFJlOiBbUEFUQ0hd
IG5mczogZml4IGluaWZpbml0ZSBsb29wIGF0DQo+IG5mczRfbGF5b3V0Y29tbWl0X3JlbGVhc2UN
Cj4gPg0KPiA+IE9uIFdlZCwgMjAxMS0wOS0wNyBhdCAwMjoxMyArMDQwMCwgVml0YWxpeSBHdXNl
diB3cm90ZToNCj4gPiA+ID4+IEBAIC0xMzc2LDcgKzEzNzYsOCBAQCBzdGF0aWMgdm9pZCBwbmZz
X2xpc3Rfd3JpdGVfbHNlZyhzdHJ1Y3QNCj4gaW5vZGUgKmlub2RlLA0KPiA+IHN0cnVjdCBsaXN0
X2hlYWQgKmxpc3RwKQ0KPiA+ID4gPj4NCj4gPiA+ID4+ICAgCWxpc3RfZm9yX2VhY2hfZW50cnko
bHNlZywmTkZTX0koaW5vZGUpLT5sYXlvdXQtPnBsaF9zZWdzLA0KPiBwbHNfbGlzdCkgew0KPiA+
ID4gPj4gICAJCWlmIChsc2VnLT5wbHNfcmFuZ2UuaW9tb2RlID09IElPTU9ERV9SVyYmDQo+ID4g
PiA+PiAtCQkgICAgdGVzdF9iaXQoTkZTX0xTRUdfTEFZT1VUQ09NTUlULCZsc2VnLT5wbHNfZmxh
Z3MpKQ0KPiA+ID4gPj4gKwkJICAgIHRlc3RfYml0KE5GU19MU0VHX0xBWU9VVENPTU1JVCwmbHNl
Zy0NCj4gPnBsc19mbGFncykmJg0KPiA+ID4gPj4gKwkJICAgIGxpc3RfZW1wdHkoJmxzZWctPnBs
c19sY19saXN0KSkNCj4gPiA+ID4+ICAgCQkJbGlzdF9hZGQoJmxzZWctPnBsc19sY19saXN0LCBs
aXN0cCk7DQo+ID4gPiA+PiAgIAl9DQo+ID4gPiA+PiAgIH0NCj4gPiA+ID4NCj4gPiA+ID4gSWYg
dGhlIGxzZWcgaXMgYWxyZWFkeSBwYXJ0IG9mIG9uZSBsYXlvdXRjb21taXQsIGJ1dCB3ZSdyZQ0K
PiBzZW5kaW5nIGENCj4gPiA+ID4gc2Vjb25kIG9uZSBmb3IgdGhlIHNhbWUgcmFuZ2UgKHByZXN1
bWFibHkgYmVjYXVzZSB3ZSB3cm90ZSBtb3JlDQo+IGRhdGEgaW4NCj4gPiA+ID4gdGhlIHNhbWUg
cmVnaW9uKSwgdGhlbiB0aGUgYWJvdmUgY2F1c2VzIHRoZSBsc2VnIHRvIGJlIGV4Y2x1ZGVkLg0K
PiA+ID4NCj4gPiA+DQo+ID4gPiBZZXMsIGxzZWcgaXMgZXhjbHVkZWQsIFRoaXMgcGF0Y2ggZG9l
cyBleGFjdGx5IG9ubHkgZXhjbHVzaW9uIG9mDQo+IGxzZWcuDQo+ID4gPiBsc2VnIGlzIHVzZWQg
aGVyZSBvbmx5IHRvIGdldC9wdXQgcmVmZXJlbmNlIG9uIHRoaXMgbHNlZywgc28NCj4gc2tpcHBp
bmcgaXMNCj4gPiA+IGNvcnJlY3QuDQo+ID4NCj4gPiBBcmUgeW91IHN1cmU/IEFzIGZhciBhcyBJ
IGNhbiBzZWUsIHBuZnNfbGlzdF93cml0ZV9sc2VnKCkgYWN0dWFsbHkNCj4gPiBhc3NpZ25zIHRo
ZSBsc2VnIHRvIGEgcGFydGljdWxhciBsYXlvdXRjb21taXQgY2FsbC4NCj4gPg0KPiA+IE15IHF1
ZXN0aW9ucyBhcmU6IGlmIEkgd3JpdGUgdG8gYW4gYXJlYSBkZXNjcmliZWQgYnkgdGhhdCBsc2Vn
DQo+IF9hZnRlcl8NCj4gPiBpdCBoYXMgYmVlbiBhc3NpZ25lZCB0byB0aGF0IGxheW91dGNvbW1p
dCBSUEMgY2FsbCAoYW5kIHRoZSBsYXR0ZXINCj4gaGFzDQo+ID4gYWxyZWFkeSBiZWVuIGRpc3Bh
dGNoZWQgdG8gdGhlIG1ldGFkYXRhIHNlcnZlciksIHRoZW4NCj4gPiAgICAgIEEuIHNob3VsZG4n
dCBpdCBiZSBhc3NpZ25lZCB0byBhIHNlY29uZCBsYXlvdXRjb21taXQgY2FsbCB0b28/DQo+ID4g
ICAgICBCLiBJZiBub3QsIHdoYXQgYXJlIHdlIGRvaW5nIHRvIGVuc3VyZSBtdXR1YWwgZXhjbHVz
aW9uIGJldHdlZW4NCj4gPiAgICAgICAgIGxheW91dGNvbW1pdCBSUEMgY2FsbHMgYW5kIHBORlMg
d3JpdGVzIHRvIHRoZSBkYXRhIHNlcnZlcj8NCj4gSSB0aGluayBpdCBkZXBlbmRzIG9uIHRoZSBw
dXJwb3NlIG9mIGxheW91dGNvbW1pdC4NCj4gMS4gZm9yIHVwZGF0aW5nIGF0aW1lL210aW1lLiBJ
biB0aGlzIGNhc2UsIHdlIGRvbid0IG5lZWQgbXV0dWFsDQo+IGV4Y2x1c2lvbg0KDQpBZ3JlZWQu
DQoNCj4gYmV0d2VlbiBsYXlvdXRjb21taXQgUlBDIGFuZCBwTkZTIHdyaXRlcyB0byBkYXRhIHNl
cnZlci4NCj4gMi4gZm9yIHVwZGF0aW5nIExEIHNwZWNpZmljIHN0YXRlLiBJbiB0aGlzIGNhc2Us
IExEIHNob3VsZCBkZWNpZGUgd2hhdA0KPiB0byBjb21taXQNCj4gKGFuZCBhY3R1YWxseSBpZ25v
cmVzIHRoZSByYW5nZSBvZiBsc2VnKS4gVGFrZSBibG9jayBsYXlvdXQgZm9yDQo+IGV4YW1wbGUu
IEl0IG1haW50YWlucw0KPiBibG9ja3NpemVkIGV4dGVudCBzdGF0ZSBpbnNpZGUgTEQgYW5kIGRl
dGVybWluZXMgd2hhdCB0byBlbmNvZGUgaW5zaWRlDQo+IGxheW91dGNvbW1pdA0KPiBSUEMgd2hl
bmV2ZXIgdGhlcmUgaXMgYSBnZW5lcmljIGxheWVyIGxheW91dGNvbW1pdCBjYWxsLiBTbyB3aGVu
IGFuDQo+IGV4dGVudCBpcyBiZWluZw0KPiBsYXlvdXRjb21taXR0ZWQsIGNsaWVudCBjYW4gc3Rp
bGwgd3JpdGUgdG8gdGhlIHNhbWUgcmFuZ2UsIGFzIGxvbmcgYXMNCj4gdGhlIGxzZWcgaXMgaGVs
ZCBieSBjbGllbnQuDQoNClllcywgYnV0IGFzIGZhciBhcyBJIGNhbiBzZWUsIGV2ZW4gaW4gdGhl
IGJsb2NrcyBjYXNlIHRoZXJlIGNhbiBiZSBtdWx0aXBsZSBleHRlbnRzIHBlciBsYXlvdXQgc2Vn
bWVudC4gV2hhdCBpZiBJIHdyaXRlIHRvIG9uZSB1bmluaXRpYWxpc2VkIGV4dGVudCwgbGF5b3V0
Y29tbWl0LCB0aGVuIHdyaXRlIHRvIGFub3RoZXIgdW5pbml0aWFsaXplZCBleHRlbnQgaW4gdGhl
IHNhbWUgbGF5b3V0IHNlZ21lbnQgYW5kIGxheW91dGNvbW1pdD8gSW4gbXkgcmVhZGluZyBvZiB0
aGUgY29kZSwgdGhlcmUgaXMgYSBjaGFuY2UgdGhhdCB0aGUgc2Vjb25kIGxheW91dGNvbW1pdCB3
aWxsIGZhaWwgdG8gcGljayB1cCB0aGUgbGF5b3V0IHNlZ21lbnQsIGFuZCBzbyB3aWxsIGZhaWwg
dG8gbm90aWZ5IHRoZSBNRFMgdGhhdCB0aGUgc2Vjb25kIGV4dGVudCBub3cgY29udGFpbnMgZGF0
YS4NCg0KVHJvbmQNCgTvv717Lm7vv70r77+977+977+977+977+977+977+9KyXvv73vv71sendt
77+977+9Yu+/veunsu+/ve+/vXLvv73vv716WO+/ve+/vRnfsinvv73vv73vv713Kh9qZ++/ve+/
ve+/vR7vv73vv73vv73vv73vv73domov77+977+977+9eu+/vd6W77+977+9Mu+/vd6Z77+977+9
77+9Ju+/vSnfoe+/vWHvv73vv71/77+977+9Hu+/vUfvv73vv73vv71o77+9D++/vWo6K3bvv73v
v73vv71377+92aU=

2011-09-09 03:12:05

[permalink] [raw]

Subject: RE: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release

SEksIFRyb25kLA0KDQo+IC0tLS0tT3JpZ2luYWwgTWVzc2FnZS0tLS0tDQo+IEZyb206IE15a2xl
YnVzdCwgVHJvbmQgW21haWx0bzpUcm9uZC5NeWtsZWJ1c3RAbmV0YXBwLmNvbV0NCj4gU2VudDog
RnJpZGF5LCBTZXB0ZW1iZXIgMDksIDIwMTEgMTowNSBBTQ0KPiBUbzogUGVuZyBUYW8NCj4gQ2M6
IFBlbmcsIFRhbzsgZ3VzZXYudml0YWxpeUBuZXhlbnRhLmNvbTsgZ3VzZXYudml0YWxpeUBnbWFp
bC5jb207DQo+IGxpbnV4LW5mc0B2Z2VyLmtlcm5lbC5vcmcNCj4gU3ViamVjdDogUkU6IFtQQVRD
SF0gbmZzOiBmaXggaW5pZmluaXRlIGxvb3AgYXQgbmZzNF9sYXlvdXRjb21taXRfcmVsZWFzZQ0K
PiANCj4gPiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiA+IEZyb206IFBlbmcgVGFvIFtt
YWlsdG86YmVyZ3dvbGZAZ21haWwuY29tXQ0KPiA+IFNlbnQ6IFRodXJzZGF5LCBTZXB0ZW1iZXIg
MDgsIDIwMTEgMTE6MDAgQU0NCj4gPiBUbzogTXlrbGVidXN0LCBUcm9uZA0KPiA+IENjOiB0YW8u
cGVuZ0BlbWMuY29tOyBndXNldi52aXRhbGl5QG5leGVudGEuY29tOw0KPiA+IGd1c2V2LnZpdGFs
aXlAZ21haWwuY29tOyBsaW51eC1uZnNAdmdlci5rZXJuZWwub3JnDQo+ID4gU3ViamVjdDogUmU6
IFtQQVRDSF0gbmZzOiBmaXggaW5pZmluaXRlIGxvb3AgYXQNCj4gPiBuZnM0X2xheW91dGNvbW1p
dF9yZWxlYXNlDQo+ID4NCj4gPiBPbiBUaHUsIFNlcCA4LCAyMDExIGF0IDg6MDEgUE0sIE15a2xl
YnVzdCwgVHJvbmQNCj4gPiA8VHJvbmQuTXlrbGVidXN0QG5ldGFwcC5jb20+IHdyb3RlOg0KPiA+
ID4+IC0tLS0tT3JpZ2luYWwgTWVzc2FnZS0tLS0tDQo+ID4NCj4gPiA+DQo+ID4gPiBZZXMsIGJ1
dCBhcyBmYXIgYXMgSSBjYW4gc2VlLCBldmVuIGluIHRoZSBibG9ja3MgY2FzZSB0aGVyZSBjYW4g
YmUNCj4gPiBtdWx0aXBsZSBleHRlbnRzIHBlciBsYXlvdXQgc2VnbWVudC4gV2hhdCBpZiBJIHdy
aXRlIHRvIG9uZQ0KPiA+IHVuaW5pdGlhbGlzZWQgZXh0ZW50LCBsYXlvdXRjb21taXQsIHRoZW4g
d3JpdGUgdG8gYW5vdGhlciB1bmluaXRpYWxpemVkDQo+ID4gZXh0ZW50IGluIHRoZSBzYW1lIGxh
eW91dCBzZWdtZW50IGFuZCBsYXlvdXRjb21taXQ/IEluIG15IHJlYWRpbmcgb2YNCj4gPiB0aGUg
Y29kZSwgdGhlcmUgaXMgYSBjaGFuY2UgdGhhdCB0aGUgc2Vjb25kIGxheW91dGNvbW1pdCB3aWxs
IGZhaWwgdG8NCj4gPiBwaWNrIHVwIHRoZSBsYXlvdXQgc2VnbWVudCwgYW5kIHNvIHdpbGwgZmFp
bCB0byBub3RpZnkgdGhlIE1EUyB0aGF0IHRoZQ0KPiA+IHNlY29uZCBleHRlbnQgbm93IGNvbnRh
aW5zIGRhdGEuDQo+ID4NCj4gPiBibG9ja2xheW91dCBkb2VzIG5vdCBkZWNpZGUgd2hhdCB0byBs
YXlvdXRjb21taXQgYWNjb3JkaW5nIHRvIHRoZSBsc2VnDQo+ID4gbGlzdC4gSW5zdGVhZCwgaXQg
a2VlcHMgdHJhY2sgb2YgZWFjaCBleHRlbnQncyBzdGF0ZSBhdCB0aGUNCj4gPiBncmFudWxhcml0
eSBvZiBibG9ja3NpemUsIGFuZCBlbmNvZGUgd2hhdGV2ZXIgbmVlZHMgbGF5b3V0Y29tbWl0dGVk
IGluDQo+ID4gdGhlIGxheW91dGNvbW1pdCBjYWxsLiBTbyBpbiB5b3VyIGFib3ZlIGNhc2UsIGFz
IGxvbmcgYXMgdGhlIHNlY29uZA0KPiA+IGxheW91dGNvbW1pdCBpcyBpc3N1ZWQsIGJsb2NrbGF5
b3V0IHdpbGwgZW5jb2RlIHRoZSBuZXdseSB3cml0dGVuDQo+ID4gZXh0ZW50IGluIHRoZSBzZWNv
bmQgbGF5b3V0Y29tbWl0IGNhbGwsIGV2ZW4gaWYgdGhlIGxzZWcgaXMgbm90DQo+ID4gYXR0YWNo
ZWQgdG8gdGhlIHNlY29uZCBsYXlvdXRjb21taXQuDQo+ID4NCj4gPiBCdXQgdGhhdCBsZWFkcyB0
byBhbm90aGVyIHR3byBxdWVzdGlvbjoNCj4gPiAxLiBIb3cgZG8gd2UgcHJvdGVjdCBhZ2FpbnN0
IGxheW91dHJlY2FsbCBpZiBsc2VnIGlzIG5vdCBsaW5rZWQgdG8NCj4gPiBsYXlvdXRjb21taXQ/
IEZvciB0aGlzIG9uZSwgY2FuIHdlIGp1c3QgcmVqZWN0IGxheW91dHJlY2FsbCBpZiB0aGVyZQ0K
PiA+IGlzIGluZmxpZ2h0IGxheW91dGNvbW1pdD8gSXQgd2lsbCBiZSBsZXNzIHBhcmFsbGVsIGJ1
dCBjYW4gZ3VhcmFudGVlDQo+ID4gY3VycmVudCBpbXBsZW1lbnRhdGlvbiBjb3JyZWN0bmVzcy4N
Cj4gPiAyLiBibG9ja2xheW91dCBPTkxZOiBibF9jb21taXR0aW5nIG1heSBiZSBvdmVybG9hZGVk
IGJ5IHNldmVyYWwNCj4gPiBsYXlvdXRjb21taXQgY2FsbHMgYW5kIHdlIGRvbid0IGhhdmUgaW5m
b3JtYXRpb24gaW4NCj4gPiBjbGVhbnVwX2xheW91dGNvbW1pdCgpIG9uIGhvdyBtYW55IGVudHJ5
IHNob3VsZCBiZSByZW1vdmVkIGZyb20NCj4gPiBibF9jb21taXR0aW5nLiBNYXliZSB3ZSBjYW4g
YWRkIGEgKHZvaWQqKSB0byBzdHJ1Y3QNCj4gPiBuZnM0X2xheW91dGNvbW1pdF9kYXRhLCBzbyB0
aGF0IExEIGNhbiBwYXNzIHNvbWUgcHJpdmF0ZSBpbmZvcm1hdGlvbg0KPiA+IGJldHdlZW4gZW5j
b2RlX2xheW91dGNvbW1pdCgpIGFuZCBjbGVhbnVwX2xheW91dGNvbW1pdCgpPw0KPiANCj4gMy4g
V2hhdCBpcyB0aGUgcHVycG9zZSBvZiBwaW5uaW5nIHRoZSBsYXlvdXQgc2VnbWVudCBhdCBhbGwg
aWYgbmVpdGhlciBibG9ja3MsIG5vcg0KPiBvYmplY3RzIG5vciBmaWxlcyBjYXJlcz8NCkkgYmVs
aWV2ZSBpdCBpcyBmb3IgcHJvdGVjdGluZyBhZ2FpbnN0IGxheW91dHJlY2FsbC4gQnV0IHNpbmNl
IHdlIGFyZSBzZXBlcmF0aW5nIGxzZWcgYW5kIExEIHNwZWNpZmljIGxheW91dCBpbmZvcm1hdGlv
biBtYW5hZ2VtZW50LCBpdCBpcyBhY3R1YWxseSBub3Qgd29ya2luZyBhcyBleHBlY3RlZC4NCg0K
PiBJT1c6IGlmIGJvdGggb2JqZWN0cyBhbmQgYmxvY2tzIHRyYWNrIHRoZSBpbmZvcm1hdGlvbiB0
aGF0IHRoZXkgbmVlZCBmb3INCj4gbGF5b3V0Y29tbWl0IG91dHNpZGUgdGhlIGxheW91dCBzZWdt
ZW50cywgdGhlbiB3aHkgZG8gd2UgbmVlZCB0aGUNCj4gTkZTX0xTRUdfTEFZT1VUQ09NTUlUIGJp
dCBhbmQgcGxzX2xjX2xpc3QgYXQgYWxsPw0KSWYgd2UgcmVtb3ZlIE5GU19MU0VHX0xBWU9VVENP
TU1JVCBiaXQgYW5kIHBsc19sY19saXN0LCB3ZSBtdXN0IGZpbmQgb3RoZXIgbWV0aG9kIHRvIHBy
b3RlY3QgYWdhaW5zIGZyZWVpbmcgbHNlZyB3aGlsZSBsYXlvdXRjb21taXQgaXMgbmVlZGVkIG9y
IGlzIGdvaW5nIG9uLiBlLmcuLCBjaGVjayBmb3IgTkZTX0lOT19MQVlPVVRDT01NSVQgYml0IGFu
ZCBpbmZsaWdodCBsYXlvdXRjb21taXQgaW4gaW5pdGlhdGVfZmlsZV9kcmFpbmluZygpLg0KDQpU
aGFua3MsDQpUYW8NCg0KDQoT77+977+97Lm7HO+/vSbvv71+77+9Ju+/vRjvv73vv70rLe+/ve+/
vd22F++/ve+/vXfvv73vv73Lm++/ve+/ve+/vW3vv71i77+977+9Z37Ip++/vRfvv73vv73cqH3v
v73vv73vv73GoHrvv70majordu+/ve+/ve+/vQfvv73vv73vv73vv716Wivvv73vv70rembvv73v
v73vv71o77+977+977+9fu+/ve+/ve+/ve+/vWnvv73vv73vv71677+9Hu+/vXfvv73vv73vv70/
77+977+977+977+9Ju+/vSnfohtm

2011-09-12 20:31:59

[permalink] [raw]

Subject: Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release

On 2011-09-12 07:56, Peng Tao wrote:
> On Sat, Sep 10, 2011 at 3:14 PM, Benny Halevy <[email protected]> wrote:
>> On 2011-09-09 11:20, Trond Myklebust wrote:
>>> On Thu, 2011-09-08 at 23:11 -0400, [email protected] wrote:
>>>> HI, Trond,
>>>>
>>>>> -----Original Message-----
>>>>> From: Myklebust, Trond [mailto:[email protected]]
>>>>> Sent: Friday, September 09, 2011 1:05 AM
>>>>> To: Peng Tao
>>>>> Cc: Peng, Tao; [email protected]; [email protected];
>>>>> [email protected]
>>>>> Subject: RE: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Peng Tao [mailto:[email protected]]
>>>>>> Sent: Thursday, September 08, 2011 11:00 AM
>>>>>> To: Myklebust, Trond
>>>>>> Cc: [email protected]; [email protected];
>>>>>> [email protected]; [email protected]
>>>>>> Subject: Re: [PATCH] nfs: fix inifinite loop at
>>>>>> nfs4_layoutcommit_release
>>>>>>
>>>>>> On Thu, Sep 8, 2011 at 8:01 PM, Myklebust, Trond
>>>>>> <[email protected]> wrote:
>>>>>>>> -----Original Message-----
>>>>>>
>>>>>>>
>>>>>>> Yes, but as far as I can see, even in the blocks case there can be
>>>>>> multiple extents per layout segment. What if I write to one
>>>>>> uninitialised extent, layoutcommit, then write to another uninitialized
>>>>>> extent in the same layout segment and layoutcommit? In my reading of
>>>>>> the code, there is a chance that the second layoutcommit will fail to
>>>>>> pick up the layout segment, and so will fail to notify the MDS that the
>>>>>> second extent now contains data.
>>>>>>
>>>>>> blocklayout does not decide what to layoutcommit according to the lseg
>>>>>> list. Instead, it keeps track of each extent's state at the
>>>>>> granularity of blocksize, and encode whatever needs layoutcommitted in
>>>>>> the layoutcommit call. So in your above case, as long as the second
>>>>>> layoutcommit is issued, blocklayout will encode the newly written
>>>>>> extent in the second layoutcommit call, even if the lseg is not
>>>>>> attached to the second layoutcommit.
>>>>>>
>>>>>> But that leads to another two question:
>>>>>> 1. How do we protect against layoutrecall if lseg is not linked to
>>>>>> layoutcommit? For this one, can we just reject layoutrecall if there
>>>>>> is inflight layoutcommit? It will be less parallel but can guarantee
>>>>>> current implementation correctness.
>>>>>> 2. blocklayout ONLY: bl_committing may be overloaded by several
>>>>>> layoutcommit calls and we don't have information in
>>>>>> cleanup_layoutcommit() on how many entry should be removed from
>>>>>> bl_committing. Maybe we can add a (void*) to struct
>>>>>> nfs4_layoutcommit_data, so that LD can pass some private information
>>>>>> between encode_layoutcommit() and cleanup_layoutcommit()?
>>>>>
>>>>> 3. What is the purpose of pinning the layout segment at all if neither blocks, nor
>>>>> objects nor files cares?
>>>> I believe it is for protecting against layoutrecall. But since we are seperating lseg and LD specific layout information management, it is actually not working as expected.
>>>>
>>
>> The layout segments are not really in use while in LAYOUTCOMMIT.
>> We only need to get the stateid right with respect to concurrent layout recalls.
> LAYOUTCOMMIT takes lseg reference to mark them as in use so that
> layoutrecall cannot free them.
>

And if layoutrecall would have freed layout segments during layoutcommit,
what is your specific concern?

Benny

2011-09-06 19:30:25

[permalink] [raw]

Subject: Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release

On Sun, 2011-08-28 at 10:22 +0400, Vitaliy Gusev wrote:
> pnfs_layout_segment can be already under handling LAYOUTCOMMIT,
> so adding list twice causes hang:
>
> BUG: soft lockup - CPU#0 stuck for 22s! [kworker/0:0:4]
> Call Trace:
>
> nfs4_layoutcommit_release+0x5a/0x9c [nfs]
> rpc_release_calldata+0x17/0x19 [sunrpc]
> rpc_free_task+0x5e/0x66 [sunrpc]
> __rpc_execute+0x29e/0x2ad [sunrpc]
> rpc_async_schedule+0x15/0x17 [sunrpc]
> process_one_work+0x213/0x3ba
> process_one_work+0x142/0x3ba
> __rpc_execute+0x2ad/0x2ad [sunrpc]
> worker_thread+0xfd/0x181
>
> Signed-off-by: Vitaliy Gusev <[email protected]>
> ---
> fs/nfs/pnfs.c | 3 ++-
> 1 files changed, 2 insertions(+), 1 deletions(-)
>
> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> index e550e88..1465f44 100644
> --- a/fs/nfs/pnfs.c
> +++ b/fs/nfs/pnfs.c
> @@ -1376,7 +1376,8 @@ static void pnfs_list_write_lseg(struct inode *inode, struct list_head *listp)
>
> list_for_each_entry(lseg, &NFS_I(inode)->layout->plh_segs, pls_list) {
> if (lseg->pls_range.iomode == IOMODE_RW &&
> - test_bit(NFS_LSEG_LAYOUTCOMMIT, &lseg->pls_flags))
> + test_bit(NFS_LSEG_LAYOUTCOMMIT, &lseg->pls_flags) &&
> + list_empty(&lseg->pls_lc_list))
> list_add(&lseg->pls_lc_list, listp);
> }
> }

If the lseg is already part of one layoutcommit, but we're sending a
second one for the same range (presumably because we wrote more data in
the same region), then the above causes the lseg to be excluded.

I agree that the current code causes list corruption, but before
applying something like the above patch, I'd like to understand why it
is correct.

Trond

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2011-09-12 14:49:11

[permalink] [raw]

Subject: Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release

Hi, Trond,

On Sat, Sep 10, 2011 at 2:20 AM, Trond Myklebust
<[email protected]> wrote:
> On Thu, 2011-09-08 at 23:11 -0400, [email protected] wrote:
>> HI, Trond,
>>
>> > -----Original Message-----
>> > From: Myklebust, Trond [mailto:[email protected]]
>> > Sent: Friday, September 09, 2011 1:05 AM
>> > To: Peng Tao
>> > Cc: Peng, Tao; [email protected]; [email protected];
>> > [email protected]
>> > Subject: RE: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release
>> >
>> > > -----Original Message-----
>> > > From: Peng Tao [mailto:[email protected]]
>> > > Sent: Thursday, September 08, 2011 11:00 AM
>> > > To: Myklebust, Trond
>> > > Cc: [email protected]; [email protected];
>> > > [email protected]; [email protected]
>> > > Subject: Re: [PATCH] nfs: fix inifinite loop at
>> > > nfs4_layoutcommit_release
>> > >
>> > > On Thu, Sep 8, 2011 at 8:01 PM, Myklebust, Trond
>> > > <[email protected]> wrote:
>> > > >> -----Original Message-----
>> > >
>> > > >
>> > > > Yes, but as far as I can see, even in the blocks case there can be
>> > > multiple extents per layout segment. What if I write to one
>> > > uninitialised extent, layoutcommit, then write to another uninitialized
>> > > extent in the same layout segment and layoutcommit? In my reading of
>> > > the code, there is a chance that the second layoutcommit will fail to
>> > > pick up the layout segment, and so will fail to notify the MDS that the
>> > > second extent now contains data.
>> > >
>> > > blocklayout does not decide what to layoutcommit according to the lseg
>> > > list. Instead, it keeps track of each extent's state at the
>> > > granularity of blocksize, and encode whatever needs layoutcommitted in
>> > > the layoutcommit call. So in your above case, as long as the second
>> > > layoutcommit is issued, blocklayout will encode the newly written
>> > > extent in the second layoutcommit call, even if the lseg is not
>> > > attached to the second layoutcommit.
>> > >
>> > > But that leads to another two question:
>> > > 1. How do we protect against layoutrecall if lseg is not linked to
>> > > layoutcommit? For this one, can we just reject layoutrecall if there
>> > > is inflight layoutcommit? It will be less parallel but can guarantee
>> > > current implementation correctness.
>> > > 2. blocklayout ONLY: bl_committing may be overloaded by several
>> > > layoutcommit calls and we don't have information in
>> > > cleanup_layoutcommit() on how many entry should be removed from
>> > > bl_committing. Maybe we can add a (void*) to struct
>> > > nfs4_layoutcommit_data, so that LD can pass some private information
>> > > between encode_layoutcommit() and cleanup_layoutcommit()?
>> >
>> > 3. What is the purpose of pinning the layout segment at all if neither blocks, nor
>> > objects nor files cares?
>> I believe it is for protecting against layoutrecall. But since we are seperating lseg and LD specific layout information management, it is actually not working as expected.
>>
>> > IOW: if both objects and blocks track the information that they need for
>> > layoutcommit outside the layout segments, then why do we need the
>> > NFS_LSEG_LAYOUTCOMMIT bit and pls_lc_list at all?
>> If we remove NFS_LSEG_LAYOUTCOMMIT bit and pls_lc_list, we must find other method to protect agains freeing lseg while layoutcommit is needed or is going on. e.g., check for NFS_INO_LAYOUTCOMMIT bit and inflight layoutcommit in initiate_file_draining().
>
> The easiest solution is to ensure we have only _one_ LAYOUTCOMMIT on the
> wire at a time. Otherwise, you are looking at a many-to-many mapping
> between layoutcommits and lsegs.
>
> We should not expect to need multiple layoutcommits on the wire if pNFS
> is working smoothly (i.e. no layout recalls), so optimising for that
> case is wrong.
> I'd also think that we want to order layoutcommit and ordinary commits
> (for those NFS files servers that require both) so that we don't end up
> writing a file size change to disk before the actual data is committed.
>
> So why not just protect the layoutcommit call using the existing
> nfs_commit_set_lock/nfs_commit_clear_lock?
Completely agree. We should serialize layoutcommit instead of trying
to handle multiple of them concurrently.

--
Thanks,
-Bergwolf

2011-09-08 22:56:20

[permalink] [raw]

Subject: RE: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release

PiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiBGcm9tOiBQZW5nIFRhbyBbbWFpbHRvOmJl
cmd3b2xmQGdtYWlsLmNvbV0NCj4gU2VudDogVGh1cnNkYXksIFNlcHRlbWJlciAwOCwgMjAxMSAx
MTowMCBBTQ0KPiBUbzogTXlrbGVidXN0LCBUcm9uZA0KPiBDYzogdGFvLnBlbmdAZW1jLmNvbTsg
Z3VzZXYudml0YWxpeUBuZXhlbnRhLmNvbTsNCj4gZ3VzZXYudml0YWxpeUBnbWFpbC5jb207IGxp
bnV4LW5mc0B2Z2VyLmtlcm5lbC5vcmcNCj4gU3ViamVjdDogUmU6IFtQQVRDSF0gbmZzOiBmaXgg
aW5pZmluaXRlIGxvb3AgYXQNCj4gbmZzNF9sYXlvdXRjb21taXRfcmVsZWFzZQ0KPiANCj4gT24g
VGh1LCBTZXAgOCwgMjAxMSBhdCA4OjAxIFBNLCBNeWtsZWJ1c3QsIFRyb25kDQo+IDxUcm9uZC5N
eWtsZWJ1c3RAbmV0YXBwLmNvbT4gd3JvdGU6DQo+ID4+IC0tLS0tT3JpZ2luYWwgTWVzc2FnZS0t
LS0tDQo+DQo+ID4NCj4gPiBZZXMsIGJ1dCBhcyBmYXIgYXMgSSBjYW4gc2VlLCBldmVuIGluIHRo
ZSBibG9ja3MgY2FzZSB0aGVyZSBjYW4gYmUNCj4gbXVsdGlwbGUgZXh0ZW50cyBwZXIgbGF5b3V0
IHNlZ21lbnQuIFdoYXQgaWYgSSB3cml0ZSB0byBvbmUNCj4gdW5pbml0aWFsaXNlZCBleHRlbnQs
IGxheW91dGNvbW1pdCwgdGhlbiB3cml0ZSB0byBhbm90aGVyIHVuaW5pdGlhbGl6ZWQNCj4gZXh0
ZW50IGluIHRoZSBzYW1lIGxheW91dCBzZWdtZW50IGFuZCBsYXlvdXRjb21taXQ/IEluIG15IHJl
YWRpbmcgb2YNCj4gdGhlIGNvZGUsIHRoZXJlIGlzIGEgY2hhbmNlIHRoYXQgdGhlIHNlY29uZCBs
YXlvdXRjb21taXQgd2lsbCBmYWlsIHRvDQo+IHBpY2sgdXAgdGhlIGxheW91dCBzZWdtZW50LCBh
bmQgc28gd2lsbCBmYWlsIHRvIG5vdGlmeSB0aGUgTURTIHRoYXQgdGhlDQo+IHNlY29uZCBleHRl
bnQgbm93IGNvbnRhaW5zIGRhdGEuDQo+IA0KPiBibG9ja2xheW91dCBkb2VzIG5vdCBkZWNpZGUg
d2hhdCB0byBsYXlvdXRjb21taXQgYWNjb3JkaW5nIHRvIHRoZSBsc2VnDQo+IGxpc3QuIEluc3Rl
YWQsIGl0IGtlZXBzIHRyYWNrIG9mIGVhY2ggZXh0ZW50J3Mgc3RhdGUgYXQgdGhlDQo+IGdyYW51
bGFyaXR5IG9mIGJsb2Nrc2l6ZSwgYW5kIGVuY29kZSB3aGF0ZXZlciBuZWVkcyBsYXlvdXRjb21t
aXR0ZWQgaW4NCj4gdGhlIGxheW91dGNvbW1pdCBjYWxsLiBTbyBpbiB5b3VyIGFib3ZlIGNhc2Us
IGFzIGxvbmcgYXMgdGhlIHNlY29uZA0KPiBsYXlvdXRjb21taXQgaXMgaXNzdWVkLCBibG9ja2xh
eW91dCB3aWxsIGVuY29kZSB0aGUgbmV3bHkgd3JpdHRlbg0KPiBleHRlbnQgaW4gdGhlIHNlY29u
ZCBsYXlvdXRjb21taXQgY2FsbCwgZXZlbiBpZiB0aGUgbHNlZyBpcyBub3QNCj4gYXR0YWNoZWQg
dG8gdGhlIHNlY29uZCBsYXlvdXRjb21taXQuDQo+IA0KPiBCdXQgdGhhdCBsZWFkcyB0byBhbm90
aGVyIHR3byBxdWVzdGlvbjoNCj4gMS4gSG93IGRvIHdlIHByb3RlY3QgYWdhaW5zdCBsYXlvdXRy
ZWNhbGwgaWYgbHNlZyBpcyBub3QgbGlua2VkIHRvDQo+IGxheW91dGNvbW1pdD8gRm9yIHRoaXMg
b25lLCBjYW4gd2UganVzdCByZWplY3QgbGF5b3V0cmVjYWxsIGlmIHRoZXJlDQo+IGlzIGluZmxp
Z2h0IGxheW91dGNvbW1pdD8gSXQgd2lsbCBiZSBsZXNzIHBhcmFsbGVsIGJ1dCBjYW4gZ3VhcmFu
dGVlDQo+IGN1cnJlbnQgaW1wbGVtZW50YXRpb24gY29ycmVjdG5lc3MuDQo+IDIuIGJsb2NrbGF5
b3V0IE9OTFk6IGJsX2NvbW1pdHRpbmcgbWF5IGJlIG92ZXJsb2FkZWQgYnkgc2V2ZXJhbA0KPiBs
YXlvdXRjb21taXQgY2FsbHMgYW5kIHdlIGRvbid0IGhhdmUgaW5mb3JtYXRpb24gaW4NCj4gY2xl
YW51cF9sYXlvdXRjb21taXQoKSBvbiBob3cgbWFueSBlbnRyeSBzaG91bGQgYmUgcmVtb3ZlZCBm
cm9tDQo+IGJsX2NvbW1pdHRpbmcuIE1heWJlIHdlIGNhbiBhZGQgYSAodm9pZCopIHRvIHN0cnVj
dA0KPiBuZnM0X2xheW91dGNvbW1pdF9kYXRhLCBzbyB0aGF0IExEIGNhbiBwYXNzIHNvbWUgcHJp
dmF0ZSBpbmZvcm1hdGlvbg0KPiBiZXR3ZWVuIGVuY29kZV9sYXlvdXRjb21taXQoKSBhbmQgY2xl
YW51cF9sYXlvdXRjb21taXQoKT8NCg0KMy4gV2hhdCBpcyB0aGUgcHVycG9zZSBvZiBwaW5uaW5n
IHRoZSBsYXlvdXQgc2VnbWVudCBhdCBhbGwgaWYgbmVpdGhlciBibG9ja3MsIG5vciBvYmplY3Rz
IG5vciBmaWxlcyBjYXJlcz8NCklPVzogaWYgYm90aCBvYmplY3RzIGFuZCBibG9ja3MgdHJhY2sg
dGhlIGluZm9ybWF0aW9uIHRoYXQgdGhleSBuZWVkIGZvciBsYXlvdXRjb21taXQgb3V0c2lkZSB0
aGUgbGF5b3V0IHNlZ21lbnRzLCB0aGVuIHdoeSBkbyB3ZSBuZWVkIHRoZSBORlNfTFNFR19MQVlP
VVRDT01NSVQgYml0IGFuZCBwbHNfbGNfbGlzdCBhdCBhbGw/DQoNCkNoZWVycw0KICBUcm9uZA0K
DQoT77+977+97Lm7HO+/vSbvv71+77+9Ju+/vRjvv73vv70rLe+/ve+/vd22F++/ve+/vXfvv73v
v73Lm++/ve+/ve+/vW3vv71i77+977+9Z37Ip++/vRfvv73vv73cqH3vv73vv73vv73GoHrvv70m
ajordu+/ve+/ve+/vQfvv73vv73vv73vv716Wivvv73vv70rembvv73vv73vv71o77+977+977+9
fu+/ve+/ve+/ve+/vWnvv73vv73vv71677+9Hu+/vXfvv73vv73vv70/77+977+977+977+9Ju+/
vSnfohtm

2011-09-14 07:53:52

[permalink] [raw]

Subject: RE: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release

> -----Original Message-----
> From: Benny Halevy [mailto:[email protected]]
> Sent: Wednesday, September 14, 2011 2:43 PM
> To: Peng, Tao
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]
> Subject: Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release
>
> On 2011-09-13 10:32, [email protected] wrote:
> >> -----Original Message-----
> >> From: Benny Halevy [mailto:[email protected]]
> >> Sent: Tuesday, September 13, 2011 3:51 PM
> >> To: Trond Myklebust
> >> Cc: Peng Tao; Peng, Tao; [email protected]; [email protected];
> >> [email protected]
> >> Subject: Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release
> >>
> >> On 2011-09-12 14:10, Trond Myklebust wrote:
> >>> On Mon, 2011-09-12 at 13:31 -0700, Benny Halevy wrote:
> >>>> On 2011-09-12 07:56, Peng Tao wrote:
> >>>>>> The layout segments are not really in use while in LAYOUTCOMMIT.
> >>>>>> We only need to get the stateid right with respect to concurrent layout
> recalls.
> >>>>> LAYOUTCOMMIT takes lseg reference to mark them as in use so that
> >>>>> layoutrecall cannot free them.
> >>>>>
> >>>>
> >>>> And if layoutrecall would have freed layout segments during layoutcommit,
> >>>> what is your specific concern?
> >>>
> >>> That layoutcommit is supposed to return NFS4ERR_BAD_LAYOUT in that case
> >>> according to section 18.42.3 of RFC5661. I can't find anything in the
> >>> errata that changes that requirement.
> >>>
> >>
> >> Right. That tells me there no need to strictly serialize LAYOUTCOMMITs
> >> with CB_LAYOUTRECALL, as long as the layout stateid sent with LAYOUTCOMMIT
> >> atomically represents the state when the operation was prepared.
> >>
> >> That said, since we do want the LAYOUTCOMMIT to succeed, it would be
> beneficial
> >> for the client to reply to a CB_LAYOUTRECALL received while a conflicting
> >> LAYOUTCOMMIT is in progress with NFS4ERR_DELAY.
> > I agree. How about adding a new flag to nfsi->flags for this? We can use the same
> flag on to ensure serialization of multiple layoutcommit.
> nfs_commit_set_lock/nfs_commit_clear_lock may not fit for this.
> >
>
> Sounds good in principle.
> Can you take a stab at a patch that does this?
OK, I will spend some time on it this week.

Thanks,
Tao
>
> Benny
>
> >>
> >> The server, on its side, should prevent a distributed deadlock by avoiding
> >> blocking of a LAYOUTCOMMIT on an outstanding CB_LAYOUTRECALL for the
> same
> >> client that sent the LAYOUTCOMMIT. I'm not sure what error would be best to
> >> return. Maybe NFS4ERR_RECALL_CONFLICT if it would be allowed (it isn't listed
> >> for LAYOUTCOMMIT at the moment). Just returning NFS4ER_DELAY might lead
> to
> >> a live lock situation where neither the LAYOUTCOMMIT not the
> CB_LAYOUTRECALL
> >> complete.
> >>
> >> Benny
> >

2011-09-10 07:14:21

[permalink] [raw]

Subject: Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release

On 2011-09-09 11:20, Trond Myklebust wrote:
> On Thu, 2011-09-08 at 23:11 -0400, [email protected] wrote:
>> HI, Trond,
>>
>>> -----Original Message-----
>>> From: Myklebust, Trond [mailto:[email protected]]
>>> Sent: Friday, September 09, 2011 1:05 AM
>>> To: Peng Tao
>>> Cc: Peng, Tao; [email protected]; [email protected];
>>> [email protected]
>>> Subject: RE: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release
>>>
>>>> -----Original Message-----
>>>> From: Peng Tao [mailto:[email protected]]
>>>> Sent: Thursday, September 08, 2011 11:00 AM
>>>> To: Myklebust, Trond
>>>> Cc: [email protected]; [email protected];
>>>> [email protected]; [email protected]
>>>> Subject: Re: [PATCH] nfs: fix inifinite loop at
>>>> nfs4_layoutcommit_release
>>>>
>>>> On Thu, Sep 8, 2011 at 8:01 PM, Myklebust, Trond
>>>> <[email protected]> wrote:
>>>>>> -----Original Message-----
>>>>
>>>>>
>>>>> Yes, but as far as I can see, even in the blocks case there can be
>>>> multiple extents per layout segment. What if I write to one
>>>> uninitialised extent, layoutcommit, then write to another uninitialized
>>>> extent in the same layout segment and layoutcommit? In my reading of
>>>> the code, there is a chance that the second layoutcommit will fail to
>>>> pick up the layout segment, and so will fail to notify the MDS that the
>>>> second extent now contains data.
>>>>
>>>> blocklayout does not decide what to layoutcommit according to the lseg
>>>> list. Instead, it keeps track of each extent's state at the
>>>> granularity of blocksize, and encode whatever needs layoutcommitted in
>>>> the layoutcommit call. So in your above case, as long as the second
>>>> layoutcommit is issued, blocklayout will encode the newly written
>>>> extent in the second layoutcommit call, even if the lseg is not
>>>> attached to the second layoutcommit.
>>>>
>>>> But that leads to another two question:
>>>> 1. How do we protect against layoutrecall if lseg is not linked to
>>>> layoutcommit? For this one, can we just reject layoutrecall if there
>>>> is inflight layoutcommit? It will be less parallel but can guarantee
>>>> current implementation correctness.
>>>> 2. blocklayout ONLY: bl_committing may be overloaded by several
>>>> layoutcommit calls and we don't have information in
>>>> cleanup_layoutcommit() on how many entry should be removed from
>>>> bl_committing. Maybe we can add a (void*) to struct
>>>> nfs4_layoutcommit_data, so that LD can pass some private information
>>>> between encode_layoutcommit() and cleanup_layoutcommit()?
>>>
>>> 3. What is the purpose of pinning the layout segment at all if neither blocks, nor
>>> objects nor files cares?
>> I believe it is for protecting against layoutrecall. But since we are seperating lseg and LD specific layout information management, it is actually not working as expected.
>>

The layout segments are not really in use while in LAYOUTCOMMIT.
We only need to get the stateid right with respect to concurrent layout recalls.

>>> IOW: if both objects and blocks track the information that they need for
>>> layoutcommit outside the layout segments, then why do we need the
>>> NFS_LSEG_LAYOUTCOMMIT bit and pls_lc_list at all?
>> If we remove NFS_LSEG_LAYOUTCOMMIT bit and pls_lc_list, we must find other method to protect agains freeing lseg while layoutcommit is needed or is going on. e.g., check for NFS_INO_LAYOUTCOMMIT bit and inflight layoutcommit in initiate_file_draining().
>
> The easiest solution is to ensure we have only _one_ LAYOUTCOMMIT on the
> wire at a time. Otherwise, you are looking at a many-to-many mapping
> between layoutcommits and lsegs.
>
> We should not expect to need multiple layoutcommits on the wire if pNFS
> is working smoothly (i.e. no layout recalls), so optimising for that
> case is wrong.
> I'd also think that we want to order layoutcommit and ordinary commits
> (for those NFS files servers that require both) so that we don't end up
> writing a file size change to disk before the actual data is committed.
>
> So why not just protect the layoutcommit call using the existing
> nfs_commit_set_lock/nfs_commit_clear_lock?
>

Sounds good to me,

Benny

2011-09-12 15:03:58

[permalink] [raw]

Subject: Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release

On Sat, Sep 10, 2011 at 3:14 PM, Benny Halevy <[email protected]> wrote:
> On 2011-09-09 11:20, Trond Myklebust wrote:
>> On Thu, 2011-09-08 at 23:11 -0400, [email protected] wrote:
>>> HI, Trond,
>>>
>>>> -----Original Message-----
>>>> From: Myklebust, Trond [mailto:[email protected]]
>>>> Sent: Friday, September 09, 2011 1:05 AM
>>>> To: Peng Tao
>>>> Cc: Peng, Tao; [email protected]; [email protected];
>>>> [email protected]
>>>> Subject: RE: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release
>>>>
>>>>> -----Original Message-----
>>>>> From: Peng Tao [mailto:[email protected]]
>>>>> Sent: Thursday, September 08, 2011 11:00 AM
>>>>> To: Myklebust, Trond
>>>>> Cc: [email protected]; [email protected];
>>>>> [email protected]; [email protected]
>>>>> Subject: Re: [PATCH] nfs: fix inifinite loop at
>>>>> nfs4_layoutcommit_release
>>>>>
>>>>> On Thu, Sep 8, 2011 at 8:01 PM, Myklebust, Trond
>>>>> <[email protected]> wrote:
>>>>>>> -----Original Message-----
>>>>>
>>>>>>
>>>>>> Yes, but as far as I can see, even in the blocks case there can be
>>>>> multiple extents per layout segment. What if I write to one
>>>>> uninitialised extent, layoutcommit, then write to another uninitialized
>>>>> extent in the same layout segment and layoutcommit? In my reading of
>>>>> the code, there is a chance that the second layoutcommit will fail to
>>>>> pick up the layout segment, and so will fail to notify the MDS that the
>>>>> second extent now contains data.
>>>>>
>>>>> blocklayout does not decide what to layoutcommit according to the lseg
>>>>> list. Instead, it keeps track of each extent's state at the
>>>>> granularity of blocksize, and encode whatever needs layoutcommitted in
>>>>> the layoutcommit call. So in your above case, as long as the second
>>>>> layoutcommit is issued, blocklayout will encode the newly written
>>>>> extent in the second layoutcommit call, even if the lseg is not
>>>>> attached to the second layoutcommit.
>>>>>
>>>>> But that leads to another two question:
>>>>> 1. How do we protect against layoutrecall if lseg is not linked to
>>>>> layoutcommit? For this one, can we just reject layoutrecall if there
>>>>> is inflight layoutcommit? It will be less parallel but can guarantee
>>>>> current implementation correctness.
>>>>> 2. blocklayout ONLY: bl_committing may be overloaded by several
>>>>> layoutcommit calls and we don't have information in
>>>>> cleanup_layoutcommit() on how many entry should be removed from
>>>>> bl_committing. Maybe we can add a (void*) to struct
>>>>> nfs4_layoutcommit_data, so that LD can pass some private information
>>>>> between encode_layoutcommit() and cleanup_layoutcommit()?
>>>>
>>>> 3. What is the purpose of pinning the layout segment at all if neither blocks, nor
>>>> objects nor files cares?
>>> I believe it is for protecting against layoutrecall. But since we are seperating lseg and LD specific layout information management, it is actually not working as expected.
>>>
>
> The layout segments are not really in use while in LAYOUTCOMMIT.
> We only need to get the stateid right with respect to concurrent layout recalls.
LAYOUTCOMMIT takes lseg reference to mark them as in use so that
layoutrecall cannot free them.

--
Thanks,
Tao

2011-09-12 14:49:26

[permalink] [raw]

Subject: Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release

On Sat, Sep 10, 2011 at 1:46 AM, Vitaliy Gusev
<[email protected]> wrote:
>>>> Reported-by: Vitaliy Gusev<[email protected]>
>>>> Signed-off-by: Peng Tao<[email protected]>
>>>> ---
>>>> fs/nfs/nfs4proc.c | 2 ++
>>>> 1 files changed, 2 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>>>> index 8c77039..da7c20c 100644
>>>> --- a/fs/nfs/nfs4proc.c
>>>> +++ b/fs/nfs/nfs4proc.c
>>>> @@ -5964,6 +5964,7 @@ static void nfs4_layoutcommit_release(void
>>>> *calldata)
>>>> struct pnfs_layout_segment *lseg, *tmp;
>>>>
>>>> pnfs_cleanup_layoutcommit(data);
>>>> + spin_lock(&data->args.inode->i_lock);
>>>
>>> I think lock over list_del_init(&lseg->pls_lc_list) is enough, because
>>
>> I put the spinlock outside of the loop because the critical section is
>> short enough and it should be faster than grabbing/dropping the inode
>> lock for every entry in the list, agree?
>
> Yes, you are right.
>
> Looked again I saw that issue is not so bad as seems.
> Really, what is result of this issue? Only that lseg is in data->lseg_list,
> but without set NFS_LSEG_LAYOUTCOMMIT. The put_lseg is called correctly at
> nfs4_layoutcommit_release. So there is no any bug.
Yes, you are right. In that case, we will have one lseg in the lc_list
list w/o NFS_LSEG_LAYOUTCOMMIT set but it doesn't hurt anyone. The
above patch of mine is not really necessary.

Thanks,
Tao

2011-09-13 08:33:00

[permalink] [raw]

Subject: RE: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release

> -----Original Message-----
> From: Benny Halevy [mailto:[email protected]]
> Sent: Tuesday, September 13, 2011 3:51 PM
> To: Trond Myklebust
> Cc: Peng Tao; Peng, Tao; [email protected]; [email protected];
> [email protected]
> Subject: Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release
>
> On 2011-09-12 14:10, Trond Myklebust wrote:
> > On Mon, 2011-09-12 at 13:31 -0700, Benny Halevy wrote:
> >> On 2011-09-12 07:56, Peng Tao wrote:
> >>>> The layout segments are not really in use while in LAYOUTCOMMIT.
> >>>> We only need to get the stateid right with respect to concurrent layout recalls.
> >>> LAYOUTCOMMIT takes lseg reference to mark them as in use so that
> >>> layoutrecall cannot free them.
> >>>
> >>
> >> And if layoutrecall would have freed layout segments during layoutcommit,
> >> what is your specific concern?
> >
> > That layoutcommit is supposed to return NFS4ERR_BAD_LAYOUT in that case
> > according to section 18.42.3 of RFC5661. I can't find anything in the
> > errata that changes that requirement.
> >
>
> Right. That tells me there no need to strictly serialize LAYOUTCOMMITs
> with CB_LAYOUTRECALL, as long as the layout stateid sent with LAYOUTCOMMIT
> atomically represents the state when the operation was prepared.
>
> That said, since we do want the LAYOUTCOMMIT to succeed, it would be beneficial
> for the client to reply to a CB_LAYOUTRECALL received while a conflicting
> LAYOUTCOMMIT is in progress with NFS4ERR_DELAY.
I agree. How about adding a new flag to nfsi->flags for this? We can use the same flag on to ensure serialization of multiple layoutcommit. nfs_commit_set_lock/nfs_commit_clear_lock may not fit for this.

>
> The server, on its side, should prevent a distributed deadlock by avoiding
> blocking of a LAYOUTCOMMIT on an outstanding CB_LAYOUTRECALL for the same
> client that sent the LAYOUTCOMMIT. I'm not sure what error would be best to
> return. Maybe NFS4ERR_RECALL_CONFLICT if it would be allowed (it isn't listed
> for LAYOUTCOMMIT at the moment). Just returning NFS4ER_DELAY might lead to
> a live lock situation where neither the LAYOUTCOMMIT not the CB_LAYOUTRECALL
> complete.
>
> Benny

2011-09-13 07:03:00

[permalink] [raw]

Subject: RE: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release

> -----Original Message-----
> From: Benny Halevy [mailto:[email protected]]
> Sent: Tuesday, September 13, 2011 4:32 AM
> To: Peng Tao
> Cc: Trond Myklebust; Peng, Tao; [email protected];
> [email protected]; [email protected]
> Subject: Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release
>
> On 2011-09-12 07:56, Peng Tao wrote:
> > On Sat, Sep 10, 2011 at 3:14 PM, Benny Halevy <[email protected]> wrote:
> >> On 2011-09-09 11:20, Trond Myklebust wrote:
> >>> On Thu, 2011-09-08 at 23:11 -0400, [email protected] wrote:
> >>>> HI, Trond,
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Myklebust, Trond [mailto:[email protected]]
> >>>>> Sent: Friday, September 09, 2011 1:05 AM
> >>>>> To: Peng Tao
> >>>>> Cc: Peng, Tao; [email protected]; [email protected];
> >>>>> [email protected]
> >>>>> Subject: RE: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Peng Tao [mailto:[email protected]]
> >>>>>> Sent: Thursday, September 08, 2011 11:00 AM
> >>>>>> To: Myklebust, Trond
> >>>>>> Cc: [email protected]; [email protected];
> >>>>>> [email protected]; [email protected]
> >>>>>> Subject: Re: [PATCH] nfs: fix inifinite loop at
> >>>>>> nfs4_layoutcommit_release
> >>>>>>
> >>>>>> On Thu, Sep 8, 2011 at 8:01 PM, Myklebust, Trond
> >>>>>> <[email protected]> wrote:
> >>>>>>>> -----Original Message-----
> >>>>>>
> >>>>>>>
> >>>>>>> Yes, but as far as I can see, even in the blocks case there can be
> >>>>>> multiple extents per layout segment. What if I write to one
> >>>>>> uninitialised extent, layoutcommit, then write to another uninitialized
> >>>>>> extent in the same layout segment and layoutcommit? In my reading of
> >>>>>> the code, there is a chance that the second layoutcommit will fail to
> >>>>>> pick up the layout segment, and so will fail to notify the MDS that the
> >>>>>> second extent now contains data.
> >>>>>>
> >>>>>> blocklayout does not decide what to layoutcommit according to the lseg
> >>>>>> list. Instead, it keeps track of each extent's state at the
> >>>>>> granularity of blocksize, and encode whatever needs layoutcommitted in
> >>>>>> the layoutcommit call. So in your above case, as long as the second
> >>>>>> layoutcommit is issued, blocklayout will encode the newly written
> >>>>>> extent in the second layoutcommit call, even if the lseg is not
> >>>>>> attached to the second layoutcommit.
> >>>>>>
> >>>>>> But that leads to another two question:
> >>>>>> 1. How do we protect against layoutrecall if lseg is not linked to
> >>>>>> layoutcommit? For this one, can we just reject layoutrecall if there
> >>>>>> is inflight layoutcommit? It will be less parallel but can guarantee
> >>>>>> current implementation correctness.
> >>>>>> 2. blocklayout ONLY: bl_committing may be overloaded by several
> >>>>>> layoutcommit calls and we don't have information in
> >>>>>> cleanup_layoutcommit() on how many entry should be removed from
> >>>>>> bl_committing. Maybe we can add a (void*) to struct
> >>>>>> nfs4_layoutcommit_data, so that LD can pass some private information
> >>>>>> between encode_layoutcommit() and cleanup_layoutcommit()?
> >>>>>
> >>>>> 3. What is the purpose of pinning the layout segment at all if neither blocks,
> nor
> >>>>> objects nor files cares?
> >>>> I believe it is for protecting against layoutrecall. But since we are seperating
> lseg and LD specific layout information management, it is actually not working as
> expected.
> >>>>
> >>
> >> The layout segments are not really in use while in LAYOUTCOMMIT.
> >> We only need to get the stateid right with respect to concurrent layout recalls.
> > LAYOUTCOMMIT takes lseg reference to mark them as in use so that
> > layoutrecall cannot free them.
> >
>
> And if layoutrecall would have freed layout segments during layoutcommit,
> what is your specific concern?
In layoutcommit_release, blocklayout need to access the corresponding extents to convert their states. If the layout segments are freed by layoutrecall, it can cause problems.

>
> Benny

2011-09-13 07:51:01

[permalink] [raw]

Subject: Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release

On 2011-09-12 14:10, Trond Myklebust wrote:
> On Mon, 2011-09-12 at 13:31 -0700, Benny Halevy wrote:
>> On 2011-09-12 07:56, Peng Tao wrote:
>>>> The layout segments are not really in use while in LAYOUTCOMMIT.
>>>> We only need to get the stateid right with respect to concurrent layout recalls.
>>> LAYOUTCOMMIT takes lseg reference to mark them as in use so that
>>> layoutrecall cannot free them.
>>>
>>
>> And if layoutrecall would have freed layout segments during layoutcommit,
>> what is your specific concern?
>
> That layoutcommit is supposed to return NFS4ERR_BAD_LAYOUT in that case
> according to section 18.42.3 of RFC5661. I can't find anything in the
> errata that changes that requirement.
>

Right. That tells me there no need to strictly serialize LAYOUTCOMMITs
with CB_LAYOUTRECALL, as long as the layout stateid sent with LAYOUTCOMMIT
atomically represents the state when the operation was prepared.

That said, since we do want the LAYOUTCOMMIT to succeed, it would be beneficial
for the client to reply to a CB_LAYOUTRECALL received while a conflicting
LAYOUTCOMMIT is in progress with NFS4ERR_DELAY.

The server, on its side, should prevent a distributed deadlock by avoiding
blocking of a LAYOUTCOMMIT on an outstanding CB_LAYOUTRECALL for the same
client that sent the LAYOUTCOMMIT. I'm not sure what error would be best to
return. Maybe NFS4ERR_RECALL_CONFLICT if it would be allowed (it isn't listed
for LAYOUTCOMMIT at the moment). Just returning NFS4ER_DELAY might lead to
a live lock situation where neither the LAYOUTCOMMIT not the CB_LAYOUTRECALL
complete.

Benny

2011-09-13 07:59:30

[permalink] [raw]

Subject: Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release

On 2011-09-13 00:02, [email protected] wrote:
>
>> -----Original Message-----
>> From: Benny Halevy [mailto:[email protected]]
>> Sent: Tuesday, September 13, 2011 4:32 AM
>> To: Peng Tao
>> Cc: Trond Myklebust; Peng, Tao; [email protected];
>> [email protected]; [email protected]
>> Subject: Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release
>>
>> On 2011-09-12 07:56, Peng Tao wrote:
>>> On Sat, Sep 10, 2011 at 3:14 PM, Benny Halevy <[email protected]> wrote:
>>>> On 2011-09-09 11:20, Trond Myklebust wrote:
>>>>> On Thu, 2011-09-08 at 23:11 -0400, [email protected] wrote:
>>>>>> HI, Trond,
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Myklebust, Trond [mailto:[email protected]]
>>>>>>> Sent: Friday, September 09, 2011 1:05 AM
>>>>>>> To: Peng Tao
>>>>>>> Cc: Peng, Tao; [email protected]; [email protected];
>>>>>>> [email protected]
>>>>>>> Subject: RE: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Peng Tao [mailto:[email protected]]
>>>>>>>> Sent: Thursday, September 08, 2011 11:00 AM
>>>>>>>> To: Myklebust, Trond
>>>>>>>> Cc: [email protected]; [email protected];
>>>>>>>> [email protected]; [email protected]
>>>>>>>> Subject: Re: [PATCH] nfs: fix inifinite loop at
>>>>>>>> nfs4_layoutcommit_release
>>>>>>>>
>>>>>>>> On Thu, Sep 8, 2011 at 8:01 PM, Myklebust, Trond
>>>>>>>> <[email protected]> wrote:
>>>>>>>>>> -----Original Message-----
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Yes, but as far as I can see, even in the blocks case there can be
>>>>>>>> multiple extents per layout segment. What if I write to one
>>>>>>>> uninitialised extent, layoutcommit, then write to another uninitialized
>>>>>>>> extent in the same layout segment and layoutcommit? In my reading of
>>>>>>>> the code, there is a chance that the second layoutcommit will fail to
>>>>>>>> pick up the layout segment, and so will fail to notify the MDS that the
>>>>>>>> second extent now contains data.
>>>>>>>>
>>>>>>>> blocklayout does not decide what to layoutcommit according to the lseg
>>>>>>>> list. Instead, it keeps track of each extent's state at the
>>>>>>>> granularity of blocksize, and encode whatever needs layoutcommitted in
>>>>>>>> the layoutcommit call. So in your above case, as long as the second
>>>>>>>> layoutcommit is issued, blocklayout will encode the newly written
>>>>>>>> extent in the second layoutcommit call, even if the lseg is not
>>>>>>>> attached to the second layoutcommit.
>>>>>>>>
>>>>>>>> But that leads to another two question:
>>>>>>>> 1. How do we protect against layoutrecall if lseg is not linked to
>>>>>>>> layoutcommit? For this one, can we just reject layoutrecall if there
>>>>>>>> is inflight layoutcommit? It will be less parallel but can guarantee
>>>>>>>> current implementation correctness.
>>>>>>>> 2. blocklayout ONLY: bl_committing may be overloaded by several
>>>>>>>> layoutcommit calls and we don't have information in
>>>>>>>> cleanup_layoutcommit() on how many entry should be removed from
>>>>>>>> bl_committing. Maybe we can add a (void*) to struct
>>>>>>>> nfs4_layoutcommit_data, so that LD can pass some private information
>>>>>>>> between encode_layoutcommit() and cleanup_layoutcommit()?
>>>>>>>
>>>>>>> 3. What is the purpose of pinning the layout segment at all if neither blocks,
>> nor
>>>>>>> objects nor files cares?
>>>>>> I believe it is for protecting against layoutrecall. But since we are seperating
>> lseg and LD specific layout information management, it is actually not working as
>> expected.
>>>>>>
>>>>
>>>> The layout segments are not really in use while in LAYOUTCOMMIT.
>>>> We only need to get the stateid right with respect to concurrent layout recalls.
>>> LAYOUTCOMMIT takes lseg reference to mark them as in use so that
>>> layoutrecall cannot free them.
>>>
>>
>> And if layoutrecall would have freed layout segments during layoutcommit,
>> what is your specific concern?
> In layoutcommit_release, blocklayout need to access the corresponding extents to convert their states. If the layout segments are freed by layoutrecall, it can cause problems.
>

See my response to Trond on his previous message. I think the best thing to do is
to return NFS4ERR_DELAY if the gets a conflicting CB_LAYOUTRECALL while the
LAYOUTCOMMIT is in progress. The server may need to reject the LAYOUTCOMMIT
in this case to prevent a distributed deadlock so the client should be prepared
to retry.

Benny

>>
>> Benny

2011-09-08 10:00:50

[permalink] [raw]

Subject: RE: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release

Hi, Gusev,

> -----Original Message-----
> From: Vitaliy Gusev [mailto:[email protected]]
> Sent: Wednesday, September 07, 2011 6:14 AM
> To: Trond Myklebust
> Cc: Vitaliy Gusev; Peng, Tao; [email protected]
> Subject: Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release
>
> >> @@ -1376,7 +1376,8 @@ static void pnfs_list_write_lseg(struct inode *inode,
> struct list_head *listp)
> >>
> >> list_for_each_entry(lseg,&NFS_I(inode)->layout->plh_segs, pls_list) {
> >> if (lseg->pls_range.iomode == IOMODE_RW&&
> >> - test_bit(NFS_LSEG_LAYOUTCOMMIT,&lseg->pls_flags))
> >> + test_bit(NFS_LSEG_LAYOUTCOMMIT,&lseg->pls_flags)&&
> >> + list_empty(&lseg->pls_lc_list))
> >> list_add(&lseg->pls_lc_list, listp);
> >> }
> >> }
> >
> > If the lseg is already part of one layoutcommit, but we're sending a
> > second one for the same range (presumably because we wrote more data in
> > the same region), then the above causes the lseg to be excluded.
>
>
> Yes, lseg is excluded, This patch does exactly only exclusion of lseg.
> lseg is used here only to get/put reference on this lseg, so skipping is
> correct.
>
>
> However, checking on list_empty can occur (on other CPU) in the middle:
>
> list_del_init(&lseg->pls_lc_list);
> Here >>>>>>
> if (test_and_clear_bit(NFS_LSEG_LAYOUTCOMMIT,
> &lseg->pls_flags))
> put_lseg(lseg);
>
>
> So list_del_init must be executed under the same lock as
> pnfs_list_write_lseg, i.e. inode->i_lock.
Yes, you are right. How about following patch?

>From 14c6da67565fb31c2d2775ccefd93251f348910d Mon Sep 17 00:00:00 2001
From: Peng Tao <[email protected]>
Date: Thu, 8 Sep 2011 00:57:02 -0400
Subject: [PATCH] nfsv4: fix race in layoutcommit lseg list create/free

Since there can be more than one layoutcommit proc happen the same time,
lseg list create/free should be protected. Otherwise lseg list
may get corrupted.

Reported-by: Vitaliy Gusev <[email protected]>
Signed-off-by: Peng Tao <[email protected]>
---
fs/nfs/nfs4proc.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 8c77039..da7c20c 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -5964,6 +5964,7 @@ static void nfs4_layoutcommit_release(void *calldata)
struct pnfs_layout_segment *lseg, *tmp;

pnfs_cleanup_layoutcommit(data);
+ spin_lock(&data->args.inode->i_lock);
/* Matched by references in pnfs_set_layoutcommit */
list_for_each_entry_safe(lseg, tmp, &data->lseg_list, pls_lc_list) {
list_del_init(&lseg->pls_lc_list);
@@ -5971,6 +5972,7 @@ static void nfs4_layoutcommit_release(void *calldata)
&lseg->pls_flags))
put_lseg(lseg);
}
+ spin_unlock(&data->args.inode->i_lock);
put_rpccred(data->cred);
kfree(data);
}
--
1.7.4.2

>
>
> >
> > I agree that the current code causes list corruption, but before
> > applying something like the above patch, I'd like to understand why it
> > is correct.
> >
> > Trond
> >
>

2011-09-09 18:02:17

by Vitaliy Gusev

[permalink] [raw]

Subject: Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release

>>> Reported-by: Vitaliy Gusev<[email protected]>
>>> Signed-off-by: Peng Tao<[email protected]>
>>> ---
>>> fs/nfs/nfs4proc.c | 2 ++
>>> 1 files changed, 2 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
>>> index 8c77039..da7c20c 100644
>>> --- a/fs/nfs/nfs4proc.c
>>> +++ b/fs/nfs/nfs4proc.c
>>> @@ -5964,6 +5964,7 @@ static void nfs4_layoutcommit_release(void
>>> *calldata)
>>> struct pnfs_layout_segment *lseg, *tmp;
>>>
>>> pnfs_cleanup_layoutcommit(data);
>>> + spin_lock(&data->args.inode->i_lock);
>>
>> I think lock over list_del_init(&lseg->pls_lc_list) is enough, because
> I put the spinlock outside of the loop because the critical section is
> short enough and it should be faster than grabbing/dropping the inode
> lock for every entry in the list, agree?

Yes, you are right.

Looked again I saw that issue is not so bad as seems.
Really, what is result of this issue? Only that lseg is in
data->lseg_list, but without set NFS_LSEG_LAYOUTCOMMIT. The put_lseg is
called correctly at nfs4_layoutcommit_release. So there is no any bug.

2011-09-06 22:33:23

[permalink] [raw]

Subject: Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release

On Wed, 2011-09-07 at 02:13 +0400, Vitaliy Gusev wrote:
> >> @@ -1376,7 +1376,8 @@ static void pnfs_list_write_lseg(struct inode *inode, struct list_head *listp)
> >>
> >> list_for_each_entry(lseg,&NFS_I(inode)->layout->plh_segs, pls_list) {
> >> if (lseg->pls_range.iomode == IOMODE_RW&&
> >> - test_bit(NFS_LSEG_LAYOUTCOMMIT,&lseg->pls_flags))
> >> + test_bit(NFS_LSEG_LAYOUTCOMMIT,&lseg->pls_flags)&&
> >> + list_empty(&lseg->pls_lc_list))
> >> list_add(&lseg->pls_lc_list, listp);
> >> }
> >> }
> >
> > If the lseg is already part of one layoutcommit, but we're sending a
> > second one for the same range (presumably because we wrote more data in
> > the same region), then the above causes the lseg to be excluded.
>
>
> Yes, lseg is excluded, This patch does exactly only exclusion of lseg.
> lseg is used here only to get/put reference on this lseg, so skipping is
> correct.

Are you sure? As far as I can see, pnfs_list_write_lseg() actually
assigns the lseg to a particular layoutcommit call.

My questions are: if I write to an area described by that lseg _after_
it has been assigned to that layoutcommit RPC call (and the latter has
already been dispatched to the metadata server), then
A. shouldn't it be assigned to a second layoutcommit call too?
B. If not, what are we doing to ensure mutual exclusion between
layoutcommit RPC calls and pNFS writes to the data server?

> However, checking on list_empty can occur (on other CPU) in the middle:
>
> list_del_init(&lseg->pls_lc_list);
> Here >>>>>>
> if (test_and_clear_bit(NFS_LSEG_LAYOUTCOMMIT,
> &lseg->pls_flags))
> put_lseg(lseg);
>
>
> So list_del_init must be executed under the same lock as
> pnfs_list_write_lseg, i.e. inode->i_lock.

Agreed if my questions above make no sense.

Cheers
Trond

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2011-09-08 15:00:49

[permalink] [raw]

Subject: Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release

On Thu, Sep 8, 2011 at 8:01 PM, Myklebust, Trond
<[email protected]> wrote:
>> -----Original Message-----
>> From: [email protected] [mailto:[email protected]]
>> Sent: Thursday, September 08, 2011 6:21 AM
>> To: Myklebust, Trond; [email protected]
>> Cc: [email protected]; [email protected]
>> Subject: RE: [PATCH] nfs: fix inifinite loop at
>> nfs4_layoutcommit_release
>>
>> Hi, Trond,
>>
>> Sorry for the late response.
>>
>> > -----Original Message-----
>> > From: Trond Myklebust [mailto:[email protected]]
>> > Sent: Wednesday, September 07, 2011 6:33 AM
>> > To: Vitaliy Gusev
>> > Cc: Vitaliy Gusev; Peng, Tao; [email protected]
>> > Subject: Re: [PATCH] nfs: fix inifinite loop at
>> nfs4_layoutcommit_release
>> >
>> > On Wed, 2011-09-07 at 02:13 +0400, Vitaliy Gusev wrote:
>> > > >> @@ -1376,7 +1376,8 @@ static void pnfs_list_write_lseg(struct
>> inode *inode,
>> > struct list_head *listp)
>> > > >>
>> > > >> list_for_each_entry(lseg,&NFS_I(inode)->layout->plh_segs,
>> pls_list) {
>> > > >> if (lseg->pls_range.iomode == IOMODE_RW&&
>> > > >> - test_bit(NFS_LSEG_LAYOUTCOMMIT,&lseg->pls_flags))
>> > > >> + test_bit(NFS_LSEG_LAYOUTCOMMIT,&lseg-
>> >pls_flags)&&
>> > > >> + list_empty(&lseg->pls_lc_list))
>> > > >> list_add(&lseg->pls_lc_list, listp);
>> > > >> }
>> > > >> }
>> > > >
>> > > > If the lseg is already part of one layoutcommit, but we're
>> sending a
>> > > > second one for the same range (presumably because we wrote more
>> data in
>> > > > the same region), then the above causes the lseg to be excluded.
>> > >
>> > >
>> > > Yes, lseg is excluded, This patch does exactly only exclusion of
>> lseg.
>> > > lseg is used here only to get/put reference on this lseg, so
>> skipping is
>> > > correct.
>> >
>> > Are you sure? As far as I can see, pnfs_list_write_lseg() actually
>> > assigns the lseg to a particular layoutcommit call.
>> >
>> > My questions are: if I write to an area described by that lseg
>> _after_
>> > it has been assigned to that layoutcommit RPC call (and the latter
>> has
>> > already been dispatched to the metadata server), then
>> > A. shouldn't it be assigned to a second layoutcommit call too?
>> > B. If not, what are we doing to ensure mutual exclusion between
>> > layoutcommit RPC calls and pNFS writes to the data server?
>> I think it depends on the purpose of layoutcommit.
>> 1. for updating atime/mtime. In this case, we don't need mutual
>> exclusion
>
> Agreed.
>
>> between layoutcommit RPC and pNFS writes to data server.
>> 2. for updating LD specific state. In this case, LD should decide what
>> to commit
>> (and actually ignores the range of lseg). Take block layout for
>> example. It maintains
>> blocksized extent state inside LD and determines what to encode inside
>> layoutcommit
>> RPC whenever there is a generic layer layoutcommit call. So when an
>> extent is being
>> layoutcommitted, client can still write to the same range, as long as
>> the lseg is held by client.
>
> Yes, but as far as I can see, even in the blocks case there can be multiple extents per layout segment. What if I write to one uninitialised extent, layoutcommit, then write to another uninitialized extent in the same layout segment and layoutcommit? In my reading of the code, there is a chance that the second layoutcommit will fail to pick up the layout segment, and so will fail to notify the MDS that the second extent now contains data.

blocklayout does not decide what to layoutcommit according to the lseg
list. Instead, it keeps track of each extent's state at the
granularity of blocksize, and encode whatever needs layoutcommitted in
the layoutcommit call. So in your above case, as long as the second
layoutcommit is issued, blocklayout will encode the newly written
extent in the second layoutcommit call, even if the lseg is not
attached to the second layoutcommit.

But that leads to another two question:
1. How do we protect against layoutrecall if lseg is not linked to
layoutcommit? For this one, can we just reject layoutrecall if there
is inflight layoutcommit? It will be less parallel but can guarantee
current implementation correctness.
2. blocklayout ONLY: bl_committing may be overloaded by several
layoutcommit calls and we don't have information in
cleanup_layoutcommit() on how many entry should be removed from
bl_committing. Maybe we can add a (void*) to struct
nfs4_layoutcommit_data, so that LD can pass some private information
between encode_layoutcommit() and cleanup_layoutcommit()?

Thanks,
Tao

2011-09-14 06:43:22

[permalink] [raw]

Subject: Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release

On 2011-09-13 10:32, [email protected] wrote:
>> -----Original Message-----
>> From: Benny Halevy [mailto:[email protected]]
>> Sent: Tuesday, September 13, 2011 3:51 PM
>> To: Trond Myklebust
>> Cc: Peng Tao; Peng, Tao; [email protected]; [email protected];
>> [email protected]
>> Subject: Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release
>>
>> On 2011-09-12 14:10, Trond Myklebust wrote:
>>> On Mon, 2011-09-12 at 13:31 -0700, Benny Halevy wrote:
>>>> On 2011-09-12 07:56, Peng Tao wrote:
>>>>>> The layout segments are not really in use while in LAYOUTCOMMIT.
>>>>>> We only need to get the stateid right with respect to concurrent layout recalls.
>>>>> LAYOUTCOMMIT takes lseg reference to mark them as in use so that
>>>>> layoutrecall cannot free them.
>>>>>
>>>>
>>>> And if layoutrecall would have freed layout segments during layoutcommit,
>>>> what is your specific concern?
>>>
>>> That layoutcommit is supposed to return NFS4ERR_BAD_LAYOUT in that case
>>> according to section 18.42.3 of RFC5661. I can't find anything in the
>>> errata that changes that requirement.
>>>
>>
>> Right. That tells me there no need to strictly serialize LAYOUTCOMMITs
>> with CB_LAYOUTRECALL, as long as the layout stateid sent with LAYOUTCOMMIT
>> atomically represents the state when the operation was prepared.
>>
>> That said, since we do want the LAYOUTCOMMIT to succeed, it would be beneficial
>> for the client to reply to a CB_LAYOUTRECALL received while a conflicting
>> LAYOUTCOMMIT is in progress with NFS4ERR_DELAY.
> I agree. How about adding a new flag to nfsi->flags for this? We can use the same flag on to ensure serialization of multiple layoutcommit. nfs_commit_set_lock/nfs_commit_clear_lock may not fit for this.
>

Sounds good in principle.
Can you take a stab at a patch that does this?

Benny

>>
>> The server, on its side, should prevent a distributed deadlock by avoiding
>> blocking of a LAYOUTCOMMIT on an outstanding CB_LAYOUTRECALL for the same
>> client that sent the LAYOUTCOMMIT. I'm not sure what error would be best to
>> return. Maybe NFS4ERR_RECALL_CONFLICT if it would be allowed (it isn't listed
>> for LAYOUTCOMMIT at the moment). Just returning NFS4ER_DELAY might lead to
>> a live lock situation where neither the LAYOUTCOMMIT not the CB_LAYOUTRECALL
>> complete.
>>
>> Benny
>

2011-09-06 22:32:49

by Vitaliy Gusev

[permalink] [raw]

Subject: Re: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release

>> @@ -1376,7 +1376,8 @@ static void pnfs_list_write_lseg(struct inode *inode, struct list_head *listp)
>>
>> list_for_each_entry(lseg,&NFS_I(inode)->layout->plh_segs, pls_list) {
>> if (lseg->pls_range.iomode == IOMODE_RW&&
>> - test_bit(NFS_LSEG_LAYOUTCOMMIT,&lseg->pls_flags))
>> + test_bit(NFS_LSEG_LAYOUTCOMMIT,&lseg->pls_flags)&&
>> + list_empty(&lseg->pls_lc_list))
>> list_add(&lseg->pls_lc_list, listp);
>> }
>> }
>
> If the lseg is already part of one layoutcommit, but we're sending a
> second one for the same range (presumably because we wrote more data in
> the same region), then the above causes the lseg to be excluded.

Yes, lseg is excluded, This patch does exactly only exclusion of lseg.
lseg is used here only to get/put reference on this lseg, so skipping is
correct.

However, checking on list_empty can occur (on other CPU) in the middle:

list_del_init(&lseg->pls_lc_list);
Here >>>>>>
if (test_and_clear_bit(NFS_LSEG_LAYOUTCOMMIT,
&lseg->pls_flags))
put_lseg(lseg);

So list_del_init must be executed under the same lock as
pnfs_list_write_lseg, i.e. inode->i_lock.

>
> I agree that the current code causes list corruption, but before
> applying something like the above patch, I'd like to understand why it
> is correct.
>
> Trond
>

2011-09-13 08:09:45

[permalink] [raw]

Subject: RE: [PATCH] nfs: fix inifinite loop at nfs4_layoutcommit_release

SGksIFRyb25kLA0KPiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiBGcm9tOiBsaW51eC1u
ZnMtb3duZXJAdmdlci5rZXJuZWwub3JnIFttYWlsdG86bGludXgtbmZzLW93bmVyQHZnZXIua2Vy
bmVsLm9yZ10NCj4gT24gQmVoYWxmIE9mIFRyb25kIE15a2xlYnVzdA0KPiBTZW50OiBUdWVzZGF5
LCBTZXB0ZW1iZXIgMTMsIDIwMTEgNToxMSBBTQ0KPiBUbzogQmVubnkgSGFsZXZ5DQo+IENjOiBQ
ZW5nIFRhbzsgUGVuZywgVGFvOyBndXNldi52aXRhbGl5QG5leGVudGEuY29tOyBndXNldi52aXRh
bGl5QGdtYWlsLmNvbTsNCj4gbGludXgtbmZzQHZnZXIua2VybmVsLm9yZw0KPiBTdWJqZWN0OiBS
ZTogW1BBVENIXSBuZnM6IGZpeCBpbmlmaW5pdGUgbG9vcCBhdCBuZnM0X2xheW91dGNvbW1pdF9y
ZWxlYXNlDQo+IA0KPiBPbiBNb24sIDIwMTEtMDktMTIgYXQgMTM6MzEgLTA3MDAsIEJlbm55IEhh
bGV2eSB3cm90ZToNCj4gPiBPbiAyMDExLTA5LTEyIDA3OjU2LCBQZW5nIFRhbyB3cm90ZToNCj4g
PiA+PiBUaGUgbGF5b3V0IHNlZ21lbnRzIGFyZSBub3QgcmVhbGx5IGluIHVzZSB3aGlsZSBpbiBM
QVlPVVRDT01NSVQuDQo+ID4gPj4gV2Ugb25seSBuZWVkIHRvIGdldCB0aGUgc3RhdGVpZCByaWdo
dCB3aXRoIHJlc3BlY3QgdG8gY29uY3VycmVudCBsYXlvdXQgcmVjYWxscy4NCj4gPiA+IExBWU9V
VENPTU1JVCB0YWtlcyBsc2VnIHJlZmVyZW5jZSB0byBtYXJrIHRoZW0gYXMgaW4gdXNlIHNvIHRo
YXQNCj4gPiA+IGxheW91dHJlY2FsbCBjYW5ub3QgZnJlZSB0aGVtLg0KPiA+ID4NCj4gPg0KPiA+
IEFuZCBpZiBsYXlvdXRyZWNhbGwgd291bGQgaGF2ZSBmcmVlZCBsYXlvdXQgc2VnbWVudHMgZHVy
aW5nIGxheW91dGNvbW1pdCwNCj4gPiB3aGF0IGlzIHlvdXIgc3BlY2lmaWMgY29uY2Vybj8NCj4g
DQo+IFRoYXQgbGF5b3V0Y29tbWl0IGlzIHN1cHBvc2VkIHRvIHJldHVybiBORlM0RVJSX0JBRF9M
QVlPVVQgaW4gdGhhdCBjYXNlDQo+IGFjY29yZGluZyB0byBzZWN0aW9uIDE4LjQyLjMgb2YgUkZD
NTY2MS4gSSBjYW4ndCBmaW5kIGFueXRoaW5nIGluIHRoZQ0KPiBlcnJhdGEgdGhhdCBjaGFuZ2Vz
IHRoYXQgcmVxdWlyZW1lbnQuDQpEbyB5b3UgbWVhbiB0aGF0IGlmIGNsaWVudCBzZW5kcyBsYXlv
dXRjb21taXQgYXQgdGhlIHNhbWUgdGltZSBhcyBzZXJ2ZXIgc2VuZHMgbGF5b3V0cmVjYWxsIGZv
ciBvdmVybGFwcGVkIHJhbmdlLCB0aGVuDQpzZXJ2ZXIgbXVzdCByZWplY3QgbGF5b3V0Y29tbWl0
IHdpdGggTkZTNEVSUl9CQURMQVlPVVQgd2hlbiByZWNlaXZpbmcgbGF5b3V0Y29tbWl0Pw0KDQpS
RkM1NjYxIHNlY3Rpb24gMTguNDIuMyBzYXlzOg0KICAgSWYgdGhlIGxheW91dCBpZGVudGlmaWVk
IGluIHRoZSBhcmd1bWVudHMgZG9lcyBub3QgZXhpc3QsIHRoZSBlcnJvcg0KICAgTkZTNEVSUl9C
QURMQVlPVVQgaXMgcmV0dXJuZWQuICBUaGUgbGF5b3V0IGJlaW5nIGNvbW1pdHRlZCBtYXkgYWxz
bw0KICAgYmUgcmVqZWN0ZWQgaWYgaXQgZG9lcyBub3QgY29ycmVzcG9uZCB0byBhbiBleGlzdGlu
ZyBsYXlvdXQgd2l0aCBhbg0KICAgaW9tb2RlIG9mIExBWU9VVElPTU9ERTRfUlcuDQoNCklNSE8s
IGluIHRoZSBhYm92ZSBjYXNlLCBzZXJ2ZXIgY2Fubm90IGRlY2lkZSBpZiB0aGUgbGF5b3V0IGV4
aXN0cyB1bnRpbCBjbGllbnQgcmV0dXJucyByZXNwb25zZSB0byBsYXlvdXRyZWNhbGwuIEJ1dCBJ
IGNhbid0IGZpbmQNCmFueSByZXF1aXJlbWVudCBpbiBSRkM1NjYxIHRoYXQgc2VydmVyIG11c3Qg
d2FpdCBmb3IgbGF5b3V0cmVjYWxsJ3MgcmVzcG9uc2UgYmVmb3JlIHByb2Nlc3NpbmcgbGF5b3V0
Y29tbWl0Lg0KDQpUaGFua3MsDQpUYW8NCg0KTu+/ve+/ve+/ve+/ve+/vXLvv73vv71577+977+9
77+9Yu+/vVjvv73vv73Hp3bvv71e77+9Kd66ey5u77+9K++/ve+/ve+/ve+/vXvvv73vv73vv70i
77+977+9Xm7vv71y77+977+977+9eu+/vRrvv73vv71o77+977+977+977+9Ju+/ve+/vR7vv71H
77+977+977+9aO+/vQMo77+96ZqO77+93aJqIu+/ve+/vRrvv70bbe+/ve+/ve+/ve+/ve+/vXrv
v73elu+/ve+/ve+/vWbvv73vv73vv71o77+977+977+9fu+/vW3vv70=

2011-09-14 13:20:48