2016-07-28 18:41:12

by Benjamin Coddington

[permalink] [raw]
Subject: [PATCH 0/2] Two fixups for generic/207

Refer to the monster thread here:
http://marc.info/?t=146895837900002&r=1&w=2

Thanks very much for your help with this Trond.

Benjamin Coddington (2):
pnfs/blocklayout: update last_write_offset in prepare_layoutcommit
pNFS: Actively set attributes as invalid if LAYOUTCOMMIT is
outstanding

fs/nfs/blocklayout/extent_tree.c | 6 ++++--
fs/nfs/inode.c | 8 +++++---
2 files changed, 9 insertions(+), 5 deletions(-)

--
2.5.5



2016-07-28 18:41:12

by Benjamin Coddington

[permalink] [raw]
Subject: [PATCH 2/2] pNFS: Actively set attributes as invalid if LAYOUTCOMMIT is outstanding

A LAYOUTCOMMIT then subsequent GETATTR may both return the same attributes,
and in that case NFS_INO_INVALID_ATTR is never set on the second pass
through nfs_update_inode(). The existing check to skip the clearing of
NFS_INO_INVALID_ATTR if a LAYOUTCOMMIT is outstanding does not help in this
case (see commit 10b7e9ad4488: "pNFS: Don't mark the inode as revalidated
if a LAYOUTCOMMIT is outstanding"). We know that if a LAYOUTCOMMIT is
outstanding then attributes will need upating, so always set
NFS_INO_INVALID_ATTR.

Signed-off-by: Benjamin Coddington <[email protected]>
---
fs/nfs/inode.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index f108d58101f8..bf4ec5ecc97e 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -1665,7 +1665,7 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
unsigned long now = jiffies;
unsigned long save_cache_validity;
bool have_writers = nfs_file_has_buffered_writers(nfsi);
- bool cache_revalidated;
+ bool cache_revalidated = true;

dfprintk(VFS, "NFS: %s(%s/%lu fh_crc=0x%08x ct=%d info=0x%x)\n",
__func__, inode->i_sb->s_id, inode->i_ino,
@@ -1714,8 +1714,10 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
/* Do atomic weak cache consistency updates */
invalid |= nfs_wcc_update_inode(inode, fattr);

-
- cache_revalidated = !pnfs_layoutcommit_outstanding(inode);
+ if (pnfs_layoutcommit_outstanding(inode)) {
+ nfsi->cache_validity |= save_cache_validity & NFS_INO_INVALID_ATTR;
+ cache_revalidated = false;
+ }

/* More cache consistency checks */
if (fattr->valid & NFS_ATTR_FATTR_CHANGE) {
--
2.5.5


2016-07-28 18:41:12

by Benjamin Coddington

[permalink] [raw]
Subject: [PATCH 1/2] pnfs/blocklayout: update last_write_offset in prepare_layoutcommit

Block/SCSI layout write completion may add committable extents to the
extent tree before updating the layout's last-written byte under the inode
lock. If a sync happens before this value is updated, then
prepare_layoutcommit may find and encode these extents which would produce
a LAYOUTCOMMIT request whose encoded extents are larger than the request's
loca_length.

Fix this by updating the last_write_offset to match the currently encoded
extents.

Signed-off-by: Benjamin Coddington <[email protected]>
---
fs/nfs/blocklayout/extent_tree.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/blocklayout/extent_tree.c b/fs/nfs/blocklayout/extent_tree.c
index 992bcb19c11e..18ae1fd2175e 100644
--- a/fs/nfs/blocklayout/extent_tree.c
+++ b/fs/nfs/blocklayout/extent_tree.c
@@ -518,7 +518,7 @@ static __be32 *encode_scsi_range(struct pnfs_block_extent *be, __be32 *p)
}

static int ext_tree_encode_commit(struct pnfs_block_layout *bl, __be32 *p,
- size_t buffer_size, size_t *count)
+ size_t buffer_size, size_t *count, __u64 *lastbyte)
{
struct pnfs_block_extent *be;
int ret = 0;
@@ -541,6 +541,8 @@ static int ext_tree_encode_commit(struct pnfs_block_layout *bl, __be32 *p,
else
p = encode_block_extent(be, p);
be->be_tag = EXTENT_COMMITTING;
+ if ((ext_f_end(be) << SECTOR_SHIFT) - 1 > *lastbyte)
+ *lastbyte = (ext_f_end(be) << SECTOR_SHIFT) - 1;
}
spin_unlock(&bl->bl_ext_lock);

@@ -564,7 +566,7 @@ ext_tree_prepare_commit(struct nfs4_layoutcommit_args *arg)
arg->layoutupdate_pages = &arg->layoutupdate_page;

retry:
- ret = ext_tree_encode_commit(bl, start_p + 1, buffer_size, &count);
+ ret = ext_tree_encode_commit(bl, start_p + 1, buffer_size, &count, &arg->lastbytewritten);
if (unlikely(ret)) {
ext_tree_free_commitdata(arg, buffer_size);

--
2.5.5


2016-07-28 18:47:48

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH 1/2] pnfs/blocklayout: update last_write_offset in prepare_layoutcommit

T24gVGh1LCAyMDE2LTA3LTI4IGF0IDE0OjQxIC0wNDAwLCBCZW5qYW1pbiBDb2RkaW5ndG9uIHdy
b3RlOg0KPiBCbG9jay9TQ1NJIGxheW91dCB3cml0ZSBjb21wbGV0aW9uIG1heSBhZGQgY29tbWl0
dGFibGUgZXh0ZW50cyB0byB0aGUNCj4gZXh0ZW50IHRyZWUgYmVmb3JlIHVwZGF0aW5nIHRoZSBs
YXlvdXQncyBsYXN0LXdyaXR0ZW4gYnl0ZSB1bmRlciB0aGUNCj4gaW5vZGUNCj4gbG9jay7CoMKg
SWYgYSBzeW5jIGhhcHBlbnMgYmVmb3JlIHRoaXMgdmFsdWUgaXMgdXBkYXRlZCwgdGhlbg0KPiBw
cmVwYXJlX2xheW91dGNvbW1pdCBtYXkgZmluZCBhbmQgZW5jb2RlIHRoZXNlIGV4dGVudHMgd2hp
Y2ggd291bGQNCj4gcHJvZHVjZQ0KPiBhIExBWU9VVENPTU1JVCByZXF1ZXN0IHdob3NlIGVuY29k
ZWQgZXh0ZW50cyBhcmUgbGFyZ2VyIHRoYW4gdGhlDQo+IHJlcXVlc3Qncw0KPiBsb2NhX2xlbmd0
aC4NCj4gDQo+IEZpeCB0aGlzIGJ5IHVwZGF0aW5nIHRoZSBsYXN0X3dyaXRlX29mZnNldCB0byBt
YXRjaCB0aGUgY3VycmVudGx5DQo+IGVuY29kZWQNCj4gZXh0ZW50cy4NCj4gDQo+IFNpZ25lZC1v
ZmYtYnk6IEJlbmphbWluIENvZGRpbmd0b24gPGJjb2RkaW5nQHJlZGhhdC5jb20+DQo+IC0tLQ0K
PiDCoGZzL25mcy9ibG9ja2xheW91dC9leHRlbnRfdHJlZS5jIHwgNiArKysrLS0NCj4gwqAxIGZp
bGUgY2hhbmdlZCwgNCBpbnNlcnRpb25zKCspLCAyIGRlbGV0aW9ucygtKQ0KPiANCj4gZGlmZiAt
LWdpdCBhL2ZzL25mcy9ibG9ja2xheW91dC9leHRlbnRfdHJlZS5jDQo+IGIvZnMvbmZzL2Jsb2Nr
bGF5b3V0L2V4dGVudF90cmVlLmMNCj4gaW5kZXggOTkyYmNiMTljMTFlLi4xOGFlMWZkMjE3NWUg
MTAwNjQ0DQo+IC0tLSBhL2ZzL25mcy9ibG9ja2xheW91dC9leHRlbnRfdHJlZS5jDQo+ICsrKyBi
L2ZzL25mcy9ibG9ja2xheW91dC9leHRlbnRfdHJlZS5jDQo+IEBAIC01MTgsNyArNTE4LDcgQEAg
c3RhdGljIF9fYmUzMiAqZW5jb2RlX3Njc2lfcmFuZ2Uoc3RydWN0DQo+IHBuZnNfYmxvY2tfZXh0
ZW50ICpiZSwgX19iZTMyICpwKQ0KPiDCoH0NCj4gwqANCj4gwqBzdGF0aWMgaW50IGV4dF90cmVl
X2VuY29kZV9jb21taXQoc3RydWN0IHBuZnNfYmxvY2tfbGF5b3V0ICpibCwNCj4gX19iZTMyICpw
LA0KPiAtCQlzaXplX3QgYnVmZmVyX3NpemUsIHNpemVfdCAqY291bnQpDQo+ICsJCXNpemVfdCBi
dWZmZXJfc2l6ZSwgc2l6ZV90ICpjb3VudCwgX191NjQgKmxhc3RieXRlKQ0KPiDCoHsNCj4gwqAJ
c3RydWN0IHBuZnNfYmxvY2tfZXh0ZW50ICpiZTsNCj4gwqAJaW50IHJldCA9IDA7DQo+IEBAIC01
NDEsNiArNTQxLDggQEAgc3RhdGljIGludCBleHRfdHJlZV9lbmNvZGVfY29tbWl0KHN0cnVjdA0K
PiBwbmZzX2Jsb2NrX2xheW91dCAqYmwsIF9fYmUzMiAqcCwNCj4gwqAJCWVsc2UNCj4gwqAJCQlw
ID0gZW5jb2RlX2Jsb2NrX2V4dGVudChiZSwgcCk7DQo+IMKgCQliZS0+YmVfdGFnID0gRVhURU5U
X0NPTU1JVFRJTkc7DQo+ICsJCWlmICgoZXh0X2ZfZW5kKGJlKSA8PCBTRUNUT1JfU0hJRlQpIC0g
MSA+ICpsYXN0Ynl0ZSkNCj4gKwkJCSpsYXN0Ynl0ZSA9IChleHRfZl9lbmQoYmUpIDw8IFNFQ1RP
Ul9TSElGVCkgDQo+IC0gMTsNCg0KV29uJ3QgdGhpcyBjYXVzZSB0aGUgZmlsZSBzaXplIHRvIGJl
IGFsd2F5cyBzZWN0b3IgYWxpZ25lZCBvbiB0aGUNCnNlcnZlcj8gSSB3YXMgYXNzdW1pbmcgdGhh
dCB3ZSB3b3VsZCBoYXZlIHRvIHN0b3JlIHRoZSBsYXN0Ynl0ZQ0KYXRvbWljYWxseSB3aXRoIHNl
dHRpbmcgdXAgdGhlIGNvbW1pdCBpbsKgZXh0X3RyZWVfbWFya193cml0dGVuKCkuDQoNCj4gwqAJ
fQ0KPiDCoAlzcGluX3VubG9jaygmYmwtPmJsX2V4dF9sb2NrKTsNCj4gwqANCj4gQEAgLTU2NCw3
ICs1NjYsNyBAQCBleHRfdHJlZV9wcmVwYXJlX2NvbW1pdChzdHJ1Y3QNCj4gbmZzNF9sYXlvdXRj
b21taXRfYXJncyAqYXJnKQ0KPiDCoAlhcmctPmxheW91dHVwZGF0ZV9wYWdlcyA9ICZhcmctPmxh
eW91dHVwZGF0ZV9wYWdlOw0KPiDCoA0KPiDCoHJldHJ5Og0KPiAtCXJldCA9IGV4dF90cmVlX2Vu
Y29kZV9jb21taXQoYmwsIHN0YXJ0X3AgKyAxLCBidWZmZXJfc2l6ZSwNCj4gJmNvdW50KTsNCj4g
KwlyZXQgPSBleHRfdHJlZV9lbmNvZGVfY29tbWl0KGJsLCBzdGFydF9wICsgMSwgYnVmZmVyX3Np
emUsDQo+ICZjb3VudCwgJmFyZy0+bGFzdGJ5dGV3cml0dGVuKTsNCj4gwqAJaWYgKHVubGlrZWx5
KHJldCkpIHsNCj4gwqAJCWV4dF90cmVlX2ZyZWVfY29tbWl0ZGF0YShhcmcsIGJ1ZmZlcl9zaXpl
KTsNCj4gwqA=


2016-07-28 18:48:18

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH 2/2] pNFS: Actively set attributes as invalid if LAYOUTCOMMIT is outstanding

T24gVGh1LCAyMDE2LTA3LTI4IGF0IDE0OjQxIC0wNDAwLCBCZW5qYW1pbiBDb2RkaW5ndG9uIHdy
b3RlOg0KPiBBIExBWU9VVENPTU1JVCB0aGVuIHN1YnNlcXVlbnQgR0VUQVRUUiBtYXkgYm90aCBy
ZXR1cm4gdGhlIHNhbWUNCj4gYXR0cmlidXRlcywNCj4gYW5kIGluIHRoYXQgY2FzZSBORlNfSU5P
X0lOVkFMSURfQVRUUiBpcyBuZXZlciBzZXQgb24gdGhlIHNlY29uZCBwYXNzDQo+IHRocm91Z2gg
bmZzX3VwZGF0ZV9pbm9kZSgpLsKgwqBUaGUgZXhpc3RpbmcgY2hlY2sgdG8gc2tpcCB0aGUgY2xl
YXJpbmcNCj4gb2YNCj4gTkZTX0lOT19JTlZBTElEX0FUVFIgaWYgYSBMQVlPVVRDT01NSVQgaXMg
b3V0c3RhbmRpbmcgZG9lcyBub3QgaGVscA0KPiBpbiB0aGlzDQo+IGNhc2UgKHNlZSBjb21taXQg
MTBiN2U5YWQ0NDg4OiAicE5GUzogRG9uJ3QgbWFyayB0aGUgaW5vZGUgYXMNCj4gcmV2YWxpZGF0
ZWQNCj4gaWYgYSBMQVlPVVRDT01NSVQgaXMgb3V0c3RhbmRpbmciKS7CoMKgV2Uga25vdyB0aGF0
IGlmIGEgTEFZT1VUQ09NTUlUDQo+IGlzDQo+IG91dHN0YW5kaW5nIHRoZW4gYXR0cmlidXRlcyB3
aWxsIG5lZWQgdXBhdGluZywgc28gYWx3YXlzIHNldA0KPiBORlNfSU5PX0lOVkFMSURfQVRUUi4N
Cj4gDQo+IFNpZ25lZC1vZmYtYnk6IEJlbmphbWluIENvZGRpbmd0b24gPGJjb2RkaW5nQHJlZGhh
dC5jb20+DQo+IC0tLQ0KPiDCoGZzL25mcy9pbm9kZS5jIHwgOCArKysrKy0tLQ0KPiDCoDEgZmls
ZSBjaGFuZ2VkLCA1IGluc2VydGlvbnMoKyksIDMgZGVsZXRpb25zKC0pDQo+IA0KPiBkaWZmIC0t
Z2l0IGEvZnMvbmZzL2lub2RlLmMgYi9mcy9uZnMvaW5vZGUuYw0KPiBpbmRleCBmMTA4ZDU4MTAx
ZjguLmJmNGVjNWVjYzk3ZSAxMDA2NDQNCj4gLS0tIGEvZnMvbmZzL2lub2RlLmMNCj4gKysrIGIv
ZnMvbmZzL2lub2RlLmMNCj4gQEAgLTE2NjUsNyArMTY2NSw3IEBAIHN0YXRpYyBpbnQgbmZzX3Vw
ZGF0ZV9pbm9kZShzdHJ1Y3QgaW5vZGUNCj4gKmlub2RlLCBzdHJ1Y3QgbmZzX2ZhdHRyICpmYXR0
cikNCj4gwqAJdW5zaWduZWQgbG9uZyBub3cgPSBqaWZmaWVzOw0KPiDCoAl1bnNpZ25lZCBsb25n
IHNhdmVfY2FjaGVfdmFsaWRpdHk7DQo+IMKgCWJvb2wgaGF2ZV93cml0ZXJzID0gbmZzX2ZpbGVf
aGFzX2J1ZmZlcmVkX3dyaXRlcnMobmZzaSk7DQo+IC0JYm9vbCBjYWNoZV9yZXZhbGlkYXRlZDsN
Cj4gKwlib29sIGNhY2hlX3JldmFsaWRhdGVkID0gdHJ1ZTsNCj4gwqANCj4gwqAJZGZwcmludGso
VkZTLCAiTkZTOiAlcyglcy8lbHUgZmhfY3JjPTB4JTA4eCBjdD0lZA0KPiBpbmZvPTB4JXgpXG4i
LA0KPiDCoAkJCV9fZnVuY19fLCBpbm9kZS0+aV9zYi0+c19pZCwgaW5vZGUtPmlfaW5vLA0KPiBA
QCAtMTcxNCw4ICsxNzE0LDEwIEBAIHN0YXRpYyBpbnQgbmZzX3VwZGF0ZV9pbm9kZShzdHJ1Y3Qg
aW5vZGUNCj4gKmlub2RlLCBzdHJ1Y3QgbmZzX2ZhdHRyICpmYXR0cikNCj4gwqAJLyogRG8gYXRv
bWljIHdlYWsgY2FjaGUgY29uc2lzdGVuY3kgdXBkYXRlcyAqLw0KPiDCoAlpbnZhbGlkIHw9IG5m
c193Y2NfdXBkYXRlX2lub2RlKGlub2RlLCBmYXR0cik7DQo+IMKgDQo+IC0NCj4gLQljYWNoZV9y
ZXZhbGlkYXRlZCA9ICFwbmZzX2xheW91dGNvbW1pdF9vdXRzdGFuZGluZyhpbm9kZSk7DQo+ICsJ
aWYgKHBuZnNfbGF5b3V0Y29tbWl0X291dHN0YW5kaW5nKGlub2RlKSkgew0KPiArCQluZnNpLT5j
YWNoZV92YWxpZGl0eSB8PSBzYXZlX2NhY2hlX3ZhbGlkaXR5ICYNCj4gTkZTX0lOT19JTlZBTElE
X0FUVFI7DQo+ICsJCWNhY2hlX3JldmFsaWRhdGVkID0gZmFsc2U7DQo+ICsJfQ0KPiDCoA0KPiDC
oAkvKiBNb3JlIGNhY2hlIGNvbnNpc3RlbmN5IGNoZWNrcyAqLw0KPiDCoAlpZiAoZmF0dHItPnZh
bGlkICYgTkZTX0FUVFJfRkFUVFJfQ0hBTkdFKSB7DQoNClRoYW5rcyEgQXBwbGllZCB0byBsaW51
eC1uZXh0Li4uDQo=


2016-07-28 20:01:18

by Benjamin Coddington

[permalink] [raw]
Subject: Re: [PATCH 1/2] pnfs/blocklayout: update last_write_offset in prepare_layoutcommit


On 28 Jul 2016, at 14:47, Trond Myklebust wrote:

> On Thu, 2016-07-28 at 14:41 -0400, Benjamin Coddington wrote:
>> Block/SCSI layout write completion may add committable extents to the
>> extent tree before updating the layout's last-written byte under the
>> inode
>> lock.  If a sync happens before this value is updated, then
>> prepare_layoutcommit may find and encode these extents which would
>> produce
>> a LAYOUTCOMMIT request whose encoded extents are larger than the
>> request's
>> loca_length.
>>
>> Fix this by updating the last_write_offset to match the currently
>> encoded
>> extents.
>>
>> Signed-off-by: Benjamin Coddington <[email protected]>
>> ---
>>  fs/nfs/blocklayout/extent_tree.c | 6 ++++--
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/nfs/blocklayout/extent_tree.c
>> b/fs/nfs/blocklayout/extent_tree.c
>> index 992bcb19c11e..18ae1fd2175e 100644
>> --- a/fs/nfs/blocklayout/extent_tree.c
>> +++ b/fs/nfs/blocklayout/extent_tree.c
>> @@ -518,7 +518,7 @@ static __be32 *encode_scsi_range(struct
>> pnfs_block_extent *be, __be32 *p)
>>  }
>>  
>>  static int ext_tree_encode_commit(struct pnfs_block_layout *bl,
>> __be32 *p,
>> - size_t buffer_size, size_t *count)
>> + size_t buffer_size, size_t *count, __u64 *lastbyte)
>>  {
>>   struct pnfs_block_extent *be;
>>   int ret = 0;
>> @@ -541,6 +541,8 @@ static int ext_tree_encode_commit(struct
>> pnfs_block_layout *bl, __be32 *p,
>>   else
>>   p = encode_block_extent(be, p);
>>   be->be_tag = EXTENT_COMMITTING;
>> + if ((ext_f_end(be) << SECTOR_SHIFT) - 1 > *lastbyte)
>> + *lastbyte = (ext_f_end(be) << SECTOR_SHIFT)
>> - 1;
>
> Won't this cause the file size to be always sector aligned on the
> server? I was assuming that we would have to store the lastbyte
> atomically with setting up the commit in ext_tree_mark_written().

You're right. It is incorrect. I'll fix it.

Ben

2016-07-29 14:00:04

by Benjamin Coddington

[permalink] [raw]
Subject: Re: [PATCH 1/2] pnfs/blocklayout: update last_write_offset in prepare_layoutcommit



On 28 Jul 2016, at 16:03, Benjamin Coddington wrote:

> On 28 Jul 2016, at 14:47, Trond Myklebust wrote:
>
>> On Thu, 2016-07-28 at 14:41 -0400, Benjamin Coddington wrote:
>>> Block/SCSI layout write completion may add committable extents to the
>>> extent tree before updating the layout's last-written byte under the
>>> inode
>>> lock.  If a sync happens before this value is updated, then
>>> prepare_layoutcommit may find and encode these extents which would
>>> produce
>>> a LAYOUTCOMMIT request whose encoded extents are larger than the
>>> request's
>>> loca_length.
>>>
>>> Fix this by updating the last_write_offset to match the currently
>>> encoded
>>> extents.
>>>
>>> Signed-off-by: Benjamin Coddington <[email protected]>
>>> ---
>>>  fs/nfs/blocklayout/extent_tree.c | 6 ++++--
>>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/nfs/blocklayout/extent_tree.c
>>> b/fs/nfs/blocklayout/extent_tree.c
>>> index 992bcb19c11e..18ae1fd2175e 100644
>>> --- a/fs/nfs/blocklayout/extent_tree.c
>>> +++ b/fs/nfs/blocklayout/extent_tree.c
>>> @@ -518,7 +518,7 @@ static __be32 *encode_scsi_range(struct
>>> pnfs_block_extent *be, __be32 *p)
>>>  }
>>>  
>>>  static int ext_tree_encode_commit(struct pnfs_block_layout *bl,
>>> __be32 *p,
>>> - size_t buffer_size, size_t *count)
>>> + size_t buffer_size, size_t *count, __u64 *lastbyte)
>>>  {
>>>   struct pnfs_block_extent *be;
>>>   int ret = 0;
>>> @@ -541,6 +541,8 @@ static int ext_tree_encode_commit(struct
>>> pnfs_block_layout *bl, __be32 *p,
>>>   else
>>>   p = encode_block_extent(be, p);
>>>   be->be_tag = EXTENT_COMMITTING;
>>> + if ((ext_f_end(be) << SECTOR_SHIFT) - 1 > *lastbyte)
>>> + *lastbyte = (ext_f_end(be) << SECTOR_SHIFT)
>>> - 1;
>>
>> Won't this cause the file size to be always sector aligned on the
>> server? I was assuming that we would have to store the lastbyte
>> atomically with setting up the commit in ext_tree_mark_written().
>
> You're right. It is incorrect. I'll fix it.

This turns out to be a little trickier than I thought, and so is taking me
longer than I have time at the moment. Due to travel, I'll have to come
back to this week after next. Thanks for catching my stupid mistake.

Ben