Hi,
we have a customer who has noticed a significant regression in NFS
performance between SLES10 (2.6.16+) and SLES11 (3.0+).
The use case involves two different processes on the same client mmapping a
file and making changes.
In 2.6.16, all these changes would be cached on the client until a flush or
close or other normal process flushed the changes out - much the same as if
just one process was accessing the file.
In 3.0 the file is flushed out every time a different process performs an
update. So if they interleave accesses, you get a flush-to-the-server on
every access.
This is caused by nfs_flush_incompatible. It deems access through different
file descriptors to be incompatible. However I don't think this is
justified. Accessed using different credentials can reasonably be seen as
incompatible(*), but if two processes with identical credentials open the
same file there there should be not need to differentiate between them.
The patch below changes the test in nfs_flush_incompatible to just test the
credentials (and the dentry, though that might be superfluous - it seems
safer).
(*) In this specific customer's case the process do currently have slightly
different credentials. In one, the primary gid is included in the list of
auxiliary gids. In the other the primary gid is excluded. This is enough
to make the credentials appear different so this patch doesn't help in that
case. They are looking at being more consistent in their setgroups usage
but it would help them if we only actually compared the credentials that
were important. In this case there is no important difference.
Would it make sense to treat a more broad credential as compatible with a
more narrow credential? So a write from a process with a given
uid,gid,grouplist could ultimately be sent to the server with same uid/gid,
but a subset of the group list?
Thoughts?
Thanks,
NeilBrown
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 7816801..b0d4e47 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -860,7 +860,12 @@ int nfs_flush_incompatible(struct file *file, struct page *page)
if (req == NULL)
return 0;
l_ctx = req->wb_lock_context;
- do_flush = req->wb_page != page || req->wb_context != ctx;
+ do_flush = req->wb_page != page;
+ if (req->wb_context != ctx) {
+ do_flush |=
+ req->wb_context->dentry != ctx->dentry ||
+ req->wb_context->cred != ctx->cred;
+ }
if (l_ctx) {
do_flush |= l_ctx->lockowner.l_owner != current->files
|| l_ctx->lockowner.l_pid != current->tgid;
T24gV2VkLCAyMDEzLTA5LTA0IGF0IDExOjM2ICsxMDAwLCBOZWlsQnJvd24gd3JvdGU6DQo+IEhp
LA0KPiAgd2UgaGF2ZSBhIGN1c3RvbWVyIHdobyBoYXMgbm90aWNlZCBhIHNpZ25pZmljYW50IHJl
Z3Jlc3Npb24gaW4gTkZTDQo+ICBwZXJmb3JtYW5jZSBiZXR3ZWVuIFNMRVMxMCAoMi42LjE2Kykg
YW5kIFNMRVMxMSAoMy4wKykuDQo+IA0KPiAgVGhlIHVzZSBjYXNlIGludm9sdmVzIHR3byBkaWZm
ZXJlbnQgcHJvY2Vzc2VzIG9uIHRoZSBzYW1lIGNsaWVudCBtbWFwcGluZyBhDQo+ICBmaWxlIGFu
ZCBtYWtpbmcgY2hhbmdlcy4NCj4gDQo+ICBJbiAyLjYuMTYsIGFsbCB0aGVzZSBjaGFuZ2VzIHdv
dWxkIGJlIGNhY2hlZCBvbiB0aGUgY2xpZW50IHVudGlsIGEgZmx1c2ggb3INCj4gIGNsb3NlIG9y
IG90aGVyIG5vcm1hbCBwcm9jZXNzIGZsdXNoZWQgdGhlIGNoYW5nZXMgb3V0IC0gbXVjaCB0aGUg
c2FtZSBhcyBpZg0KPiAganVzdCBvbmUgcHJvY2VzcyB3YXMgYWNjZXNzaW5nIHRoZSBmaWxlLg0K
PiANCj4gIEluIDMuMCB0aGUgZmlsZSBpcyBmbHVzaGVkIG91dCBldmVyeSB0aW1lIGEgZGlmZmVy
ZW50IHByb2Nlc3MgcGVyZm9ybXMgYW4NCj4gIHVwZGF0ZS4gU28gaWYgdGhleSBpbnRlcmxlYXZl
IGFjY2Vzc2VzLCB5b3UgZ2V0IGEgZmx1c2gtdG8tdGhlLXNlcnZlciBvbg0KPiAgZXZlcnkgYWNj
ZXNzLg0KPiANCj4gIFRoaXMgaXMgY2F1c2VkIGJ5IG5mc19mbHVzaF9pbmNvbXBhdGlibGUuICBJ
dCBkZWVtcyBhY2Nlc3MgdGhyb3VnaCBkaWZmZXJlbnQNCj4gIGZpbGUgZGVzY3JpcHRvcnMgdG8g
YmUgaW5jb21wYXRpYmxlLiAgSG93ZXZlciBJIGRvbid0IHRoaW5rIHRoaXMgaXMNCj4gIGp1c3Rp
ZmllZC4gIEFjY2Vzc2VkIHVzaW5nIGRpZmZlcmVudCBjcmVkZW50aWFscyBjYW4gcmVhc29uYWJs
eSBiZSBzZWVuIGFzDQo+ICBpbmNvbXBhdGlibGUoKiksIGJ1dCBpZiB0d28gcHJvY2Vzc2VzIHdp
dGggaWRlbnRpY2FsIGNyZWRlbnRpYWxzIG9wZW4gdGhlDQo+ICBzYW1lIGZpbGUgdGhlcmUgdGhl
cmUgc2hvdWxkIGJlIG5vdCBuZWVkIHRvIGRpZmZlcmVudGlhdGUgYmV0d2VlbiB0aGVtLg0KPiAN
Cj4gIFRoZSBwYXRjaCBiZWxvdyBjaGFuZ2VzIHRoZSB0ZXN0IGluIG5mc19mbHVzaF9pbmNvbXBh
dGlibGUgdG8ganVzdCB0ZXN0IHRoZQ0KPiAgY3JlZGVudGlhbHMgKGFuZCB0aGUgZGVudHJ5LCB0
aG91Z2ggdGhhdCBtaWdodCBiZSBzdXBlcmZsdW91cyAtIGl0IHNlZW1zDQo+ICBzYWZlcikuDQo+
IA0KPiAoKikgSW4gdGhpcyBzcGVjaWZpYyBjdXN0b21lcidzIGNhc2UgdGhlIHByb2Nlc3MgZG8g
Y3VycmVudGx5IGhhdmUgc2xpZ2h0bHkNCj4gIGRpZmZlcmVudCBjcmVkZW50aWFscy4gIEluIG9u
ZSwgdGhlIHByaW1hcnkgZ2lkIGlzIGluY2x1ZGVkIGluIHRoZSBsaXN0IG9mDQo+ICBhdXhpbGlh
cnkgZ2lkcy4gIEluIHRoZSBvdGhlciB0aGUgcHJpbWFyeSBnaWQgaXMgZXhjbHVkZWQuICBUaGlz
IGlzIGVub3VnaA0KPiAgdG8gbWFrZSB0aGUgY3JlZGVudGlhbHMgYXBwZWFyIGRpZmZlcmVudCBz
byB0aGlzIHBhdGNoIGRvZXNuJ3QgaGVscCBpbiB0aGF0DQo+ICBjYXNlLiAgVGhleSBhcmUgbG9v
a2luZyBhdCBiZWluZyBtb3JlIGNvbnNpc3RlbnQgaW4gdGhlaXIgc2V0Z3JvdXBzIHVzYWdlDQo+
ICBidXQgaXQgd291bGQgaGVscCB0aGVtIGlmIHdlIG9ubHkgYWN0dWFsbHkgY29tcGFyZWQgdGhl
IGNyZWRlbnRpYWxzIHRoYXQNCj4gIHdlcmUgaW1wb3J0YW50LiAgSW4gdGhpcyBjYXNlIHRoZXJl
IGlzIG5vIGltcG9ydGFudCBkaWZmZXJlbmNlLg0KPiAgV291bGQgaXQgbWFrZSBzZW5zZSB0byB0
cmVhdCBhIG1vcmUgYnJvYWQgY3JlZGVudGlhbCBhcyBjb21wYXRpYmxlIHdpdGggYQ0KPiAgbW9y
ZSBuYXJyb3cgY3JlZGVudGlhbD8gIFNvIGEgd3JpdGUgZnJvbSBhIHByb2Nlc3Mgd2l0aCBhIGdp
dmVuDQo+ICB1aWQsZ2lkLGdyb3VwbGlzdCBjb3VsZCB1bHRpbWF0ZWx5IGJlIHNlbnQgdG8gdGhl
IHNlcnZlciB3aXRoIHNhbWUgdWlkL2dpZCwNCj4gIGJ1dCBhIHN1YnNldCBvZiB0aGUgZ3JvdXAg
bGlzdD8NCj4gIA0KPiANCj4gIFRob3VnaHRzPw0KPiANCj4gVGhhbmtzLA0KPiBOZWlsQnJvd24N
Cj4gDQo+IA0KPiBkaWZmIC0tZ2l0IGEvZnMvbmZzL3dyaXRlLmMgYi9mcy9uZnMvd3JpdGUuYw0K
PiBpbmRleCA3ODE2ODAxLi5iMGQ0ZTQ3IDEwMDY0NA0KPiAtLS0gYS9mcy9uZnMvd3JpdGUuYw0K
PiArKysgYi9mcy9uZnMvd3JpdGUuYw0KPiBAQCAtODYwLDcgKzg2MCwxMiBAQCBpbnQgbmZzX2Zs
dXNoX2luY29tcGF0aWJsZShzdHJ1Y3QgZmlsZSAqZmlsZSwgc3RydWN0IHBhZ2UgKnBhZ2UpDQo+
ICAJCWlmIChyZXEgPT0gTlVMTCkNCj4gIAkJCXJldHVybiAwOw0KPiAgCQlsX2N0eCA9IHJlcS0+
d2JfbG9ja19jb250ZXh0Ow0KPiAtCQlkb19mbHVzaCA9IHJlcS0+d2JfcGFnZSAhPSBwYWdlIHx8
IHJlcS0+d2JfY29udGV4dCAhPSBjdHg7DQo+ICsJCWRvX2ZsdXNoID0gcmVxLT53Yl9wYWdlICE9
IHBhZ2U7DQo+ICsJCWlmIChyZXEtPndiX2NvbnRleHQgIT0gY3R4KSB7DQo+ICsJCQlkb19mbHVz
aCB8PQ0KPiArCQkJCXJlcS0+d2JfY29udGV4dC0+ZGVudHJ5ICE9IGN0eC0+ZGVudHJ5IHx8DQo+
ICsJCQkJcmVxLT53Yl9jb250ZXh0LT5jcmVkICE9IGN0eC0+Y3JlZDsNCj4gKwkJfQ0KPiAgCQlp
ZiAobF9jdHgpIHsNCj4gIAkJCWRvX2ZsdXNoIHw9IGxfY3R4LT5sb2Nrb3duZXIubF9vd25lciAh
PSBjdXJyZW50LT5maWxlcw0KPiAgCQkJCXx8IGxfY3R4LT5sb2Nrb3duZXIubF9waWQgIT0gY3Vy
cmVudC0+dGdpZDsNCg0KSGkgTmVpbCwNCg0KV29uJ3QgdGhpcyBjYXVzZSBhbnkgd3JpdGUgZXJy
b3JzIHRvIGJlIHByb3BhZ2F0ZWQgdG8gdGhlIHdyb25nIHByb2Nlc3M/DQoNCkNoZWVycw0KICBU
cm9uZA0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBjbGllbnQgbWFpbnRhaW5lcg0K
DQpOZXRBcHANClRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29tDQp3d3cubmV0YXBwLmNvbQ0K
T24gVGh1LCAyMDEzLTA5LTA1IGF0IDA5OjI5ICsxMDAwLCBOZWlsQnJvd24gd3JvdGU6DQo+IE9u
IFdlZCwgNCBTZXAgMjAxMyAxNDozMzozNSArMDAwMCAiTXlrbGVidXN0LCBUcm9uZCINCj4gPFRy
b25kLk15a2xlYnVzdEBuZXRhcHAuY29tPiB3cm90ZToNCj4gDQo+ID4gT24gV2VkLCAyMDEzLTA5
LTA0IGF0IDExOjM2ICsxMDAwLCBOZWlsQnJvd24gd3JvdGU6DQo+ID4gPiBIaSwNCj4gPiA+ICB3
ZSBoYXZlIGEgY3VzdG9tZXIgd2hvIGhhcyBub3RpY2VkIGEgc2lnbmlmaWNhbnQgcmVncmVzc2lv
biBpbiBORlMNCj4gPiA+ICBwZXJmb3JtYW5jZSBiZXR3ZWVuIFNMRVMxMCAoMi42LjE2KykgYW5k
IFNMRVMxMSAoMy4wKykuDQo+ID4gPiANCj4gPiA+ICBUaGUgdXNlIGNhc2UgaW52b2x2ZXMgdHdv
IGRpZmZlcmVudCBwcm9jZXNzZXMgb24gdGhlIHNhbWUgY2xpZW50IG1tYXBwaW5nIGENCj4gPiA+
ICBmaWxlIGFuZCBtYWtpbmcgY2hhbmdlcy4NCj4gPiA+IA0KPiA+ID4gIEluIDIuNi4xNiwgYWxs
IHRoZXNlIGNoYW5nZXMgd291bGQgYmUgY2FjaGVkIG9uIHRoZSBjbGllbnQgdW50aWwgYSBmbHVz
aCBvcg0KPiA+ID4gIGNsb3NlIG9yIG90aGVyIG5vcm1hbCBwcm9jZXNzIGZsdXNoZWQgdGhlIGNo
YW5nZXMgb3V0IC0gbXVjaCB0aGUgc2FtZSBhcyBpZg0KPiA+ID4gIGp1c3Qgb25lIHByb2Nlc3Mg
d2FzIGFjY2Vzc2luZyB0aGUgZmlsZS4NCj4gPiA+IA0KPiA+ID4gIEluIDMuMCB0aGUgZmlsZSBp
cyBmbHVzaGVkIG91dCBldmVyeSB0aW1lIGEgZGlmZmVyZW50IHByb2Nlc3MgcGVyZm9ybXMgYW4N
Cj4gPiA+ICB1cGRhdGUuIFNvIGlmIHRoZXkgaW50ZXJsZWF2ZSBhY2Nlc3NlcywgeW91IGdldCBh
IGZsdXNoLXRvLXRoZS1zZXJ2ZXIgb24NCj4gPiA+ICBldmVyeSBhY2Nlc3MuDQo+ID4gPiANCj4g
PiA+ICBUaGlzIGlzIGNhdXNlZCBieSBuZnNfZmx1c2hfaW5jb21wYXRpYmxlLiAgSXQgZGVlbXMg
YWNjZXNzIHRocm91Z2ggZGlmZmVyZW50DQo+ID4gPiAgZmlsZSBkZXNjcmlwdG9ycyB0byBiZSBp
bmNvbXBhdGlibGUuICBIb3dldmVyIEkgZG9uJ3QgdGhpbmsgdGhpcyBpcw0KPiA+ID4gIGp1c3Rp
ZmllZC4gIEFjY2Vzc2VkIHVzaW5nIGRpZmZlcmVudCBjcmVkZW50aWFscyBjYW4gcmVhc29uYWJs
eSBiZSBzZWVuIGFzDQo+ID4gPiAgaW5jb21wYXRpYmxlKCopLCBidXQgaWYgdHdvIHByb2Nlc3Nl
cyB3aXRoIGlkZW50aWNhbCBjcmVkZW50aWFscyBvcGVuIHRoZQ0KPiA+ID4gIHNhbWUgZmlsZSB0
aGVyZSB0aGVyZSBzaG91bGQgYmUgbm90IG5lZWQgdG8gZGlmZmVyZW50aWF0ZSBiZXR3ZWVuIHRo
ZW0uDQo+ID4gPiANCj4gPiA+ICBUaGUgcGF0Y2ggYmVsb3cgY2hhbmdlcyB0aGUgdGVzdCBpbiBu
ZnNfZmx1c2hfaW5jb21wYXRpYmxlIHRvIGp1c3QgdGVzdCB0aGUNCj4gPiA+ICBjcmVkZW50aWFs
cyAoYW5kIHRoZSBkZW50cnksIHRob3VnaCB0aGF0IG1pZ2h0IGJlIHN1cGVyZmx1b3VzIC0gaXQg
c2VlbXMNCj4gPiA+ICBzYWZlcikuDQo+ID4gPiANCj4gPiA+ICgqKSBJbiB0aGlzIHNwZWNpZmlj
IGN1c3RvbWVyJ3MgY2FzZSB0aGUgcHJvY2VzcyBkbyBjdXJyZW50bHkgaGF2ZSBzbGlnaHRseQ0K
PiA+ID4gIGRpZmZlcmVudCBjcmVkZW50aWFscy4gIEluIG9uZSwgdGhlIHByaW1hcnkgZ2lkIGlz
IGluY2x1ZGVkIGluIHRoZSBsaXN0IG9mDQo+ID4gPiAgYXV4aWxpYXJ5IGdpZHMuICBJbiB0aGUg
b3RoZXIgdGhlIHByaW1hcnkgZ2lkIGlzIGV4Y2x1ZGVkLiAgVGhpcyBpcyBlbm91Z2gNCj4gPiA+
ICB0byBtYWtlIHRoZSBjcmVkZW50aWFscyBhcHBlYXIgZGlmZmVyZW50IHNvIHRoaXMgcGF0Y2gg
ZG9lc24ndCBoZWxwIGluIHRoYXQNCj4gPiA+ICBjYXNlLiAgVGhleSBhcmUgbG9va2luZyBhdCBi
ZWluZyBtb3JlIGNvbnNpc3RlbnQgaW4gdGhlaXIgc2V0Z3JvdXBzIHVzYWdlDQo+ID4gPiAgYnV0
IGl0IHdvdWxkIGhlbHAgdGhlbSBpZiB3ZSBvbmx5IGFjdHVhbGx5IGNvbXBhcmVkIHRoZSBjcmVk
ZW50aWFscyB0aGF0DQo+ID4gPiAgd2VyZSBpbXBvcnRhbnQuICBJbiB0aGlzIGNhc2UgdGhlcmUg
aXMgbm8gaW1wb3J0YW50IGRpZmZlcmVuY2UuDQo+ID4gPiAgV291bGQgaXQgbWFrZSBzZW5zZSB0
byB0cmVhdCBhIG1vcmUgYnJvYWQgY3JlZGVudGlhbCBhcyBjb21wYXRpYmxlIHdpdGggYQ0KPiA+
ID4gIG1vcmUgbmFycm93IGNyZWRlbnRpYWw/ICBTbyBhIHdyaXRlIGZyb20gYSBwcm9jZXNzIHdp
dGggYSBnaXZlbg0KPiA+ID4gIHVpZCxnaWQsZ3JvdXBsaXN0IGNvdWxkIHVsdGltYXRlbHkgYmUg
c2VudCB0byB0aGUgc2VydmVyIHdpdGggc2FtZSB1aWQvZ2lkLA0KPiA+ID4gIGJ1dCBhIHN1YnNl
dCBvZiB0aGUgZ3JvdXAgbGlzdD8NCj4gPiA+ICANCj4gPiA+IA0KPiA+ID4gIFRob3VnaHRzPw0K
PiA+ID4gDQo+ID4gPiBUaGFua3MsDQo+ID4gPiBOZWlsQnJvd24NCj4gPiA+IA0KPiA+ID4gDQo+
ID4gPiBkaWZmIC0tZ2l0IGEvZnMvbmZzL3dyaXRlLmMgYi9mcy9uZnMvd3JpdGUuYw0KPiA+ID4g
aW5kZXggNzgxNjgwMS4uYjBkNGU0NyAxMDA2NDQNCj4gPiA+IC0tLSBhL2ZzL25mcy93cml0ZS5j
DQo+ID4gPiArKysgYi9mcy9uZnMvd3JpdGUuYw0KPiA+ID4gQEAgLTg2MCw3ICs4NjAsMTIgQEAg
aW50IG5mc19mbHVzaF9pbmNvbXBhdGlibGUoc3RydWN0IGZpbGUgKmZpbGUsIHN0cnVjdCBwYWdl
ICpwYWdlKQ0KPiA+ID4gIAkJaWYgKHJlcSA9PSBOVUxMKQ0KPiA+ID4gIAkJCXJldHVybiAwOw0K
PiA+ID4gIAkJbF9jdHggPSByZXEtPndiX2xvY2tfY29udGV4dDsNCj4gPiA+IC0JCWRvX2ZsdXNo
ID0gcmVxLT53Yl9wYWdlICE9IHBhZ2UgfHwgcmVxLT53Yl9jb250ZXh0ICE9IGN0eDsNCj4gPiA+
ICsJCWRvX2ZsdXNoID0gcmVxLT53Yl9wYWdlICE9IHBhZ2U7DQo+ID4gPiArCQlpZiAocmVxLT53
Yl9jb250ZXh0ICE9IGN0eCkgew0KPiA+ID4gKwkJCWRvX2ZsdXNoIHw9DQo+ID4gPiArCQkJCXJl
cS0+d2JfY29udGV4dC0+ZGVudHJ5ICE9IGN0eC0+ZGVudHJ5IHx8DQo+ID4gPiArCQkJCXJlcS0+
d2JfY29udGV4dC0+Y3JlZCAhPSBjdHgtPmNyZWQ7DQo+ID4gPiArCQl9DQo+ID4gPiAgCQlpZiAo
bF9jdHgpIHsNCj4gPiA+ICAJCQlkb19mbHVzaCB8PSBsX2N0eC0+bG9ja293bmVyLmxfb3duZXIg
IT0gY3VycmVudC0+ZmlsZXMNCj4gPiA+ICAJCQkJfHwgbF9jdHgtPmxvY2tvd25lci5sX3BpZCAh
PSBjdXJyZW50LT50Z2lkOw0KPiA+IA0KPiA+IEhpIE5laWwsDQo+ID4gDQo+ID4gV29uJ3QgdGhp
cyBjYXVzZSBhbnkgd3JpdGUgZXJyb3JzIHRvIGJlIHByb3BhZ2F0ZWQgdG8gdGhlIHdyb25nIHBy
b2Nlc3M/DQo+IA0KPiBJJ20gbm90IHN1cmUgd2hhdCB0aGlzIG1lYW5zLg0KPiANCj4gRm9yIGEg
bG9jYWwgZmlsZXN5c3RlbSwgdGhlIG9ubHkgZXJyb3IgdGhhdCBpc24ndCBzeW5jaHJvbm91cyBp
cyBFSU8gYW5kIGFuDQo+IGFwcGxpY2F0aW9uIGNhbiBvbmx5IGJlIHN1cmUgb2YgZ2V0dGluZyB0
aGF0IGlmIGl0IHVzZXMgJ2ZzeW5jJy4gIEFuDQo+IGFwcGxpY2F0aW9uIGNhbiBnZXQgRUlPIGZy
b20gZnN5bmMgZXZlbiBpZiBpdCBkaWRuJ3Qgd3JpdGUgdG8gdGhlIHBhZ2Ugd2hpY2gNCj4gaXMg
Y2F1c2luZyB0aGUgcHJvYmxlbS4gIEFsbCBhcHBzIHNob3VsZCBiZSBhYmxlIHRvIHNlZSB0aGUg
RUlPIGlmIHRoZXkNCj4gYm90aGVyIHRvIGZzeW5jLg0KDQpZb3UgYXJlIGRlc2NyaWJpbmcgYSBj
b25zZXF1ZW5jZSBvZiB0aGUgY29tcGxldGUgYW5kIHV0dGVyIGJyb2tlbm5lc3Mgb2YNCnRoZSBi
bG9jayBsYXllci4gVGhlIGJsb2NrIGxheWVyIGRvZXNuJ3QgdHJhY2sgd2hvIHdyb3RlIHdoYXQg
aW50byB0aGUNCnBhZ2UgY2FjaGUsIGFuZCBzbyBpdCBqdXN0IHBhc3NlcyBhbnkgd3JpdGUgZXJy
b3JzIHRvIHdoaWNoZXZlciBwcm9jZXNzDQp0aGF0IGZpcnN0IGNhbGxzIGZpbGVtYXBfY2hlY2tf
ZXJyb3JzKCkgZm9yIHRoYXQgZmlsZS4NCg0KSW4gTkZTLCB3ZSB1c2UgdGhlIG5mc19vcGVuX2Nv
bnRleHQgdG8gdHJhY2sgZXhhY3RseSB3aG8gd3JvdGUgd2hhdCwgYW5kDQp1c2UgdGhhdCB0byBu
b3RpZnkgdGhlIHNwZWNpZmljIHByb2Nlc3MgdGhhdCB3cm90ZSB0aGUgZGF0YSB0aGF0IGZhaWxl
ZC4NCg0KPiBGb3IgTkZTLCBFRFFVT1QgYW5kIEVOT1NQQyBhcmUgYWxzbyBhc3luY2hyb25vdXMs
IGFuZCB0aGV5IGNhbiBhbHNvIGNvbWUgaW4NCj4gdGhyb3VnaCBjbG9zZSgpLiAgSSB0aGluayBl
dmVyeSBhcHAgc2hvdWxkIHN0aWxsIHNlZSBldmVyeSBlcnJvciB3aGV0aGVyICBpdA0KPiBkaXJ0
aWVkIHRoZSBwYWdlIHdoaWNoIHRyaWdnZXJlZCB0aGUgZXJyb3Igb3Igbm90Lg0KDQpXaXRoIHRo
aXMgcGF0Y2gsIHdlIGNhbiBlbmQgdXAgd3JpdGluZyB0aGUgZGF0YSB1c2luZyB0aGUgd3JvbmcN
Cm5mc19vcGVuX2NvbnRleHQsIGluIHdoaWNoIGNhc2UgdGhlIGVycm9ycyB3aWxsIGdldCByZXBv
cnRlZCB0byB0aGUNCnByb2Nlc3MgdGhhdCBhY3R1YWxseSBvd25zIHRoYXQgbmZzX29wZW5fY29u
dGV4dC4NCg0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBjbGllbnQgbWFpbnRhaW5l
cg0KDQpOZXRBcHANClRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29tDQp3d3cubmV0YXBwLmNvbQ0K
On Thu, 5 Sep 2013 03:17:54 +0000 "Myklebust, Trond"
<[email protected]> wrote:
> On Thu, 2013-09-05 at 09:29 +1000, NeilBrown wrote:
> > On Wed, 4 Sep 2013 14:33:35 +0000 "Myklebust, Trond"
> > <[email protected]> wrote:
> >
> > > On Wed, 2013-09-04 at 11:36 +1000, NeilBrown wrote:
> > > > Hi,
> > > > we have a customer who has noticed a significant regression in NFS
> > > > performance between SLES10 (2.6.16+) and SLES11 (3.0+).
> > > >
> > > > The use case involves two different processes on the same client mmapping a
> > > > file and making changes.
> > > >
> > > > In 2.6.16, all these changes would be cached on the client until a flush or
> > > > close or other normal process flushed the changes out - much the same as if
> > > > just one process was accessing the file.
> > > >
> > > > In 3.0 the file is flushed out every time a different process performs an
> > > > update. So if they interleave accesses, you get a flush-to-the-server on
> > > > every access.
> > > >
> > > > This is caused by nfs_flush_incompatible. It deems access through different
> > > > file descriptors to be incompatible. However I don't think this is
> > > > justified. Accessed using different credentials can reasonably be seen as
> > > > incompatible(*), but if two processes with identical credentials open the
> > > > same file there there should be not need to differentiate between them.
> > > >
> > > > The patch below changes the test in nfs_flush_incompatible to just test the
> > > > credentials (and the dentry, though that might be superfluous - it seems
> > > > safer).
> > > >
> > > > (*) In this specific customer's case the process do currently have slightly
> > > > different credentials. In one, the primary gid is included in the list of
> > > > auxiliary gids. In the other the primary gid is excluded. This is enough
> > > > to make the credentials appear different so this patch doesn't help in that
> > > > case. They are looking at being more consistent in their setgroups usage
> > > > but it would help them if we only actually compared the credentials that
> > > > were important. In this case there is no important difference.
> > > > Would it make sense to treat a more broad credential as compatible with a
> > > > more narrow credential? So a write from a process with a given
> > > > uid,gid,grouplist could ultimately be sent to the server with same uid/gid,
> > > > but a subset of the group list?
> > > >
> > > >
> > > > Thoughts?
> > > >
> > > > Thanks,
> > > > NeilBrown
> > > >
> > > >
> > > > diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> > > > index 7816801..b0d4e47 100644
> > > > --- a/fs/nfs/write.c
> > > > +++ b/fs/nfs/write.c
> > > > @@ -860,7 +860,12 @@ int nfs_flush_incompatible(struct file *file, struct page *page)
> > > > if (req == NULL)
> > > > return 0;
> > > > l_ctx = req->wb_lock_context;
> > > > - do_flush = req->wb_page != page || req->wb_context != ctx;
> > > > + do_flush = req->wb_page != page;
> > > > + if (req->wb_context != ctx) {
> > > > + do_flush |=
> > > > + req->wb_context->dentry != ctx->dentry ||
> > > > + req->wb_context->cred != ctx->cred;
> > > > + }
> > > > if (l_ctx) {
> > > > do_flush |= l_ctx->lockowner.l_owner != current->files
> > > > || l_ctx->lockowner.l_pid != current->tgid;
> > >
> > > Hi Neil,
> > >
> > > Won't this cause any write errors to be propagated to the wrong process?
> >
> > I'm not sure what this means.
> >
> > For a local filesystem, the only error that isn't synchronous is EIO and an
> > application can only be sure of getting that if it uses 'fsync'. An
> > application can get EIO from fsync even if it didn't write to the page which
> > is causing the problem. All apps should be able to see the EIO if they
> > bother to fsync.
>
> You are describing a consequence of the complete and utter brokenness of
> the block layer. The block layer doesn't track who wrote what into the
> page cache, and so it just passes any write errors to whichever process
> that first calls filemap_check_errors() for that file.
I would have said "pragmatic design" rather than "complete ... brokenness".
If a write has failed then the file is corrupt and every writer deserves to
know.
If multiple processes try to fsync the same file, pages that fail for one
process will likely fail for the next process so each should get told of the
failure.
(actually... I wonder if I'm missing something. If you have a Dirty page
which results in an Error every time it is written, is there any way for
it to be removed from the page cache, or will it remain there causing every
'fsync' on the file to try to write the same page and get an error?)
>
> In NFS, we use the nfs_open_context to track exactly who wrote what, and
> use that to notify the specific process that wrote the data that failed.
While this could be seen as an improvement, the extra guarantees are only
available on NFS, so apps are not likely to depend on them, so the extra value
doesn't actually help anyone - does it?
And it has a clear cost: two processes on the one machine can suffer
significantly degraded performance.
Is the theoretical improvement in semantics worth that cost.
>
> > For NFS, EDQUOT and ENOSPC are also asynchronous, and they can also come in
> > through close(). I think every app should still see every error whether it
> > dirtied the page which triggered the error or not.
>
> With this patch, we can end up writing the data using the wrong
> nfs_open_context, in which case the errors will get reported to the
> process that actually owns that nfs_open_context.
>
I see yes. I would want the error to be returned to whoever instigated the
'flush', not whoever issues the 'write'.
Thanks,
NeilBrown
T24gVHVlLCAyMDEzLTA5LTEwIGF0IDE0OjQyICsxMDAwLCBOZWlsQnJvd24gd3JvdGU6DQo+IE9u
IFRodSwgNSBTZXAgMjAxMyAwMzoxNzo1NCArMDAwMCAiTXlrbGVidXN0LCBUcm9uZCINCj4gPFRy
b25kLk15a2xlYnVzdEBuZXRhcHAuY29tPiB3cm90ZToNCj4gDQo+ID4gT24gVGh1LCAyMDEzLTA5
LTA1IGF0IDA5OjI5ICsxMDAwLCBOZWlsQnJvd24gd3JvdGU6DQo+ID4gPiBPbiBXZWQsIDQgU2Vw
IDIwMTMgMTQ6MzM6MzUgKzAwMDAgIk15a2xlYnVzdCwgVHJvbmQiDQo+ID4gPiA8VHJvbmQuTXlr
bGVidXN0QG5ldGFwcC5jb20+IHdyb3RlOg0KPiA+ID4gDQo+ID4gPiA+IE9uIFdlZCwgMjAxMy0w
OS0wNCBhdCAxMTozNiArMTAwMCwgTmVpbEJyb3duIHdyb3RlOg0KPiA+ID4gPiA+IEhpLA0KPiA+
ID4gPiA+ICB3ZSBoYXZlIGEgY3VzdG9tZXIgd2hvIGhhcyBub3RpY2VkIGEgc2lnbmlmaWNhbnQg
cmVncmVzc2lvbiBpbiBORlMNCj4gPiA+ID4gPiAgcGVyZm9ybWFuY2UgYmV0d2VlbiBTTEVTMTAg
KDIuNi4xNispIGFuZCBTTEVTMTEgKDMuMCspLg0KPiA+ID4gPiA+IA0KPiA+ID4gPiA+ICBUaGUg
dXNlIGNhc2UgaW52b2x2ZXMgdHdvIGRpZmZlcmVudCBwcm9jZXNzZXMgb24gdGhlIHNhbWUgY2xp
ZW50IG1tYXBwaW5nIGENCj4gPiA+ID4gPiAgZmlsZSBhbmQgbWFraW5nIGNoYW5nZXMuDQo+ID4g
PiA+ID4gDQo+ID4gPiA+ID4gIEluIDIuNi4xNiwgYWxsIHRoZXNlIGNoYW5nZXMgd291bGQgYmUg
Y2FjaGVkIG9uIHRoZSBjbGllbnQgdW50aWwgYSBmbHVzaCBvcg0KPiA+ID4gPiA+ICBjbG9zZSBv
ciBvdGhlciBub3JtYWwgcHJvY2VzcyBmbHVzaGVkIHRoZSBjaGFuZ2VzIG91dCAtIG11Y2ggdGhl
IHNhbWUgYXMgaWYNCj4gPiA+ID4gPiAganVzdCBvbmUgcHJvY2VzcyB3YXMgYWNjZXNzaW5nIHRo
ZSBmaWxlLg0KPiA+ID4gPiA+IA0KPiA+ID4gPiA+ICBJbiAzLjAgdGhlIGZpbGUgaXMgZmx1c2hl
ZCBvdXQgZXZlcnkgdGltZSBhIGRpZmZlcmVudCBwcm9jZXNzIHBlcmZvcm1zIGFuDQo+ID4gPiA+
ID4gIHVwZGF0ZS4gU28gaWYgdGhleSBpbnRlcmxlYXZlIGFjY2Vzc2VzLCB5b3UgZ2V0IGEgZmx1
c2gtdG8tdGhlLXNlcnZlciBvbg0KPiA+ID4gPiA+ICBldmVyeSBhY2Nlc3MuDQo+ID4gPiA+ID4g
DQo+ID4gPiA+ID4gIFRoaXMgaXMgY2F1c2VkIGJ5IG5mc19mbHVzaF9pbmNvbXBhdGlibGUuICBJ
dCBkZWVtcyBhY2Nlc3MgdGhyb3VnaCBkaWZmZXJlbnQNCj4gPiA+ID4gPiAgZmlsZSBkZXNjcmlw
dG9ycyB0byBiZSBpbmNvbXBhdGlibGUuICBIb3dldmVyIEkgZG9uJ3QgdGhpbmsgdGhpcyBpcw0K
PiA+ID4gPiA+ICBqdXN0aWZpZWQuICBBY2Nlc3NlZCB1c2luZyBkaWZmZXJlbnQgY3JlZGVudGlh
bHMgY2FuIHJlYXNvbmFibHkgYmUgc2VlbiBhcw0KPiA+ID4gPiA+ICBpbmNvbXBhdGlibGUoKiks
IGJ1dCBpZiB0d28gcHJvY2Vzc2VzIHdpdGggaWRlbnRpY2FsIGNyZWRlbnRpYWxzIG9wZW4gdGhl
DQo+ID4gPiA+ID4gIHNhbWUgZmlsZSB0aGVyZSB0aGVyZSBzaG91bGQgYmUgbm90IG5lZWQgdG8g
ZGlmZmVyZW50aWF0ZSBiZXR3ZWVuIHRoZW0uDQo+ID4gPiA+ID4gDQo+ID4gPiA+ID4gIFRoZSBw
YXRjaCBiZWxvdyBjaGFuZ2VzIHRoZSB0ZXN0IGluIG5mc19mbHVzaF9pbmNvbXBhdGlibGUgdG8g
anVzdCB0ZXN0IHRoZQ0KPiA+ID4gPiA+ICBjcmVkZW50aWFscyAoYW5kIHRoZSBkZW50cnksIHRo
b3VnaCB0aGF0IG1pZ2h0IGJlIHN1cGVyZmx1b3VzIC0gaXQgc2VlbXMNCj4gPiA+ID4gPiAgc2Fm
ZXIpLg0KPiA+ID4gPiA+IA0KPiA+ID4gPiA+ICgqKSBJbiB0aGlzIHNwZWNpZmljIGN1c3RvbWVy
J3MgY2FzZSB0aGUgcHJvY2VzcyBkbyBjdXJyZW50bHkgaGF2ZSBzbGlnaHRseQ0KPiA+ID4gPiA+
ICBkaWZmZXJlbnQgY3JlZGVudGlhbHMuICBJbiBvbmUsIHRoZSBwcmltYXJ5IGdpZCBpcyBpbmNs
dWRlZCBpbiB0aGUgbGlzdCBvZg0KPiA+ID4gPiA+ICBhdXhpbGlhcnkgZ2lkcy4gIEluIHRoZSBv
dGhlciB0aGUgcHJpbWFyeSBnaWQgaXMgZXhjbHVkZWQuICBUaGlzIGlzIGVub3VnaA0KPiA+ID4g
PiA+ICB0byBtYWtlIHRoZSBjcmVkZW50aWFscyBhcHBlYXIgZGlmZmVyZW50IHNvIHRoaXMgcGF0
Y2ggZG9lc24ndCBoZWxwIGluIHRoYXQNCj4gPiA+ID4gPiAgY2FzZS4gIFRoZXkgYXJlIGxvb2tp
bmcgYXQgYmVpbmcgbW9yZSBjb25zaXN0ZW50IGluIHRoZWlyIHNldGdyb3VwcyB1c2FnZQ0KPiA+
ID4gPiA+ICBidXQgaXQgd291bGQgaGVscCB0aGVtIGlmIHdlIG9ubHkgYWN0dWFsbHkgY29tcGFy
ZWQgdGhlIGNyZWRlbnRpYWxzIHRoYXQNCj4gPiA+ID4gPiAgd2VyZSBpbXBvcnRhbnQuICBJbiB0
aGlzIGNhc2UgdGhlcmUgaXMgbm8gaW1wb3J0YW50IGRpZmZlcmVuY2UuDQo+ID4gPiA+ID4gIFdv
dWxkIGl0IG1ha2Ugc2Vuc2UgdG8gdHJlYXQgYSBtb3JlIGJyb2FkIGNyZWRlbnRpYWwgYXMgY29t
cGF0aWJsZSB3aXRoIGENCj4gPiA+ID4gPiAgbW9yZSBuYXJyb3cgY3JlZGVudGlhbD8gIFNvIGEg
d3JpdGUgZnJvbSBhIHByb2Nlc3Mgd2l0aCBhIGdpdmVuDQo+ID4gPiA+ID4gIHVpZCxnaWQsZ3Jv
dXBsaXN0IGNvdWxkIHVsdGltYXRlbHkgYmUgc2VudCB0byB0aGUgc2VydmVyIHdpdGggc2FtZSB1
aWQvZ2lkLA0KPiA+ID4gPiA+ICBidXQgYSBzdWJzZXQgb2YgdGhlIGdyb3VwIGxpc3Q/DQo+ID4g
PiA+ID4gIA0KPiA+ID4gPiA+IA0KPiA+ID4gPiA+ICBUaG91Z2h0cz8NCj4gPiA+ID4gPiANCj4g
PiA+ID4gPiBUaGFua3MsDQo+ID4gPiA+ID4gTmVpbEJyb3duDQo+ID4gPiA+ID4gDQo+ID4gPiA+
ID4gDQo+ID4gPiA+ID4gZGlmZiAtLWdpdCBhL2ZzL25mcy93cml0ZS5jIGIvZnMvbmZzL3dyaXRl
LmMNCj4gPiA+ID4gPiBpbmRleCA3ODE2ODAxLi5iMGQ0ZTQ3IDEwMDY0NA0KPiA+ID4gPiA+IC0t
LSBhL2ZzL25mcy93cml0ZS5jDQo+ID4gPiA+ID4gKysrIGIvZnMvbmZzL3dyaXRlLmMNCj4gPiA+
ID4gPiBAQCAtODYwLDcgKzg2MCwxMiBAQCBpbnQgbmZzX2ZsdXNoX2luY29tcGF0aWJsZShzdHJ1
Y3QgZmlsZSAqZmlsZSwgc3RydWN0IHBhZ2UgKnBhZ2UpDQo+ID4gPiA+ID4gIAkJaWYgKHJlcSA9
PSBOVUxMKQ0KPiA+ID4gPiA+ICAJCQlyZXR1cm4gMDsNCj4gPiA+ID4gPiAgCQlsX2N0eCA9IHJl
cS0+d2JfbG9ja19jb250ZXh0Ow0KPiA+ID4gPiA+IC0JCWRvX2ZsdXNoID0gcmVxLT53Yl9wYWdl
ICE9IHBhZ2UgfHwgcmVxLT53Yl9jb250ZXh0ICE9IGN0eDsNCj4gPiA+ID4gPiArCQlkb19mbHVz
aCA9IHJlcS0+d2JfcGFnZSAhPSBwYWdlOw0KPiA+ID4gPiA+ICsJCWlmIChyZXEtPndiX2NvbnRl
eHQgIT0gY3R4KSB7DQo+ID4gPiA+ID4gKwkJCWRvX2ZsdXNoIHw9DQo+ID4gPiA+ID4gKwkJCQly
ZXEtPndiX2NvbnRleHQtPmRlbnRyeSAhPSBjdHgtPmRlbnRyeSB8fA0KPiA+ID4gPiA+ICsJCQkJ
cmVxLT53Yl9jb250ZXh0LT5jcmVkICE9IGN0eC0+Y3JlZDsNCj4gPiA+ID4gPiArCQl9DQo+ID4g
PiA+ID4gIAkJaWYgKGxfY3R4KSB7DQo+ID4gPiA+ID4gIAkJCWRvX2ZsdXNoIHw9IGxfY3R4LT5s
b2Nrb3duZXIubF9vd25lciAhPSBjdXJyZW50LT5maWxlcw0KPiA+ID4gPiA+ICAJCQkJfHwgbF9j
dHgtPmxvY2tvd25lci5sX3BpZCAhPSBjdXJyZW50LT50Z2lkOw0KPiA+ID4gPiANCj4gPiA+ID4g
SGkgTmVpbCwNCj4gPiA+ID4gDQo+ID4gPiA+IFdvbid0IHRoaXMgY2F1c2UgYW55IHdyaXRlIGVy
cm9ycyB0byBiZSBwcm9wYWdhdGVkIHRvIHRoZSB3cm9uZyBwcm9jZXNzPw0KPiA+ID4gDQo+ID4g
PiBJJ20gbm90IHN1cmUgd2hhdCB0aGlzIG1lYW5zLg0KPiA+ID4gDQo+ID4gPiBGb3IgYSBsb2Nh
bCBmaWxlc3lzdGVtLCB0aGUgb25seSBlcnJvciB0aGF0IGlzbid0IHN5bmNocm9ub3VzIGlzIEVJ
TyBhbmQgYW4NCj4gPiA+IGFwcGxpY2F0aW9uIGNhbiBvbmx5IGJlIHN1cmUgb2YgZ2V0dGluZyB0
aGF0IGlmIGl0IHVzZXMgJ2ZzeW5jJy4gIEFuDQo+ID4gPiBhcHBsaWNhdGlvbiBjYW4gZ2V0IEVJ
TyBmcm9tIGZzeW5jIGV2ZW4gaWYgaXQgZGlkbid0IHdyaXRlIHRvIHRoZSBwYWdlIHdoaWNoDQo+
ID4gPiBpcyBjYXVzaW5nIHRoZSBwcm9ibGVtLiAgQWxsIGFwcHMgc2hvdWxkIGJlIGFibGUgdG8g
c2VlIHRoZSBFSU8gaWYgdGhleQ0KPiA+ID4gYm90aGVyIHRvIGZzeW5jLg0KPiA+IA0KPiA+IFlv
dSBhcmUgZGVzY3JpYmluZyBhIGNvbnNlcXVlbmNlIG9mIHRoZSBjb21wbGV0ZSBhbmQgdXR0ZXIg
YnJva2VubmVzcyBvZg0KPiA+IHRoZSBibG9jayBsYXllci4gVGhlIGJsb2NrIGxheWVyIGRvZXNu
J3QgdHJhY2sgd2hvIHdyb3RlIHdoYXQgaW50byB0aGUNCj4gPiBwYWdlIGNhY2hlLCBhbmQgc28g
aXQganVzdCBwYXNzZXMgYW55IHdyaXRlIGVycm9ycyB0byB3aGljaGV2ZXIgcHJvY2Vzcw0KPiA+
IHRoYXQgZmlyc3QgY2FsbHMgZmlsZW1hcF9jaGVja19lcnJvcnMoKSBmb3IgdGhhdCBmaWxlLg0K
PiANCj4gSSB3b3VsZCBoYXZlIHNhaWQgInByYWdtYXRpYyBkZXNpZ24iIHJhdGhlciB0aGFuICJj
b21wbGV0ZSAuLi4gYnJva2VubmVzcyIuDQo+IA0KPiBJZiBhIHdyaXRlIGhhcyBmYWlsZWQgdGhl
biB0aGUgZmlsZSBpcyBjb3JydXB0IGFuZCBldmVyeSB3cml0ZXIgZGVzZXJ2ZXMgdG8NCj4ga25v
dy4NCj4gSWYgbXVsdGlwbGUgcHJvY2Vzc2VzIHRyeSB0byBmc3luYyB0aGUgc2FtZSBmaWxlLCBw
YWdlcyB0aGF0IGZhaWwgZm9yIG9uZQ0KPiBwcm9jZXNzIHdpbGwgbGlrZWx5IGZhaWwgZm9yIHRo
ZSBuZXh0IHByb2Nlc3Mgc28gZWFjaCBzaG91bGQgZ2V0IHRvbGQgb2YgdGhlDQo+IGZhaWx1cmUu
DQo+IA0KPiAoYWN0dWFsbHkuLi4gSSB3b25kZXIgaWYgSSdtIG1pc3Npbmcgc29tZXRoaW5nLiAg
SWYgeW91IGhhdmUgYSBEaXJ0eSBwYWdlDQo+IHdoaWNoIHJlc3VsdHMgaW4gYW4gRXJyb3IgZXZl
cnkgdGltZSBpdCBpcyB3cml0dGVuLCBpcyB0aGVyZSBhbnkgd2F5IGZvcg0KPiBpdCB0byBiZSBy
ZW1vdmVkIGZyb20gdGhlIHBhZ2UgY2FjaGUsIG9yIHdpbGwgaXQgcmVtYWluIHRoZXJlIGNhdXNp
bmcgZXZlcnkNCj4gJ2ZzeW5jJyBvbiB0aGUgZmlsZSB0byB0cnkgdG8gd3JpdGUgdGhlIHNhbWUg
cGFnZSBhbmQgZ2V0IGFuIGVycm9yPykNCj4gDQoNCkl0IHdpbGwgdXN1YWxseSBiZSByZW1vdmVk
LiBUaGVyZSBpcyBubyBwb2ludCBpbiBrZWVwaW5nIGEgcGFnZSB0aGF0IGlzDQpnZXR0aW5nIEVB
Q0NFUyBkdWUgdG8gc2VydmVyLXNpZGUgcGVybWlzc2lvbiBjaGVja3MsIGZvciBpbnN0YW5jZS4g
V2hhdA0Kd2UnbGwgZG8gaW4gdGhhdCBjYXNlIGlzIHRocm93IG91dCB0aGUgcGFnZSwgYW5kIHRo
ZW4gcmV0dXJuIEVBQ0NFUyB0bw0KdGhlIG5leHQgd3JpdGUoKSBieSB0aGF0IHByb2Nlc3MgYW5k
L29yIHRoZSBuZXh0IGZzeW5jKCkuDQoNCj4gPiANCj4gPiBJbiBORlMsIHdlIHVzZSB0aGUgbmZz
X29wZW5fY29udGV4dCB0byB0cmFjayBleGFjdGx5IHdobyB3cm90ZSB3aGF0LCBhbmQNCj4gPiB1
c2UgdGhhdCB0byBub3RpZnkgdGhlIHNwZWNpZmljIHByb2Nlc3MgdGhhdCB3cm90ZSB0aGUgZGF0
YSB0aGF0IGZhaWxlZC4NCj4gDQo+IFdoaWxlIHRoaXMgY291bGQgYmUgc2VlbiBhcyBhbiBpbXBy
b3ZlbWVudCwgdGhlIGV4dHJhIGd1YXJhbnRlZXMgYXJlIG9ubHkNCj4gYXZhaWxhYmxlIG9uIE5G
Uywgc28gYXBwcyBhcmUgbm90IGxpa2VseSB0byBkZXBlbmQgb24gdGhlbSwgc28gdGhlIGV4dHJh
IHZhbHVlDQo+IGRvZXNuJ3QgYWN0dWFsbHkgaGVscCBhbnlvbmUgLSBkb2VzIGl0Pw0KPiANCj4g
QW5kIGl0IGhhcyBhIGNsZWFyIGNvc3Q6IHR3byBwcm9jZXNzZXMgb24gdGhlIG9uZSBtYWNoaW5l
IGNhbiBzdWZmZXINCj4gc2lnbmlmaWNhbnRseSBkZWdyYWRlZCBwZXJmb3JtYW5jZS4NCj4gSXMg
dGhlIHRoZW9yZXRpY2FsIGltcHJvdmVtZW50IGluIHNlbWFudGljcyB3b3J0aCB0aGF0IGNvc3Qu
DQoNClRoZXNlIGFyZSB0aGUgc2VtYW50aWNzIHRoYXQgd2UndmUgZW5mb3JjZWQgZm9yIHRoZSBw
YXN0IDE1IHllYXJzLiBXZQ0KZXZlbiB1c2VkIHRvIGhhdmUgYSBkZWRpY2F0ZWQgZmlscC0+Zl9l
cnJvciBqdXN0IGZvciBORlMgdW50aWwgd2Ugd2VyZQ0KYWJsZSB0byBoaWRlIGl0IGluIHRoZSBm
aWxlc3lzdGVtLXNwZWNpZmljIG9wZW4gY29udGV4dC4NCg0KQ2hhbmdpbmcgdGhvc2Ugc2VtYW50
aWNzIG5vdyBpcyBub3Qgc29tZXRoaW5nIEknbSBwcmVwYXJlZCB0byBkbyB3aXRob3V0DQphIHZl
cnkgZ29vZCByZWFzb24uIFlvdSd2ZSBhZG1pdHRlZCB0aGF0IHlvdXIgb3duIHdvcmtsb2FkIHdv
dWxkIF9ub3RfDQpiZW5lZml0IGZyb20gdGhlIGtpbmQgb2YgY2hhbmdlIHRoYXQgeW91J3JlIGFk
dm9jYXRpbmcsIHNvIGFzIGZhciBhcyBJJ20NCmNvbmNlcm5lZCB0aGF0IHByZXR0eSBtdWNoIG1h
a2VzIHRoZSBxdWVzdGlvbiBtb290IGZvciBub3cuDQoNCj4gPiA+IEZvciBORlMsIEVEUVVPVCBh
bmQgRU5PU1BDIGFyZSBhbHNvIGFzeW5jaHJvbm91cywgYW5kIHRoZXkgY2FuIGFsc28gY29tZSBp
bg0KPiA+ID4gdGhyb3VnaCBjbG9zZSgpLiAgSSB0aGluayBldmVyeSBhcHAgc2hvdWxkIHN0aWxs
IHNlZSBldmVyeSBlcnJvciB3aGV0aGVyICBpdA0KPiA+ID4gZGlydGllZCB0aGUgcGFnZSB3aGlj
aCB0cmlnZ2VyZWQgdGhlIGVycm9yIG9yIG5vdC4NCj4gPiANCj4gPiBXaXRoIHRoaXMgcGF0Y2gs
IHdlIGNhbiBlbmQgdXAgd3JpdGluZyB0aGUgZGF0YSB1c2luZyB0aGUgd3JvbmcNCj4gPiBuZnNf
b3Blbl9jb250ZXh0LCBpbiB3aGljaCBjYXNlIHRoZSBlcnJvcnMgd2lsbCBnZXQgcmVwb3J0ZWQg
dG8gdGhlDQo+ID4gcHJvY2VzcyB0aGF0IGFjdHVhbGx5IG93bnMgdGhhdCBuZnNfb3Blbl9jb250
ZXh0Lg0KPiA+IA0KPiANCj4gSSBzZWUgeWVzLiAgSSB3b3VsZCB3YW50IHRoZSBlcnJvciB0byBi
ZSByZXR1cm5lZCB0byB3aG9ldmVyIGluc3RpZ2F0ZWQgdGhlDQo+ICdmbHVzaCcsIG5vdCB3aG9l
dmVyIGlzc3VlcyB0aGUgJ3dyaXRlJy4NCg0KVGhhdCBvbmx5IG1ha2VzIHNlbnNlIGlmIGFsbCBw
cm9jZXNzZXMgdGhhdCBjYWxsIGZzeW5jKCkgc2VlIHRoZSBzYW1lDQplcnJvcnMsIHdoaWNoIGlz
IG5vdCB0aGUgY2FzZSAoc2VlIGFib3ZlKS4NCg0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4
IE5GUyBjbGllbnQgbWFpbnRhaW5lcg0KDQpOZXRBcHANClRyb25kLk15a2xlYnVzdEBuZXRhcHAu
Y29tDQp3d3cubmV0YXBwLmNvbQ0K
On Wed, 4 Sep 2013 14:33:35 +0000 "Myklebust, Trond"
<[email protected]> wrote:
> On Wed, 2013-09-04 at 11:36 +1000, NeilBrown wrote:
> > Hi,
> > we have a customer who has noticed a significant regression in NFS
> > performance between SLES10 (2.6.16+) and SLES11 (3.0+).
> >
> > The use case involves two different processes on the same client mmapping a
> > file and making changes.
> >
> > In 2.6.16, all these changes would be cached on the client until a flush or
> > close or other normal process flushed the changes out - much the same as if
> > just one process was accessing the file.
> >
> > In 3.0 the file is flushed out every time a different process performs an
> > update. So if they interleave accesses, you get a flush-to-the-server on
> > every access.
> >
> > This is caused by nfs_flush_incompatible. It deems access through different
> > file descriptors to be incompatible. However I don't think this is
> > justified. Accessed using different credentials can reasonably be seen as
> > incompatible(*), but if two processes with identical credentials open the
> > same file there there should be not need to differentiate between them.
> >
> > The patch below changes the test in nfs_flush_incompatible to just test the
> > credentials (and the dentry, though that might be superfluous - it seems
> > safer).
> >
> > (*) In this specific customer's case the process do currently have slightly
> > different credentials. In one, the primary gid is included in the list of
> > auxiliary gids. In the other the primary gid is excluded. This is enough
> > to make the credentials appear different so this patch doesn't help in that
> > case. They are looking at being more consistent in their setgroups usage
> > but it would help them if we only actually compared the credentials that
> > were important. In this case there is no important difference.
> > Would it make sense to treat a more broad credential as compatible with a
> > more narrow credential? So a write from a process with a given
> > uid,gid,grouplist could ultimately be sent to the server with same uid/gid,
> > but a subset of the group list?
> >
> >
> > Thoughts?
> >
> > Thanks,
> > NeilBrown
> >
> >
> > diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> > index 7816801..b0d4e47 100644
> > --- a/fs/nfs/write.c
> > +++ b/fs/nfs/write.c
> > @@ -860,7 +860,12 @@ int nfs_flush_incompatible(struct file *file, struct page *page)
> > if (req == NULL)
> > return 0;
> > l_ctx = req->wb_lock_context;
> > - do_flush = req->wb_page != page || req->wb_context != ctx;
> > + do_flush = req->wb_page != page;
> > + if (req->wb_context != ctx) {
> > + do_flush |=
> > + req->wb_context->dentry != ctx->dentry ||
> > + req->wb_context->cred != ctx->cred;
> > + }
> > if (l_ctx) {
> > do_flush |= l_ctx->lockowner.l_owner != current->files
> > || l_ctx->lockowner.l_pid != current->tgid;
>
> Hi Neil,
>
> Won't this cause any write errors to be propagated to the wrong process?
I'm not sure what this means.
For a local filesystem, the only error that isn't synchronous is EIO and an
application can only be sure of getting that if it uses 'fsync'. An
application can get EIO from fsync even if it didn't write to the page which
is causing the problem. All apps should be able to see the EIO if they
bother to fsync.
For NFS, EDQUOT and ENOSPC are also asynchronous, and they can also come in
through close(). I think every app should still see every error whether it
dirtied the page which triggered the error or not.
??
Thanks,
NeilBrown