2013-06-13 18:49:34

by Malahal Naineni

[permalink] [raw]
Subject: corruption due to loss of lock

Hi Trond,

I saw Bryan's patches here https://patchwork.kernel.org/patch/987402/
that fix issues after loss of a lock. What is the status on this patch
set? Do they need more work? We have an application that uses range
locks on a file. Two threads from two different clients end up writing
to the same a file due to this bug after a lease expiry from a client.

Regards, Malahal.



2013-07-11 16:29:00

by Myklebust, Trond

[permalink] [raw]
Subject: Re: corruption due to loss of lock

T24gVGh1LCAyMDEzLTA3LTExIGF0IDExOjIwIC0wNDAwLCBKZWZmIExheXRvbiB3cm90ZToNCj4g
T24gVGh1LCAxMSBKdWwgMjAxMyAxNDozMzowMiArMDAwMA0KPiAiTXlrbGVidXN0LCBUcm9uZCIg
PFRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29tPiB3cm90ZToNCj4gDQo+ID4gT24gVGh1LCAyMDEz
LTA3LTExIGF0IDEwOjI4IC0wNDAwLCBKZWZmIExheXRvbiB3cm90ZToNCj4gPiA+IE9uIFRodSwg
MTEgSnVsIDIwMTMgMTQ6MTk6MTAgKzAwMDANCj4gPiA+ICJNeWtsZWJ1c3QsIFRyb25kIiA8VHJv
bmQuTXlrbGVidXN0QG5ldGFwcC5jb20+IHdyb3RlOg0KPiA+ID4gDQo+ID4gPiA+IE9uIFRodSwg
MjAxMy0wNy0xMSBhdCAwNzoxMyAtMDQwMCwgSmVmZiBMYXl0b24gd3JvdGU6DQo+ID4gPiA+ID4g
T24gVGh1LCAxMyBKdW4gMjAxMyAxMzo0NzozNyAtMDUwMA0KPiA+ID4gPiA+IE1hbGFoYWwgTmFp
bmVuaSA8bWFsYWhhbEB1cy5pYm0uY29tPiB3cm90ZToNCj4gPiA+ID4gPiANCj4gPiA+ID4gPiA+
IEhpIFRyb25kLA0KPiA+ID4gPiA+ID4gDQo+ID4gPiA+ID4gPiBJIHNhdyBCcnlhbidzIHBhdGNo
ZXMgaGVyZSBodHRwczovL3BhdGNod29yay5rZXJuZWwub3JnL3BhdGNoLzk4NzQwMi8NCj4gPiA+
ID4gPiA+IHRoYXQgZml4IGlzc3VlcyBhZnRlciBsb3NzIG9mIGEgbG9jay4gIFdoYXQgaXMgdGhl
IHN0YXR1cyBvbiB0aGlzIHBhdGNoDQo+ID4gPiA+ID4gPiBzZXQ/IERvIHRoZXkgbmVlZCBtb3Jl
IHdvcms/IFdlIGhhdmUgYW4gYXBwbGljYXRpb24gdGhhdCB1c2VzIHJhbmdlDQo+ID4gPiA+ID4g
PiBsb2NrcyBvbiBhIGZpbGUuIFR3byB0aHJlYWRzIGZyb20gdHdvIGRpZmZlcmVudCBjbGllbnRz
IGVuZCB1cCB3cml0aW5nDQo+ID4gPiA+ID4gPiB0byB0aGUgc2FtZSBhIGZpbGUgZHVlIHRvIHRo
aXMgYnVnIGFmdGVyIGEgbGVhc2UgZXhwaXJ5IGZyb20gYSBjbGllbnQuDQo+ID4gPiA+ID4gPiAN
Cj4gPiA+ID4gPiA+IFJlZ2FyZHMsIE1hbGFoYWwuDQo+ID4gPiA+ID4gDQo+ID4gPiA+ID4gKGNj
J2luZyBCcnlhbiBzaW5jZSBoZSBkaWQgdGhlIG9yaWdpbmFsIHNldCkNCj4gPiA+ID4gPiANCj4g
PiA+ID4gPiBZZWFoLCB0aGlzIHNldCB3b3VsZCBiZSBhIG5pY2UgdGhpbmcgdG8gaGF2ZS4gQSBj
b3VwbGUgb2YgY29tbWVudHM6DQo+ID4gPiA+ID4gDQo+ID4gPiA+ID4gLSBJIHN0aWxsIHRoaW5r
IGl0IHdvdWxkIGJlIGJlc3QgdG8gbWFrZSBTSUdMT1NUIGl0cyBvd24gc2lnbmFsLCBidXQgYXMN
Cj4gPiA+ID4gPiAgIEJyeWFuIHBvaW50cyBvdXQsIGl0IHdvdWxkIG5lZWQgdG8gYmUgbGFyZ2Vy
IHRoYW4gU0lHUlRNQVguIEknbQ0KPiA+ID4gPiA+ICAgbm90IHN1cmUgdGhhdCdzIHBvc3NpYmxl
IG9uIGFsbCBhcmNoZXMgd2l0aCB0aGUgd2F5IHRoZSBSVCBzaWduYWxzDQo+ID4gPiA+ID4gICB3
ZXJlIGRvbmUuIEl0J3MgcHJvYmFibHkgd29ydGggaW52ZXN0aWdhdGluZyB0aGF0IHRob3VnaCBi
ZWZvcmUNCj4gPiA+ID4gPiAgIHNldHRsaW5nIG9uIFNJR0lPIHNpbmNlIGl0IHdvdWxkIGJlIGhh
cmQgdG8gY2hhbmdlIHRoYXQgcmV0cm9hY3RpdmVseS4NCj4gPiA+ID4gPiANCj4gPiA+ID4gPiAt
IFRoaXMgaXMgbm90IHJlYWxseSBhIHY0LjEgc3BlY2lmaWMgdGhpbmcuIEl0IHNob3VsZCBhbHNv
IGJlIGRvbmUgZm9yDQo+ID4gPiA+ID4gICB2NC4wIGFuZCB2Mi8zLCB0aG91Z2ggdGhlIGxhdHRl
ciB0d28gcmVhbGx5IG5lZWQgdG8gYmUgZG9uZSB3aXRoaW4NCj4gPiA+ID4gPiAgIGxvY2tkLg0K
PiA+ID4gPiANCj4gPiA+ID4gU0lHTE9TVCBpcyBub3QgcGFydCBvZiBhbnkgc3RhbmRhcmQuIEl0
IGlzIGEgaGFjayB0aGF0IGhhcyBiZWVuIGFkb3B0ZWQNCj4gPiA+ID4gYnkgSUJNIGFuZCBTb2xh
cmlzLg0KPiA+ID4gPiANCj4gPiA+ID4gVGhlIFBPU0lYbHkgY29ycmVjdCB3YXkgdG8gZG8gdGhp
cyBpcyB0byB1c2UgRUJBREYgdG8gd2FybiB0aGUNCj4gPiA+ID4gYXBwbGljYXRpb24gdGhhdCB0
aGUgZmlsZSBkZXNjcmlwdG9yIGlzIG5vIGxvbmdlciB2YWxpZCAoaW4gdGhlIHNlbnNlDQo+ID4g
PiA+IHRoYXQgdGhlIHNlcnZlciBpcyBubyBsb25nZXIgaG9ub3VyaW5nIHRoZSBsb2NrKSBhbmQg
RUlPIGluIG9yZGVyIHRvDQo+ID4gPiA+IHdhcm4gaXQgdGhhdCBkYXRhIG1heSBoYXZlIGJlZW4g
bG9zdC4NCj4gPiA+ID4gDQo+ID4gPiANCj4gPiA+IEl0IGlzIGEgaGFjay4uLkkgd29uJ3QgYXJn
dWUgdGhlcmUNCj4gPiA+IA0KPiA+ID4gSSdtIG5vdCBzdXJlIHRoYXQgcmV0dXJuaW5nIGVycm9y
cyBpcyByZWFsbHkgdGhlIGJlc3QgYXBwcm9hY2ggdGhvdWdoLg0KPiA+ID4gSW4gc29tZSBjYXNl
cywgdGhlIGZkIG1heSBiZSBmaW5lLiBJdCBtYXkgb25seSBiZSB0aGUgbG9jayB0aGF0IGhhcw0K
PiA+ID4gYmVlbiBsb3N0Lg0KPiA+ID4gDQo+ID4gPiBXaXRoIGEgc2lnbmFsLCB0aGUgcHJvZ3Jh
bSBoYXMgbW9yZSBvZiBhIGNob2ljZSBhcyB0byB3aGV0aGVyIGl0IGNhcmVzDQo+ID4gPiBhYm91
dCBsb3N0IGxvY2tzIGFuZCBpcyBtb3JlIGltbWVkaWF0ZSB3aGVuIHRoZSBwcm9ibGVtIG9jY3Vy
cy4gQW4NCj4gPiA+IGVycm9yIGNvZGUgc2VlbXMgbGlrZSBpdCBtaWdodCBjYXVzZSBhIGxvdCBv
ZiBncmllZiBmb3IgcHJvZ3JhbXMgdGhhdA0KPiA+ID4gYXJlbid0IGV4cGVjdGluZyB0aGF0IHNv
cnQgb2YgYmVoYXZpb3IuDQo+ID4gDQo+ID4gRUJBREYgaXMgYSBlcnJvciB0aGF0IGhhcyBhbiBv
YnZpb3VzIG1lYW5pbmcgaW4gUE9TSVg6IHlvdSBuZWVkIHRvDQo+ID4gcmVvcGVuIHRoZSBmaWxl
IGFuZCByZS1lc3RhYmxpc2ggYW55IGxvY2tzLg0KPiANCj4gV2VsbCwgRUJBREYgbWVhbnMgIkJh
ZCBmaWxlIGRlc2NyaXB0b3IiLiBDb25zaWRlciB0aGUgdjIvMyBjYXNlIC0tIHRoZQ0KPiBmZCBt
aWdodCBzdGlsbCBiZSB1c2FibGUsIGl0J3Mgb25seSBteSBsb2NrIHRoYXQgaGFzIGJlZW4gbG9z
dC4gT25lDQo+IG1pZ2h0IGNvbnNpZGVyIHRoYXQgdG8gbWVhbiB0aGF0IHdlIHNob3VsZG4ndCB1
c2UgdGhhdCBmZCBhbnltb3JlLCBidXQNCj4gdGhhdCdzIGEgYmVoYXZpb3JhbCBjaGFuZ2UgYW55
IHdheSB5b3Ugc2xpY2UgaXQuLi4NCg0KSSBmYWlsIHRvIHNlZSBob3cgd2UgY2FuIGF2b2lkIGJl
aGF2aW91cmFsIGNoYW5nZXMgaGVyZS4gVGhlIGN1cnJlbnQNCmJlaGF2aW91ciBpcyB0byBpZ25v
cmUgdGhlIGNvbmRpdGlvbiBjb21wbGV0ZWx5LiBTSUdMT1NUIGlzIGENCmJlaGF2aW91cmFsIGNo
YW5nZSAoYSBwcmV0dHkgbWFqb3Igb25lLCB0aGF0IHJlcXVpcmVzIDEwMCUgbm9uLXBvcnRhYmxl
DQphcHBsaWNhdGlvbiBjaGFuZ2VzKS4NCg0KPiA+IEhvdyBpcyB0aGF0IG5vdCBiZXR0ZXIgdGhh
bg0KPiA+IHJlY2VpdmluZyBhIHNpZ25hbCB0aGV5IHdvbid0IGJlIGV4cGVjdGluZz8gQ29uc2lk
ZXIgdGhhdCB3ZSdkIGhhdmUgdG8NCj4gPiBvdmVybG9hZCBTSUdJTywgd2hpY2ggaGFzIGEgY29t
cGxldGVseSBkaWZmZXJlbnQgbWVhbmluZyBpbiBQT1NJWC4uLg0KPiA+IA0KPiANCj4gVGhhdCdz
IHRoZSBtYWluIHJlYXNvbiB0aGF0IEkgdGhpbmsgd2Ugd2FudCBhIG5ldyBzaWduYWwgZm9yIHRo
aXMuIFRoZQ0KPiBkZWZhdWx0IG9uIFNJR0xPU1Qgc2hvdWxkIGJlIHRvIGlnbm9yZSBpdCwgYW5k
IHRoZW4gdGhhdCB3b3VsZCBhbGxvdw0KPiBwcm9jZXNzZXMgdG8gb3B0LWluIHRvIHBheWluZyBh
dHRlbnRpb24gdG8gaXQuDQoNCkFjY29yZGluZyB0byAnbWFuIDcgc2lnbmFsJywgdGhlIG9ubHkg
ZGVmaW5lZCBiZWhhdmlvdXIgZm9yIFNJR0xPU1QgaXMNCidUZXJtJy4gVGhhdCdzIGFsc28gdGhl
IGRlZmF1bHQgb24gb3RoZXIgZXhpc3RpbmcgTkZTIGltcGxlbWVudGF0aW9ucy4NCg0KV2UgY291
bGQsIGhvd2V2ZXIgdXNlIGtpbGxfZmFzeW5jKCkgdG8gbGV0IHRoZSBsb2NrIG93bmVycyBrbm93
IHRoYXQNCnNvbWV0aGluZyBoYXMgaGFwcGVuZWQgYW5kIHRlbGwgdGhlbSB0byBwb2xsIHRoZSBm
aWxlIGRlc2NyaXB0b3IuIFRoYXQncw0KYWxzbyBub24tcG9ydGFibGUsIGJ1dCBhdCBsZWFzdCBp
dCBoYXMgdGhlIGFkdmFudGFnZSB0aGF0IGl0IGlzIGJhc2VkIG9uDQphIHN1YnNjcmlwdGlvbiBt
b2RlbC4NCg0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBjbGllbnQgbWFpbnRhaW5l
cg0KDQpOZXRBcHANClRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29tDQp3d3cubmV0YXBwLmNvbQ0K

2013-07-11 14:12:25

by Malahal Naineni

[permalink] [raw]
Subject: Re: corruption due to loss of lock

Jeff Layton [[email protected]] wrote:
> On Thu, 13 Jun 2013 13:47:37 -0500
> Malahal Naineni <[email protected]> wrote:
>
> > Hi Trond,
> >
> > I saw Bryan's patches here https://patchwork.kernel.org/patch/987402/
> > that fix issues after loss of a lock. What is the status on this patch
> > set? Do they need more work? We have an application that uses range
> > locks on a file. Two threads from two different clients end up writing
> > to the same a file due to this bug after a lease expiry from a client.
> >
> > Regards, Malahal.
>
> (cc'ing Bryan since he did the original set)
>
> Yeah, this set would be a nice thing to have. A couple of comments:
>
> - I still think it would be best to make SIGLOST its own signal, but as
> Bryan points out, it would need to be larger than SIGRTMAX. I'm
> not sure that's possible on all arches with the way the RT signals
> were done. It's probably worth investigating that though before
> settling on SIGIO since it would be hard to change that retroactively.

Our application doesn't handle SIGIO, so it was terminating due to
SIGIO. We tested without SIGIO/SIGLOST part of the patch, it stopped
sending writes to NFS server as expected but the application didn't
receive EIO because the original patch set some task fields and didn't
call rpc_exit(). I just had to modify nfs_write_prepare() to call
rpc_exit(task) rather than modifying task status and action fields
directly.

I will post my patch after some more testing.

>
> - This is not really a v4.1 specific thing. It should also be done for
> v4.0 and v2/3, though the latter two really need to be done within
> lockd.

Correct.

Regards, Malahal.


2013-07-11 14:19:13

by Myklebust, Trond

[permalink] [raw]
Subject: Re: corruption due to loss of lock

T24gVGh1LCAyMDEzLTA3LTExIGF0IDA3OjEzIC0wNDAwLCBKZWZmIExheXRvbiB3cm90ZToNCj4g
T24gVGh1LCAxMyBKdW4gMjAxMyAxMzo0NzozNyAtMDUwMA0KPiBNYWxhaGFsIE5haW5lbmkgPG1h
bGFoYWxAdXMuaWJtLmNvbT4gd3JvdGU6DQo+IA0KPiA+IEhpIFRyb25kLA0KPiA+IA0KPiA+IEkg
c2F3IEJyeWFuJ3MgcGF0Y2hlcyBoZXJlIGh0dHBzOi8vcGF0Y2h3b3JrLmtlcm5lbC5vcmcvcGF0
Y2gvOTg3NDAyLw0KPiA+IHRoYXQgZml4IGlzc3VlcyBhZnRlciBsb3NzIG9mIGEgbG9jay4gIFdo
YXQgaXMgdGhlIHN0YXR1cyBvbiB0aGlzIHBhdGNoDQo+ID4gc2V0PyBEbyB0aGV5IG5lZWQgbW9y
ZSB3b3JrPyBXZSBoYXZlIGFuIGFwcGxpY2F0aW9uIHRoYXQgdXNlcyByYW5nZQ0KPiA+IGxvY2tz
IG9uIGEgZmlsZS4gVHdvIHRocmVhZHMgZnJvbSB0d28gZGlmZmVyZW50IGNsaWVudHMgZW5kIHVw
IHdyaXRpbmcNCj4gPiB0byB0aGUgc2FtZSBhIGZpbGUgZHVlIHRvIHRoaXMgYnVnIGFmdGVyIGEg
bGVhc2UgZXhwaXJ5IGZyb20gYSBjbGllbnQuDQo+ID4gDQo+ID4gUmVnYXJkcywgTWFsYWhhbC4N
Cj4gDQo+IChjYydpbmcgQnJ5YW4gc2luY2UgaGUgZGlkIHRoZSBvcmlnaW5hbCBzZXQpDQo+IA0K
PiBZZWFoLCB0aGlzIHNldCB3b3VsZCBiZSBhIG5pY2UgdGhpbmcgdG8gaGF2ZS4gQSBjb3VwbGUg
b2YgY29tbWVudHM6DQo+IA0KPiAtIEkgc3RpbGwgdGhpbmsgaXQgd291bGQgYmUgYmVzdCB0byBt
YWtlIFNJR0xPU1QgaXRzIG93biBzaWduYWwsIGJ1dCBhcw0KPiAgIEJyeWFuIHBvaW50cyBvdXQs
IGl0IHdvdWxkIG5lZWQgdG8gYmUgbGFyZ2VyIHRoYW4gU0lHUlRNQVguIEknbQ0KPiAgIG5vdCBz
dXJlIHRoYXQncyBwb3NzaWJsZSBvbiBhbGwgYXJjaGVzIHdpdGggdGhlIHdheSB0aGUgUlQgc2ln
bmFscw0KPiAgIHdlcmUgZG9uZS4gSXQncyBwcm9iYWJseSB3b3J0aCBpbnZlc3RpZ2F0aW5nIHRo
YXQgdGhvdWdoIGJlZm9yZQ0KPiAgIHNldHRsaW5nIG9uIFNJR0lPIHNpbmNlIGl0IHdvdWxkIGJl
IGhhcmQgdG8gY2hhbmdlIHRoYXQgcmV0cm9hY3RpdmVseS4NCj4gDQo+IC0gVGhpcyBpcyBub3Qg
cmVhbGx5IGEgdjQuMSBzcGVjaWZpYyB0aGluZy4gSXQgc2hvdWxkIGFsc28gYmUgZG9uZSBmb3IN
Cj4gICB2NC4wIGFuZCB2Mi8zLCB0aG91Z2ggdGhlIGxhdHRlciB0d28gcmVhbGx5IG5lZWQgdG8g
YmUgZG9uZSB3aXRoaW4NCj4gICBsb2NrZC4NCg0KU0lHTE9TVCBpcyBub3QgcGFydCBvZiBhbnkg
c3RhbmRhcmQuIEl0IGlzIGEgaGFjayB0aGF0IGhhcyBiZWVuIGFkb3B0ZWQNCmJ5IElCTSBhbmQg
U29sYXJpcy4NCg0KVGhlIFBPU0lYbHkgY29ycmVjdCB3YXkgdG8gZG8gdGhpcyBpcyB0byB1c2Ug
RUJBREYgdG8gd2FybiB0aGUNCmFwcGxpY2F0aW9uIHRoYXQgdGhlIGZpbGUgZGVzY3JpcHRvciBp
cyBubyBsb25nZXIgdmFsaWQgKGluIHRoZSBzZW5zZQ0KdGhhdCB0aGUgc2VydmVyIGlzIG5vIGxv
bmdlciBob25vdXJpbmcgdGhlIGxvY2spIGFuZCBFSU8gaW4gb3JkZXIgdG8NCndhcm4gaXQgdGhh
dCBkYXRhIG1heSBoYXZlIGJlZW4gbG9zdC4NCg0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4
IE5GUyBjbGllbnQgbWFpbnRhaW5lcg0KDQpOZXRBcHANClRyb25kLk15a2xlYnVzdEBuZXRhcHAu
Y29tDQp3d3cubmV0YXBwLmNvbQ0K

2013-07-11 15:40:06

by Malahal Naineni

[permalink] [raw]
Subject: Re: corruption due to loss of lock

Jeff Layton [[email protected]] wrote:
> On Thu, 11 Jul 2013 14:33:02 +0000
> "Myklebust, Trond" <[email protected]> wrote:
>
> > On Thu, 2013-07-11 at 10:28 -0400, Jeff Layton wrote:
> > > On Thu, 11 Jul 2013 14:19:10 +0000
> > > "Myklebust, Trond" <[email protected]> wrote:
> > >
> > > > On Thu, 2013-07-11 at 07:13 -0400, Jeff Layton wrote:
> > > > > On Thu, 13 Jun 2013 13:47:37 -0500
> > > > > Malahal Naineni <[email protected]> wrote:
> > > > >
> > > > > > Hi Trond,
> > > > > >
> > > > > > I saw Bryan's patches here https://patchwork.kernel.org/patch/987402/
> > > > > > that fix issues after loss of a lock. What is the status on this patch
> > > > > > set? Do they need more work? We have an application that uses range
> > > > > > locks on a file. Two threads from two different clients end up writing
> > > > > > to the same a file due to this bug after a lease expiry from a client.
> > > > > >
> > > > > > Regards, Malahal.
> > > > >
> > > > > (cc'ing Bryan since he did the original set)
> > > > >
> > > > > Yeah, this set would be a nice thing to have. A couple of comments:
> > > > >
> > > > > - I still think it would be best to make SIGLOST its own signal, but as
> > > > > Bryan points out, it would need to be larger than SIGRTMAX. I'm
> > > > > not sure that's possible on all arches with the way the RT signals
> > > > > were done. It's probably worth investigating that though before
> > > > > settling on SIGIO since it would be hard to change that retroactively.
> > > > >
> > > > > - This is not really a v4.1 specific thing. It should also be done for
> > > > > v4.0 and v2/3, though the latter two really need to be done within
> > > > > lockd.
> > > >
> > > > SIGLOST is not part of any standard. It is a hack that has been adopted
> > > > by IBM and Solaris.
> > > >
> > > > The POSIXly correct way to do this is to use EBADF to warn the
> > > > application that the file descriptor is no longer valid (in the sense
> > > > that the server is no longer honouring the lock) and EIO in order to
> > > > warn it that data may have been lost.
> > > >
> > >
> > > It is a hack...I won't argue there
> > >
> > > I'm not sure that returning errors is really the best approach though.
> > > In some cases, the fd may be fine. It may only be the lock that has
> > > been lost.
> > >
> > > With a signal, the program has more of a choice as to whether it cares
> > > about lost locks and is more immediate when the problem occurs. An
> > > error code seems like it might cause a lot of grief for programs that
> > > aren't expecting that sort of behavior.
> >
> > EBADF is a error that has an obvious meaning in POSIX: you need to
> > reopen the file and re-establish any locks.
>
> Well, EBADF means "Bad file descriptor". Consider the v2/3 case -- the
> fd might still be usable, it's only my lock that has been lost. One
> might consider that to mean that we shouldn't use that fd anymore, but
> that's a behavioral change any way you slice it...
>
> > How is that not better than
> > receiving a signal they won't be expecting? Consider that we'd have to
> > overload SIGIO, which has a completely different meaning in POSIX...
> >
>
> That's the main reason that I think we want a new signal for this. The
> default on SIGLOST should be to ignore it, and then that would allow
> processes to opt-in to paying attention to it.

We should split that patchset into two.

1. we should return EBADF/EIO (debatable which one) for operations that
require lock after loss of lock.
2. sending a signal (SIGIO/SIGLOST).

The first one is critical to avoid corruption, and second one is needed
for graceful recovery.

Regards, Malahal.


2013-07-11 14:28:49

by Jeff Layton

[permalink] [raw]
Subject: Re: corruption due to loss of lock

On Thu, 11 Jul 2013 14:19:10 +0000
"Myklebust, Trond" <[email protected]> wrote:

> On Thu, 2013-07-11 at 07:13 -0400, Jeff Layton wrote:
> > On Thu, 13 Jun 2013 13:47:37 -0500
> > Malahal Naineni <[email protected]> wrote:
> >
> > > Hi Trond,
> > >
> > > I saw Bryan's patches here https://patchwork.kernel.org/patch/987402/
> > > that fix issues after loss of a lock. What is the status on this patch
> > > set? Do they need more work? We have an application that uses range
> > > locks on a file. Two threads from two different clients end up writing
> > > to the same a file due to this bug after a lease expiry from a client.
> > >
> > > Regards, Malahal.
> >
> > (cc'ing Bryan since he did the original set)
> >
> > Yeah, this set would be a nice thing to have. A couple of comments:
> >
> > - I still think it would be best to make SIGLOST its own signal, but as
> > Bryan points out, it would need to be larger than SIGRTMAX. I'm
> > not sure that's possible on all arches with the way the RT signals
> > were done. It's probably worth investigating that though before
> > settling on SIGIO since it would be hard to change that retroactively.
> >
> > - This is not really a v4.1 specific thing. It should also be done for
> > v4.0 and v2/3, though the latter two really need to be done within
> > lockd.
>
> SIGLOST is not part of any standard. It is a hack that has been adopted
> by IBM and Solaris.
>
> The POSIXly correct way to do this is to use EBADF to warn the
> application that the file descriptor is no longer valid (in the sense
> that the server is no longer honouring the lock) and EIO in order to
> warn it that data may have been lost.
>

It is a hack...I won't argue there

I'm not sure that returning errors is really the best approach though.
In some cases, the fd may be fine. It may only be the lock that has
been lost.

With a signal, the program has more of a choice as to whether it cares
about lost locks and is more immediate when the problem occurs. An
error code seems like it might cause a lot of grief for programs that
aren't expecting that sort of behavior.

--
Jeff Layton <[email protected]>

2013-07-11 15:20:42

by Jeff Layton

[permalink] [raw]
Subject: Re: corruption due to loss of lock

On Thu, 11 Jul 2013 14:33:02 +0000
"Myklebust, Trond" <[email protected]> wrote:

> On Thu, 2013-07-11 at 10:28 -0400, Jeff Layton wrote:
> > On Thu, 11 Jul 2013 14:19:10 +0000
> > "Myklebust, Trond" <[email protected]> wrote:
> >
> > > On Thu, 2013-07-11 at 07:13 -0400, Jeff Layton wrote:
> > > > On Thu, 13 Jun 2013 13:47:37 -0500
> > > > Malahal Naineni <[email protected]> wrote:
> > > >
> > > > > Hi Trond,
> > > > >
> > > > > I saw Bryan's patches here https://patchwork.kernel.org/patch/987402/
> > > > > that fix issues after loss of a lock. What is the status on this patch
> > > > > set? Do they need more work? We have an application that uses range
> > > > > locks on a file. Two threads from two different clients end up writing
> > > > > to the same a file due to this bug after a lease expiry from a client.
> > > > >
> > > > > Regards, Malahal.
> > > >
> > > > (cc'ing Bryan since he did the original set)
> > > >
> > > > Yeah, this set would be a nice thing to have. A couple of comments:
> > > >
> > > > - I still think it would be best to make SIGLOST its own signal, but as
> > > > Bryan points out, it would need to be larger than SIGRTMAX. I'm
> > > > not sure that's possible on all arches with the way the RT signals
> > > > were done. It's probably worth investigating that though before
> > > > settling on SIGIO since it would be hard to change that retroactively.
> > > >
> > > > - This is not really a v4.1 specific thing. It should also be done for
> > > > v4.0 and v2/3, though the latter two really need to be done within
> > > > lockd.
> > >
> > > SIGLOST is not part of any standard. It is a hack that has been adopted
> > > by IBM and Solaris.
> > >
> > > The POSIXly correct way to do this is to use EBADF to warn the
> > > application that the file descriptor is no longer valid (in the sense
> > > that the server is no longer honouring the lock) and EIO in order to
> > > warn it that data may have been lost.
> > >
> >
> > It is a hack...I won't argue there
> >
> > I'm not sure that returning errors is really the best approach though.
> > In some cases, the fd may be fine. It may only be the lock that has
> > been lost.
> >
> > With a signal, the program has more of a choice as to whether it cares
> > about lost locks and is more immediate when the problem occurs. An
> > error code seems like it might cause a lot of grief for programs that
> > aren't expecting that sort of behavior.
>
> EBADF is a error that has an obvious meaning in POSIX: you need to
> reopen the file and re-establish any locks.

Well, EBADF means "Bad file descriptor". Consider the v2/3 case -- the
fd might still be usable, it's only my lock that has been lost. One
might consider that to mean that we shouldn't use that fd anymore, but
that's a behavioral change any way you slice it...

> How is that not better than
> receiving a signal they won't be expecting? Consider that we'd have to
> overload SIGIO, which has a completely different meaning in POSIX...
>

That's the main reason that I think we want a new signal for this. The
default on SIGLOST should be to ignore it, and then that would allow
processes to opt-in to paying attention to it.

--
Jeff Layton <[email protected]>

2013-07-11 11:13:28

by Jeff Layton

[permalink] [raw]
Subject: Re: corruption due to loss of lock

On Thu, 13 Jun 2013 13:47:37 -0500
Malahal Naineni <[email protected]> wrote:

> Hi Trond,
>
> I saw Bryan's patches here https://patchwork.kernel.org/patch/987402/
> that fix issues after loss of a lock. What is the status on this patch
> set? Do they need more work? We have an application that uses range
> locks on a file. Two threads from two different clients end up writing
> to the same a file due to this bug after a lease expiry from a client.
>
> Regards, Malahal.

(cc'ing Bryan since he did the original set)

Yeah, this set would be a nice thing to have. A couple of comments:

- I still think it would be best to make SIGLOST its own signal, but as
Bryan points out, it would need to be larger than SIGRTMAX. I'm
not sure that's possible on all arches with the way the RT signals
were done. It's probably worth investigating that though before
settling on SIGIO since it would be hard to change that retroactively.

- This is not really a v4.1 specific thing. It should also be done for
v4.0 and v2/3, though the latter two really need to be done within
lockd.
--
Jeff Layton <[email protected]>

2013-07-11 14:33:04

by Myklebust, Trond

[permalink] [raw]
Subject: Re: corruption due to loss of lock

T24gVGh1LCAyMDEzLTA3LTExIGF0IDEwOjI4IC0wNDAwLCBKZWZmIExheXRvbiB3cm90ZToNCj4g
T24gVGh1LCAxMSBKdWwgMjAxMyAxNDoxOToxMCArMDAwMA0KPiAiTXlrbGVidXN0LCBUcm9uZCIg
PFRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29tPiB3cm90ZToNCj4gDQo+ID4gT24gVGh1LCAyMDEz
LTA3LTExIGF0IDA3OjEzIC0wNDAwLCBKZWZmIExheXRvbiB3cm90ZToNCj4gPiA+IE9uIFRodSwg
MTMgSnVuIDIwMTMgMTM6NDc6MzcgLTA1MDANCj4gPiA+IE1hbGFoYWwgTmFpbmVuaSA8bWFsYWhh
bEB1cy5pYm0uY29tPiB3cm90ZToNCj4gPiA+IA0KPiA+ID4gPiBIaSBUcm9uZCwNCj4gPiA+ID4g
DQo+ID4gPiA+IEkgc2F3IEJyeWFuJ3MgcGF0Y2hlcyBoZXJlIGh0dHBzOi8vcGF0Y2h3b3JrLmtl
cm5lbC5vcmcvcGF0Y2gvOTg3NDAyLw0KPiA+ID4gPiB0aGF0IGZpeCBpc3N1ZXMgYWZ0ZXIgbG9z
cyBvZiBhIGxvY2suICBXaGF0IGlzIHRoZSBzdGF0dXMgb24gdGhpcyBwYXRjaA0KPiA+ID4gPiBz
ZXQ/IERvIHRoZXkgbmVlZCBtb3JlIHdvcms/IFdlIGhhdmUgYW4gYXBwbGljYXRpb24gdGhhdCB1
c2VzIHJhbmdlDQo+ID4gPiA+IGxvY2tzIG9uIGEgZmlsZS4gVHdvIHRocmVhZHMgZnJvbSB0d28g
ZGlmZmVyZW50IGNsaWVudHMgZW5kIHVwIHdyaXRpbmcNCj4gPiA+ID4gdG8gdGhlIHNhbWUgYSBm
aWxlIGR1ZSB0byB0aGlzIGJ1ZyBhZnRlciBhIGxlYXNlIGV4cGlyeSBmcm9tIGEgY2xpZW50Lg0K
PiA+ID4gPiANCj4gPiA+ID4gUmVnYXJkcywgTWFsYWhhbC4NCj4gPiA+IA0KPiA+ID4gKGNjJ2lu
ZyBCcnlhbiBzaW5jZSBoZSBkaWQgdGhlIG9yaWdpbmFsIHNldCkNCj4gPiA+IA0KPiA+ID4gWWVh
aCwgdGhpcyBzZXQgd291bGQgYmUgYSBuaWNlIHRoaW5nIHRvIGhhdmUuIEEgY291cGxlIG9mIGNv
bW1lbnRzOg0KPiA+ID4gDQo+ID4gPiAtIEkgc3RpbGwgdGhpbmsgaXQgd291bGQgYmUgYmVzdCB0
byBtYWtlIFNJR0xPU1QgaXRzIG93biBzaWduYWwsIGJ1dCBhcw0KPiA+ID4gICBCcnlhbiBwb2lu
dHMgb3V0LCBpdCB3b3VsZCBuZWVkIHRvIGJlIGxhcmdlciB0aGFuIFNJR1JUTUFYLiBJJ20NCj4g
PiA+ICAgbm90IHN1cmUgdGhhdCdzIHBvc3NpYmxlIG9uIGFsbCBhcmNoZXMgd2l0aCB0aGUgd2F5
IHRoZSBSVCBzaWduYWxzDQo+ID4gPiAgIHdlcmUgZG9uZS4gSXQncyBwcm9iYWJseSB3b3J0aCBp
bnZlc3RpZ2F0aW5nIHRoYXQgdGhvdWdoIGJlZm9yZQ0KPiA+ID4gICBzZXR0bGluZyBvbiBTSUdJ
TyBzaW5jZSBpdCB3b3VsZCBiZSBoYXJkIHRvIGNoYW5nZSB0aGF0IHJldHJvYWN0aXZlbHkuDQo+
ID4gPiANCj4gPiA+IC0gVGhpcyBpcyBub3QgcmVhbGx5IGEgdjQuMSBzcGVjaWZpYyB0aGluZy4g
SXQgc2hvdWxkIGFsc28gYmUgZG9uZSBmb3INCj4gPiA+ICAgdjQuMCBhbmQgdjIvMywgdGhvdWdo
IHRoZSBsYXR0ZXIgdHdvIHJlYWxseSBuZWVkIHRvIGJlIGRvbmUgd2l0aGluDQo+ID4gPiAgIGxv
Y2tkLg0KPiA+IA0KPiA+IFNJR0xPU1QgaXMgbm90IHBhcnQgb2YgYW55IHN0YW5kYXJkLiBJdCBp
cyBhIGhhY2sgdGhhdCBoYXMgYmVlbiBhZG9wdGVkDQo+ID4gYnkgSUJNIGFuZCBTb2xhcmlzLg0K
PiA+IA0KPiA+IFRoZSBQT1NJWGx5IGNvcnJlY3Qgd2F5IHRvIGRvIHRoaXMgaXMgdG8gdXNlIEVC
QURGIHRvIHdhcm4gdGhlDQo+ID4gYXBwbGljYXRpb24gdGhhdCB0aGUgZmlsZSBkZXNjcmlwdG9y
IGlzIG5vIGxvbmdlciB2YWxpZCAoaW4gdGhlIHNlbnNlDQo+ID4gdGhhdCB0aGUgc2VydmVyIGlz
IG5vIGxvbmdlciBob25vdXJpbmcgdGhlIGxvY2spIGFuZCBFSU8gaW4gb3JkZXIgdG8NCj4gPiB3
YXJuIGl0IHRoYXQgZGF0YSBtYXkgaGF2ZSBiZWVuIGxvc3QuDQo+ID4gDQo+IA0KPiBJdCBpcyBh
IGhhY2suLi5JIHdvbid0IGFyZ3VlIHRoZXJlDQo+IA0KPiBJJ20gbm90IHN1cmUgdGhhdCByZXR1
cm5pbmcgZXJyb3JzIGlzIHJlYWxseSB0aGUgYmVzdCBhcHByb2FjaCB0aG91Z2guDQo+IEluIHNv
bWUgY2FzZXMsIHRoZSBmZCBtYXkgYmUgZmluZS4gSXQgbWF5IG9ubHkgYmUgdGhlIGxvY2sgdGhh
dCBoYXMNCj4gYmVlbiBsb3N0Lg0KPiANCj4gV2l0aCBhIHNpZ25hbCwgdGhlIHByb2dyYW0gaGFz
IG1vcmUgb2YgYSBjaG9pY2UgYXMgdG8gd2hldGhlciBpdCBjYXJlcw0KPiBhYm91dCBsb3N0IGxv
Y2tzIGFuZCBpcyBtb3JlIGltbWVkaWF0ZSB3aGVuIHRoZSBwcm9ibGVtIG9jY3Vycy4gQW4NCj4g
ZXJyb3IgY29kZSBzZWVtcyBsaWtlIGl0IG1pZ2h0IGNhdXNlIGEgbG90IG9mIGdyaWVmIGZvciBw
cm9ncmFtcyB0aGF0DQo+IGFyZW4ndCBleHBlY3RpbmcgdGhhdCBzb3J0IG9mIGJlaGF2aW9yLg0K
DQpFQkFERiBpcyBhIGVycm9yIHRoYXQgaGFzIGFuIG9idmlvdXMgbWVhbmluZyBpbiBQT1NJWDog
eW91IG5lZWQgdG8NCnJlb3BlbiB0aGUgZmlsZSBhbmQgcmUtZXN0YWJsaXNoIGFueSBsb2Nrcy4g
SG93IGlzIHRoYXQgbm90IGJldHRlciB0aGFuDQpyZWNlaXZpbmcgYSBzaWduYWwgdGhleSB3b24n
dCBiZSBleHBlY3Rpbmc/IENvbnNpZGVyIHRoYXQgd2UnZCBoYXZlIHRvDQpvdmVybG9hZCBTSUdJ
Tywgd2hpY2ggaGFzIGEgY29tcGxldGVseSBkaWZmZXJlbnQgbWVhbmluZyBpbiBQT1NJWC4uLg0K
DQotLSANClRyb25kIE15a2xlYnVzdA0KTGludXggTkZTIGNsaWVudCBtYWludGFpbmVyDQoNCk5l
dEFwcA0KVHJvbmQuTXlrbGVidXN0QG5ldGFwcC5jb20NCnd3dy5uZXRhcHAuY29tDQo=