I was playing around with the in-kernel flexfiles server today, and I
seem to be hitting a deadlock when using it on an XFS-exported
filesystem. Here's the stack trace of how the CB_LAYOUTRECALL occurs:
[ 928.736139] CPU: 0 PID: 846 Comm: nfsd Tainted: G OE 4.8.0-rc1+ #3
[ 928.737040] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014
[ 928.738009] 0000000000000286 000000006125f50e ffff91153845b878 ffffffff8f463853
[ 928.738906] ffff91152ec194d0 ffff91152d31d9c0 ffff91153845b8a8 ffffffffc045936f
[ 928.739788] ffff91152c051980 ffff91152d31d9c0 ffff91152c051540 ffff9115361b8a58
[ 928.740697] Call Trace:
[ 928.740998] [<ffffffff8f463853>] dump_stack+0x86/0xc3
[ 928.741570] [<ffffffffc045936f>] nfsd4_recall_file_layout+0x17f/0x190 [nfsd]
[ 928.742380] [<ffffffffc045939d>] nfsd4_layout_lm_break+0x1d/0x30 [nfsd]
[ 928.743115] [<ffffffff8f3056d8>] __break_lease+0x118/0x6a0
[ 928.743759] [<ffffffffc02dea69>] xfs_break_layouts+0x79/0x120 [xfs]
[ 928.744462] [<ffffffffc029ea04>] xfs_file_aio_write_checks+0x94/0x1f0 [xfs]
[ 928.745251] [<ffffffffc029f36b>] xfs_file_buffered_aio_write+0x7b/0x330 [xfs]
[ 928.746063] [<ffffffffc029f70c>] xfs_file_write_iter+0xec/0x140 [xfs]
[ 928.746803] [<ffffffff8f2a0599>] do_iter_readv_writev+0xb9/0x140
[ 928.747478] [<ffffffff8f2a126b>] do_readv_writev+0x19b/0x240
[ 928.748146] [<ffffffffc029f620>] ? xfs_file_buffered_aio_write+0x330/0x330 [xfs]
[ 928.748956] [<ffffffff8f29e02b>] ? do_dentry_open+0x28b/0x310
[ 928.749614] [<ffffffffc029c800>] ? xfs_extent_busy_ag_cmp+0x20/0x20 [xfs]
[ 928.750367] [<ffffffff8f2a156f>] vfs_writev+0x3f/0x50
[ 928.750934] [<ffffffffc04276ca>] nfsd_vfs_write+0xca/0x3a0 [nfsd]
[ 928.751608] [<ffffffffc0429ec5>] nfsd_write+0x485/0x780 [nfsd]
[ 928.752263] [<ffffffffc043144c>] nfsd3_proc_write+0xbc/0x150 [nfsd]
[ 928.752973] [<ffffffffc0421388>] nfsd_dispatch+0xb8/0x1f0 [nfsd]
[ 928.753642] [<ffffffffc036d78f>] svc_process_common+0x42f/0x690 [sunrpc]
[ 928.754395] [<ffffffffc036e8e8>] svc_process+0x118/0x330 [sunrpc]
[ 928.755080] [<ffffffffc04208ac>] nfsd+0x19c/0x2b0 [nfsd]
[ 928.755681] [<ffffffffc0420715>] ? nfsd+0x5/0x2b0 [nfsd]
[ 928.756274] [<ffffffffc0420710>] ? nfsd_destroy+0x190/0x190 [nfsd]
[ 928.756991] [<ffffffff8f0d5891>] kthread+0x101/0x120
[ 928.757563] [<ffffffff8f10dcc5>] ? trace_hardirqs_on_caller+0xf5/0x1b0
[ 928.758282] [<ffffffff8f8f2fef>] ret_from_fork+0x1f/0x40
[ 928.758875] [<ffffffff8f0d5790>] ? kthread_create_on_node+0x250/0x250
So the client gets a flexfiles layout, and then tries to issue a v3
WRITE against the file. XFS then recalls the layout, but the client
can't return the layout until the v3 WRITE completes. Eventually this
should resolve itself after 2 lease periods, but that's quite a long
time.
I guess XFS requires recalling block and SCSI layouts when the server
wants to issue a write (or someone writes to it locally), but that
seems like it shouldn't be happening when the layout is a flexfiles
layout.
Any thoughts on what the right fix is here?
On a related note, knfsd will spam the heck out of the client with
CB_LAYOUTRECALLs during this time. I think we ought to consider fixing
the server not to treat an NFS_OK return from the client like
NFS4ERR_DELAY there, but that would mean a different mechanism for
timing out a CB_LAYOUTRECALL.
--
Jeff Layton <[email protected]>
DQo+IE9uIEF1ZyAxMSwgMjAxNiwgYXQgMTE6MjMsIEplZmYgTGF5dG9uIDxqbGF5dG9uQHJlZGhh
dC5jb20+IHdyb3RlOg0KPiANCj4gSSB3YXMgcGxheWluZyBhcm91bmQgd2l0aCB0aGUgaW4ta2Vy
bmVsIGZsZXhmaWxlcyBzZXJ2ZXIgdG9kYXksIGFuZCBJDQo+IHNlZW0gdG8gYmUgaGl0dGluZyBh
IGRlYWRsb2NrIHdoZW4gdXNpbmcgaXQgb24gYW4gWEZTLWV4cG9ydGVkDQo+IGZpbGVzeXN0ZW0u
IEhlcmUncyB0aGUgc3RhY2sgdHJhY2Ugb2YgaG93IHRoZSBDQl9MQVlPVVRSRUNBTEwgb2NjdXJz
Og0KPiANCj4gWyAgOTI4LjczNjEzOV0gQ1BVOiAwIFBJRDogODQ2IENvbW06IG5mc2QgVGFpbnRl
ZDogRyAgICAgICAgICAgT0UgICA0LjguMC1yYzErICMzDQo+IFsgIDkyOC43MzcwNDBdIEhhcmR3
YXJlIG5hbWU6IFFFTVUgU3RhbmRhcmQgUEMgKGk0NDBGWCArIFBJSVgsIDE5OTYpLCBCSU9TIDEu
OS4xLTEuZmMyNCAwNC8wMS8yMDE0DQo+IFsgIDkyOC43MzgwMDldICAwMDAwMDAwMDAwMDAwMjg2
IDAwMDAwMDAwNjEyNWY1MGUgZmZmZjkxMTUzODQ1Yjg3OCBmZmZmZmZmZjhmNDYzODUzDQo+IFsg
IDkyOC43Mzg5MDZdICBmZmZmOTExNTJlYzE5NGQwIGZmZmY5MTE1MmQzMWQ5YzAgZmZmZjkxMTUz
ODQ1YjhhOCBmZmZmZmZmZmMwNDU5MzZmDQo+IFsgIDkyOC43Mzk3ODhdICBmZmZmOTExNTJjMDUx
OTgwIGZmZmY5MTE1MmQzMWQ5YzAgZmZmZjkxMTUyYzA1MTU0MCBmZmZmOTExNTM2MWI4YTU4DQo+
IFsgIDkyOC43NDA2OTddIENhbGwgVHJhY2U6DQo+IFsgIDkyOC43NDA5OThdICBbPGZmZmZmZmZm
OGY0NjM4NTM+XSBkdW1wX3N0YWNrKzB4ODYvMHhjMw0KPiBbICA5MjguNzQxNTcwXSAgWzxmZmZm
ZmZmZmMwNDU5MzZmPl0gbmZzZDRfcmVjYWxsX2ZpbGVfbGF5b3V0KzB4MTdmLzB4MTkwIFtuZnNk
XQ0KPiBbICA5MjguNzQyMzgwXSAgWzxmZmZmZmZmZmMwNDU5MzlkPl0gbmZzZDRfbGF5b3V0X2xt
X2JyZWFrKzB4MWQvMHgzMCBbbmZzZF0NCj4gWyAgOTI4Ljc0MzExNV0gIFs8ZmZmZmZmZmY4ZjMw
NTZkOD5dIF9fYnJlYWtfbGVhc2UrMHgxMTgvMHg2YTANCj4gWyAgOTI4Ljc0Mzc1OV0gIFs8ZmZm
ZmZmZmZjMDJkZWE2OT5dIHhmc19icmVha19sYXlvdXRzKzB4NzkvMHgxMjAgW3hmc10NCj4gWyAg
OTI4Ljc0NDQ2Ml0gIFs8ZmZmZmZmZmZjMDI5ZWEwND5dIHhmc19maWxlX2Fpb193cml0ZV9jaGVj
a3MrMHg5NC8weDFmMCBbeGZzXQ0KPiBbICA5MjguNzQ1MjUxXSAgWzxmZmZmZmZmZmMwMjlmMzZi
Pl0geGZzX2ZpbGVfYnVmZmVyZWRfYWlvX3dyaXRlKzB4N2IvMHgzMzAgW3hmc10NCj4gWyAgOTI4
Ljc0NjA2M10gIFs8ZmZmZmZmZmZjMDI5ZjcwYz5dIHhmc19maWxlX3dyaXRlX2l0ZXIrMHhlYy8w
eDE0MCBbeGZzXQ0KPiBbICA5MjguNzQ2ODAzXSAgWzxmZmZmZmZmZjhmMmEwNTk5Pl0gZG9faXRl
cl9yZWFkdl93cml0ZXYrMHhiOS8weDE0MA0KPiBbICA5MjguNzQ3NDc4XSAgWzxmZmZmZmZmZjhm
MmExMjZiPl0gZG9fcmVhZHZfd3JpdGV2KzB4MTliLzB4MjQwDQo+IFsgIDkyOC43NDgxNDZdICBb
PGZmZmZmZmZmYzAyOWY2MjA+XSA/IHhmc19maWxlX2J1ZmZlcmVkX2Fpb193cml0ZSsweDMzMC8w
eDMzMCBbeGZzXQ0KPiBbICA5MjguNzQ4OTU2XSAgWzxmZmZmZmZmZjhmMjllMDJiPl0gPyBkb19k
ZW50cnlfb3BlbisweDI4Yi8weDMxMA0KPiBbICA5MjguNzQ5NjE0XSAgWzxmZmZmZmZmZmMwMjlj
ODAwPl0gPyB4ZnNfZXh0ZW50X2J1c3lfYWdfY21wKzB4MjAvMHgyMCBbeGZzXQ0KPiBbICA5Mjgu
NzUwMzY3XSAgWzxmZmZmZmZmZjhmMmExNTZmPl0gdmZzX3dyaXRldisweDNmLzB4NTANCj4gWyAg
OTI4Ljc1MDkzNF0gIFs8ZmZmZmZmZmZjMDQyNzZjYT5dIG5mc2RfdmZzX3dyaXRlKzB4Y2EvMHgz
YTAgW25mc2RdDQo+IFsgIDkyOC43NTE2MDhdICBbPGZmZmZmZmZmYzA0MjllYzU+XSBuZnNkX3dy
aXRlKzB4NDg1LzB4NzgwIFtuZnNkXQ0KPiBbICA5MjguNzUyMjYzXSAgWzxmZmZmZmZmZmMwNDMx
NDRjPl0gbmZzZDNfcHJvY193cml0ZSsweGJjLzB4MTUwIFtuZnNkXQ0KPiBbICA5MjguNzUyOTcz
XSAgWzxmZmZmZmZmZmMwNDIxMzg4Pl0gbmZzZF9kaXNwYXRjaCsweGI4LzB4MWYwIFtuZnNkXQ0K
PiBbICA5MjguNzUzNjQyXSAgWzxmZmZmZmZmZmMwMzZkNzhmPl0gc3ZjX3Byb2Nlc3NfY29tbW9u
KzB4NDJmLzB4NjkwIFtzdW5ycGNdDQo+IFsgIDkyOC43NTQzOTVdICBbPGZmZmZmZmZmYzAzNmU4
ZTg+XSBzdmNfcHJvY2VzcysweDExOC8weDMzMCBbc3VucnBjXQ0KPiBbICA5MjguNzU1MDgwXSAg
WzxmZmZmZmZmZmMwNDIwOGFjPl0gbmZzZCsweDE5Yy8weDJiMCBbbmZzZF0NCj4gWyAgOTI4Ljc1
NTY4MV0gIFs8ZmZmZmZmZmZjMDQyMDcxNT5dID8gbmZzZCsweDUvMHgyYjAgW25mc2RdDQo+IFsg
IDkyOC43NTYyNzRdICBbPGZmZmZmZmZmYzA0MjA3MTA+XSA/IG5mc2RfZGVzdHJveSsweDE5MC8w
eDE5MCBbbmZzZF0NCj4gWyAgOTI4Ljc1Njk5MV0gIFs8ZmZmZmZmZmY4ZjBkNTg5MT5dIGt0aHJl
YWQrMHgxMDEvMHgxMjANCj4gWyAgOTI4Ljc1NzU2M10gIFs8ZmZmZmZmZmY4ZjEwZGNjNT5dID8g
dHJhY2VfaGFyZGlycXNfb25fY2FsbGVyKzB4ZjUvMHgxYjANCj4gWyAgOTI4Ljc1ODI4Ml0gIFs8
ZmZmZmZmZmY4ZjhmMmZlZj5dIHJldF9mcm9tX2ZvcmsrMHgxZi8weDQwDQo+IFsgIDkyOC43NTg4
NzVdICBbPGZmZmZmZmZmOGYwZDU3OTA+XSA/IGt0aHJlYWRfY3JlYXRlX29uX25vZGUrMHgyNTAv
MHgyNTANCj4gDQo+IA0KPiBTbyB0aGUgY2xpZW50IGdldHMgYSBmbGV4ZmlsZXMgbGF5b3V0LCBh
bmQgdGhlbiB0cmllcyB0byBpc3N1ZSBhIHYzDQo+IFdSSVRFIGFnYWluc3QgdGhlIGZpbGUuIFhG
UyB0aGVuIHJlY2FsbHMgdGhlIGxheW91dCwgYnV0IHRoZSBjbGllbnQNCj4gY2FuJ3QgcmV0dXJu
IHRoZSBsYXlvdXQgdW50aWwgdGhlIHYzIFdSSVRFIGNvbXBsZXRlcy4gRXZlbnR1YWxseSB0aGlz
DQo+IHNob3VsZCByZXNvbHZlIGl0c2VsZiBhZnRlciAyIGxlYXNlIHBlcmlvZHMsIGJ1dCB0aGF0
J3MgcXVpdGUgYSBsb25nDQo+IHRpbWUuDQoNCldoYXTigJlzIHRoZSBzZXF1ZW5jZSBvZiBvcGVy
YXRpb25zIGhlcmU/IElmIHRoZSBjbGllbnQgaGFzIG91dHN0YW5kaW5nIEkvTywgSSBzaG91bGQg
bm93IGJlIHJldHVybmluZyBORlNfT0ssIGFuZCB0aGVuIGNvbXBsZXRpbmcgdGhlIHJlY2FsbCB3
aXRoIGEgTEFZT1VUUkVUVVJOIGFzIHNvb24gYXMgdGhlIG91dHN0YW5kaW5nIEkvTyAoYW5kIGxh
eW91dGNvbW1pdCwgaWYgb25lIGlzIGR1ZSkgaXMgZG9uZS4NCg0KVGhlIHNlcnZlciBpcyBleHBl
Y3RlZCB0byByZXR1cm4gTkZTNEVSUl9SRUNBTExDT05GTElDVCB0byBhbnkgTEFZT1VUR0VUIGF0
dGVtcHRzIHRoYXQgb2NjdXIgYmVmb3JlIHRoZSBMQVlPVVRSRVRVUk4uDQoNCj4gDQo+IEkgZ3Vl
c3MgWEZTIHJlcXVpcmVzIHJlY2FsbGluZyBibG9jayBhbmQgU0NTSSBsYXlvdXRzIHdoZW4gdGhl
IHNlcnZlcg0KPiB3YW50cyB0byBpc3N1ZSBhIHdyaXRlIChvciBzb21lb25lIHdyaXRlcyB0byBp
dCBsb2NhbGx5KSwgYnV0IHRoYXQNCj4gc2VlbXMgbGlrZSBpdCBzaG91bGRuJ3QgYmUgaGFwcGVu
aW5nIHdoZW4gdGhlIGxheW91dCBpcyBhIGZsZXhmaWxlcw0KPiBsYXlvdXQuDQo+IA0KPiBBbnkg
dGhvdWdodHMgb24gd2hhdCB0aGUgcmlnaHQgZml4IGlzIGhlcmU/DQo+IA0KPiBPbiBhIHJlbGF0
ZWQgbm90ZSwga25mc2Qgd2lsbCBzcGFtIHRoZSBoZWNrIG91dCBvZiB0aGUgY2xpZW50IHdpdGgN
Cj4gQ0JfTEFZT1VUUkVDQUxMcyBkdXJpbmcgdGhpcyB0aW1lLiBJIHRoaW5rIHdlIG91Z2h0IHRv
IGNvbnNpZGVyIGZpeGluZw0KPiB0aGUgc2VydmVyIG5vdCB0byB0cmVhdCBhbiBORlNfT0sgcmV0
dXJuIGZyb20gdGhlIGNsaWVudCBsaWtlDQo+IE5GUzRFUlJfREVMQVkgdGhlcmUsIGJ1dCB0aGF0
IHdvdWxkIG1lYW4gYSBkaWZmZXJlbnQgbWVjaGFuaXNtIGZvcg0KPiB0aW1pbmcgb3V0IGEgQ0Jf
TEFZT1VUUkVDQUxMLg0KDQpUaGVyZSBpcyBhIGJpZyBkaWZmZXJlbmNlIGJldHdlZW4gTkZTX09L
IGFuZCBORlM0RVJSX0RFTEFZIGFzIGZhciBhcyB0aGUgc2VydmVyIGlzIGNvbmNlcm5lZDoNCg0K
LSBORlNfT0sgbWVhbnMgdGhhdCB0aGUgY2xpZW50IGhhcyBub3cgc2VlbiB0aGUgc3RhdGVpZCB3
aXRoIHRoZSB1cGRhdGVkIHNlcXVlbmNlIGlkIHRoYXQgd2FzIHNlbnQgaW4gQ0JfTEFZT1VUUkVD
QUxMLCBhbmQgaXMgcHJvY2Vzc2luZyBpdC4gTm8gcmVzZW5kIG9mIHRoZSBDQl9MQVlPVVRSRUNB
TEwgaXMgcmVxdWlyZWQuDQotIE9UT0gsIE5GUzRFUlJfREVMQVkgbWVhbnMgdGhlIHNhbWUgdGhp
bmcgaW4gdGhlIGJhY2sgY2hhbm5lbCBhcyBpdCBkb2VzIGluIHRoZSBmb3J3YXJkIGNoYW5uZWw6
IEnigJltIGJ1c3kgYW5kIGNhbm5vdCBwcm9jZXNzIHlvdXIgcmVxdWVzdCwgcGxlYXNlIHJlc2Vu
ZCBpdCBsYXRlci4=
On Thu, 2016-08-11 at 15:55 +0000, Trond Myklebust wrote:
> >
> > On Aug 11, 2016, at 11:23, Jeff Layton <[email protected]> wrote:
> >
> > I was playing around with the in-kernel flexfiles server today, and
> > I
> > seem to be hitting a deadlock when using it on an XFS-exported
> > filesystem. Here's the stack trace of how the CB_LAYOUTRECALL
> > occurs:
> >
> > [ 928.736139] CPU: 0 PID: 846 Comm: nfsd Tainted:
> > G OE 4.8.0-rc1+ #3
> > [ 928.737040] Hardware name: QEMU Standard PC (i440FX + PIIX,
> > 1996), BIOS 1.9.1-1.fc24 04/01/2014
> > [ 928.738009] 0000000000000286 000000006125f50e ffff91153845b878
> > ffffffff8f463853
> > [ 928.738906] ffff91152ec194d0 ffff91152d31d9c0 ffff91153845b8a8
> > ffffffffc045936f
> > [ 928.739788] ffff91152c051980 ffff91152d31d9c0 ffff91152c051540
> > ffff9115361b8a58
> > [ 928.740697] Call Trace:
> > [ 928.740998] [<ffffffff8f463853>] dump_stack+0x86/0xc3
> > [ 928.741570] [<ffffffffc045936f>]
> > nfsd4_recall_file_layout+0x17f/0x190 [nfsd]
> > [ 928.742380] [<ffffffffc045939d>]
> > nfsd4_layout_lm_break+0x1d/0x30 [nfsd]
> > [ 928.743115] [<ffffffff8f3056d8>] __break_lease+0x118/0x6a0
> > [ 928.743759] [<ffffffffc02dea69>] xfs_break_layouts+0x79/0x120
> > [xfs]
> > [ 928.744462] [<ffffffffc029ea04>]
> > xfs_file_aio_write_checks+0x94/0x1f0 [xfs]
> > [ 928.745251] [<ffffffffc029f36b>]
> > xfs_file_buffered_aio_write+0x7b/0x330 [xfs]
> > [ 928.746063] [<ffffffffc029f70c>] xfs_file_write_iter+0xec/0x140
> > [xfs]
> > [ 928.746803] [<ffffffff8f2a0599>]
> > do_iter_readv_writev+0xb9/0x140
> > [ 928.747478] [<ffffffff8f2a126b>] do_readv_writev+0x19b/0x240
> > [ 928.748146] [<ffffffffc029f620>] ?
> > xfs_file_buffered_aio_write+0x330/0x330 [xfs]
> > [ 928.748956] [<ffffffff8f29e02b>] ? do_dentry_open+0x28b/0x310
> > [ 928.749614] [<ffffffffc029c800>] ?
> > xfs_extent_busy_ag_cmp+0x20/0x20 [xfs]
> > [ 928.750367] [<ffffffff8f2a156f>] vfs_writev+0x3f/0x50
> > [ 928.750934] [<ffffffffc04276ca>] nfsd_vfs_write+0xca/0x3a0
> > [nfsd]
> > [ 928.751608] [<ffffffffc0429ec5>] nfsd_write+0x485/0x780 [nfsd]
> > [ 928.752263] [<ffffffffc043144c>] nfsd3_proc_write+0xbc/0x150
> > [nfsd]
> > [ 928.752973] [<ffffffffc0421388>] nfsd_dispatch+0xb8/0x1f0
> > [nfsd]
> > [ 928.753642] [<ffffffffc036d78f>] svc_process_common+0x42f/0x690
> > [sunrpc]
> > [ 928.754395] [<ffffffffc036e8e8>] svc_process+0x118/0x330
> > [sunrpc]
> > [ 928.755080] [<ffffffffc04208ac>] nfsd+0x19c/0x2b0 [nfsd]
> > [ 928.755681] [<ffffffffc0420715>] ? nfsd+0x5/0x2b0 [nfsd]
> > [ 928.756274] [<ffffffffc0420710>] ? nfsd_destroy+0x190/0x190
> > [nfsd]
> > [ 928.756991] [<ffffffff8f0d5891>] kthread+0x101/0x120
> > [ 928.757563] [<ffffffff8f10dcc5>] ?
> > trace_hardirqs_on_caller+0xf5/0x1b0
> > [ 928.758282] [<ffffffff8f8f2fef>] ret_from_fork+0x1f/0x40
> > [ 928.758875] [<ffffffff8f0d5790>] ?
> > kthread_create_on_node+0x250/0x250
> >
> >
> > So the client gets a flexfiles layout, and then tries to issue a v3
> > WRITE against the file. XFS then recalls the layout, but the client
> > can't return the layout until the v3 WRITE completes. Eventually
> > this
> > should resolve itself after 2 lease periods, but that's quite a
> > long
> > time.
>
> What’s the sequence of operations here? If the client has outstanding
> I/O, I should now be returning NFS_OK, and then completing the recall
> with a LAYOUTRETURN as soon as the outstanding I/O (and layoutcommit,
> if one is due) is done.
>
> The server is expected to return NFS4ERR_RECALLCONFLICT to any
> LAYOUTGET attempts that occur before the LAYOUTRETURN.
>
Basically, I'm just doing this on the client:
$ echo "foo" > /mnt/knfsdsrv/testfile
The client does:
OPEN
LAYOUTGET (for RW)
GETDEVICEINFO
...and then a v3 WRITE under the aegis of the layout it got.
The server then issues a CB_LAYOUTRECALL (because XFS wants to do that
whenever there is a local write, apparently). The client returns
NFS_OK, but it can't return the layout until the v3 WRITE completes.
The v3 write is hung though because it's waiting for the layout to be
returned.
> >
> >
> > I guess XFS requires recalling block and SCSI layouts when the
> > server
> > wants to issue a write (or someone writes to it locally), but that
> > seems like it shouldn't be happening when the layout is a flexfiles
> > layout.
> >
> > Any thoughts on what the right fix is here?
> >
> > On a related note, knfsd will spam the heck out of the client with
> > CB_LAYOUTRECALLs during this time. I think we ought to consider
> > fixing
> > the server not to treat an NFS_OK return from the client like
> > NFS4ERR_DELAY there, but that would mean a different mechanism for
> > timing out a CB_LAYOUTRECALL.
>
> There is a big difference between NFS_OK and NFS4ERR_DELAY as far as
> the server is concerned:
>
> - NFS_OK means that the client has now seen the stateid with the
> updated sequence id that was sent in CB_LAYOUTRECALL, and is
> processing it. No resend of the CB_LAYOUTRECALL is required.
> - OTOH, NFS4ERR_DELAY means the same thing in the back channel as it
> does in the forward channel: I’m busy and cannot process your
> request, please resend it later.
Right. The current code basically just treats them the same as a
mechanism to handle eventually timing out the layoutrecall. The extra
CB_LAYOUTRECALLs are entirely superfluous. It's probably not too hard
to fix, but we'd need to come up with some other mechanism for timing
out the layoutrecall.
--
Jeff Layton <[email protected]>
DQo+IE9uIEF1ZyAxMSwgMjAxNiwgYXQgMTI6MDYsIEplZmYgTGF5dG9uIDxqbGF5dG9uQHJlZGhh
dC5jb20+IHdyb3RlOg0KPiANCj4gT24gVGh1LCAyMDE2LTA4LTExIGF0IDE1OjU1ICswMDAwLCBU
cm9uZCBNeWtsZWJ1c3Qgd3JvdGU6DQo+Pj4gDQo+Pj4gT24gQXVnIDExLCAyMDE2LCBhdCAxMToy
MywgSmVmZiBMYXl0b24gPGpsYXl0b25AcmVkaGF0LmNvbT4gd3JvdGU6DQo+Pj4gDQo+Pj4gSSB3
YXMgcGxheWluZyBhcm91bmQgd2l0aCB0aGUgaW4ta2VybmVsIGZsZXhmaWxlcyBzZXJ2ZXIgdG9k
YXksIGFuZA0KPj4+IEkNCj4+PiBzZWVtIHRvIGJlIGhpdHRpbmcgYSBkZWFkbG9jayB3aGVuIHVz
aW5nIGl0IG9uIGFuIFhGUy1leHBvcnRlZA0KPj4+IGZpbGVzeXN0ZW0uIEhlcmUncyB0aGUgc3Rh
Y2sgdHJhY2Ugb2YgaG93IHRoZSBDQl9MQVlPVVRSRUNBTEwNCj4+PiBvY2N1cnM6DQo+Pj4gDQo+
Pj4gWyAgOTI4LjczNjEzOV0gQ1BVOiAwIFBJRDogODQ2IENvbW06IG5mc2QgVGFpbnRlZDoNCj4+
PiBHICAgICAgICAgICBPRSAgIDQuOC4wLXJjMSsgIzMNCj4+PiBbICA5MjguNzM3MDQwXSBIYXJk
d2FyZSBuYW1lOiBRRU1VIFN0YW5kYXJkIFBDIChpNDQwRlggKyBQSUlYLA0KPj4+IDE5OTYpLCBC
SU9TIDEuOS4xLTEuZmMyNCAwNC8wMS8yMDE0DQo+Pj4gWyAgOTI4LjczODAwOV0gIDAwMDAwMDAw
MDAwMDAyODYgMDAwMDAwMDA2MTI1ZjUwZSBmZmZmOTExNTM4NDViODc4DQo+Pj4gZmZmZmZmZmY4
ZjQ2Mzg1Mw0KPj4+IFsgIDkyOC43Mzg5MDZdICBmZmZmOTExNTJlYzE5NGQwIGZmZmY5MTE1MmQz
MWQ5YzAgZmZmZjkxMTUzODQ1YjhhOA0KPj4+IGZmZmZmZmZmYzA0NTkzNmYNCj4+PiBbICA5Mjgu
NzM5Nzg4XSAgZmZmZjkxMTUyYzA1MTk4MCBmZmZmOTExNTJkMzFkOWMwIGZmZmY5MTE1MmMwNTE1
NDANCj4+PiBmZmZmOTExNTM2MWI4YTU4DQo+Pj4gWyAgOTI4Ljc0MDY5N10gQ2FsbCBUcmFjZToN
Cj4+PiBbICA5MjguNzQwOTk4XSAgWzxmZmZmZmZmZjhmNDYzODUzPl0gZHVtcF9zdGFjaysweDg2
LzB4YzMNCj4+PiBbICA5MjguNzQxNTcwXSAgWzxmZmZmZmZmZmMwNDU5MzZmPl0NCj4+PiBuZnNk
NF9yZWNhbGxfZmlsZV9sYXlvdXQrMHgxN2YvMHgxOTAgW25mc2RdDQo+Pj4gWyAgOTI4Ljc0MjM4
MF0gIFs8ZmZmZmZmZmZjMDQ1OTM5ZD5dDQo+Pj4gbmZzZDRfbGF5b3V0X2xtX2JyZWFrKzB4MWQv
MHgzMCBbbmZzZF0NCj4+PiBbICA5MjguNzQzMTE1XSAgWzxmZmZmZmZmZjhmMzA1NmQ4Pl0gX19i
cmVha19sZWFzZSsweDExOC8weDZhMA0KPj4+IFsgIDkyOC43NDM3NTldICBbPGZmZmZmZmZmYzAy
ZGVhNjk+XSB4ZnNfYnJlYWtfbGF5b3V0cysweDc5LzB4MTIwDQo+Pj4gW3hmc10NCj4+PiBbICA5
MjguNzQ0NDYyXSAgWzxmZmZmZmZmZmMwMjllYTA0Pl0NCj4+PiB4ZnNfZmlsZV9haW9fd3JpdGVf
Y2hlY2tzKzB4OTQvMHgxZjAgW3hmc10NCj4+PiBbICA5MjguNzQ1MjUxXSAgWzxmZmZmZmZmZmMw
MjlmMzZiPl0NCj4+PiB4ZnNfZmlsZV9idWZmZXJlZF9haW9fd3JpdGUrMHg3Yi8weDMzMCBbeGZz
XQ0KPj4+IFsgIDkyOC43NDYwNjNdICBbPGZmZmZmZmZmYzAyOWY3MGM+XSB4ZnNfZmlsZV93cml0
ZV9pdGVyKzB4ZWMvMHgxNDANCj4+PiBbeGZzXQ0KPj4+IFsgIDkyOC43NDY4MDNdICBbPGZmZmZm
ZmZmOGYyYTA1OTk+XQ0KPj4+IGRvX2l0ZXJfcmVhZHZfd3JpdGV2KzB4YjkvMHgxNDANCj4+PiBb
ICA5MjguNzQ3NDc4XSAgWzxmZmZmZmZmZjhmMmExMjZiPl0gZG9fcmVhZHZfd3JpdGV2KzB4MTli
LzB4MjQwDQo+Pj4gWyAgOTI4Ljc0ODE0Nl0gIFs8ZmZmZmZmZmZjMDI5ZjYyMD5dID8NCj4+PiB4
ZnNfZmlsZV9idWZmZXJlZF9haW9fd3JpdGUrMHgzMzAvMHgzMzAgW3hmc10NCj4+PiBbICA5Mjgu
NzQ4OTU2XSAgWzxmZmZmZmZmZjhmMjllMDJiPl0gPyBkb19kZW50cnlfb3BlbisweDI4Yi8weDMx
MA0KPj4+IFsgIDkyOC43NDk2MTRdICBbPGZmZmZmZmZmYzAyOWM4MDA+XSA/DQo+Pj4geGZzX2V4
dGVudF9idXN5X2FnX2NtcCsweDIwLzB4MjAgW3hmc10NCj4+PiBbICA5MjguNzUwMzY3XSAgWzxm
ZmZmZmZmZjhmMmExNTZmPl0gdmZzX3dyaXRldisweDNmLzB4NTANCj4+PiBbICA5MjguNzUwOTM0
XSAgWzxmZmZmZmZmZmMwNDI3NmNhPl0gbmZzZF92ZnNfd3JpdGUrMHhjYS8weDNhMA0KPj4+IFtu
ZnNkXQ0KPj4+IFsgIDkyOC43NTE2MDhdICBbPGZmZmZmZmZmYzA0MjllYzU+XSBuZnNkX3dyaXRl
KzB4NDg1LzB4NzgwIFtuZnNkXQ0KPj4+IFsgIDkyOC43NTIyNjNdICBbPGZmZmZmZmZmYzA0MzE0
NGM+XSBuZnNkM19wcm9jX3dyaXRlKzB4YmMvMHgxNTANCj4+PiBbbmZzZF0NCj4+PiBbICA5Mjgu
NzUyOTczXSAgWzxmZmZmZmZmZmMwNDIxMzg4Pl0gbmZzZF9kaXNwYXRjaCsweGI4LzB4MWYwDQo+
Pj4gW25mc2RdDQo+Pj4gWyAgOTI4Ljc1MzY0Ml0gIFs8ZmZmZmZmZmZjMDM2ZDc4Zj5dIHN2Y19w
cm9jZXNzX2NvbW1vbisweDQyZi8weDY5MA0KPj4+IFtzdW5ycGNdDQo+Pj4gWyAgOTI4Ljc1NDM5
NV0gIFs8ZmZmZmZmZmZjMDM2ZThlOD5dIHN2Y19wcm9jZXNzKzB4MTE4LzB4MzMwDQo+Pj4gW3N1
bnJwY10NCj4+PiBbICA5MjguNzU1MDgwXSAgWzxmZmZmZmZmZmMwNDIwOGFjPl0gbmZzZCsweDE5
Yy8weDJiMCBbbmZzZF0NCj4+PiBbICA5MjguNzU1NjgxXSAgWzxmZmZmZmZmZmMwNDIwNzE1Pl0g
PyBuZnNkKzB4NS8weDJiMCBbbmZzZF0NCj4+PiBbICA5MjguNzU2Mjc0XSAgWzxmZmZmZmZmZmMw
NDIwNzEwPl0gPyBuZnNkX2Rlc3Ryb3krMHgxOTAvMHgxOTANCj4+PiBbbmZzZF0NCj4+PiBbICA5
MjguNzU2OTkxXSAgWzxmZmZmZmZmZjhmMGQ1ODkxPl0ga3RocmVhZCsweDEwMS8weDEyMA0KPj4+
IFsgIDkyOC43NTc1NjNdICBbPGZmZmZmZmZmOGYxMGRjYzU+XSA/DQo+Pj4gdHJhY2VfaGFyZGly
cXNfb25fY2FsbGVyKzB4ZjUvMHgxYjANCj4+PiBbICA5MjguNzU4MjgyXSAgWzxmZmZmZmZmZjhm
OGYyZmVmPl0gcmV0X2Zyb21fZm9yaysweDFmLzB4NDANCj4+PiBbICA5MjguNzU4ODc1XSAgWzxm
ZmZmZmZmZjhmMGQ1NzkwPl0gPw0KPj4+IGt0aHJlYWRfY3JlYXRlX29uX25vZGUrMHgyNTAvMHgy
NTANCj4+PiANCj4+PiANCj4+PiBTbyB0aGUgY2xpZW50IGdldHMgYSBmbGV4ZmlsZXMgbGF5b3V0
LCBhbmQgdGhlbiB0cmllcyB0byBpc3N1ZSBhIHYzDQo+Pj4gV1JJVEUgYWdhaW5zdCB0aGUgZmls
ZS4gWEZTIHRoZW4gcmVjYWxscyB0aGUgbGF5b3V0LCBidXQgdGhlIGNsaWVudA0KPj4+IGNhbid0
IHJldHVybiB0aGUgbGF5b3V0IHVudGlsIHRoZSB2MyBXUklURSBjb21wbGV0ZXMuIEV2ZW50dWFs
bHkNCj4+PiB0aGlzDQo+Pj4gc2hvdWxkIHJlc29sdmUgaXRzZWxmIGFmdGVyIDIgbGVhc2UgcGVy
aW9kcywgYnV0IHRoYXQncyBxdWl0ZSBhDQo+Pj4gbG9uZw0KPj4+IHRpbWUuDQo+PiANCj4+IFdo
YXTigJlzIHRoZSBzZXF1ZW5jZSBvZiBvcGVyYXRpb25zIGhlcmU/IElmIHRoZSBjbGllbnQgaGFz
IG91dHN0YW5kaW5nDQo+PiBJL08sIEkgc2hvdWxkIG5vdyBiZSByZXR1cm5pbmcgTkZTX09LLCBh
bmQgdGhlbiBjb21wbGV0aW5nIHRoZSByZWNhbGwNCj4+IHdpdGggYSBMQVlPVVRSRVRVUk4gYXMg
c29vbiBhcyB0aGUgb3V0c3RhbmRpbmcgSS9PIChhbmQgbGF5b3V0Y29tbWl0LA0KPj4gaWYgb25l
IGlzIGR1ZSkgaXMgZG9uZS4NCj4+IA0KPj4gVGhlIHNlcnZlciBpcyBleHBlY3RlZCB0byByZXR1
cm4gTkZTNEVSUl9SRUNBTExDT05GTElDVCB0byBhbnkNCj4+IExBWU9VVEdFVCBhdHRlbXB0cyB0
aGF0IG9jY3VyIGJlZm9yZSB0aGUgTEFZT1VUUkVUVVJOLg0KPj4gDQo+IA0KPiBCYXNpY2FsbHks
IEknbSBqdXN0IGRvaW5nIHRoaXMgb24gdGhlIGNsaWVudDoNCj4gDQo+ICAgICAkIGVjaG8gImZv
byIgPiAvbW50L2tuZnNkc3J2L3Rlc3RmaWxlDQo+IA0KPiANCj4gVGhlIGNsaWVudCBkb2VzOg0K
PiANCj4gT1BFTg0KPiBMQVlPVVRHRVQgKGZvciBSVykNCj4gR0VUREVWSUNFSU5GTw0KPiANCj4g
Li4uYW5kIHRoZW4gYSB2MyBXUklURSB1bmRlciB0aGUgYWVnaXMgb2YgdGhlIGxheW91dCBpdCBn
b3QuDQo+IA0KPiBUaGUgc2VydmVyIHRoZW4gaXNzdWVzIGEgQ0JfTEFZT1VUUkVDQUxMIChiZWNh
dXNlIFhGUyB3YW50cyB0byBkbyB0aGF0DQo+IHdoZW5ldmVyIHRoZXJlIGlzIGEgbG9jYWwgd3Jp
dGUsIGFwcGFyZW50bHkpLiBUaGUgY2xpZW50IHJldHVybnMNCj4gTkZTX09LLCBidXQgaXQgY2Fu
J3QgcmV0dXJuIHRoZSBsYXlvdXQgdW50aWwgdGhlIHYzIFdSSVRFIGNvbXBsZXRlcy4NCj4gVGhl
IHYzIHdyaXRlIGlzIGh1bmcgdGhvdWdoIGJlY2F1c2UgaXQncyB3YWl0aW5nIGZvciB0aGUgbGF5
b3V0IHRvIGJlDQo+IHJldHVybmVkLg0KDQpPaOKApiBTbyB0aGlzIGlzIGFuIGFydGlmYWN0IG9m
IHRoZSB3cml0ZSBiZWluZyBsb2NhbCwgYW5kIFhGUyBoYXZpbmcgYSBwYXRoIHRvIHJlY2FsbCB0
aGUgbGF5b3V0IHRoYXQgaXQgcmVhbGx5IHNob3VsZG7igJl0IGhhdmUgaW4gdGhlIGZsZXhmaWxl
cyBjYXNlPw0KDQo+IA0KPj4+IA0KPj4+IA0KPj4+IEkgZ3Vlc3MgWEZTIHJlcXVpcmVzIHJlY2Fs
bGluZyBibG9jayBhbmQgU0NTSSBsYXlvdXRzIHdoZW4gdGhlDQo+Pj4gc2VydmVyDQo+Pj4gd2Fu
dHMgdG8gaXNzdWUgYSB3cml0ZSAob3Igc29tZW9uZSB3cml0ZXMgdG8gaXQgbG9jYWxseSksIGJ1
dCB0aGF0DQo+Pj4gc2VlbXMgbGlrZSBpdCBzaG91bGRuJ3QgYmUgaGFwcGVuaW5nIHdoZW4gdGhl
IGxheW91dCBpcyBhIGZsZXhmaWxlcw0KPj4+IGxheW91dC4NCj4+PiANCj4+PiBBbnkgdGhvdWdo
dHMgb24gd2hhdCB0aGUgcmlnaHQgZml4IGlzIGhlcmU/DQo+Pj4gDQo+Pj4gT24gYSByZWxhdGVk
IG5vdGUsIGtuZnNkIHdpbGwgc3BhbSB0aGUgaGVjayBvdXQgb2YgdGhlIGNsaWVudCB3aXRoDQo+
Pj4gQ0JfTEFZT1VUUkVDQUxMcyBkdXJpbmcgdGhpcyB0aW1lLiBJIHRoaW5rIHdlIG91Z2h0IHRv
IGNvbnNpZGVyDQo+Pj4gZml4aW5nDQo+Pj4gdGhlIHNlcnZlciBub3QgdG8gdHJlYXQgYW4gTkZT
X09LIHJldHVybiBmcm9tIHRoZSBjbGllbnQgbGlrZQ0KPj4+IE5GUzRFUlJfREVMQVkgdGhlcmUs
IGJ1dCB0aGF0IHdvdWxkIG1lYW4gYSBkaWZmZXJlbnQgbWVjaGFuaXNtIGZvcg0KPj4+IHRpbWlu
ZyBvdXQgYSBDQl9MQVlPVVRSRUNBTEwuDQo+PiANCj4+IFRoZXJlIGlzIGEgYmlnIGRpZmZlcmVu
Y2UgYmV0d2VlbiBORlNfT0sgYW5kIE5GUzRFUlJfREVMQVkgYXMgZmFyIGFzDQo+PiB0aGUgc2Vy
dmVyIGlzIGNvbmNlcm5lZDoNCj4+IA0KPj4gLSBORlNfT0sgbWVhbnMgdGhhdCB0aGUgY2xpZW50
IGhhcyBub3cgc2VlbiB0aGUgc3RhdGVpZCB3aXRoIHRoZQ0KPj4gdXBkYXRlZCBzZXF1ZW5jZSBp
ZCB0aGF0IHdhcyBzZW50IGluIENCX0xBWU9VVFJFQ0FMTCwgYW5kIGlzDQo+PiBwcm9jZXNzaW5n
IGl0LiBObyByZXNlbmQgb2YgdGhlIENCX0xBWU9VVFJFQ0FMTCBpcyByZXF1aXJlZC4NCj4+IC0g
T1RPSCwgTkZTNEVSUl9ERUxBWSBtZWFucyB0aGUgc2FtZSB0aGluZyBpbiB0aGUgYmFjayBjaGFu
bmVsIGFzIGl0DQo+PiBkb2VzIGluIHRoZSBmb3J3YXJkIGNoYW5uZWw6IEnigJltIGJ1c3kgYW5k
IGNhbm5vdCBwcm9jZXNzIHlvdXINCj4+IHJlcXVlc3QsIHBsZWFzZSByZXNlbmQgaXQgbGF0ZXIu
DQo+IA0KPiBSaWdodC4gVGhlIGN1cnJlbnQgY29kZSBiYXNpY2FsbHkganVzdCB0cmVhdHMgdGhl
bSB0aGUgc2FtZSBhcyBhDQo+IG1lY2hhbmlzbSB0byBoYW5kbGUgZXZlbnR1YWxseSB0aW1pbmcg
b3V0IHRoZSBsYXlvdXRyZWNhbGwuIFRoZSBleHRyYQ0KPiBDQl9MQVlPVVRSRUNBTExzIGFyZSBl
bnRpcmVseSBzdXBlcmZsdW91cy4gSXQncyBwcm9iYWJseSBub3QgdG9vIGhhcmQNCj4gdG8gZml4
LCBidXQgd2UnZCBuZWVkIHRvIGNvbWUgdXAgd2l0aCBzb21lIG90aGVyIG1lY2hhbmlzbSBmb3Ig
dGltaW5nDQo+IG91dCB0aGUgbGF5b3V0cmVjYWxsLg0KPiANCj4gLS0gDQo+IEplZmYgTGF5dG9u
IDxqbGF5dG9uQHJlZGhhdC5jb20+DQoNCg==
Yeah, for file-like layouts there should be a flag in
struct nfsd4_layout_ops to disable recalls.
On Thu, 2016-08-11 at 18:25 +0200, hch wrote:
> Yeah, for file-like layouts there should be a flag in
> struct nfsd4_layout_ops to disable recalls.
I don't think disabling recalls would be enough, would it? XFS still
wants to break_layout and won't proceed until the layout list is empty,
AFAICT. We need some way to indicate to the lower filesystem not to
call break_layout in this case.
--
Jeff Layton <[email protected]>
On Thu, Aug 11, 2016 at 12:33:47PM -0400, Jeff Layton wrote:
> On Thu, 2016-08-11 at 18:25 +0200, hch wrote:
> > Yeah, for file-like layouts there should be a flag in
> > struct nfsd4_layout_ops to disable recalls.
>
> I don't think disabling recalls would be enough, would it? XFS still
> wants to break_layout and won't proceed until the layout list is empty,
> AFAICT. We need some way to indicate to the lower filesystem not to
> call break_layout in this case.
XFS only cares about block-like layours where the client has direct
access to the file blocks. I'd need to look how to propagate the
flag into break_layout, but in principle we don't need to do any
recalls on truncate every for file and flexfile layouts.
>
> --
> Jeff Layton <[email protected]>
---end quoted text---
On Thu, 2016-08-11 at 18:59 +0200, hch wrote:
> On Thu, Aug 11, 2016 at 12:33:47PM -0400, Jeff Layton wrote:
> >
> > On Thu, 2016-08-11 at 18:25 +0200, hch wrote:
> > >
> > > Yeah, for file-like layouts there should be a flag in
> > > struct nfsd4_layout_ops to disable recalls.
> >
> > I don't think disabling recalls would be enough, would it? XFS
> > still
> > wants to break_layout and won't proceed until the layout list is
> > empty,
> > AFAICT. We need some way to indicate to the lower filesystem not to
> > call break_layout in this case.
>
> XFS only cares about block-like layours where the client has direct
> access to the file blocks. I'd need to look how to propagate the
> flag into break_layout, but in principle we don't need to do any
> recalls on truncate every for file and flexfile layouts.
>
Hmm...if we aren't ever going to recall files and flexfiles layouts,
then do we even need to set a FL_LAYOUT lease for them at all?
I think I'll try hacking something up that takes that approach and see
if that might be a reasonable fix.
--
Jeff Layton <[email protected]>
On 11 Aug 2016, at 11:23, Jeff Layton wrote:
> I was playing around with the in-kernel flexfiles server today, and I
> seem to be hitting a deadlock when using it on an XFS-exported
> filesystem. Here's the stack trace of how the CB_LAYOUTRECALL occurs:
>
> [ 928.736139] CPU: 0 PID: 846 Comm: nfsd Tainted: G OE
> 4.8.0-rc1+ #3
> [ 928.737040] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS 1.9.1-1.fc24 04/01/2014
> [ 928.738009] 0000000000000286 000000006125f50e ffff91153845b878
> ffffffff8f463853
> [ 928.738906] ffff91152ec194d0 ffff91152d31d9c0 ffff91153845b8a8
> ffffffffc045936f
> [ 928.739788] ffff91152c051980 ffff91152d31d9c0 ffff91152c051540
> ffff9115361b8a58
> [ 928.740697] Call Trace:
> [ 928.740998] [<ffffffff8f463853>] dump_stack+0x86/0xc3
> [ 928.741570] [<ffffffffc045936f>]
> nfsd4_recall_file_layout+0x17f/0x190 [nfsd]
> [ 928.742380] [<ffffffffc045939d>] nfsd4_layout_lm_break+0x1d/0x30
> [nfsd]
> [ 928.743115] [<ffffffff8f3056d8>] __break_lease+0x118/0x6a0
> [ 928.743759] [<ffffffffc02dea69>] xfs_break_layouts+0x79/0x120
> [xfs]
> [ 928.744462] [<ffffffffc029ea04>]
> xfs_file_aio_write_checks+0x94/0x1f0 [xfs]
> [ 928.745251] [<ffffffffc029f36b>]
> xfs_file_buffered_aio_write+0x7b/0x330 [xfs]
> [ 928.746063] [<ffffffffc029f70c>] xfs_file_write_iter+0xec/0x140
> [xfs]
> [ 928.746803] [<ffffffff8f2a0599>] do_iter_readv_writev+0xb9/0x140
> [ 928.747478] [<ffffffff8f2a126b>] do_readv_writev+0x19b/0x240
> [ 928.748146] [<ffffffffc029f620>] ?
> xfs_file_buffered_aio_write+0x330/0x330 [xfs]
> [ 928.748956] [<ffffffff8f29e02b>] ? do_dentry_open+0x28b/0x310
> [ 928.749614] [<ffffffffc029c800>] ?
> xfs_extent_busy_ag_cmp+0x20/0x20 [xfs]
> [ 928.750367] [<ffffffff8f2a156f>] vfs_writev+0x3f/0x50
> [ 928.750934] [<ffffffffc04276ca>] nfsd_vfs_write+0xca/0x3a0 [nfsd]
> [ 928.751608] [<ffffffffc0429ec5>] nfsd_write+0x485/0x780 [nfsd]
> [ 928.752263] [<ffffffffc043144c>] nfsd3_proc_write+0xbc/0x150
> [nfsd]
> [ 928.752973] [<ffffffffc0421388>] nfsd_dispatch+0xb8/0x1f0 [nfsd]
> [ 928.753642] [<ffffffffc036d78f>] svc_process_common+0x42f/0x690
> [sunrpc]
> [ 928.754395] [<ffffffffc036e8e8>] svc_process+0x118/0x330 [sunrpc]
> [ 928.755080] [<ffffffffc04208ac>] nfsd+0x19c/0x2b0 [nfsd]
> [ 928.755681] [<ffffffffc0420715>] ? nfsd+0x5/0x2b0 [nfsd]
> [ 928.756274] [<ffffffffc0420710>] ? nfsd_destroy+0x190/0x190 [nfsd]
> [ 928.756991] [<ffffffff8f0d5891>] kthread+0x101/0x120
> [ 928.757563] [<ffffffff8f10dcc5>] ?
> trace_hardirqs_on_caller+0xf5/0x1b0
> [ 928.758282] [<ffffffff8f8f2fef>] ret_from_fork+0x1f/0x40
> [ 928.758875] [<ffffffff8f0d5790>] ?
> kthread_create_on_node+0x250/0x250
>
>
> So the client gets a flexfiles layout, and then tries to issue a v3
> WRITE against the file. XFS then recalls the layout, but the client
> can't return the layout until the v3 WRITE completes. Eventually this
> should resolve itself after 2 lease periods, but that's quite a long
> time.
>
> I guess XFS requires recalling block and SCSI layouts when the server
> wants to issue a write (or someone writes to it locally), but that
> seems like it shouldn't be happening when the layout is a flexfiles
> layout.
>
> Any thoughts on what the right fix is here?
>
> On a related note, knfsd will spam the heck out of the client with
> CB_LAYOUTRECALLs during this time. I think we ought to consider fixing
> the server not to treat an NFS_OK return from the client like
> NFS4ERR_DELAY there, but that would mean a different mechanism for
> timing out a CB_LAYOUTRECALL.
I'm getting into similar trouble with SCSI layouts when the client ends
up
submitting a WRITE because the IO is not page aligned, but it already
holds
a layout for that range. It looks like the server sends a
CB_LAYOUTRECALL,
but the client has to answer NFS4ERR_DELAY because it is still holding
the
layout.
Probably, the client should return any layouts it holds for that range
before
doing IO through the MDS.
Alternatively, shouldn't the MDS accept IO from the same client that
holds a
layout for that range, rather than recall that layout? RFC 5661 Section
20.3.4 talks about the client submitting WRITEs before responding to
CB_LAYOUTRECALL: "As always, the client may write the data through the
metadata server."
I'm trying to find the discussion that resulted in this commit:
commit 6b9b21073d3b250e17812cd562fffc9006962b39
Author: Jeff Layton <[email protected]>
Date: Tue Dec 8 07:23:48 2015 -0500
nfsd: give up on CB_LAYOUTRECALLs after two lease periods
Why should we poll the client if the client answers with NFS4ERR_DELAY?
Can
we instead just wait for the layout to be returned?
Also, I think the 2*lease period timeout is currently broken because we
reset
tk_start after every call.. but that's not really causing any trouble.
Ben
On Sat, 2018-01-27 at 10:39 -0500, Benjamin Coddington wrote:
> On 11 Aug 2016, at 11:23, Jeff Layton wrote:
>
> > I was playing around with the in-kernel flexfiles server today, and I
> > seem to be hitting a deadlock when using it on an XFS-exported
> > filesystem. Here's the stack trace of how the CB_LAYOUTRECALL occurs:
> >
> > [ 928.736139] CPU: 0 PID: 846 Comm: nfsd Tainted: G OE
> > 4.8.0-rc1+ #3
> > [ 928.737040] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> > BIOS 1.9.1-1.fc24 04/01/2014
> > [ 928.738009] 0000000000000286 000000006125f50e ffff91153845b878
> > ffffffff8f463853
> > [ 928.738906] ffff91152ec194d0 ffff91152d31d9c0 ffff91153845b8a8
> > ffffffffc045936f
> > [ 928.739788] ffff91152c051980 ffff91152d31d9c0 ffff91152c051540
> > ffff9115361b8a58
> > [ 928.740697] Call Trace:
> > [ 928.740998] [<ffffffff8f463853>] dump_stack+0x86/0xc3
> > [ 928.741570] [<ffffffffc045936f>]
> > nfsd4_recall_file_layout+0x17f/0x190 [nfsd]
> > [ 928.742380] [<ffffffffc045939d>] nfsd4_layout_lm_break+0x1d/0x30
> > [nfsd]
> > [ 928.743115] [<ffffffff8f3056d8>] __break_lease+0x118/0x6a0
> > [ 928.743759] [<ffffffffc02dea69>] xfs_break_layouts+0x79/0x120
> > [xfs]
> > [ 928.744462] [<ffffffffc029ea04>]
> > xfs_file_aio_write_checks+0x94/0x1f0 [xfs]
> > [ 928.745251] [<ffffffffc029f36b>]
> > xfs_file_buffered_aio_write+0x7b/0x330 [xfs]
> > [ 928.746063] [<ffffffffc029f70c>] xfs_file_write_iter+0xec/0x140
> > [xfs]
> > [ 928.746803] [<ffffffff8f2a0599>] do_iter_readv_writev+0xb9/0x140
> > [ 928.747478] [<ffffffff8f2a126b>] do_readv_writev+0x19b/0x240
> > [ 928.748146] [<ffffffffc029f620>] ?
> > xfs_file_buffered_aio_write+0x330/0x330 [xfs]
> > [ 928.748956] [<ffffffff8f29e02b>] ? do_dentry_open+0x28b/0x310
> > [ 928.749614] [<ffffffffc029c800>] ?
> > xfs_extent_busy_ag_cmp+0x20/0x20 [xfs]
> > [ 928.750367] [<ffffffff8f2a156f>] vfs_writev+0x3f/0x50
> > [ 928.750934] [<ffffffffc04276ca>] nfsd_vfs_write+0xca/0x3a0 [nfsd]
> > [ 928.751608] [<ffffffffc0429ec5>] nfsd_write+0x485/0x780 [nfsd]
> > [ 928.752263] [<ffffffffc043144c>] nfsd3_proc_write+0xbc/0x150
> > [nfsd]
> > [ 928.752973] [<ffffffffc0421388>] nfsd_dispatch+0xb8/0x1f0 [nfsd]
> > [ 928.753642] [<ffffffffc036d78f>] svc_process_common+0x42f/0x690
> > [sunrpc]
> > [ 928.754395] [<ffffffffc036e8e8>] svc_process+0x118/0x330 [sunrpc]
> > [ 928.755080] [<ffffffffc04208ac>] nfsd+0x19c/0x2b0 [nfsd]
> > [ 928.755681] [<ffffffffc0420715>] ? nfsd+0x5/0x2b0 [nfsd]
> > [ 928.756274] [<ffffffffc0420710>] ? nfsd_destroy+0x190/0x190 [nfsd]
> > [ 928.756991] [<ffffffff8f0d5891>] kthread+0x101/0x120
> > [ 928.757563] [<ffffffff8f10dcc5>] ?
> > trace_hardirqs_on_caller+0xf5/0x1b0
> > [ 928.758282] [<ffffffff8f8f2fef>] ret_from_fork+0x1f/0x40
> > [ 928.758875] [<ffffffff8f0d5790>] ?
> > kthread_create_on_node+0x250/0x250
> >
> >
> > So the client gets a flexfiles layout, and then tries to issue a v3
> > WRITE against the file. XFS then recalls the layout, but the client
> > can't return the layout until the v3 WRITE completes. Eventually this
> > should resolve itself after 2 lease periods, but that's quite a long
> > time.
> >
> > I guess XFS requires recalling block and SCSI layouts when the server
> > wants to issue a write (or someone writes to it locally), but that
> > seems like it shouldn't be happening when the layout is a flexfiles
> > layout.
> >
> > Any thoughts on what the right fix is here?
> >
> > On a related note, knfsd will spam the heck out of the client with
> > CB_LAYOUTRECALLs during this time. I think we ought to consider fixing
> > the server not to treat an NFS_OK return from the client like
> > NFS4ERR_DELAY there, but that would mean a different mechanism for
> > timing out a CB_LAYOUTRECALL.
>
> I'm getting into similar trouble with SCSI layouts when the client ends
> up
> submitting a WRITE because the IO is not page aligned, but it already
> holds
> a layout for that range. It looks like the server sends a
> CB_LAYOUTRECALL,
> but the client has to answer NFS4ERR_DELAY because it is still holding
> the
> layout.
>
> Probably, the client should return any layouts it holds for that range
> before
> doing IO through the MDS.
>
Yes, that might be good. Could even prefix the WRITE compound with a
LAYOUTRETURN if you want to get fancy. :)
> Alternatively, shouldn't the MDS accept IO from the same client that
> holds a
> layout for that range, rather than recall that layout? RFC 5661 Section
> 20.3.4 talks about the client submitting WRITEs before responding to
> CB_LAYOUTRECALL: "As always, the client may write the data through the
> metadata server."
>
Agreed. That seems reasonable too.
> I'm trying to find the discussion that resulted in this commit:
>
> commit 6b9b21073d3b250e17812cd562fffc9006962b39
> Author: Jeff Layton <[email protected]>
> Date: Tue Dec 8 07:23:48 2015 -0500
>
> nfsd: give up on CB_LAYOUTRECALLs after two lease periods
>
> Why should we poll the client if the client answers with NFS4ERR_DELAY?
> Can
> we instead just wait for the layout to be returned?
>
No. NFS4ERR_DELAY just means "I'm too busy to answer right now, please
call again later". You can't infer that the client has made any note of
the CB_LAYOUTRECALL at all since it didn't succeed.
Returning NFS4_OK on a CB_LAYOUTRECALL just means that you acknowledge
that it has been recalled and will eventually send a LAYOUTRETURN. It
doesn't mean that you are immediately returning it.
Probably what the client should do in this situation is mark the layout
as having been recalled and return NFS4_OK instead of NFS4ERR_DELAY. It
seems like that ought to be possible, but I haven't looked at the code
to see why that isn't occurring.
> Also, I think the 2*lease period timeout is currently broken because we
> reset
> tk_start after every call.. but that's not really causing any trouble.
>
It'd be good to fix that too, since you're in there...
--
Jeff Layton <[email protected]>