2016-06-02 23:03:39

by Oleg Drokin

[permalink] [raw]
Subject: NFS/d_splice_alias breakage

Hello!

I just came across a bug (trying to run some Lustre test scripts against NFS, while hunting for another nfsd bug)
that seems to be present since at least 2014 that lets users crash nfs client locally.

Here's some interesting comment quote first from d_obtain_alias:

> * Cluster filesystems may call this function with a negative, hashed dentry.
> * In that case, we know that the inode will be a regular file, and also this
> * will only occur during atomic_open. So we need to check for the dentry
> * being already hashed only in the final case.
> */
> struct dentry *d_splice_alias(struct inode *inode, struct dentry *dentry)
> {
> if (IS_ERR(inode))
> return ERR_CAST(inode);
>
> BUG_ON(!d_unhashed(dentry));
^^^^^^^^^^^^^^ - This does not align well with the quote above.

It got imported here by commit b5ae6b15bd73e35b129408755a0804287a87e041

Removing the BUG_ON headon is not going to work since the d_rehash of old
is now folded into __d_add and we might not want to move that condition there.

doing an
if (d_unhashed(dentry))
__d_add(dentry, inode);
else
d_instantiate(dentry, inode);

also does not look super pretty and who knows how all of the previous code
like _d_find_any_alias would react.

Al, I think you might want to chime in here on how to better handle this?

The problem was there at least since 3.10 it appears where the fs/nfs/dir.c code
was calling d_materialise_unique() that did require the dentry to be unhashed.

Not sure how this was not hit earlier. The crash looks like this (I added
a printk to ensure this is what is going on indeed and not some other weird race):

[ 64.489326] Calling into d_splice_alias with hashed dentry, dentry->d_inode (null) inode ffff88010f500c70
[ 64.489549] ------------[ cut here ]------------
[ 64.489642] kernel BUG at /home/green/bk/linux/fs/dcache.c:2989!
[ 64.489750] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 64.489853] Modules linked in: loop rpcsec_gss_krb5 joydev pcspkr acpi_cpufreq i2c_piix4 tpm_tis tpm nfsd drm_kms_helper ttm drm serio_raw virtio_blk
[ 64.491111] CPU: 6 PID: 7125 Comm: file_concat.sh Not tainted 4.7.0-rc1-vm-nfs+ #99
[ 64.492069] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 64.492489] task: ffff880110a801c0 ti: ffff8800c283c000 task.ti: ffff8800c283c000
[ 64.493159] RIP: 0010:[<ffffffff8124292f>] [<ffffffff8124292f>] d_splice_alias+0x31f/0x480
[ 64.493866] RSP: 0018:ffff8800c283fb20 EFLAGS: 00010282
[ 64.494238] RAX: 0000000000000067 RBX: ffff88010f500c70 RCX: 0000000000000000
[ 64.494625] RDX: 0000000000000067 RSI: ffff8800d08d2ed0 RDI: ffff88010f500c70
[ 64.495021] RBP: ffff8800c283fb78 R08: 0000000000000001 R09: 0000000000000000
[ 64.495407] R10: 0000000000000001 R11: 0000000000000001 R12: ffff8800d08d2ed0
[ 64.495804] R13: 0000000000000000 R14: ffff8800d4a56f00 R15: ffff8800d0bb8c70
[ 64.496199] FS: 00007f94ae25c700(0000) GS:ffff88011f580000(0000) knlGS:0000000000000000
[ 64.496859] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 64.497235] CR2: 000055e5d3fb46a4 CR3: 00000000ce364000 CR4: 00000000000006e0
[ 64.497626] Stack:
[ 64.497961] ffffffff8132154e ffff8800c283fb68 ffffffff81325916 0000000000000000
[ 64.498765] 0000000000000000 ffff8801109459c0 0000000000000000 ffff8800d0bb8c70
[ 64.499578] ffff8800c283fba0 ffff8800d08d2ed0 ffff8800cb927080 ffff8800c283fc18
[ 64.500385] Call Trace:
[ 64.500727] [<ffffffff8132154e>] ? nfs_lookup+0x17e/0x320
[ 64.501103] [<ffffffff81325916>] ? __put_nfs_open_context+0xc6/0xf0
[ 64.501477] [<ffffffff8132252b>] nfs_atomic_open+0x8b/0x430
[ 64.501850] [<ffffffff81236def>] lookup_open+0x29f/0x7a0
[ 64.502222] [<ffffffff812377ce>] path_openat+0x4de/0xfc0
[ 64.502591] [<ffffffff8123935e>] do_filp_open+0x7e/0xe0
[ 64.502964] [<ffffffff8124877c>] ? __alloc_fd+0xbc/0x170
[ 64.503347] [<ffffffff817f4977>] ? _raw_spin_unlock+0x27/0x40
[ 64.503719] [<ffffffff8124877c>] ? __alloc_fd+0xbc/0x170
[ 64.504097] [<ffffffff81226896>] do_sys_open+0x116/0x1f0
[ 64.504465] [<ffffffff8122698e>] SyS_open+0x1e/0x20
[ 64.504831] [<ffffffff817f5176>] entry_SYSCALL_64_fastpath+0x1e/0xad
[ 64.505218] Code: 01 e8 46 20 5b 00 85 db 74 0b 4c 89 ff 4c 63 fb e8 87 d8 ff ff 4c 89 e7 e8 2f 3c 00 00 4c 89 f8 e9 5e fe ff ff 0f 0b 48 89 f8 c3 <0f> 0b 48 8b 43 40 4c 8b 78 58 49 8d 8f 58 03 00 00 eb 02 f3 90
[ 64.508754] RIP [<ffffffff8124292f>] d_splice_alias+0x31f/0x480
[ 64.509176] RSP <ffff8800c283fb20>

Bye,
Oleg


2016-06-03 00:44:55

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS/d_splice_alias breakage

DQoNCk9uIDYvMi8xNiwgMTg6NDYsICJsaW51eC1uZnMtb3duZXJAdmdlci5rZXJuZWwub3JnIG9u
IGJlaGFsZiBvZiBPbGVnIERyb2tpbiIgPGxpbnV4LW5mcy1vd25lckB2Z2VyLmtlcm5lbC5vcmcg
b24gYmVoYWxmIG9mIGdyZWVuQGxpbnV4aGFja2VyLnJ1PiB3cm90ZToNCg0KPkhlbGxvIQ0KPg0K
PiAgIEkganVzdCBjYW1lIGFjcm9zcyBhIGJ1ZyAodHJ5aW5nIHRvIHJ1biBzb21lIEx1c3RyZSB0
ZXN0IHNjcmlwdHMgYWdhaW5zdCBORlMsIHdoaWxlIGh1bnRpbmcgZm9yIGFub3RoZXIgbmZzZCBi
dWcpDQo+ICAgdGhhdCBzZWVtcyB0byBiZSBwcmVzZW50IHNpbmNlIGF0IGxlYXN0IDIwMTQgdGhh
dCBsZXRzIHVzZXJzIGNyYXNoIG5mcyBjbGllbnQgbG9jYWxseS4NCj4NCj4gICBIZXJlJ3Mgc29t
ZSBpbnRlcmVzdGluZyBjb21tZW50IHF1b3RlIGZpcnN0IGZyb20gZF9vYnRhaW5fYWxpYXM6DQo+
DQo+PiAgKiBDbHVzdGVyIGZpbGVzeXN0ZW1zIG1heSBjYWxsIHRoaXMgZnVuY3Rpb24gd2l0aCBh
IG5lZ2F0aXZlLCBoYXNoZWQgZGVudHJ5Lg0KPj4gICogSW4gdGhhdCBjYXNlLCB3ZSBrbm93IHRo
YXQgdGhlIGlub2RlIHdpbGwgYmUgYSByZWd1bGFyIGZpbGUsIGFuZCBhbHNvIHRoaXMNCj4+ICAq
IHdpbGwgb25seSBvY2N1ciBkdXJpbmcgYXRvbWljX29wZW4uIFNvIHdlIG5lZWQgdG8gY2hlY2sg
Zm9yIHRoZSBkZW50cnkNCj4+ICAqIGJlaW5nIGFscmVhZHkgaGFzaGVkIG9ubHkgaW4gdGhlIGZp
bmFsIGNhc2UuDQo+PiAgKi8NCj4+IHN0cnVjdCBkZW50cnkgKmRfc3BsaWNlX2FsaWFzKHN0cnVj
dCBpbm9kZSAqaW5vZGUsIHN0cnVjdCBkZW50cnkgKmRlbnRyeSkNCj4+IHsNCj4+ICAgICAgICAg
aWYgKElTX0VSUihpbm9kZSkpDQo+PiAgICAgICAgICAgICAgICAgcmV0dXJuIEVSUl9DQVNUKGlu
b2RlKTsNCj4+IA0KPj4gICAgICAgICBCVUdfT04oIWRfdW5oYXNoZWQoZGVudHJ5KSk7DQo+ICAg
ICAgICBeXl5eXl5eXl5eXl5eXiAtIFRoaXMgZG9lcyBub3QgYWxpZ24gd2VsbCB3aXRoIHRoZSBx
dW90ZSBhYm92ZS4NCj4NCj5JdCBnb3QgaW1wb3J0ZWQgaGVyZSBieSBjb21taXQgYjVhZTZiMTVi
ZDczZTM1YjEyOTQwODc1NWEwODA0Mjg3YTg3ZTA0MQ0KPg0KPlJlbW92aW5nIHRoZSBCVUdfT04g
aGVhZG9uIGlzIG5vdCBnb2luZyB0byB3b3JrIHNpbmNlIHRoZSBkX3JlaGFzaCBvZiBvbGQNCj5p
cyBub3cgZm9sZGVkIGludG8gX19kX2FkZCBhbmQgd2UgbWlnaHQgbm90IHdhbnQgdG8gbW92ZSB0
aGF0IGNvbmRpdGlvbiB0aGVyZS4NCj4NCj5kb2luZyBhbg0KPmlmIChkX3VuaGFzaGVkKGRlbnRy
eSkpDQo+ICAgX19kX2FkZChkZW50cnksIGlub2RlKTsNCj5lbHNlDQo+ICAgZF9pbnN0YW50aWF0
ZShkZW50cnksIGlub2RlKTsNCj4NCj5hbHNvIGRvZXMgbm90IGxvb2sgc3VwZXIgcHJldHR5IGFu
ZCB3aG8ga25vd3MgaG93IGFsbCBvZiB0aGUgcHJldmlvdXMgY29kZQ0KPmxpa2UgX2RfZmluZF9h
bnlfYWxpYXMgd291bGQgcmVhY3QuDQo+DQo+QWwsIEkgdGhpbmsgeW91IG1pZ2h0IHdhbnQgdG8g
Y2hpbWUgaW4gaGVyZSBvbiBob3cgdG8gYmV0dGVyIGhhbmRsZSB0aGlzPw0KPg0KPlRoZSBwcm9i
bGVtIHdhcyB0aGVyZSBhdCBsZWFzdCBzaW5jZSAzLjEwIGl0IGFwcGVhcnMgd2hlcmUgdGhlIGZz
L25mcy9kaXIuYyBjb2RlDQo+d2FzIGNhbGxpbmcgZF9tYXRlcmlhbGlzZV91bmlxdWUoKSB0aGF0
IGRpZCByZXF1aXJlIHRoZSBkZW50cnkgdG8gYmUgdW5oYXNoZWQuDQo+DQo+Tm90IHN1cmUgaG93
IHRoaXMgd2FzIG5vdCBoaXQgZWFybGllci4gVGhlIGNyYXNoIGxvb2tzIGxpa2UgdGhpcyAoSSBh
ZGRlZA0KPmEgcHJpbnRrIHRvIGVuc3VyZSB0aGlzIGlzIHdoYXQgaXMgZ29pbmcgb24gaW5kZWVk
IGFuZCBub3Qgc29tZSBvdGhlciB3ZWlyZCByYWNlKToNCj4NCj5bICAgNjQuNDg5MzI2XSBDYWxs
aW5nIGludG8gZF9zcGxpY2VfYWxpYXMgd2l0aCBoYXNoZWQgZGVudHJ5LCBkZW50cnktPmRfaW5v
ZGUgKG51bGwpIGlub2RlIGZmZmY4ODAxMGY1MDBjNzANCj5bICAgNjQuNDg5NTQ5XSAtLS0tLS0t
LS0tLS1bIGN1dCBoZXJlIF0tLS0tLS0tLS0tLS0NCj5bICAgNjQuNDg5NjQyXSBrZXJuZWwgQlVH
IGF0IC9ob21lL2dyZWVuL2JrL2xpbnV4L2ZzL2RjYWNoZS5jOjI5ODkhDQo+WyAgIDY0LjQ4OTc1
MF0gaW52YWxpZCBvcGNvZGU6IDAwMDAgWyMxXSBTTVAgREVCVUdfUEFHRUFMTE9DDQo+WyAgIDY0
LjQ4OTg1M10gTW9kdWxlcyBsaW5rZWQgaW46IGxvb3AgcnBjc2VjX2dzc19rcmI1IGpveWRldiBw
Y3Nwa3IgYWNwaV9jcHVmcmVxIGkyY19waWl4NCB0cG1fdGlzIHRwbSBuZnNkIGRybV9rbXNfaGVs
cGVyIHR0bSBkcm0gc2VyaW9fcmF3IHZpcnRpb19ibGsNCj5bICAgNjQuNDkxMTExXSBDUFU6IDYg
UElEOiA3MTI1IENvbW06IGZpbGVfY29uY2F0LnNoIE5vdCB0YWludGVkIDQuNy4wLXJjMS12bS1u
ZnMrICM5OQ0KPlsgICA2NC40OTIwNjldIEhhcmR3YXJlIG5hbWU6IEJvY2hzIEJvY2hzLCBCSU9T
IEJvY2hzIDAxLzAxLzIwMTENCj5bICAgNjQuNDkyNDg5XSB0YXNrOiBmZmZmODgwMTEwYTgwMWMw
IHRpOiBmZmZmODgwMGMyODNjMDAwIHRhc2sudGk6IGZmZmY4ODAwYzI4M2MwMDANCj5bICAgNjQu
NDkzMTU5XSBSSVA6IDAwMTA6WzxmZmZmZmZmZjgxMjQyOTJmPl0gIFs8ZmZmZmZmZmY4MTI0Mjky
Zj5dIGRfc3BsaWNlX2FsaWFzKzB4MzFmLzB4NDgwDQo+WyAgIDY0LjQ5Mzg2Nl0gUlNQOiAwMDE4
OmZmZmY4ODAwYzI4M2ZiMjAgIEVGTEFHUzogMDAwMTAyODINCj5bICAgNjQuNDk0MjM4XSBSQVg6
IDAwMDAwMDAwMDAwMDAwNjcgUkJYOiBmZmZmODgwMTBmNTAwYzcwIFJDWDogMDAwMDAwMDAwMDAw
MDAwMA0KPlsgICA2NC40OTQ2MjVdIFJEWDogMDAwMDAwMDAwMDAwMDA2NyBSU0k6IGZmZmY4ODAw
ZDA4ZDJlZDAgUkRJOiBmZmZmODgwMTBmNTAwYzcwDQo+WyAgIDY0LjQ5NTAyMV0gUkJQOiBmZmZm
ODgwMGMyODNmYjc4IFIwODogMDAwMDAwMDAwMDAwMDAwMSBSMDk6IDAwMDAwMDAwMDAwMDAwMDAN
Cj5bICAgNjQuNDk1NDA3XSBSMTA6IDAwMDAwMDAwMDAwMDAwMDEgUjExOiAwMDAwMDAwMDAwMDAw
MDAxIFIxMjogZmZmZjg4MDBkMDhkMmVkMA0KPlsgICA2NC40OTU4MDRdIFIxMzogMDAwMDAwMDAw
MDAwMDAwMCBSMTQ6IGZmZmY4ODAwZDRhNTZmMDAgUjE1OiBmZmZmODgwMGQwYmI4YzcwDQo+WyAg
IDY0LjQ5NjE5OV0gRlM6ICAwMDAwN2Y5NGFlMjVjNzAwKDAwMDApIEdTOmZmZmY4ODAxMWY1ODAw
MDAoMDAwMCkga25sR1M6MDAwMDAwMDAwMDAwMDAwMA0KPlsgICA2NC40OTY4NTldIENTOiAgMDAx
MCBEUzogMDAwMCBFUzogMDAwMCBDUjA6IDAwMDAwMDAwODAwNTAwMzMNCj5bICAgNjQuNDk3MjM1
XSBDUjI6IDAwMDA1NWU1ZDNmYjQ2YTQgQ1IzOiAwMDAwMDAwMGNlMzY0MDAwIENSNDogMDAwMDAw
MDAwMDAwMDZlMA0KPlsgICA2NC40OTc2MjZdIFN0YWNrOg0KPlsgICA2NC40OTc5NjFdICBmZmZm
ZmZmZjgxMzIxNTRlIGZmZmY4ODAwYzI4M2ZiNjggZmZmZmZmZmY4MTMyNTkxNiAwMDAwMDAwMDAw
MDAwMDAwDQo+WyAgIDY0LjQ5ODc2NV0gIDAwMDAwMDAwMDAwMDAwMDAgZmZmZjg4MDExMDk0NTlj
MCAwMDAwMDAwMDAwMDAwMDAwIGZmZmY4ODAwZDBiYjhjNzANCj5bICAgNjQuNDk5NTc4XSAgZmZm
Zjg4MDBjMjgzZmJhMCBmZmZmODgwMGQwOGQyZWQwIGZmZmY4ODAwY2I5MjcwODAgZmZmZjg4MDBj
MjgzZmMxOA0KPlsgICA2NC41MDAzODVdIENhbGwgVHJhY2U6DQo+WyAgIDY0LjUwMDcyN10gIFs8
ZmZmZmZmZmY4MTMyMTU0ZT5dID8gbmZzX2xvb2t1cCsweDE3ZS8weDMyMA0KPlsgICA2NC41MDEx
MDNdICBbPGZmZmZmZmZmODEzMjU5MTY+XSA/IF9fcHV0X25mc19vcGVuX2NvbnRleHQrMHhjNi8w
eGYwDQo+WyAgIDY0LjUwMTQ3N10gIFs8ZmZmZmZmZmY4MTMyMjUyYj5dIG5mc19hdG9taWNfb3Bl
bisweDhiLzB4NDMwDQo+WyAgIDY0LjUwMTg1MF0gIFs8ZmZmZmZmZmY4MTIzNmRlZj5dIGxvb2t1
cF9vcGVuKzB4MjlmLzB4N2EwDQo+WyAgIDY0LjUwMjIyMl0gIFs8ZmZmZmZmZmY4MTIzNzdjZT5d
IHBhdGhfb3BlbmF0KzB4NGRlLzB4ZmMwDQo+WyAgIDY0LjUwMjU5MV0gIFs8ZmZmZmZmZmY4MTIz
OTM1ZT5dIGRvX2ZpbHBfb3BlbisweDdlLzB4ZTANCj5bICAgNjQuNTAyOTY0XSAgWzxmZmZmZmZm
ZjgxMjQ4NzdjPl0gPyBfX2FsbG9jX2ZkKzB4YmMvMHgxNzANCj5bICAgNjQuNTAzMzQ3XSAgWzxm
ZmZmZmZmZjgxN2Y0OTc3Pl0gPyBfcmF3X3NwaW5fdW5sb2NrKzB4MjcvMHg0MA0KPlsgICA2NC41
MDM3MTldICBbPGZmZmZmZmZmODEyNDg3N2M+XSA/IF9fYWxsb2NfZmQrMHhiYy8weDE3MA0KPlsg
ICA2NC41MDQwOTddICBbPGZmZmZmZmZmODEyMjY4OTY+XSBkb19zeXNfb3BlbisweDExNi8weDFm
MA0KPlsgICA2NC41MDQ0NjVdICBbPGZmZmZmZmZmODEyMjY5OGU+XSBTeVNfb3BlbisweDFlLzB4
MjANCj5bICAgNjQuNTA0ODMxXSAgWzxmZmZmZmZmZjgxN2Y1MTc2Pl0gZW50cnlfU1lTQ0FMTF82
NF9mYXN0cGF0aCsweDFlLzB4YWQNCj5bICAgNjQuNTA1MjE4XSBDb2RlOiAwMSBlOCA0NiAyMCA1
YiAwMCA4NSBkYiA3NCAwYiA0YyA4OSBmZiA0YyA2MyBmYiBlOCA4NyBkOCBmZiBmZiA0YyA4OSBl
NyBlOCAyZiAzYyAwMCAwMCA0YyA4OSBmOCBlOSA1ZSBmZSBmZiBmZiAwZiAwYiA0OCA4OSBmOCBj
MyA8MGY+IDBiIDQ4IDhiIDQzIDQwIDRjIDhiIDc4IDU4IDQ5IDhkIDhmIDU4IDAzIDAwIDAwIGVi
IDAyIGYzIDkwIA0KPlsgICA2NC41MDg3NTRdIFJJUCAgWzxmZmZmZmZmZjgxMjQyOTJmPl0gZF9z
cGxpY2VfYWxpYXMrMHgzMWYvMHg0ODANCj5bICAgNjQuNTA5MTc2XSAgUlNQIDxmZmZmODgwMGMy
ODNmYjIwPg0KDQpUaGF0IHdvdWxkIGhhdmUgdG8gYmUgYSByZWFsbHkgdGlnaHQgcmFjZSwgc2lu
Y2UgdGhlIGNvZGUgaW4gX25mczRfb3Blbl9hbmRfZ2V0X3N0YXRlKCkgY3VycmVudGx5IHJlYWRz
Og0KDQogICAgICAgICAgICAgICAgZF9kcm9wKGRlbnRyeSk7DQogICAgICAgICAgICAgICAgYWxp
YXMgPSBkX2V4YWN0X2FsaWFzKGRlbnRyeSwgc3RhdGUtPmlub2RlKTsNCiAgICAgICAgICAgICAg
ICBpZiAoIWFsaWFzKQ0KICAgICAgICAgICAgICAgICAgICAgICAgYWxpYXMgPSBkX3NwbGljZV9h
bGlhcyhpZ3JhYihzdGF0ZS0+aW5vZGUpLCBkZW50cnkpOw0KDQpJT1c6IHNvbWV0aGluZyB3b3Vs
ZCBoYXZlIHRvIGJlIGFjdGluZyBiZXR3ZWVuIHRoZSBkX2Ryb3AoKSBhbmQgZF9zcGxpY2VfYWxp
YXMoKSBhYm92ZS4uLg0KDQpBbCwgSeKAmXZlIGJlZW4gZGlzdHJhY3RlZCBieSBwZXJzb25hbCBt
YXR0ZXJzIGluIHRoZSBwYXN0IDYgbW9udGhzLiBXaGF0IGlzIHRoZXJlIHRvIGd1YXJhbnRlZSBl
eGNsdXNpb24gb2YgdGhlIHJlYWRkaXJwbHVzIGRlbnRyeSBpbnN0YW50aWF0aW9uIGFuZCB0aGUg
TkZTdjQgYXRvbWljIG9wZW4gaW4gdGhlIGJyYXZlIG5ldyB3b3JsZCBvZiBWRlMsIEp1bmUgMjAx
NiB2aW50YWdlPw0KDQpDaGVlcnMNCiAgVHJvbmQNCg0K


2016-06-03 00:54:19

by Oleg Drokin

[permalink] [raw]
Subject: Re: NFS/d_splice_alias breakage


On Jun 2, 2016, at 8:44 PM, Trond Myklebust wrote:

> That would have to be a really tight race, since the code in _nfs4_open_and_get_state() currently reads:
>
> d_drop(dentry);
> alias = d_exact_alias(dentry, state->inode);
> if (!alias)
> alias = d_splice_alias(igrab(state->inode), dentry);

We live in reality where there are no more tight races left.
1 instruction-race is now huge due to commonplace cpu-overcommitted VMs (and hyperthreading), nobody knows
when is your VM cpu going to be preempted by the host (while the other
cpus of the same VM are running ahead full steam).
Stuff that took ages to trigger again to better understand is instant now.

> IOW: something would have to be acting between the d_drop() and d_splice_alias() above?

The other CPU ;)
I checked the readdirplus theory and that's not it, there's some sort of another lookup going, this dentry we crashed on never came through nfs_prime_dcache.

> Al, I?ve been distracted by personal matters in the past 6 months. What is there to guarantee exclusion of the readdirplus dentry instantiation and the NFSv4 atomic open in the brave new world of VFS, June 2016 vintage?

Note this is not a new 2016 vintage bug. I hit this in 3.10 as well (baseline kernel from one well known enterprise vendor, only there we hit it in d_realise_unique).

2016-06-03 03:26:33

by Al Viro

[permalink] [raw]
Subject: Re: NFS/d_splice_alias breakage

On Thu, Jun 02, 2016 at 08:54:06PM -0400, Oleg Drokin wrote:

> > Al, I’ve been distracted by personal matters in the past 6 months. What is there to guarantee exclusion of the readdirplus dentry instantiation and the NFSv4 atomic open in the brave new world of VFS, June 2016 vintage?

d_alloc_parallel() use by all parties involved.

2016-06-03 03:28:58

by Al Viro

[permalink] [raw]
Subject: Re: NFS/d_splice_alias breakage

On Fri, Jun 03, 2016 at 12:44:51AM +0000, Trond Myklebust wrote:

> That would have to be a really tight race, since the code in _nfs4_open_and_get_state() currently reads:
>
> d_drop(dentry);
> alias = d_exact_alias(dentry, state->inode);
> if (!alias)
> alias = d_splice_alias(igrab(state->inode), dentry);
>
> IOW: something would have to be acting between the d_drop() and d_splice_alias() above...

How? dentry is
* negative (it would better be, or we are _really_ fucked)
* unhashed

How does whoever's rehashing it stumble across that thing?

2016-06-03 03:37:53

by Al Viro

[permalink] [raw]
Subject: Re: NFS/d_splice_alias breakage

On Thu, Jun 02, 2016 at 06:46:08PM -0400, Oleg Drokin wrote:
> Hello!
>
> I just came across a bug (trying to run some Lustre test scripts against NFS, while hunting for another nfsd bug)
> that seems to be present since at least 2014 that lets users crash nfs client locally.

> > * Cluster filesystems may call this function with a negative, hashed dentry.
> > * In that case, we know that the inode will be a regular file, and also this
> > * will only occur during atomic_open. So we need to check for the dentry
> > * being already hashed only in the final case.

Comment is long obsolete and should've been removed. "Cluster filesystem"
in question was GFS2 and it had been dealt with there. Mea culpa - should've
removed the comment as soon as that was done.

> Removing the BUG_ON headon is not going to work since the d_rehash of old
> is now folded into __d_add and we might not want to move that condition there.

No, it is not. It really should not be called that way.

> The problem was there at least since 3.10 it appears where the fs/nfs/dir.c code
> was calling d_materialise_unique() that did require the dentry to be unhashed.
>
> Not sure how this was not hit earlier. The crash looks like this (I added
> a printk to ensure this is what is going on indeed and not some other weird race):

> [ 64.489326] Calling into d_splice_alias with hashed dentry, dentry->d_inode (null) inode ffff88010f500c70

Which of the call sites had that been and how does one reproduce that fun?
If you feel that posting a reproducer in the open is a bad idea, just send
it off-list...

2016-06-03 03:38:43

by Al Viro

[permalink] [raw]
Subject: Re: NFS/d_splice_alias breakage

On Fri, Jun 03, 2016 at 04:26:25AM +0100, Al Viro wrote:
> On Thu, Jun 02, 2016 at 08:54:06PM -0400, Oleg Drokin wrote:
>
> > > Al, I’ve been distracted by personal matters in the past 6 months. What is there to guarantee exclusion of the readdirplus dentry instantiation and the NFSv4 atomic open in the brave new world of VFS, June 2016 vintage?
>
> d_alloc_parallel() use by all parties involved.

Grr... That should've been a reply to Trond, obviously.

2016-06-03 03:44:09

by Oleg Drokin

[permalink] [raw]
Subject: Re: NFS/d_splice_alias breakage


On Jun 2, 2016, at 11:37 PM, Al Viro wrote:

> On Thu, Jun 02, 2016 at 06:46:08PM -0400, Oleg Drokin wrote:
>> Hello!
>>
>> I just came across a bug (trying to run some Lustre test scripts against NFS, while hunting for another nfsd bug)
>> that seems to be present since at least 2014 that lets users crash nfs client locally.
>
>>> * Cluster filesystems may call this function with a negative, hashed dentry.
>>> * In that case, we know that the inode will be a regular file, and also this
>>> * will only occur during atomic_open. So we need to check for the dentry
>>> * being already hashed only in the final case.
>
> Comment is long obsolete and should've been removed. "Cluster filesystem"
> in question was GFS2 and it had been dealt with there. Mea culpa - should've
> removed the comment as soon as that was done.

Oh, ok. I assumed it was still valid, esp. considering the issue at hand where
what it describes actually happens and NFS is also a cluster filesystem of sorts ;)

>> The problem was there at least since 3.10 it appears where the fs/nfs/dir.c code
>> was calling d_materialise_unique() that did require the dentry to be unhashed.
>>
>> Not sure how this was not hit earlier. The crash looks like this (I added
>> a printk to ensure this is what is going on indeed and not some other weird race):
>
>> [ 64.489326] Calling into d_splice_alias with hashed dentry, dentry->d_inode (null) inode ffff88010f500c70
>
> Which of the call sites had that been and how does one reproduce that fun?
> If you feel that posting a reproducer in the open is a bad idea, just send
> it off-list...

This is fs/nfs/dir.c::nfs_lookup() right after no_entry label.

I'll send you the scripts with instructions separately for now.

2016-06-03 04:26:52

by Al Viro

[permalink] [raw]
Subject: Re: NFS/d_splice_alias breakage

On Thu, Jun 02, 2016 at 11:43:59PM -0400, Oleg Drokin wrote:

> > Which of the call sites had that been and how does one reproduce that fun?
> > If you feel that posting a reproducer in the open is a bad idea, just send
> > it off-list...
>
> This is fs/nfs/dir.c::nfs_lookup() right after no_entry label.

Bloody hell... Was that sucker hashed on the entry into nfs_lookup()?
If we get that with call as a method, we are really fucked.

<greps>

Ho-hum... One of the goto no_open; in nfs_atomic_open()? That would
mean a stale negative dentry in dcache that has escaped revalidation...
Wait a minute, didn't nfs ->d_revalidate() skip everything in some
cases, leaving it to ->atomic_open()?

Looks like the right thing to do would be to do d_drop() at no_open:,
just before we call nfs_lookup(). If not earlier, actually... How
about the following?

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index aaf7bd0..6e3a6f4 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1536,9 +1536,9 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
err = PTR_ERR(inode);
trace_nfs_atomic_open_exit(dir, ctx, open_flags, err);
put_nfs_open_context(ctx);
+ d_drop(dentry);
switch (err) {
case -ENOENT:
- d_drop(dentry);
d_add(dentry, NULL);
nfs_set_verifier(dentry, nfs_save_change_attribute(dir));
break;

2016-06-03 04:42:56

by Al Viro

[permalink] [raw]
Subject: Re: NFS/d_splice_alias breakage

On Fri, Jun 03, 2016 at 05:26:48AM +0100, Al Viro wrote:

> Looks like the right thing to do would be to do d_drop() at no_open:,
> just before we call nfs_lookup(). If not earlier, actually... How
> about the following?

A bit of rationale: dentry in question is negative and attempt to open
it has just failed; in case it's really negative we did that d_drop()
anyway (and then unconditionally rehashed it). In case when we proceed to
nfs_lookup() and it does not fail, we'll have it rehashed there (with the
right inode). What do we lose from doing d_drop() on *any* error here?
It's negative, with dubious validity. In the worst case, we had open
and lookup fail, but it's just us - the sucker really is negative and
somebody else would be able to revalidate it. If we drop it here (and
not rehash later), that somebody else will have to allocate an in-core
dentry before doing lookup or atomic_open. Which is negligible extra
burden.

> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index aaf7bd0..6e3a6f4 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -1536,9 +1536,9 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
> err = PTR_ERR(inode);
> trace_nfs_atomic_open_exit(dir, ctx, open_flags, err);
> put_nfs_open_context(ctx);
> + d_drop(dentry);
> switch (err) {
> case -ENOENT:
> - d_drop(dentry);
> d_add(dentry, NULL);
> nfs_set_verifier(dentry, nfs_save_change_attribute(dir));
> break;

2016-06-03 04:53:04

by Al Viro

[permalink] [raw]
Subject: Re: NFS/d_splice_alias breakage

On Fri, Jun 03, 2016 at 05:42:53AM +0100, Al Viro wrote:
> On Fri, Jun 03, 2016 at 05:26:48AM +0100, Al Viro wrote:
>
> > Looks like the right thing to do would be to do d_drop() at no_open:,
> > just before we call nfs_lookup(). If not earlier, actually... How
> > about the following?
>
> A bit of rationale: dentry in question is negative and attempt to open
> it has just failed; in case it's really negative we did that d_drop()
> anyway (and then unconditionally rehashed it). In case when we proceed to
> nfs_lookup() and it does not fail, we'll have it rehashed there (with the
> right inode). What do we lose from doing d_drop() on *any* error here?
> It's negative, with dubious validity. In the worst case, we had open
> and lookup fail, but it's just us - the sucker really is negative and
> somebody else would be able to revalidate it. If we drop it here (and
> not rehash later), that somebody else will have to allocate an in-core
> dentry before doing lookup or atomic_open. Which is negligible extra
> burden.

I suspect that it got broken in commit 275bb3078 (NFSv4: Move dentry
instantiation into the NFSv4-specific atomic open code). Prior to that
commit, d_drop() was there (error or no error). Looks like 3.10+, indeed.
I agree that we shouldn't drop it on successful open, but it needs to be
done on errors. All of them, not just ENOENT, as in that commit.

2016-06-03 04:58:20

by Oleg Drokin

[permalink] [raw]
Subject: Re: NFS/d_splice_alias breakage


On Jun 3, 2016, at 12:26 AM, Al Viro wrote:

> On Thu, Jun 02, 2016 at 11:43:59PM -0400, Oleg Drokin wrote:
>
>>> Which of the call sites had that been and how does one reproduce that fun?
>>> If you feel that posting a reproducer in the open is a bad idea, just send
>>> it off-list...
>>
>> This is fs/nfs/dir.c::nfs_lookup() right after no_entry label.
>
> Bloody hell... Was that sucker hashed on the entry into nfs_lookup()?
> If we get that with call as a method, we are really fucked.
>
> <greps>
>
> Ho-hum... One of the goto no_open; in nfs_atomic_open()? That would
> mean a stale negative dentry in dcache that has escaped revalidation...
> Wait a minute, didn't nfs ->d_revalidate() skip everything in some
> cases, leaving it to ->atomic_open()?
>
> Looks like the right thing to do would be to do d_drop() at no_open:,
> just before we call nfs_lookup(). If not earlier, actually... How
> about the following?

This one cures the insta-crash I was having, and I see no other ill-effects so far.

>
> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index aaf7bd0..6e3a6f4 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -1536,9 +1536,9 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
> err = PTR_ERR(inode);
> trace_nfs_atomic_open_exit(dir, ctx, open_flags, err);
> put_nfs_open_context(ctx);
> + d_drop(dentry);
> switch (err) {
> case -ENOENT:
> - d_drop(dentry);
> d_add(dentry, NULL);
> nfs_set_verifier(dentry, nfs_save_change_attribute(dir));
> break;


2016-06-03 05:56:59

by Al Viro

[permalink] [raw]
Subject: Re: NFS/d_splice_alias breakage

On Fri, Jun 03, 2016 at 12:58:10AM -0400, Oleg Drokin wrote:

> This one cures the insta-crash I was having, and I see no other ill-effects so far.

OK... I can take it through vfs.git, but I think it'd be better off in
NFS tree. Is everyone OK with something like the following?

make nfs_atomic_open() call d_drop() on all ->open_context() errors.

In "NFSv4: Move dentry instantiation into the NFSv4-specific atomic open code"
unconditional d_drop() after the ->open_context() had been removed. It had
been correct for success cases (there ->open_context() itself had been doing
dcache manipulations), but not for error ones. Only one of those (ENOENT)
got a compensatory d_drop() added in that commit, but in fact it should've
been done for all errors. As it is, the case of O_CREAT non-exclusive open
on a hashed negative dentry racing with e.g. symlink creation from another
client ended up with ->open_context() getting an error and proceeding to
call nfs_lookup(). On a hashed dentry, which would've instantly triggered
BUG_ON() in d_materialise_unique() (or, these days, its equivalent in
d_splice_alias()).

Cc: [email protected] # v3.10+
Tested-by: Oleg Drokin <[email protected]>
Signed-off-by: Al Viro <[email protected]>
---
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index aaf7bd0..6e3a6f4 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1536,9 +1536,9 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
err = PTR_ERR(inode);
trace_nfs_atomic_open_exit(dir, ctx, open_flags, err);
put_nfs_open_context(ctx);
+ d_drop(dentry);
switch (err) {
case -ENOENT:
- d_drop(dentry);
d_add(dentry, NULL);
nfs_set_verifier(dentry, nfs_save_change_attribute(dir));
break;

2016-06-06 23:36:38

by Oleg Drokin

[permalink] [raw]
Subject: Re: NFS/d_splice_alias breakage

Well, I have some bad news.

This patch does not fix the issue 100% of the time apparently, I just hit it again.

Only this time it's much harder to trigger, but stack is the same
(looks a bit different due to a compiler change). Must be some much narrower race now.

I still don't have a crashdump, though (apparently makedumpfile that is used by
kexec-tools is behind times and does not work with 4.7.0-rc1 kernels) so I cannot
tell you more about the dentry still.

[12470.365211] kernel BUG at /home/green/bk/linux/fs/dcache.c:2989!
[12470.366336] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
[12470.366927] Modules linked in: loop rpcsec_gss_krb5 joydev pcspkr acpi_cpufreq i2c_piix4 tpm_tis virtio_console tpm nfsd ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm serio_raw virtio_blk floppy
[12470.368917] CPU: 7 PID: 15952 Comm: cat Not tainted 4.7.0-rc1-vm-nfs+ #115
[12470.369554] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[12470.370137] task: ffff8800447447c0 ti: ffff880049a48000 task.ti: ffff880049a48000
[12470.371214] RIP: 0010:[<ffffffff81288061>] [<ffffffff81288061>] d_splice_alias+0x1e1/0x470
[12470.372340] RSP: 0018:ffff880049a4bab8 EFLAGS: 00010286
[12470.372906] RAX: ffff8800393372a8 RBX: ffff88003c781000 RCX: 0000000000000001
[12470.373525] RDX: 0000000000001895 RSI: ffff88003c781000 RDI: ffff8800393372a8
[12470.374145] RBP: ffff880049a4baf0 R08: 00001353641935c2 R09: 0000000000000000
[12470.374777] R10: 0000000000000001 R11: 0000000000000000 R12: ffff88003a7b9300
[12470.375407] R13: 0000000000000000 R14: ffff88003bf0d2a8 R15: 0000000000000000
[12470.376016] FS: 00007fbb07feb700(0000) GS:ffff88006b800000(0000) knlGS:0000000000000000
[12470.377106] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[12470.377693] CR2: 000055a498c7bdc8 CR3: 00000000479b3000 CR4: 00000000000006e0
[12470.378311] Stack:
[12470.378823] ffff880040008f00 0000000029e67876 ffff88003c781000 ffff88003a7b9300
[12470.379946] 0000000000000000 ffff88003bf0d2a8 0000000000000000 ffff880049a4bb48
[12470.381075] ffffffff8137d63c ffffffffffffffeb ffff880000000000 0000000000000000
[12470.382228] Call Trace:
[12470.382766] [<ffffffff8137d63c>] nfs_lookup+0x15c/0x420
[12470.383363] [<ffffffff8137f681>] nfs_atomic_open+0xb1/0x700
[12470.383961] [<ffffffff812792ea>] lookup_open+0x2ea/0x770
[12470.384570] [<ffffffff8127c76f>] path_openat+0x7ff/0x1030
[12470.385169] [<ffffffff8127d15f>] ? getname_flags+0x4f/0x1f0
[12470.385770] [<ffffffff8104a485>] ? kvm_sched_clock_read+0x25/0x40
[12470.386361] [<ffffffff8127e1d1>] do_filp_open+0x91/0x100
[12470.386945] [<ffffffff8188aa97>] ? _raw_spin_unlock+0x27/0x40
[12470.387559] [<ffffffff8128f810>] ? __alloc_fd+0x100/0x200
[12470.388158] [<ffffffff8126a230>] do_sys_open+0x130/0x220
[12470.388758] [<ffffffff8126a33e>] SyS_open+0x1e/0x20
[12470.389327] [<ffffffff8188b3fc>] entry_SYSCALL_64_fastpath+0x1f/0xbd
[12470.389929] Code: 83 c4 10 4c 89 f8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 48 89 df e8 f1 d6 ff ff 4c 89 f7 e8 19 2a 60 00 45 31 ff eb d9 49 89 ff eb d4 <0f> 0b 48 8b 43 50 4c 8b 78 68 49 8d 97 c8 03 00 00 eb 02 f3 90
[12470.392299] RIP [<ffffffff81288061>] d_splice_alias+0x1e1/0x470


On Jun 3, 2016, at 1:56 AM, Al Viro wrote:

> On Fri, Jun 03, 2016 at 12:58:10AM -0400, Oleg Drokin wrote:
>
>> This one cures the insta-crash I was having, and I see no other ill-effects so far.
>
> OK... I can take it through vfs.git, but I think it'd be better off in
> NFS tree. Is everyone OK with something like the following?
>
> make nfs_atomic_open() call d_drop() on all ->open_context() errors.
>
> In "NFSv4: Move dentry instantiation into the NFSv4-specific atomic open code"
> unconditional d_drop() after the ->open_context() had been removed. It had
> been correct for success cases (there ->open_context() itself had been doing
> dcache manipulations), but not for error ones. Only one of those (ENOENT)
> got a compensatory d_drop() added in that commit, but in fact it should've
> been done for all errors. As it is, the case of O_CREAT non-exclusive open
> on a hashed negative dentry racing with e.g. symlink creation from another
> client ended up with ->open_context() getting an error and proceeding to
> call nfs_lookup(). On a hashed dentry, which would've instantly triggered
> BUG_ON() in d_materialise_unique() (or, these days, its equivalent in
> d_splice_alias()).
>
> Cc: [email protected] # v3.10+
> Tested-by: Oleg Drokin <[email protected]>
> Signed-off-by: Al Viro <[email protected]>
> ---
> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index aaf7bd0..6e3a6f4 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -1536,9 +1536,9 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
> err = PTR_ERR(inode);
> trace_nfs_atomic_open_exit(dir, ctx, open_flags, err);
> put_nfs_open_context(ctx);
> + d_drop(dentry);
> switch (err) {
> case -ENOENT:
> - d_drop(dentry);
> d_add(dentry, NULL);
> nfs_set_verifier(dentry, nfs_save_change_attribute(dir));
> break;


2016-06-10 01:33:37

by Oleg Drokin

[permalink] [raw]
Subject: Re: NFS/d_splice_alias breakage


On Jun 6, 2016, at 7:36 PM, Oleg Drokin wrote:

> Well, I have some bad news.
>
> This patch does not fix the issue 100% of the time apparently, I just hit it again.

Ok, so now finally a piece of good news.
Whatever was causing this other much harder to hit crash is gone in -rc2, gone are
some other disturbing things I saw.

Hopefully this patch will get propagated everywhere soonish.

2016-06-10 16:49:28

by Oleg Drokin

[permalink] [raw]
Subject: Re: NFS/d_splice_alias breakage


On Jun 9, 2016, at 9:33 PM, Oleg Drokin wrote:

>
> On Jun 6, 2016, at 7:36 PM, Oleg Drokin wrote:
>
>> Well, I have some bad news.
>>
>> This patch does not fix the issue 100% of the time apparently, I just hit it again.
>
> Ok, so now finally a piece of good news.
> Whatever was causing this other much harder to hit crash is gone in -rc2, gone are
> some other disturbing things I saw.

Famous last words, I guess.
It all returned overnight.

But this time it's different from the past couple.

0xffffffff813aa8bb is in nfs4_do_open (/home/green/bk/linux/fs/nfs/nfs4proc.c:2482).
2477 d_drop(dentry);
2478 alias = d_exact_alias(dentry, state->inode);
2479 if (!alias)
2480 alias = d_splice_alias(igrab(state->inode), dentry);
2481 /* d_splice_alias() can't fail here - it's a non-directory */

So it appears the dentry that was negative turned positive in the middle of
d_exact_alias call… and also despite us doing d_drop(dentry), it's also already
hashed?
If this can happen in the middle of our operations here, same thing
presumably can happen in the other place we patched up in nfs_atomic_open -
we do d_drop there, but by the time we get into d_splice_alias it's now
all hashed again by some racing thread, that would explain
some of the rarer earlier crashes I had in -rc1 after the initial fix.

I wonder if it could be a remote-fs specific open vs open race?

Say we have atomic open for parent1/file1, this locks parent1 and proceeds to lookup
file1.
Now before the lookup commences, some other thread moves file1 to parent2
and yet some other thread goes to open parent2/file1.
Eventually it all converges with two threads trying to instantiate file1,
if we get it "just wrong" then a clash like what we see can happen, no?
Hm, I guess then both opens would have their own dentry and it's only inode that's
shared, so that cannot be it.
How can anything find a dentry we presumably just allocated and hash it while we
are not done with it, I wonder?
Also I wonder if all of this is somehow related to this other problem I am having
where drop_nlink reports going into negative territory on rename() call, I hit this
twice already and I guess I just need to convert that to BUG_ON instead for a
closer examination.


The dentry is (I finally have a crashdump):

crash> p *(struct dentry *)0xffff880055dbd2e8
$4 = {
d_flags = 4718796,
d_seq = {
sequence = 4,
dep_map = {
key = 0xffffffff8337b1c0 <__key.41115>,
class_cache = {0x0, 0x0},
name = 0xffffffff81c79c66 "&dentry->d_seq",
cpu = 6,
ip = 6510615555426900570
}
},
d_hash = {
next = 0x0,
pprev = 0xffffc900000fd9c0
},
d_parent = 0xffff8800079d1008,
d_name = {
{
{
hash = 2223188567,
len = 2
},
hash_len = 10813123159
},
name = 0xffff880055dbd358 "10"
},
d_inode = 0xffff880066d8ab38,
d_iname = "10\000ZZZZZZZZZZZZZZZZZZZZZZZZZZZZ",
d_lockref = {
{
{
lock = {
{
rlock = {
raw_lock = {
val = {
counter = 0
}
},
magic = 3735899821,
owner_cpu = 4294967295,
owner = 0xffffffffffffffff,
dep_map = {
key = 0xffffffff8337b1c8 <__key.41114>,
class_cache = {0xffffffff82c1c3a0 <lock_classes+47616>, 0x0},
name = 0xffffffff81c65bc8 "&(&dentry->d_lockref.lock)->rlock",
cpu = 3,
ip = 18446744071583760701
}
},
{
__padding = "\000\000\000\000╜N╜чЪЪЪЪZZZZЪЪЪЪЪЪЪЪ",
dep_map = {
key = 0xffffffff8337b1c8 <__key.41114>,
class_cache = {0xffffffff82c1c3a0 <lock_classes+47616>, 0x0},
name = 0xffffffff81c65bc8 "&(&dentry->d_lockref.lock)->rlock",
cpu = 3,
ip = 18446744071583760701
}
}
}
},
count = 3
}
}
},
d_op = 0xffffffff81a46780 <nfs4_dentry_operations>,
d_sb = 0xffff88004eaf3000,
d_time = 6510615555426956154,
d_fsdata = 0x0,
{
d_lru = {
next = 0xffff880066d7e3e8,
prev = 0xffff8800036736c8
},
d_wait = 0xffff880066d7e3e8
},
d_child = {
next = 0xffff8800268629b8,
prev = 0xffff8800079d1128
},
d_subdirs = {
next = 0xffff880055dbd408,
prev = 0xffff880055dbd408
},
d_u = {
d_alias = {
next = 0x0,
pprev = 0xffff880066d8ad20
},
d_in_lookup_hash = {
next = 0x0,
pprev = 0xffff880066d8ad20
},
d_rcu = {
next = 0x0,
func = 0xffff880066d8ad20
}
}
}

[22845.516232] kernel BUG at /home/green/bk/linux/fs/dcache.c:2989!
[22845.516864] invalid opcode: 0000 [#1] SMP
[22845.517350] Modules linked in: loop rpcsec_gss_krb5 acpi_cpufreq joydev tpm_tis tpm virtio_console i2c_piix4 pcspkr nfsd ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm virtio_blk serio_raw floppy
[22845.519236] CPU: 0 PID: 2894259 Comm: cat Not tainted 4.7.0-rc2-vm-nfs+ #122
[22845.519843] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[22845.520259] task: ffff8800640d8e40 ti: ffff88004665c000 task.ti: ffff88004665c000
[22845.520975] RIP: 0010:[<ffffffff81287811>] [<ffffffff81287811>] d_splice_alias+0x1e1/0x470
[22845.521746] RSP: 0018:ffff88004665fa08 EFLAGS: 00010286
[22845.522122] RAX: ffff880066d8ab38 RBX: 0000000000000000 RCX: 0000000000000001
[22845.522524] RDX: 000000000000191a RSI: ffff880055dbd2e8 RDI: ffff880066d8ab38
[22845.522926] RBP: ffff88004665fa40 R08: 0000235352190a66 R09: ffff880055dbd2e8
[22845.523328] R10: 0000000000000001 R11: 0000000000000000 R12: ffff880072d43c00
[22845.523731] R13: ffff880072d43c00 R14: ffff88004883a580 R15: ffff880052615800
[22845.524131] FS: 00007f3b2c27d700(0000) GS:ffff88006b400000(0000) knlGS:0000000000000000
[22845.524846] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[22845.525225] CR2: 000055649285c0d0 CR3: 000000004d961000 CR4: 00000000000006f0
[22845.525628] Stack:
[22845.525964] ffff88004665fa20 ffffffff8188a1f7 0000000000000000 ffff880072d43c00
[22845.526700] ffff880072d43c00 ffff88004883a580 ffff880052615800 ffff88004665fb20
[22845.527433] ffffffff813aa8bb ffffffff00000000 ffff8800024000c0 ffff88004665fd80
[22845.528161] Call Trace:
[22845.528504] [<ffffffff8188a1f7>] ? _raw_spin_unlock+0x27/0x40
[22845.528943] [<ffffffff813aa8bb>] nfs4_do_open+0xaeb/0xb10
[22845.529331] [<ffffffff813aa9d0>] nfs4_atomic_open+0xf0/0x110
[22845.529718] [<ffffffff8137eefc>] nfs_atomic_open+0x1ac/0x700
[22845.530112] [<ffffffff8127900a>] lookup_open+0x2ea/0x770
[22845.530488] [<ffffffff8127bee5>] path_openat+0x6e5/0xc20
[22845.530881] [<ffffffff8104a425>] ? kvm_sched_clock_read+0x25/0x40
[22845.531268] [<ffffffff8127d981>] do_filp_open+0x91/0x100
[22845.531645] [<ffffffff8188a1f7>] ? _raw_spin_unlock+0x27/0x40
[22845.532027] [<ffffffff8128efc0>] ? __alloc_fd+0x100/0x200
[22845.532405] [<ffffffff81269f80>] do_sys_open+0x130/0x220
[22845.533157] [<ffffffff8126a08e>] SyS_open+0x1e/0x20

vmcore is at http://knox.linuxhacker.ru/tmp/dentry2/vmcore.gz
vmlinux is at http://knox.linuxhacker.ru/tmp/dentry2/vmlinux.gz


2016-06-20 13:25:41

by Oleg Drokin

[permalink] [raw]
Subject: Re: NFS/d_splice_alias breakage

It looks like this patch was totally forgotten?
I don't see it in neither vfs nor nfs trees and yet it fixes a very easy to cause
crash in nfs code. And I think it's unrelated to the other parallel case too.

On Jun 3, 2016, at 1:56 AM, Al Viro wrote:

> On Fri, Jun 03, 2016 at 12:58:10AM -0400, Oleg Drokin wrote:
>
>> This one cures the insta-crash I was having, and I see no other ill-effects so far.
>
> OK... I can take it through vfs.git, but I think it'd be better off in
> NFS tree. Is everyone OK with something like the following?
>
> make nfs_atomic_open() call d_drop() on all ->open_context() errors.
>
> In "NFSv4: Move dentry instantiation into the NFSv4-specific atomic open code"
> unconditional d_drop() after the ->open_context() had been removed. It had
> been correct for success cases (there ->open_context() itself had been doing
> dcache manipulations), but not for error ones. Only one of those (ENOENT)
> got a compensatory d_drop() added in that commit, but in fact it should've
> been done for all errors. As it is, the case of O_CREAT non-exclusive open
> on a hashed negative dentry racing with e.g. symlink creation from another
> client ended up with ->open_context() getting an error and proceeding to
> call nfs_lookup(). On a hashed dentry, which would've instantly triggered
> BUG_ON() in d_materialise_unique() (or, these days, its equivalent in
> d_splice_alias()).
>
> Cc: [email protected] # v3.10+
> Tested-by: Oleg Drokin <[email protected]>
> Signed-off-by: Al Viro <[email protected]>
> ---
> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index aaf7bd0..6e3a6f4 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -1536,9 +1536,9 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
> err = PTR_ERR(inode);
> trace_nfs_atomic_open_exit(dir, ctx, open_flags, err);
> put_nfs_open_context(ctx);
> + d_drop(dentry);
> switch (err) {
> case -ENOENT:
> - d_drop(dentry);
> d_add(dentry, NULL);
> nfs_set_verifier(dentry, nfs_save_change_attribute(dir));
> break;


2016-06-20 14:09:10

by Al Viro

[permalink] [raw]
Subject: Re: NFS/d_splice_alias breakage

On Mon, Jun 20, 2016 at 09:25:12AM -0400, Oleg Drokin wrote:
> It looks like this patch was totally forgotten?
> I don't see it in neither vfs nor nfs trees and yet it fixes a very easy to cause
> crash in nfs code. And I think it's unrelated to the other parallel case too.

I assumed it would go through NFS tree, seeing that it's NFS-specific and
has nothing to do with any of the recent VFS changes (oops is triggerable
starting from 3.11); I can certainly put it through vfs.git, and there
will be changes nearby, but this one should go into -stable as a separate
patch.

2016-06-20 14:54:45

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS/d_splice_alias breakage


> On Jun 20, 2016, at 10:08, Al Viro <[email protected]> wrote:
>
> On Mon, Jun 20, 2016 at 09:25:12AM -0400, Oleg Drokin wrote:
>> It looks like this patch was totally forgotten?
>> I don't see it in neither vfs nor nfs trees and yet it fixes a very easy to cause
>> crash in nfs code. And I think it's unrelated to the other parallel case too.
>
> I assumed it would go through NFS tree, seeing that it's NFS-specific and
> has nothing to do with any of the recent VFS changes (oops is triggerable
> starting from 3.11); I can certainly put it through vfs.git, and there
> will be changes nearby, but this one should go into -stable as a separate
> patch.
>

I?ll take it through the NFS tree.

Cheers
Trond


2016-06-20 15:28:44

by Al Viro

[permalink] [raw]
Subject: Re: NFS/d_splice_alias breakage

On Mon, Jun 20, 2016 at 02:54:36PM +0000, Trond Myklebust wrote:
>
> > On Jun 20, 2016, at 10:08, Al Viro <[email protected]> wrote:
> >
> > On Mon, Jun 20, 2016 at 09:25:12AM -0400, Oleg Drokin wrote:
> >> It looks like this patch was totally forgotten?
> >> I don't see it in neither vfs nor nfs trees and yet it fixes a very easy to cause
> >> crash in nfs code. And I think it's unrelated to the other parallel case too.
> >
> > I assumed it would go through NFS tree, seeing that it's NFS-specific and
> > has nothing to do with any of the recent VFS changes (oops is triggerable
> > starting from 3.11); I can certainly put it through vfs.git, and there
> > will be changes nearby, but this one should go into -stable as a separate
> > patch.
> >
>
> I’ll take it through the NFS tree.

OK. It's really a -stable fodder, BTW - all you need to trigger that oops is
a hashed negative dentry from earlier lookup + symlink created from another
client + attempt to open from ours. Gets you d_splice_alias() (or
d_materialise_unique() prior to 3.19) with hashed dentry and that triggers
BUG_ON, leaving us with the parent directory locked.

2016-06-20 15:46:16

by Oleg Drokin

[permalink] [raw]
Subject: Re: NFS/d_splice_alias breakage


On Jun 20, 2016, at 11:43 AM, Anna Schumaker wrote:

> Hi,
>
> On 06/20/2016 10:08 AM, Al Viro wrote:
>> On Mon, Jun 20, 2016 at 09:25:12AM -0400, Oleg Drokin wrote:
>>> It looks like this patch was totally forgotten?
>>> I don't see it in neither vfs nor nfs trees and yet it fixes a very easy to cause
>>> crash in nfs code. And I think it's unrelated to the other parallel case too.
>>
>> I assumed it would go through NFS tree, seeing that it's NFS-specific and
>> has nothing to do with any of the recent VFS changes (oops is triggerable
>> starting from 3.11); I can certainly put it through vfs.git, and there
>> will be changes nearby, but this one should go into -stable as a separate
>> patch.
>
> I was going to put together an nfs bugfixes pull request for 4.7 this week, so I can include the patch there if this is easy to hit.

Yes, it is very easy to hit.


2016-06-20 15:50:25

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFS/d_splice_alias breakage

DQo+IE9uIEp1biAyMCwgMjAxNiwgYXQgMTE6NDMsIEFubmEgU2NodW1ha2VyIDxBbm5hLlNjaHVt
YWtlckBuZXRhcHAuY29tPiB3cm90ZToNCj4gDQo+IEhpLA0KPiANCj4gT24gMDYvMjAvMjAxNiAx
MDowOCBBTSwgQWwgVmlybyB3cm90ZToNCj4+IE9uIE1vbiwgSnVuIDIwLCAyMDE2IGF0IDA5OjI1
OjEyQU0gLTA0MDAsIE9sZWcgRHJva2luIHdyb3RlOg0KPj4+IEl0IGxvb2tzIGxpa2UgdGhpcyBw
YXRjaCB3YXMgdG90YWxseSBmb3Jnb3R0ZW4/DQo+Pj4gSSBkb24ndCBzZWUgaXQgaW4gbmVpdGhl
ciB2ZnMgbm9yIG5mcyB0cmVlcyBhbmQgeWV0IGl0IGZpeGVzIGEgdmVyeSBlYXN5IHRvIGNhdXNl
DQo+Pj4gY3Jhc2ggaW4gbmZzIGNvZGUuIEFuZCBJIHRoaW5rIGl0J3MgdW5yZWxhdGVkIHRvIHRo
ZSBvdGhlciBwYXJhbGxlbCBjYXNlIHRvby4NCj4+IA0KPj4gSSBhc3N1bWVkIGl0IHdvdWxkIGdv
IHRocm91Z2ggTkZTIHRyZWUsIHNlZWluZyB0aGF0IGl0J3MgTkZTLXNwZWNpZmljIGFuZA0KPj4g
aGFzIG5vdGhpbmcgdG8gZG8gd2l0aCBhbnkgb2YgdGhlIHJlY2VudCBWRlMgY2hhbmdlcyAob29w
cyBpcyB0cmlnZ2VyYWJsZQ0KPj4gc3RhcnRpbmcgZnJvbSAzLjExKTsgSSBjYW4gY2VydGFpbmx5
IHB1dCBpdCB0aHJvdWdoIHZmcy5naXQsIGFuZCB0aGVyZQ0KPj4gd2lsbCBiZSBjaGFuZ2VzIG5l
YXJieSwgYnV0IHRoaXMgb25lIHNob3VsZCBnbyBpbnRvIC1zdGFibGUgYXMgYSBzZXBhcmF0ZQ0K
Pj4gcGF0Y2guDQo+IA0KPiBJIHdhcyBnb2luZyB0byBwdXQgdG9nZXRoZXIgYW4gbmZzIGJ1Z2Zp
eGVzIHB1bGwgcmVxdWVzdCBmb3IgNC43IHRoaXMgd2Vlaywgc28gSSBjYW4gaW5jbHVkZSB0aGUg
cGF0Y2ggdGhlcmUgaWYgdGhpcyBpcyBlYXN5IHRvIGhpdC4NCj4gDQoNCkhpIEFubmEsDQoNClBs
ZWFzZSBkbywgYW5kIHBsZWFzZSBrZWVwIHRoZSBDYzogc3RhYmxl4oCmDQoNClRoYW5rcw0KIFRy
b25kDQoNCg==


2016-06-20 15:43:48

by Anna Schumaker

[permalink] [raw]
Subject: Re: NFS/d_splice_alias breakage

Hi,

On 06/20/2016 10:08 AM, Al Viro wrote:
> On Mon, Jun 20, 2016 at 09:25:12AM -0400, Oleg Drokin wrote:
>> It looks like this patch was totally forgotten?
>> I don't see it in neither vfs nor nfs trees and yet it fixes a very easy to cause
>> crash in nfs code. And I think it's unrelated to the other parallel case too.
>
> I assumed it would go through NFS tree, seeing that it's NFS-specific and
> has nothing to do with any of the recent VFS changes (oops is triggerable
> starting from 3.11); I can certainly put it through vfs.git, and there
> will be changes nearby, but this one should go into -stable as a separate
> patch.

I was going to put together an nfs bugfixes pull request for 4.7 this week, so I can include the patch there if this is easy to hit.

Anna

> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>