Benny Halevy reported the following oops when testing RHEL6:
<7>nfs_update_inode: inode 892950 mode changed, 0040755 to 0100644
<1>BUG: unable to handle kernel NULL pointer dereference at (null)
<1>IP: [<ffffffffa02a52c5>] nfs_closedir+0x15/0x30 [nfs]
<4>PGD 81448a067 PUD 831632067 PMD 0
<4>Oops: 0000 [#1] SMP
<4>last sysfs file: /sys/kernel/mm/redhat_transparent_hugepage/enabled
<4>CPU 6
<4>Modules linked in: fuse bonding 8021q garp ebtable_nat ebtables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi softdog bridge stp llc xt_physdev ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_multiport iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 dm_round_robin dm_multipath objlayoutdriver2(U) nfs(U) lockd fscache auth_rpcgss nfs_acl sunrpc vhost_net macvtap macvlan tun kvm_intel kvm be2net igb dca ptp pps_core microcode serio_raw sg iTCO_wdt iTCO_vendor_support i7core_edac edac_core shpchp ext4 mbcache jbd2 sd_mod crc_t10dif ahci dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
<4>
<4>Pid: 6332, comm: dd Not tainted 2.6.32-358.el6.x86_64 #1 HP ProLiant DL170e G6 /ProLiant DL170e G6
<4>RIP: 0010:[<ffffffffa02a52c5>] [<ffffffffa02a52c5>] nfs_closedir+0x15/0x30 [nfs]
<4>RSP: 0018:ffff88081458bb98 EFLAGS: 00010292
<4>RAX: ffffffffa02a52b0 RBX: 0000000000000000 RCX: 0000000000000003
<4>RDX: ffffffffa02e45a0 RSI: ffff88081440b300 RDI: ffff88082d5f5760
<4>RBP: ffff88081458bba8 R08: 0000000000000000 R09: 0000000000000000
<4>R10: 0000000000000772 R11: 0000000000400004 R12: 0000000040000008
<4>R13: ffff88082d5f5760 R14: ffff88082d6e8800 R15: ffff88082f12d780
<4>FS: 00007f728f37e700(0000) GS:ffff8800456c0000(0000) knlGS:0000000000000000
<4>CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
<4>CR2: 0000000000000000 CR3: 0000000831279000 CR4: 00000000000007e0
<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>Process dd (pid: 6332, threadinfo ffff88081458a000, task ffff88082fa0e040)
<4>Stack:
<4> 0000000040000008 ffff88081440b300 ffff88081458bbf8 ffffffff81182745
<4><d> ffff88082d5f5760 ffff88082d6e8800 ffff88081458bbf8 ffffffffffffffea
<4><d> ffff88082f12d780 ffff88082d6e8800 ffffffffa02a50a0 ffff88082d5f5760
<4>Call Trace:
<4> [<ffffffff81182745>] __fput+0xf5/0x210
<4> [<ffffffffa02a50a0>] ? do_open+0x0/0x20 [nfs]
<4> [<ffffffff81182885>] fput+0x25/0x30
<4> [<ffffffff8117e23e>] __dentry_open+0x27e/0x360
<4> [<ffffffff811c397a>] ? inotify_d_instantiate+0x2a/0x60
<4> [<ffffffff8117e4b9>] lookup_instantiate_filp+0x69/0x90
<4> [<ffffffffa02a6679>] nfs_intent_set_file+0x59/0x90 [nfs]
<4> [<ffffffffa02a686b>] nfs_atomic_lookup+0x1bb/0x310 [nfs]
<4> [<ffffffff8118e0c2>] __lookup_hash+0x102/0x160
<4> [<ffffffff81225052>] ? selinux_inode_permission+0x72/0xb0
<4> [<ffffffff8118e76a>] lookup_hash+0x3a/0x50
<4> [<ffffffff81192a4b>] do_filp_open+0x2eb/0xdd0
<4> [<ffffffff8104757c>] ? __do_page_fault+0x1ec/0x480
<4> [<ffffffff8119f562>] ? alloc_fd+0x92/0x160
<4> [<ffffffff8117de79>] do_sys_open+0x69/0x140
<4> [<ffffffff811811f6>] ? sys_lseek+0x66/0x80
<4> [<ffffffff8117df90>] sys_open+0x20/0x30
<4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
<4>Code: 65 48 8b 04 25 c8 cb 00 00 83 a8 44 e0 ff ff 01 5b 41 5c c9 c3 90 55 48 89 e5 53 48 83 ec 08 0f 1f 44 00 00 48 8b 9e a0 00 00 00 <48> 8b 3b e8 13 0c f7 ff 48 89 df e8 ab 3d ec e0 48 83 c4 08 31
<1>RIP [<ffffffffa02a52c5>] nfs_closedir+0x15/0x30 [nfs]
<4> RSP <ffff88081458bb98>
<4>CR2: 0000000000000000
I think this is ultimately due to a bug on the server. The client had
previously found a directory dentry. It then later tried to do an atomic
open on a new (regular file) dentry. The attributes it got back had the
same filehandle as the previously found directory inode. It then tried
to put the filp because it failed the aops tests for O_DIRECT opens, and
oopsed here because the ctx was still NULL.
Obviously the root cause here is a server issue, but we can take steps
to mitigate this on the client. When nfs_fhget is called, we always know
what type of inode it is. In the event that there's a broken or
malicious server on the other end of the wire, the client can end up
crashing because the wrong ops are set on it.
Have nfs_find_actor check that the inode type is correct after checking
the fileid. The fileid check should rarely ever match, so it should only
rarely ever get to this check. In the case where we have a broken
server, we may see two different inodes with the same i_ino, but the
client should be able to cope with them without crashing.
This should fix the oops reported here:
https://bugzilla.redhat.com/show_bug.cgi?id=913660
Reported-by: Benny Halevy <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfs/inode.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 6acc73c..f52c99f 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -237,6 +237,8 @@ nfs_find_actor(struct inode *inode, void *opaque)
if (NFS_FILEID(inode) != fattr->fileid)
return 0;
+ if ((S_IFMT & inode->i_mode) != (S_IFMT & fattr->mode))
+ return 0;
if (nfs_compare_fh(NFS_FH(inode), fh))
return 0;
if (is_bad_inode(inode) || NFS_STALE(inode))
--
1.7.11.7
> Quite frankly, all I care about in a situation like this is that the
> client doesn't Oops.
Certainly, and his patch does do that, but it's also pointing out
there's another bug running around. And once you fix that bug, the
original patch is no longer needed.
> If a server is really this utterly broken, then there is no way we can
> fix it on the client, and we're not even going to try.
Of course. But you also don't want to unnecessarily leave the
client with an invalid inode that's not properly flagged and
possibly leave another bug unfixed.
Here is a example patch that I feel better addresses the problem:
commit 2d6b411eea04ae4398707b584b8d9e552606aaf7
Author: Quentin Barnes <[email protected]>
Date: Wed Jul 31 17:50:35 2013 -0500
Have nfs_refresh_inode_locked() ensure that it doesn't return without
flagging invalid inodes (ones that don't match its fattr type).
nfs_refresh_inode() already does this, so we need to check the return
status from nfs_check_inode_attributes() before returning from
nfs_refresh_inode_locked().
Once this hole is plugged, there will be no invalid inode references
returned by nfs_fhget(), so no need to check in nfs_find_actor().
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index af6e806..d2263a5 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -244,8 +244,6 @@ nfs_find_actor(struct inode *inode, void *opaque)
if (NFS_FILEID(inode) != fattr->fileid)
return 0;
- if ((S_IFMT & inode->i_mode) != (S_IFMT & fattr->mode))
- return 0;
if (nfs_compare_fh(NFS_FH(inode), fh))
return 0;
if (is_bad_inode(inode) || NFS_STALE(inode))
@@ -1269,9 +1267,16 @@ static int nfs_inode_attrs_need_update(const struct inode *inode, const struct n
static int nfs_refresh_inode_locked(struct inode *inode, struct nfs_fattr *fattr)
{
+ int status;
+
if (nfs_inode_attrs_need_update(inode, fattr))
return nfs_update_inode(inode, fattr);
- return nfs_check_inode_attributes(inode, fattr);
+
+ status = nfs_check_inode_attributes(inode, fattr);
+ if (status)
+ nfs_invalidate_inode(inode);
+
+ return status;
}
/**
T24gV2VkLCAyMDEzLTA3LTMxIGF0IDE1OjI2IC0wNTAwLCBRdWVudGluIEJhcm5lcyB3cm90ZToN
Cj4gT24gV2VkLCBKdWwgMzEsIDIwMTMgYXQgMTo0NiBQTSwgSmVmZiBMYXl0b24gPGpsYXl0b25A
cmVkaGF0LmNvbT4gd3JvdGU6DQo+ID4gT24gV2VkLCAzMSBKdWwgMjAxMyAxMzozNjoyMSAtMDUw
MA0KPiA+IFF1ZW50aW4gQmFybmVzIDxxYmFybmVzQGdtYWlsLmNvbT4gd3JvdGU6DQo+ID4NCj4g
Pj4gSSB3YXMgcmV2aWV3aW5nIHBhdGNoZXMgSSB3YXMgaW50ZWdyYXRpbmcgaW50byBteSBORlMg
c291cmNlIGJhc2UNCj4gPj4gYW5kIHRoaXMgb25lIHBvcHBlZCB1cCAoY29tbWl0IGY2NDg4Yzli
KS4gIEluIHJldmlld2luZyBpdCwgaXQgd291bGQNCj4gPj4gZml4IHRoZSBwcm9ibGVtLCBob3dl
dmVyLCBJIHRoaW5rIGl0J3Mgbm90IHRoZSBiZXN0IGZpeCBmb3IgaXQuDQo+ID4+IENhdGNoaW5n
IGl0IGluIG5mc19maW5kX2FjdG9yKCkgaXMgYWZ0ZXIgdGhlIGhvcnNlIGhhcyBsZWZ0IHRoZQ0K
PiA+PiBiYXJuIHNvIHRvIHNwZWFrLg0KPiA+Pg0KPiA+PiBJIHRoaW5rIHRoZSBwcm9ibGVtIGlz
IGluIG5mc19maGdldCgpIHdoZW4gY29uc3RydWN0aW5nIHRoZSBuZXcNCj4gPj4gaW5vZGUgKGlu
c2lkZSB0aGUgaWYgKGlub2RlLT5pX3N0YXRlICYgSV9ORVcpIHsuLi59KSBhbmQgc2hvdWxkIGJl
DQo+ID4+IGFkZHJlc3NlZCB0aGVyZS4NCj4gPj4NCj4gPj4gVGhlIG5mcyBtb2R1bGUgYWxyZWFk
eSBoYXMgY2hlY2tzIHRvIGVuc3VyZSB0aGF0IHRoZSBleHByZXNzaW9uDQo+ID4+ICIoKFNfSUZN
VCAmIGlub2RlLT5pX21vZGUpID09IChTX0lGTVQgJiBmYXR0ci0+bW9kZSkpIiBpcyB0cnVlDQo+
ID4+IGluIG5mc191cGRhdGVfaW5vZGUoKSBhbmQgbmZzX2NoZWNrX2lub2RlX2F0dHJpYnV0ZXMo
KS4gIEkgdGhpbmsNCj4gPj4gcHJvYmxlbSBCZW5ueSBoaXQgd2FzIHRoYXQgdGhlIGVxdWl2YWxl
bnQgY2hlY2sgaXMgbWlzc2luZyBpbg0KPiA+PiBuZnNfZmhnZXQoKSB3aGVuIGNvbnN0cnVjdGlu
ZyBhIG5ldyBpbm9kZS4gIFRoZSBzYW1lIGNoZWNrIHRoYXQncw0KPiA+PiBpbiB0aG9zZSBvdGhl
ciB0d28gZnVuY3Rpb25zIG5lZWRzIHRvIGJlIGFkZGVkIHRvIG5mc19maGdldCgpJ3MgImlmDQo+
ID4+IChpbm9kZS0+aV9zdGF0ZSAmIElfTkVXKSB7Li4ufSIgY29uZGl0aW9uYWwgYmxvY2sgdG8g
cHJldmVudCB0aGUNCj4gPj4gcHJvYmxlbSBvZiBhbiBpbmNvbnNpc3RlbnQgZmF0dHIvaW5vZGUg
c3RhdGUgZnJvbSBvY2N1cnJpbmcgaW4gdGhlDQo+ID4+IGZpcnN0IHBsYWNlLg0KPiA+Pg0KPiA+
PiBJIGNhbiBjb21lIHVwIHdpdGggYSBwYXRjaCBpZiB5b3UnZCBsaWtlLCBidXQgSSB3YW50ZWQg
dG8gbWFrZQ0KPiA+PiBzdXJlIG15IGFzc2VydGlvbnMgd2VyZSB2YWxpZCBmaXJzdC4gIElzIHJl
bW92aW5nIGEgY2hlY2sgZnJvbQ0KPiA+PiBuZnNfZmluZF9hY3RvcigpIGFuZCBhZGRpbmcgdGhl
IGNoZWNrIHRvIG5mc19maGdldCgpIHRvIHByZXZlbnQgdGhlDQo+ID4+IGluY29uc2lzdGVuY3kg
aW4gdGhlIGZpcnN0IHBsYWNlIGEgYmV0dGVyIGFwcHJvYWNoPw0KPiA+Pg0KPiA+PiBRdWVudGlu
DQo+ID4NCj4gPiBJIGRvbid0IHRoaW5rIEkgYWdyZWUuIFRoZSB3aG9sZSBwcm9ibGVtIHdhcyB0
aGF0IHdlIHdlcmUgYWxsb3dpbmcNCj4gPiBpZ2V0NV9sb2NrZWQgdG8gbWF0Y2ggYW4gZXhpc3Rp
bmcgaW5vZGUgZXZlbiB0aG91Z2ggKFNfSUZNVCAmIG1vZGUpIHdhcw0KPiA+IGRpZmZlcmVudC4N
Cj4gDQo+IEkgYXNzZXJ0IHRoYXQgd2hlbiBhbiBpbmNvbnNpc3RlbnQgc3RhdGUgaGFwcGVucyB5
b3UgZG9uJ3Qgd2FudA0KPiBuZnNfZmluZF9hY3RvcigpIHRvIGJlIHRoZSBvbmUgcmVzcG9uc2li
bGUgZm9yIG5vdGljaW5nIGl0DQo+IChhY3R1YWxseSwgaWdub3JpbmcgdGhlIHByb2JsZW0pLiAg
WW91IHdhbnQgdGhlIGV4aXN0aW5nIGNvbnNpc3RlbmN5DQo+IGFuZCBlcnJvciByZWNvdmVyeSBw
YXRocyB0byBoYW5kbGUgaXQgaW5zdGVhZC4NCj4gDQo+IFNvIG9uZSBvciB0d28gdGhpbmdzIGFy
ZSBnb2luZyBvbi4gIFRoZSBleGlzdGluZyBlcnJvciByZWNvdmVyeQ0KPiBwYXRocyBhcmUgaW5h
ZGVxdWF0ZSBvciB0aGF0IHRoZXJlJ3MgYSBob2xlIGluIHRoZSBmYXR0ci9pbm9kZQ0KPiBjb25z
aXN0ZW5jeSBjaGVja2luZyAob3IgcG9zc2libHkgYm90aCkuICBZb3VyIHBhdGNoIGNvdmVycyBv
dmVyDQo+IHRoaXMgdW5kZXJseWluZyBidWcgcmF0aGVyIHRoYW4gYWRkcmVzc2luZyBpdCBhbmQg
d2hpY2ggYWxzbyBsZXRzDQo+IHRoZSBvbGQgaW5vZGUgbm90IGdldCBmbGFnZ2VkIGFzIGludmFs
aWQuICBJIHdvdWxkIHJhdGhlciBzZWUgdGhlDQo+IHJlYWwgcHJvYmxlbSBmaXhlZCBhbmQgdGhl
IG9sZCBpbm9kZSBnZXQgZmxhZ2dlZCByYXRoZXIgdGhhbiBqdXN0DQo+IGlnbm9yaW5nIGl0IGFu
ZCBjcmVhdGluZyBhbm90aGVyIG5ldyBpbm9kZS4NCj4gDQo+IEJlZm9yZSB5b3VyIHBhdGNoLCBp
ZiBiZWNhdXNlIG9mIGEgc2VydmVyIGJ1ZyByZXVzaW5nIGEgZmlsZQ0KPiBoYW5kbGUsIG5mc19m
aW5kX2FjdG9yKCksIHdvdWxkIG1hdGNoIGFuZCBpZ2V0NV9sb2NrZWQoKSB3b3VsZA0KPiByZXR1
cm4gYW4gaW5vZGUgdGhhdCBtYXRjaGVkIG9uIGl0cyBmaWxlIGhhbmRsZSBidXQgdGhhdCBkaWRu
J3QNCj4gbmVjZXNzYXJpbHkgbWF0Y2ggdGhlIGZhdHRyIGluZm8uDQo+IA0KPiBJZiB0aGF0IGlu
b2RlIHJldHVybmVkIGJ5IGlnZXQ1X2xvY2tlZCgpIHdhcyBub3QgSV9ORVcgKHRoZSBlbHNlDQo+
IGNhc2UpLCBuZnNfZmhnZXQoKSB3b3VsZCBpbnZva2UgbmZzX3JlZnJlc2hfaW5vZGUoKSwgd2hp
Y2ggd291bGQNCj4gY2FsbCBuZnNfcmVmcmVzaF9pbm9kZV9sb2NrZWQoKS4gIEl0IGNhbGxzIGVp
dGhlciBuZnNfdXBkYXRlX2lub2RlKCkNCj4gb3IgbmZzX2NoZWNrX2lub2RlX2F0dHJpYnV0ZXMo
KS4gIEJvdGggb2YgdGhvc2UgZnVuY3Rpb25zIGFscmVhZHkNCj4gY2hlY2sgZm9yIGZhdHRyL2lu
b2RlIFNfSUZNVCBjb25zaXN0ZW5jeSBjYXRjaGluZyBiYWQgc3RhdGVzLg0KPiANCj4gPiBXaHkg
d291bGQgaXQgYmUgcHJlZmVyYWJsZSB0byB1c2UgdGhlIHNhbWUgc3RydWN0IGlub2RlIGV2ZW4g
dGhvdWdoDQo+ID4gaXQncyBjbGVhcmx5IGEgZGlmZmVyZW50IG9uZSBvbiB0aGUgc2VydmVyPyBp
bm9kZXMgZ2VuZXJhbGx5IGRvbid0DQo+ID4gY2hhbmdlIHRoZWlyIHR5cGUgYWZ0ZXIgdGhleSd2
ZSBiZWVuIGNyZWF0ZWQuDQo+IA0KPiBJdCdzIG5vdCBwcmVmZXJhYmxlIHRvIGhhdmUgdGhlIHNh
bWUgaW5vZGUgaW4gYSBkaWZmZXJlbnQgc3RhdGUuDQo+IFdoYXQgeW91IGRvbid0IHdhbnQgaXMg
bmZzX2ZpbmRfYWN0b3IoKSBhcyBhIHNpZGUtZWZmZWN0IHRvIGxldCB0aGUNCj4gcHJvYmxlbSBn
byB1bm5vdGljZWQuICBZb3Ugd2FudCBsZXQgdGhlIG5vcm1hbCwgZXhpc3RpbmcgZXJyb3INCj4g
aGFuZGxpbmcgcGF0aHMgZGVhbCB3aXRoIGl0LCBvciBpZiB0aGV5J3JlIG5vdCBmdWxseSBkZWFs
aW5nIHdpdGgNCj4gaXQsIGZpeCB0aGVtLg0KPiANCj4gU2luY2UgdGhlIElfTkVXIGNvbmRpdGlv
bmFsIGZhbHNlIHBhdGggKHRoZSBlbHNlIGNhc2UpIGluIG5mc19maGdldCgpDQo+IHNob3VsZCBj
YXRjaCB0aGUgcHJvYmxlbSwgSSBoYWQgdGhvdWdodCB0aGF0IHRoZSBwcm9ibGVtIG11c3QgbGF5
DQo+IHdpdGggdGhlIElfTkVXIGNvbmRpdGlvbmFsIHRydWUgcGF0aC4gIE5vdyBJIGRvbid0IHRo
aW5rIHNvLg0KPiANCj4gSSBzZWUgbm93IHRoYXQgbmZzX2NoZWNrX2lub2RlX2F0dHJpYnV0ZXMo
KSBpcyBqdXN0IHJldHVybmluZyBhbg0KPiBlcnJvciB3aGVuIHRoZSBmYXR0ciBhbmQgaW5vZGUg
c3RhdGVzIGFyZSBmb3VuZCB0byBiZSBpbmNvbnNpc3RlbnQsDQo+IGJ1dCBpcyB0YWtpbmcgbm8g
YWN0aW9uIG9uIHRoZSBiYWQgaW5vZGUncyBzdGF0ZSAoaS5lLiBmbGFnZ2luZyBpdA0KPiBhcyBp
bnZhbGlkKS4gIEFuZCB1cG9uIHJldHVybiwgbmZzX2ZoZ2V0KCkgaXMgbm90IGRvaW5nIGFueXRo
aW5nIHdpdGgNCj4gdGhlIGJhZCByZXR1cm4gc3RhdGUgcGFzc2VkIHVwIHRvIG5mc19yZWZyZXNo
X2lub2RlKCkuICBOb3cgSSdtDQo+IHRoaW5raW5nIHRoYXQgdGhpcyBpcyBwcm9ibGVtYXRpYyBo
b2xlIGJlY2F1c2UgaXQgaXMgaWdub3JpbmcgdGhlDQo+IGVycm9yIHN0YXRlIGFuZCB0aGF0IGxl
dCBCZW5ueSdzIHBhbmljIGhhcHBlbi4NCj4gDQo+IEl0IGRvZXNuJ3Qgc2VlbSB0byBtZSB0aGF0
IGl0IHNob3VsZCBiZSBuZnNfY2hlY2tfaW5vZGVfYXR0cmlidXRlcygpDQo+IGpvYiB0byBmbGFn
IHRoZSBpbm9kZSBhcyBpbnZhbGlkLiAgVGhhdCBzZWVtcyBvdXRzaWRlIGl0cyByb2xlLiAgSQ0K
PiB3b3VsZCB0aGluayBpdCBzaG91bGQgYmUgbmZzX3JlZnJlc2hfaW5vZGVfbG9ja2VkKCkncyBq
b2IsIGJ1dCBJJ20NCj4gc3RpbGwgcmV2aWV3aW5nIHRoZSBjb2RlLg0KPiANCj4gVGhvdWdodHM/
DQoNClF1aXRlIGZyYW5rbHksIGFsbCBJIGNhcmUgYWJvdXQgaW4gYSBzaXR1YXRpb24gbGlrZSB0
aGlzIGlzIHRoYXQgdGhlDQpjbGllbnQgZG9lc24ndCBPb3BzLg0KDQpJZiBhIHNlcnZlciBpcyBy
ZWFsbHkgdGhpcyB1dHRlcmx5IGJyb2tlbiwgdGhlbiB0aGVyZSBpcyBubyB3YXkgd2UgY2FuDQpm
aXggaXQgb24gdGhlIGNsaWVudCwgYW5kIHdlJ3JlIG5vdCBldmVuIGdvaW5nIHRvIHRyeS4NCg0K
LS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBjbGllbnQgbWFpbnRhaW5lcg0KDQpOZXRB
cHANClRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29tDQp3d3cubmV0YXBwLmNvbQ0K
On Wed, 31 Jul 2013 13:36:21 -0500
Quentin Barnes <[email protected]> wrote:
> I was reviewing patches I was integrating into my NFS source base
> and this one popped up (commit f6488c9b). In reviewing it, it would
> fix the problem, however, I think it's not the best fix for it.
> Catching it in nfs_find_actor() is after the horse has left the
> barn so to speak.
>
> I think the problem is in nfs_fhget() when constructing the new
> inode (inside the if (inode->i_state & I_NEW) {...}) and should be
> addressed there.
>
> The nfs module already has checks to ensure that the expression
> "((S_IFMT & inode->i_mode) == (S_IFMT & fattr->mode))" is true
> in nfs_update_inode() and nfs_check_inode_attributes(). I think
> problem Benny hit was that the equivalent check is missing in
> nfs_fhget() when constructing a new inode. The same check that's
> in those other two functions needs to be added to nfs_fhget()'s "if
> (inode->i_state & I_NEW) {...}" conditional block to prevent the
> problem of an inconsistent fattr/inode state from occurring in the
> first place.
>
> I can come up with a patch if you'd like, but I wanted to make
> sure my assertions were valid first. Is removing a check from
> nfs_find_actor() and adding the check to nfs_fhget() to prevent the
> inconsistency in the first place a better approach?
>
> Quentin
>
> On Wed, Feb 27, 2013 at 7:10 PM, Jeff Layton <[email protected]> wrote:
> > Benny Halevy reported the following oops when testing RHEL6:
> [...]
> > This should fix the oops reported here:
> >
> > https://bugzilla.redhat.com/show_bug.cgi?id=913660
> >
> > Reported-by: Benny Halevy <[email protected]>
> > Signed-off-by: Jeff Layton <[email protected]>
> > ---
> > fs/nfs/inode.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> > index 6acc73c..f52c99f 100644
> > --- a/fs/nfs/inode.c
> > +++ b/fs/nfs/inode.c
> > @@ -237,6 +237,8 @@ nfs_find_actor(struct inode *inode, void *opaque)
> >
> > if (NFS_FILEID(inode) != fattr->fileid)
> > return 0;
> > + if ((S_IFMT & inode->i_mode) != (S_IFMT & fattr->mode))
> > + return 0;
> > if (nfs_compare_fh(NFS_FH(inode), fh))
> > return 0;
> > if (is_bad_inode(inode) || NFS_STALE(inode))
> > --
> > 1.7.11.7
> >
> > --
I don't think I agree. The whole problem was that we were allowing
iget5_locked to match an existing inode even though (S_IFMT & mode) was
different.
Why would it be preferable to use the same struct inode even though
it's clearly a different one on the server? inodes generally don't
change their type after they've been created.
That said, patches speak louder than words so if you have one to
propose I can certainly take a look. Maybe it'll make sense to me
then...
--
Jeff Layton <[email protected]>
On Wed, 31 Jul 2013 20:32:24 +0000
"Myklebust, Trond" <[email protected]> wrote:
> On Wed, 2013-07-31 at 15:26 -0500, Quentin Barnes wrote:
> > On Wed, Jul 31, 2013 at 1:46 PM, Jeff Layton <[email protected]> wrote:
> > > On Wed, 31 Jul 2013 13:36:21 -0500
> > > Quentin Barnes <[email protected]> wrote:
> > >
> > >> I was reviewing patches I was integrating into my NFS source base
> > >> and this one popped up (commit f6488c9b). In reviewing it, it would
> > >> fix the problem, however, I think it's not the best fix for it.
> > >> Catching it in nfs_find_actor() is after the horse has left the
> > >> barn so to speak.
> > >>
> > >> I think the problem is in nfs_fhget() when constructing the new
> > >> inode (inside the if (inode->i_state & I_NEW) {...}) and should be
> > >> addressed there.
> > >>
> > >> The nfs module already has checks to ensure that the expression
> > >> "((S_IFMT & inode->i_mode) == (S_IFMT & fattr->mode))" is true
> > >> in nfs_update_inode() and nfs_check_inode_attributes(). I think
> > >> problem Benny hit was that the equivalent check is missing in
> > >> nfs_fhget() when constructing a new inode. The same check that's
> > >> in those other two functions needs to be added to nfs_fhget()'s "if
> > >> (inode->i_state & I_NEW) {...}" conditional block to prevent the
> > >> problem of an inconsistent fattr/inode state from occurring in the
> > >> first place.
> > >>
> > >> I can come up with a patch if you'd like, but I wanted to make
> > >> sure my assertions were valid first. Is removing a check from
> > >> nfs_find_actor() and adding the check to nfs_fhget() to prevent the
> > >> inconsistency in the first place a better approach?
> > >>
> > >> Quentin
> > >
> > > I don't think I agree. The whole problem was that we were allowing
> > > iget5_locked to match an existing inode even though (S_IFMT & mode) was
> > > different.
> >
> > I assert that when an inconsistent state happens you don't want
> > nfs_find_actor() to be the one responsible for noticing it
> > (actually, ignoring the problem). You want the existing consistency
> > and error recovery paths to handle it instead.
> >
> > So one or two things are going on. The existing error recovery
> > paths are inadequate or that there's a hole in the fattr/inode
> > consistency checking (or possibly both). Your patch covers over
> > this underlying bug rather than addressing it and which also lets
> > the old inode not get flagged as invalid. I would rather see the
> > real problem fixed and the old inode get flagged rather than just
> > ignoring it and creating another new inode.
> >
> > Before your patch, if because of a server bug reusing a file
> > handle, nfs_find_actor(), would match and iget5_locked() would
> > return an inode that matched on its file handle but that didn't
> > necessarily match the fattr info.
> >
> > If that inode returned by iget5_locked() was not I_NEW (the else
> > case), nfs_fhget() would invoke nfs_refresh_inode(), which would
> > call nfs_refresh_inode_locked(). It calls either nfs_update_inode()
> > or nfs_check_inode_attributes(). Both of those functions already
> > check for fattr/inode S_IFMT consistency catching bad states.
> >
> > > Why would it be preferable to use the same struct inode even though
> > > it's clearly a different one on the server? inodes generally don't
> > > change their type after they've been created.
> >
> > It's not preferable to have the same inode in a different state.
> > What you don't want is nfs_find_actor() as a side-effect to let the
> > problem go unnoticed. You want let the normal, existing error
> > handling paths deal with it, or if they're not fully dealing with
> > it, fix them.
> >
> > Since the I_NEW conditional false path (the else case) in nfs_fhget()
> > should catch the problem, I had thought that the problem must lay
> > with the I_NEW conditional true path. Now I don't think so.
> >
> > I see now that nfs_check_inode_attributes() is just returning an
> > error when the fattr and inode states are found to be inconsistent,
> > but is taking no action on the bad inode's state (i.e. flagging it
> > as invalid). And upon return, nfs_fhget() is not doing anything with
> > the bad return state passed up to nfs_refresh_inode(). Now I'm
> > thinking that this is problematic hole because it is ignoring the
> > error state and that let Benny's panic happen.
> >
> > It doesn't seem to me that it should be nfs_check_inode_attributes()
> > job to flag the inode as invalid. That seems outside its role. I
> > would think it should be nfs_refresh_inode_locked()'s job, but I'm
> > still reviewing the code.
> >
> > Thoughts?
>
> Quite frankly, all I care about in a situation like this is that the
> client doesn't Oops.
>
> If a server is really this utterly broken, then there is no way we can
> fix it on the client, and we're not even going to try.
>
Agreed. For the record, this problem came about due to a bug in the
server against which Benny was testing. It's not a situation that
should ever occur under normal conditions.
--
Jeff Layton <[email protected]>
On Wed, 31 Jul 2013 18:00:51 -0500
Quentin Barnes <[email protected]> wrote:
> > Quite frankly, all I care about in a situation like this is that the
> > client doesn't Oops.
>
> Certainly, and his patch does do that, but it's also pointing out
> there's another bug running around. And once you fix that bug, the
> original patch is no longer needed.
>
> > If a server is really this utterly broken, then there is no way we can
> > fix it on the client, and we're not even going to try.
>
> Of course. But you also don't want to unnecessarily leave the
> client with an invalid inode that's not properly flagged and
> possibly leave another bug unfixed.
>
> Here is a example patch that I feel better addresses the problem:
>
>
> commit 2d6b411eea04ae4398707b584b8d9e552606aaf7
> Author: Quentin Barnes <[email protected]>
> Date: Wed Jul 31 17:50:35 2013 -0500
>
> Have nfs_refresh_inode_locked() ensure that it doesn't return without
> flagging invalid inodes (ones that don't match its fattr type).
>
> nfs_refresh_inode() already does this, so we need to check the return
> status from nfs_check_inode_attributes() before returning from
> nfs_refresh_inode_locked().
>
> Once this hole is plugged, there will be no invalid inode references
> returned by nfs_fhget(), so no need to check in nfs_find_actor().
>
> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> index af6e806..d2263a5 100644
> --- a/fs/nfs/inode.c
> +++ b/fs/nfs/inode.c
> @@ -244,8 +244,6 @@ nfs_find_actor(struct inode *inode, void *opaque)
>
> if (NFS_FILEID(inode) != fattr->fileid)
> return 0;
> - if ((S_IFMT & inode->i_mode) != (S_IFMT & fattr->mode))
> - return 0;
> if (nfs_compare_fh(NFS_FH(inode), fh))
> return 0;
> if (is_bad_inode(inode) || NFS_STALE(inode))
> @@ -1269,9 +1267,16 @@ static int nfs_inode_attrs_need_update(const struct inode *inode, const struct n
>
> static int nfs_refresh_inode_locked(struct inode *inode, struct nfs_fattr *fattr)
> {
> + int status;
> +
> if (nfs_inode_attrs_need_update(inode, fattr))
> return nfs_update_inode(inode, fattr);
> - return nfs_check_inode_attributes(inode, fattr);
> +
> + status = nfs_check_inode_attributes(inode, fattr);
> + if (status)
> + nfs_invalidate_inode(inode);
> +
> + return status;
> }
>
> /**
I don't think that's going to fix the problem.
The issue is that the i_fops are set when the inode is in I_NEW state,
and are never changed. An inode is only I_NEW when it's first created.
In this case, we did an atomic open against a filename and got back
attributes that trick the code into matching the new dentry up with an
inode that we previously found to be a directory.
The patch above doesn't address that. It just adds an extra cache
invalidation, which won't help this case. The ops are still set the
same way and the teardown of the atomic_open will still end up hitting
the panic.
--
Jeff Layton <[email protected]>
T24gV2VkLCAyMDEzLTA3LTMxIGF0IDE4OjAwIC0wNTAwLCBRdWVudGluIEJhcm5lcyB3cm90ZToN
Cj4gPiBRdWl0ZSBmcmFua2x5LCBhbGwgSSBjYXJlIGFib3V0IGluIGEgc2l0dWF0aW9uIGxpa2Ug
dGhpcyBpcyB0aGF0IHRoZQ0KPiA+IGNsaWVudCBkb2Vzbid0IE9vcHMuDQo+IA0KPiBDZXJ0YWlu
bHksIGFuZCBoaXMgcGF0Y2ggZG9lcyBkbyB0aGF0LCBidXQgaXQncyBhbHNvIHBvaW50aW5nIG91
dA0KPiB0aGVyZSdzIGFub3RoZXIgYnVnIHJ1bm5pbmcgYXJvdW5kLiAgQW5kIG9uY2UgeW91IGZp
eCB0aGF0IGJ1ZywgdGhlDQo+IG9yaWdpbmFsIHBhdGNoIGlzIG5vIGxvbmdlciBuZWVkZWQuDQo+
IA0KPiA+IElmIGEgc2VydmVyIGlzIHJlYWxseSB0aGlzIHV0dGVybHkgYnJva2VuLCB0aGVuIHRo
ZXJlIGlzIG5vIHdheSB3ZSBjYW4NCj4gPiBmaXggaXQgb24gdGhlIGNsaWVudCwgYW5kIHdlJ3Jl
IG5vdCBldmVuIGdvaW5nIHRvIHRyeS4NCj4gDQo+IE9mIGNvdXJzZS4gIEJ1dCB5b3UgYWxzbyBk
b24ndCB3YW50IHRvIHVubmVjZXNzYXJpbHkgbGVhdmUgdGhlDQo+IGNsaWVudCB3aXRoIGFuIGlu
dmFsaWQgaW5vZGUgdGhhdCdzIG5vdCBwcm9wZXJseSBmbGFnZ2VkIGFuZA0KPiBwb3NzaWJseSBs
ZWF2ZSBhbm90aGVyIGJ1ZyB1bmZpeGVkLg0KDQpXaHkgaXMgdGhlIGludmFsaWQgaW5vZGUgbm90
IGJlaW5nIGZsYWdnZWQgYSBwcm9ibGVtIHRoYXQgbmVlZHMgdG8gYmUNCnNvbHZlZD8NCkhvdyBk
b2VzIHRoZSBwYXRjaCB0aGF0IHlvdSd2ZSBwcm9wb3NlZCBjaGFuZ2UgbWF0dGVycyBmb3IgdGhl
IGVuZCB1c2VyDQpvciBhcHBsaWNhdGlvbj8NCg0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4
IE5GUyBjbGllbnQgbWFpbnRhaW5lcg0KDQpOZXRBcHANClRyb25kLk15a2xlYnVzdEBuZXRhcHAu
Y29tDQp3d3cubmV0YXBwLmNvbQ0K
I was reviewing patches I was integrating into my NFS source base
and this one popped up (commit f6488c9b). In reviewing it, it would
fix the problem, however, I think it's not the best fix for it.
Catching it in nfs_find_actor() is after the horse has left the
barn so to speak.
I think the problem is in nfs_fhget() when constructing the new
inode (inside the if (inode->i_state & I_NEW) {...}) and should be
addressed there.
The nfs module already has checks to ensure that the expression
"((S_IFMT & inode->i_mode) == (S_IFMT & fattr->mode))" is true
in nfs_update_inode() and nfs_check_inode_attributes(). I think
problem Benny hit was that the equivalent check is missing in
nfs_fhget() when constructing a new inode. The same check that's
in those other two functions needs to be added to nfs_fhget()'s "if
(inode->i_state & I_NEW) {...}" conditional block to prevent the
problem of an inconsistent fattr/inode state from occurring in the
first place.
I can come up with a patch if you'd like, but I wanted to make
sure my assertions were valid first. Is removing a check from
nfs_find_actor() and adding the check to nfs_fhget() to prevent the
inconsistency in the first place a better approach?
Quentin
On Wed, Feb 27, 2013 at 7:10 PM, Jeff Layton <[email protected]> wrote:
> Benny Halevy reported the following oops when testing RHEL6:
[...]
> This should fix the oops reported here:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=913660
>
> Reported-by: Benny Halevy <[email protected]>
> Signed-off-by: Jeff Layton <[email protected]>
> ---
> fs/nfs/inode.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> index 6acc73c..f52c99f 100644
> --- a/fs/nfs/inode.c
> +++ b/fs/nfs/inode.c
> @@ -237,6 +237,8 @@ nfs_find_actor(struct inode *inode, void *opaque)
>
> if (NFS_FILEID(inode) != fattr->fileid)
> return 0;
> + if ((S_IFMT & inode->i_mode) != (S_IFMT & fattr->mode))
> + return 0;
> if (nfs_compare_fh(NFS_FH(inode), fh))
> return 0;
> if (is_bad_inode(inode) || NFS_STALE(inode))
> --
> 1.7.11.7
>
> --
On Wed, Jul 31, 2013 at 1:46 PM, Jeff Layton <[email protected]> wrote:
> On Wed, 31 Jul 2013 13:36:21 -0500
> Quentin Barnes <[email protected]> wrote:
>
>> I was reviewing patches I was integrating into my NFS source base
>> and this one popped up (commit f6488c9b). In reviewing it, it would
>> fix the problem, however, I think it's not the best fix for it.
>> Catching it in nfs_find_actor() is after the horse has left the
>> barn so to speak.
>>
>> I think the problem is in nfs_fhget() when constructing the new
>> inode (inside the if (inode->i_state & I_NEW) {...}) and should be
>> addressed there.
>>
>> The nfs module already has checks to ensure that the expression
>> "((S_IFMT & inode->i_mode) == (S_IFMT & fattr->mode))" is true
>> in nfs_update_inode() and nfs_check_inode_attributes(). I think
>> problem Benny hit was that the equivalent check is missing in
>> nfs_fhget() when constructing a new inode. The same check that's
>> in those other two functions needs to be added to nfs_fhget()'s "if
>> (inode->i_state & I_NEW) {...}" conditional block to prevent the
>> problem of an inconsistent fattr/inode state from occurring in the
>> first place.
>>
>> I can come up with a patch if you'd like, but I wanted to make
>> sure my assertions were valid first. Is removing a check from
>> nfs_find_actor() and adding the check to nfs_fhget() to prevent the
>> inconsistency in the first place a better approach?
>>
>> Quentin
>
> I don't think I agree. The whole problem was that we were allowing
> iget5_locked to match an existing inode even though (S_IFMT & mode) was
> different.
I assert that when an inconsistent state happens you don't want
nfs_find_actor() to be the one responsible for noticing it
(actually, ignoring the problem). You want the existing consistency
and error recovery paths to handle it instead.
So one or two things are going on. The existing error recovery
paths are inadequate or that there's a hole in the fattr/inode
consistency checking (or possibly both). Your patch covers over
this underlying bug rather than addressing it and which also lets
the old inode not get flagged as invalid. I would rather see the
real problem fixed and the old inode get flagged rather than just
ignoring it and creating another new inode.
Before your patch, if because of a server bug reusing a file
handle, nfs_find_actor(), would match and iget5_locked() would
return an inode that matched on its file handle but that didn't
necessarily match the fattr info.
If that inode returned by iget5_locked() was not I_NEW (the else
case), nfs_fhget() would invoke nfs_refresh_inode(), which would
call nfs_refresh_inode_locked(). It calls either nfs_update_inode()
or nfs_check_inode_attributes(). Both of those functions already
check for fattr/inode S_IFMT consistency catching bad states.
> Why would it be preferable to use the same struct inode even though
> it's clearly a different one on the server? inodes generally don't
> change their type after they've been created.
It's not preferable to have the same inode in a different state.
What you don't want is nfs_find_actor() as a side-effect to let the
problem go unnoticed. You want let the normal, existing error
handling paths deal with it, or if they're not fully dealing with
it, fix them.
Since the I_NEW conditional false path (the else case) in nfs_fhget()
should catch the problem, I had thought that the problem must lay
with the I_NEW conditional true path. Now I don't think so.
I see now that nfs_check_inode_attributes() is just returning an
error when the fattr and inode states are found to be inconsistent,
but is taking no action on the bad inode's state (i.e. flagging it
as invalid). And upon return, nfs_fhget() is not doing anything with
the bad return state passed up to nfs_refresh_inode(). Now I'm
thinking that this is problematic hole because it is ignoring the
error state and that let Benny's panic happen.
It doesn't seem to me that it should be nfs_check_inode_attributes()
job to flag the inode as invalid. That seems outside its role. I
would think it should be nfs_refresh_inode_locked()'s job, but I'm
still reviewing the code.
Thoughts?
> That said, patches speak louder than words so if you have one to
> propose I can certainly take a look. Maybe it'll make sense to me
> then...
As you can see, I'm still trying to fully piece together the
underlying cause cause of the original bug.
The panic was against RHEL6.4. Do you want a patch against that
code base or against the linux git?
> Jeff Layton <[email protected]>
On Fri, Aug 02, 2013 at 02:29:24PM -0400, Jeff Layton wrote:
> On Fri, 2 Aug 2013 12:58:51 -0500
> Quentin Barnes <[email protected]> wrote:
> > On Wed, Jul 31, 2013 at 7:46 PM, Jeff Layton <[email protected]> wrote:
> > > The patch above doesn't address that.
> >
> > I disagree.
> >
> > Let's take a look at it from the NFSv4 angle --
> >
> > The NFSv4 atomic open call path is:
> > nfs4_atomic_open()->nfs4_do_open()->_nfs4_do_open()
> >
> > In _nfs4_do_open(), it calls nfs4_do_setattr() which fills in an
> > fattr structure then calls nfs_post_op_update_inode() with it to
> > update the inode. Then we call nfs_post_op_update_inode()->
> > nfs_post_op_update_inode_locked()->nfs_refresh_inode_locked().
> >
> > So, yes, I think even in Benny's case, fixing nfs_refresh_inode_locked()
> > to invalidate an inode when nfs_check_inode_attributes() catches that
> > an NFS server had pulled a switcheroo (dir to file with same fh), the
> > approach above would have caught it when it happened.
>
> No, it wouldn't have.
>
> First, nfs4_do_setattr only gets called on an O_EXCL open. That bug is
> still triggerable even without that flag set in the open() call.
I'll have to look at that path. Thanks for the additional details
on the conditions triggering of the problem.
> Second, opening a directory is different from opening a file. They
> create different open contexts and tearing those down at close time
> requires a different set of operations.
Of course.
> If you use the *same* inode in
> this situation, then you end up with the wrong set of operations for
> tearing down the open context.
Yes, I see your point. I'm not seeing that so far. I see it only
looking at the inode's parent directory or the superblock, but I'm
still reading.
> That's what causes the panic. So, marking the inode as stale makes no
> difference whatsoever because we still have to tear down the context,
> and that gets done using the fops associated with the inode.
I know what you mean. I'm thinking it won't get that far, but I
know I'm still trying to get familiar with the NFSv4 paths.
> > > It just adds an extra cache
> > > invalidation, which won't help this case. The ops are still set the
> > > same way and the teardown of the atomic_open will still end up hitting
> > > the panic.
> >
> > It would help me if you could explain how flagging the inode as
> > invalid doesn't help here. If that's the case, then other bugs
> > might be running around.
>
> That just flags the inode as stale and marks its attributes as invalid.
> The inode still has the same operations. Aside from that, I'm not
> convinced that marking that inode as stale is correct at all.
Marking the inode as stale _may not_ actually be correct as you
assert. However, that's exactly how this error is already being
handled when encountered by nfs_update_inode(). Do you consider
nfs_update_inode() inappropriate in this regards as well, or is
there a separate reason why how it handles the error is valid?
> A stale inode has a pretty well-defined meaning in NFS. It means that
> the server doesn't know what this filehandle refers to. That doesn't
> seem to be the case here. We (likely) have a filehandle that got
> recycled too quickly, or something along those lines. That's indicative
> of a broken server, but there's nothing to suggest that the server will
> return NFSERR_STALE on it in the future.
I would think of the NFS errors available with this situation,
ESTALE would fit about the best. Even though the file handle was
reused by the server, the server no longer has the original file
that went with that filehandle, hence that reference is stale.
In nfs_check_inode_attributes(), for this error (reusing a FH),
it returns EIO (instead of ESTALE). Given those two, I think
the latter fits this weird situation a little better.
> If you're serious about "fixing" this problem then I suggest:
>
> 1) describing what the problem is. We have a broken server here so all
> bets are off in working with it. Why should we care whether it's more
> "correct" to do what you suggest? What breakage can we avoid by doing
> this differently?
I have been somewhat surprised by your comment. I can certainly
understand, appreciate, and agree the skepticism of different
approaches, however, if a better approach is available and proven,
it should almost always be desired (unless, for some odd reason,
say, it triggers a massive rewrite).
(As to why I originally flagged this patch, I don't want to trigger
a tangent here, but if you're interested, I can send you an email.)
I do have a question for you about my example patch though. If you
completely ignore the portion modifying nfs_find_actor() and only
focus on the modification to nfs_refresh_inode_locked(), do you
think that change fixes an existing problem of how the error return
state from nfs_check_inode_attributes() is currently unchecked
(calling from nfs_fhget())?
> 2) consider rolling up a pynfs testcase or something that simulates
> such a broken server, so you can verify that the original bug is still
> fixed with the patch you propose.
Oh, yes, before a patch such as mine is accepted, I should
definitely have to prove that it still fixes the original panic with
hard evidence. No argument there. I would expect nothing less.
To emulate the bug, I was considering writing a systemtap script
to monitor the NFSv4 protocols and then for a known file handle for
a directory on a later call forge up a return for a regular file.
Would you expect that this approach (with some tweaking as needed
to get the protocol sequence right) be able to trigger the original
panic on the kernel version it was reported on?
To write the code to trigger the sequence of NFSv4 protocols, I
would expect to do say a stat() on a directory to trigger a LOOKUP
to load the dentry and inode, then do an open(...,O_CREAT|O_DIRECT)
to trigger an OPEN intercepting its return information overwriting
and forging it to look like a regular file. Is that the correct
sequence? What variations should also be tested?
> --
> Jeff Layton <[email protected]>
Quentin
On Wed, Jul 31, 2013 at 6:47 PM, Myklebust, Trond
<[email protected]> wrote:
> On Wed, 2013-07-31 at 18:00 -0500, Quentin Barnes wrote:
>> > Quite frankly, all I care about in a situation like this is that the
>> > client doesn't Oops.
>>
>> Certainly, and his patch does do that, but it's also pointing out
>> there's another bug running around. And once you fix that bug, the
>> original patch is no longer needed.
>>
>> > If a server is really this utterly broken, then there is no way we can
>> > fix it on the client, and we're not even going to try.
>>
>> Of course. But you also don't want to unnecessarily leave the
>> client with an invalid inode that's not properly flagged and
>> possibly leave another bug unfixed.
>
> Why is the invalid inode not being flagged a problem that needs to be
> solved?
> How does the patch that you've proposed change matters for the end user
> or application?
Ah, good questions! Let's see if I can address them to your satisfaction.
> Why is the invalid inode not being flagged a problem that needs to be
> solved?
It's giving nfs_refresh_inode_locked() a consistent behavior.
If nfs_refresh_inode_locked() has two paths depending on if
nfs_inode_attrs_need_update() happens to be true or not,
calling either nfs_update_inode() or nfs_check_inode_attributes().
If nfs_refresh_inode_locked() happens to go down the
nfs_update_inode() [which could simply be due to nothing other than
timing], the problem of having an NFS server reusing a file handle
would be caught, the inode flagged, and the error bubbled up to the
caller.
However, without the proposed change, if the path taken is to
nfs_check_inode_attributes(), the error it caught is silently
ignored (which I assert led to Benny's reported panic).
> How does the patch that you've proposed change matters for the end user
> or application?
With the original patch, user space could either get back from a
syscall on the file either a failure or success depending on how or
when the nfs subsystem noticed that the NFS server had reused a file
handle.
With the new patch, user space will consistently always get a failure.
> --
> Trond Myklebust
> Linux NFS client maintainer
>
> NetApp
> [email protected]
> http://www.netapp.com
On Wed, Jul 31, 2013 at 7:46 PM, Jeff Layton <[email protected]> wrote:
> On Wed, 31 Jul 2013 18:00:51 -0500
> Quentin Barnes <[email protected]> wrote:
>
>> > Quite frankly, all I care about in a situation like this is that the
>> > client doesn't Oops.
>>
>> Certainly, and his patch does do that, but it's also pointing out
>> there's another bug running around. And once you fix that bug, the
>> original patch is no longer needed.
>>
>> > If a server is really this utterly broken, then there is no way we can
>> > fix it on the client, and we're not even going to try.
>>
>> Of course. But you also don't want to unnecessarily leave the
>> client with an invalid inode that's not properly flagged and
>> possibly leave another bug unfixed.
>>
>> Here is a example patch that I feel better addresses the problem:
>>
>>
>> commit 2d6b411eea04ae4398707b584b8d9e552606aaf7
>> Author: Quentin Barnes <[email protected]>
>> Date: Wed Jul 31 17:50:35 2013 -0500
>>
>> Have nfs_refresh_inode_locked() ensure that it doesn't return without
>> flagging invalid inodes (ones that don't match its fattr type).
>>
>> nfs_refresh_inode() already does this, so we need to check the return
>> status from nfs_check_inode_attributes() before returning from
>> nfs_refresh_inode_locked().
>>
>> Once this hole is plugged, there will be no invalid inode references
>> returned by nfs_fhget(), so no need to check in nfs_find_actor().
>>
>> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
>> index af6e806..d2263a5 100644
>> --- a/fs/nfs/inode.c
>> +++ b/fs/nfs/inode.c
>> @@ -244,8 +244,6 @@ nfs_find_actor(struct inode *inode, void *opaque)
>>
>> if (NFS_FILEID(inode) != fattr->fileid)
>> return 0;
>> - if ((S_IFMT & inode->i_mode) != (S_IFMT & fattr->mode))
>> - return 0;
>> if (nfs_compare_fh(NFS_FH(inode), fh))
>> return 0;
>> if (is_bad_inode(inode) || NFS_STALE(inode))
>> @@ -1269,9 +1267,16 @@ static int nfs_inode_attrs_need_update(const struct inode *inode, const struct n
>>
>> static int nfs_refresh_inode_locked(struct inode *inode, struct nfs_fattr *fattr)
>> {
>> + int status;
>> +
>> if (nfs_inode_attrs_need_update(inode, fattr))
>> return nfs_update_inode(inode, fattr);
>> - return nfs_check_inode_attributes(inode, fattr);
>> +
>> + status = nfs_check_inode_attributes(inode, fattr);
>> + if (status)
>> + nfs_invalidate_inode(inode);
>> +
>> + return status;
>> }
>>
>> /**
>
> I don't think that's going to fix the problem.
>
> The issue is that the i_fops are set when the inode is in I_NEW state,
> and are never changed. An inode is only I_NEW when it's first created.
> In this case, we did an atomic open against a filename and got back
> attributes that trick the code into matching the new dentry up with an
> inode that we previously found to be a directory.
Oh, yes! The original reported mentioned an atomic open. We should
look at it from that NFSv4 perspective.
(I know I tend to inadvertently focus too much on the NFSv3 paths
since I know those best. Sigh.)
> The patch above doesn't address that.
I disagree.
Let's take a look at it from the NFSv4 angle --
The NFSv4 atomic open call path is:
nfs4_atomic_open()->nfs4_do_open()->_nfs4_do_open()
In _nfs4_do_open(), it calls nfs4_do_setattr() which fills in an
fattr structure then calls nfs_post_op_update_inode() with it to
update the inode. Then we call nfs_post_op_update_inode()->
nfs_post_op_update_inode_locked()->nfs_refresh_inode_locked().
So, yes, I think even in Benny's case, fixing nfs_refresh_inode_locked()
to invalidate an inode when nfs_check_inode_attributes() catches that
an NFS server had pulled a switcheroo (dir to file with same fh), the
approach above would have caught it when it happened.
> It just adds an extra cache
> invalidation, which won't help this case. The ops are still set the
> same way and the teardown of the atomic_open will still end up hitting
> the panic.
It would help me if you could explain how flagging the inode as
invalid doesn't help here. If that's the case, then other bugs
might be running around.
> --
> Jeff Layton <[email protected]>
On Fri, 2 Aug 2013 12:58:51 -0500
Quentin Barnes <[email protected]> wrote:
> On Wed, Jul 31, 2013 at 7:46 PM, Jeff Layton <[email protected]> wrote:
> > On Wed, 31 Jul 2013 18:00:51 -0500
> > Quentin Barnes <[email protected]> wrote:
> >
> >> > Quite frankly, all I care about in a situation like this is that the
> >> > client doesn't Oops.
> >>
> >> Certainly, and his patch does do that, but it's also pointing out
> >> there's another bug running around. And once you fix that bug, the
> >> original patch is no longer needed.
> >>
> >> > If a server is really this utterly broken, then there is no way we can
> >> > fix it on the client, and we're not even going to try.
> >>
> >> Of course. But you also don't want to unnecessarily leave the
> >> client with an invalid inode that's not properly flagged and
> >> possibly leave another bug unfixed.
> >>
> >> Here is a example patch that I feel better addresses the problem:
> >>
> >>
> >> commit 2d6b411eea04ae4398707b584b8d9e552606aaf7
> >> Author: Quentin Barnes <[email protected]>
> >> Date: Wed Jul 31 17:50:35 2013 -0500
> >>
> >> Have nfs_refresh_inode_locked() ensure that it doesn't return without
> >> flagging invalid inodes (ones that don't match its fattr type).
> >>
> >> nfs_refresh_inode() already does this, so we need to check the return
> >> status from nfs_check_inode_attributes() before returning from
> >> nfs_refresh_inode_locked().
> >>
> >> Once this hole is plugged, there will be no invalid inode references
> >> returned by nfs_fhget(), so no need to check in nfs_find_actor().
> >>
> >> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> >> index af6e806..d2263a5 100644
> >> --- a/fs/nfs/inode.c
> >> +++ b/fs/nfs/inode.c
> >> @@ -244,8 +244,6 @@ nfs_find_actor(struct inode *inode, void *opaque)
> >>
> >> if (NFS_FILEID(inode) != fattr->fileid)
> >> return 0;
> >> - if ((S_IFMT & inode->i_mode) != (S_IFMT & fattr->mode))
> >> - return 0;
> >> if (nfs_compare_fh(NFS_FH(inode), fh))
> >> return 0;
> >> if (is_bad_inode(inode) || NFS_STALE(inode))
> >> @@ -1269,9 +1267,16 @@ static int nfs_inode_attrs_need_update(const struct inode *inode, const struct n
> >>
> >> static int nfs_refresh_inode_locked(struct inode *inode, struct nfs_fattr *fattr)
> >> {
> >> + int status;
> >> +
> >> if (nfs_inode_attrs_need_update(inode, fattr))
> >> return nfs_update_inode(inode, fattr);
> >> - return nfs_check_inode_attributes(inode, fattr);
> >> +
> >> + status = nfs_check_inode_attributes(inode, fattr);
> >> + if (status)
> >> + nfs_invalidate_inode(inode);
> >> +
> >> + return status;
> >> }
> >>
> >> /**
> >
> > I don't think that's going to fix the problem.
> >
> > The issue is that the i_fops are set when the inode is in I_NEW state,
> > and are never changed. An inode is only I_NEW when it's first created.
> > In this case, we did an atomic open against a filename and got back
> > attributes that trick the code into matching the new dentry up with an
> > inode that we previously found to be a directory.
>
> Oh, yes! The original reported mentioned an atomic open. We should
> look at it from that NFSv4 perspective.
>
> (I know I tend to inadvertently focus too much on the NFSv3 paths
> since I know those best. Sigh.)
>
> > The patch above doesn't address that.
>
> I disagree.
>
> Let's take a look at it from the NFSv4 angle --
>
> The NFSv4 atomic open call path is:
> nfs4_atomic_open()->nfs4_do_open()->_nfs4_do_open()
>
> In _nfs4_do_open(), it calls nfs4_do_setattr() which fills in an
> fattr structure then calls nfs_post_op_update_inode() with it to
> update the inode. Then we call nfs_post_op_update_inode()->
> nfs_post_op_update_inode_locked()->nfs_refresh_inode_locked().
>
> So, yes, I think even in Benny's case, fixing nfs_refresh_inode_locked()
> to invalidate an inode when nfs_check_inode_attributes() catches that
> an NFS server had pulled a switcheroo (dir to file with same fh), the
> approach above would have caught it when it happened.
>
No, it wouldn't have.
First, nfs4_do_setattr only gets called on an O_EXCL open. That bug is
still triggerable even without that flag set in the open() call.
Second, opening a directory is different from opening a file. They
create different open contexts and tearing those down at close time
requires a different set of operations. If you use the *same* inode in
this situation, then you end up with the wrong set of operations for
tearing down the open context.
That's what causes the panic. So, marking the inode as stale makes no
difference whatsoever because we still have to tear down the context,
and that gets done using the fops associated with the inode.
> > It just adds an extra cache
> > invalidation, which won't help this case. The ops are still set the
> > same way and the teardown of the atomic_open will still end up hitting
> > the panic.
>
> It would help me if you could explain how flagging the inode as
> invalid doesn't help here. If that's the case, then other bugs
> might be running around.
>
That just flags the inode as stale and marks its attributes as invalid.
The inode still has the same operations. Aside from that, I'm not
convinced that marking that inode as stale is correct at all.
A stale inode has a pretty well-defined meaning in NFS. It means that
the server doesn't know what this filehandle refers to. That doesn't
seem to be the case here. We (likely) have a filehandle that got
recycled too quickly, or something along those lines. That's indicative
of a broken server, but there's nothing to suggest that the server will
return NFSERR_STALE on it in the future.
If you're serious about "fixing" this problem then I suggest:
1) describing what the problem is. We have a broken server here so all
bets are off in working with it. Why should we care whether it's more
"correct" to do what you suggest? What breakage can we avoid by doing
this differently?
2) consider rolling up a pynfs testcase or something that simulates
such a broken server, so you can verify that the original bug is still
fixed with the patch you propose.
--
Jeff Layton <[email protected]>