2023-12-05 14:11:35

by Scott Mayhew

[permalink] [raw]
Subject: [PATCH RFC 0/1] Use parent's objective cred in nfs_access_login_time()

We have a customer occasionally hitting a panic in cred_fscmp():

crash> bt
PID: 564685 TASK: ffff8a3016c2d8c0 CPU: 39 COMMAND: "ifind"
#0 [ffffa8548626b850] machine_kexec at ffffffffa3a77497
#1 [ffffa8548626b8a8] __crash_kexec at ffffffffa3bec25a
#2 [ffffa8548626b968] crash_kexec at ffffffffa3bed4e8
#3 [ffffa8548626b970] oops_end at ffffffffa3a2e8ab
#4 [ffffa8548626b990] exc_general_protection at ffffffffa465ed6c
#5 [ffffa8548626ba30] asm_exc_general_protection at ffffffffa4800ac2
[exception RIP: cred_fscmp+0x53]
RIP: ffffffffa3b3b4e3 RSP: ffffa8548626bae0 RFLAGS: 00010286
RAX: 0000000000000001 RBX: ffff8ab810ba19c0 RCX: ffffffffa565ed90
RDX: 6b6b6b6b6b6b6b6b RSI: ffff8ab810ba19c0 RDI: ffff8a25a2c5b940
RBP: ffff8a28290976e8 R8: 0000000000000050 R9: 0000000000000013
R10: ffff8a84d3734400 R11: ffff8a840c5a78c8 R12: ffffa8548626bb80
R13: 0000000000000081 R14: ffff8a1e00b91dc0 R15: ffff8a1e41229dc0
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#6 [ffffa8548626bae0] nfs_access_get_cached at ffffffffc172f198 [nfs]
#7 [ffffa8548626bb30] nfs_do_access at ffffffffc172fedf [nfs]
#8 [ffffa8548626bbc8] nfs_permission at ffffffffc17301f3 [nfs]
#9 [ffffa8548626bbf0] inode_permission at ffffffffa3e32410
#10 [ffffa8548626bc20] link_path_walk at ffffffffa3e37426
#11 [ffffa8548626bc80] path_lookupat at ffffffffa3e37b0e
#12 [ffffa8548626bcb8] filename_lookup at ffffffffa3e38f2f
#13 [ffffa8548626bdd8] vfs_statx at ffffffffa3e2ad9d
#14 [ffffa8548626be30] vfs_fstatat at ffffffffa3e2b1b4
#15 [ffffa8548626be58] __do_sys_newlstat at ffffffffa3e2b523
#16 [ffffa8548626bf10] do_syscall_64 at ffffffffa465e18c
#17 [ffffa8548626bf50] entry_SYSCALL_64_after_hwframe at ffffffffa48000aa
RIP: 00007f592f74ef5a RSP: 00007f59242eeae8 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 00007f59242eedf0 RCX: 00007f592f74ef5a
RDX: 00007f59242eedf0 RSI: 00007f59242eedf0 RDI: 00007f58d89812d0
RBP: 00007f59242eebb0 R8: 0000000000000001 R9: 00007f58d8983c90
R10: 0000000000000004 R11: 0000000000000246 R12: 00007f58d81101c0
R13: 00007f59242eedf0 R14: 00007f59242efb08 R15: 0000000000000000
ORIG_RAX: 0000000000000006 CS: 0033 SS: 002b

The instruction we crashed on was:
/usr/src/debug/kernel-5.14.0-378.el9/linux-5.14.0-378.el9.x86_64/kernel/cred.c: 651
0xffffffffa3b3b4e3 <cred_fscmp+0x53>: movslq 0x4(%rdx),%rsi

We were trying to access group_info->ngroups:
crash> group_info.ngroups -o
struct group_info {
[0x4] int ngroups;
}

But RDX has POISON_FREE instead of the address of a struct group_info.

RDX was populated slightly earlier:
/usr/src/debug/kernel-5.14.0-378.el9/linux-5.14.0-378.el9.x86_64/kernel/cred.c: 640
0xffffffffa3b3b4bf <cred_fscmp+0x2f>: mov $0x1,%eax
0xffffffffa3b3b4c4 <cred_fscmp+0x34>: jb 0xffffffffa3b3b524 <cred_fscmp+0x94>
0xffffffffa3b3b4c6 <cred_fscmp+0x36>: mov 0x98(%rdi),%rdx <----- HERE
0xffffffffa3b3b4cd <cred_fscmp+0x3d>: mov 0x98(%rsi),%rcx

So RDI has the address of the struct cred we were comparing:
crash> cred.group_info -o
struct cred {
[0x98] struct group_info *group_info;
}

Looking at the disassembly of nfs_access_get_cached to see how RDI gets populated:

/usr/src/debug/kernel-5.14.0-378.el9/linux-5.14.0-378.el9.x86_64/fs/nfs/dir.c: 2965
0xffffffffc172f19c <nfs_access_get_cached+0x3c>: mov %r14,%r15
0xffffffffc172f19f <nfs_access_get_cached+0x3f>: mov 0xae8(%r14),%r14 <----- parent
/usr/src/debug/kernel-5.14.0-378.el9/linux-5.14.0-378.el9.x86_64/fs/nfs/dir.c: 2966
0xffffffffc172f1a6 <nfs_access_get_cached+0x46>: mov 0xcd8(%r14),%rdi <----- parent's cred

So R15 has the address of a task_struct, R14 has the address of its parent task_struct, and RDI has the cred from the parent's task_struct.

crash> task_struct.real_parent -o
struct task_struct {
[0xae8] struct task_struct *real_parent;
}
crash> task_struct.cred -o
struct task_struct {
[0xcd8] const struct cred *cred;
}

Looking at the parent of the panic task:

crash> task -R real_parent ffff8a3016c2d8c0
PID: 564685 TASK: ffff8a3016c2d8c0 CPU: 39 COMMAND: "ifind"
real_parent = 0xffff8a1e41229dc0,

That matches the value in R15 in the exception frame, so we need to look at the next parent:

crash> task -R real_parent 0xffff8a1e41229dc0
PID: 2083 TASK: ffff8a1e41229dc0 CPU: 19 COMMAND: "cvlaunchd"
real_parent = 0xffff8a1e00b91dc0,

That matches the value in R14 in the exception frame, so let's look at the cred:

crash> task -R cred 0xffff8a1e00b91dc0
PID: 1 TASK: ffff8a1e00b91dc0 CPU: 4 COMMAND: "systemd"
cred = 0xffff8acc795371c0,

That does NOT match the value in RDI in the exception frame.

The value in RDI is indeed a struct cred, but contrast its refcount

crash> cred.usage ffff8a25a2c5b940
usage = {
counter = 0x3
},

with the refcount of the actual cred

crash> cred.usage 0xffff8acc795371c0
usage = {
counter = 0xad
},

It appears that the cred in RDI was freed, possibly via:

do_faccessat()
old_cred = access_override_creds();
override_cred->non_rcu = 1;
...
revert_creds(old_cred);
put_cred(override);
__put_cred
if (cred->non_rcu)
put_cred_rcu(&cred->rcu);

and subsequently re-used.

Note in particular this comment in access_override_creds():

/*
* The new set of credentials can *only* be used in
* task-synchronous circumstances, and does not need
* RCU freeing, unless somebody then takes a separate
* reference to it.
*
* NOTE! This is _only_ true because this credential
* is used purely for override_creds() that installs
* it as the subjective cred. Other threads will be
* accessing ->real_cred, not the subjective cred.
*
* If somebody _does_ make a copy of this (using the
* 'get_current_cred()' function), that will clear the
* non_rcu field, because now that other user may be
* expecting RCU freeing. But normal thread-synchronous
* cred accesses will keep things non-RCY.
*/

So it seems that nfs_access_login_time() should be using parent->real_cred.

-Scott

Scott Mayhew (1):
NFS: Use parent's objective cred in nfs_access_login_time()

fs/nfs/dir.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--
2.41.0



2023-12-05 14:12:37

by Scott Mayhew

[permalink] [raw]
Subject: [PATCH RFC 1/1] NFS: Use parent's objective cred in nfs_access_login_time()

The subjective cred (task->cred) can potentially be overridden and
subsquently freed in non-RCU context, which could lead to a panic if we
try to use it in cred_fscmp(). Use __task_cred(), which returns the
objective cred (task->real_cred) instead.

Fixes: 0eb43812c027 ("NFS: Clear the file access cache upon login")
Fixes: 5e9a7b9c2ea1 ("NFS: Fix up a sparse warning")

Signed-off-by: Scott Mayhew <[email protected]>
---
fs/nfs/dir.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 13dffe4201e6..273c0b68abf4 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -2963,7 +2963,7 @@ static u64 nfs_access_login_time(const struct task_struct *task,
rcu_read_lock();
for (;;) {
parent = rcu_dereference(task->real_parent);
- pcred = rcu_dereference(parent->cred);
+ pcred = __task_cred(parent);
if (parent == task || cred_fscmp(pcred, cred) != 0)
break;
task = parent;
--
2.41.0