2022-12-31 05:34:48

by Waiman Long

[permalink] [raw]
Subject: [PATCH v6 0/2] sched: Fix dup_user_cpus_ptr() & do_set_cpus_allowed() bugs

v6:
- Update patch 2 to fix build error with !CONFIG_SMP configs.

v5:
- Add an alloc_user_cpus_ptr() helper and use it in patch 2.

v4:
- Make sure user_cpus_ptr allocation size is large enough for
rcu_head.

This series fixes a UAF bug in dup_user_cpus_ptr() and uses kfree_rcu()
in do_set_cpus_allowed to avoid lockdep splats.

Waiman Long (2):
sched: Fix use-after-free bug in dup_user_cpus_ptr()
sched: Use kfree_rcu() in do_set_cpus_allowed()

kernel/sched/core.c | 65 +++++++++++++++++++++++++++++++++++++++------
1 file changed, 57 insertions(+), 8 deletions(-)

--
2.31.1


2022-12-31 06:08:34

by Waiman Long

[permalink] [raw]
Subject: [PATCH v6 2/2] sched: Use kfree_rcu() in do_set_cpus_allowed()

Commit 851a723e45d1 ("sched: Always clear user_cpus_ptr in
do_set_cpus_allowed()") may call kfree() if user_cpus_ptr was previously
set. Unfortunately, some of the callers of do_set_cpus_allowed()
may have pi_lock held when calling it. So the following splats may be
printed especially when running with a PREEMPT_RT kernel:

WARNING: possible circular locking dependency detected
BUG: sleeping function called from invalid context

To avoid these problems, kfree_rcu() is used instead. An internal
cpumask_rcuhead union is created for the sole purpose of facilitating
the use of kfree_rcu() to free the cpumask.

Since user_cpus_ptr is not being used in non-SMP configs, the newly
introduced alloc_user_cpus_ptr() helper will return NULL in this case
and sched_setaffinity() is modified to handle this special case.

Fixes: 851a723e45d1 ("sched: Always clear user_cpus_ptr in do_set_cpus_allowed()")
Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Waiman Long <[email protected]>
---
kernel/sched/core.c | 33 +++++++++++++++++++++++++++++----
1 file changed, 29 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b93d030b9fd5..dc68c9a54a71 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2604,9 +2604,29 @@ void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
.user_mask = NULL,
.flags = SCA_USER, /* clear the user requested mask */
};
+ union cpumask_rcuhead {
+ cpumask_t cpumask;
+ struct rcu_head rcu;
+ };

__do_set_cpus_allowed(p, &ac);
- kfree(ac.user_mask);
+
+ /*
+ * Because this is called with p->pi_lock held, it is not possible
+ * to use kfree() here (when PREEMPT_RT=y), therefore punt to using
+ * kfree_rcu().
+ */
+ kfree_rcu((union cpumask_rcuhead *)ac.user_mask, rcu);
+}
+
+static cpumask_t *alloc_user_cpus_ptr(int node)
+{
+ /*
+ * See do_set_cpus_allowed() above for the rcu_head usage.
+ */
+ int size = max_t(int, cpumask_size(), sizeof(struct rcu_head));
+
+ return kmalloc_node(size, GFP_KERNEL, node);
}

int dup_user_cpus_ptr(struct task_struct *dst, struct task_struct *src,
@@ -2629,7 +2649,7 @@ int dup_user_cpus_ptr(struct task_struct *dst, struct task_struct *src,
if (data_race(!src->user_cpus_ptr))
return 0;

- user_mask = kmalloc_node(cpumask_size(), GFP_KERNEL, node);
+ user_mask = alloc_user_cpus_ptr(node);
if (!user_mask)
return -ENOMEM;

@@ -3605,6 +3625,11 @@ static inline bool rq_has_pinned_tasks(struct rq *rq)
return false;
}

+static inline cpumask_t *alloc_user_cpus_ptr(int node)
+{
+ return NULL;
+}
+
#endif /* !CONFIG_SMP */

static void
@@ -8263,8 +8288,8 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
if (retval)
goto out_put_task;

- user_mask = kmalloc(cpumask_size(), GFP_KERNEL);
- if (!user_mask) {
+ user_mask = alloc_user_cpus_ptr(NUMA_NO_NODE);
+ if (IS_ENABLED(CONFIG_SMP) && !user_mask) {
retval = -ENOMEM;
goto out_put_task;
}
--
2.31.1

2023-01-09 11:12:07

by tip-bot2 for Hou Wenlong

[permalink] [raw]
Subject: [tip: sched/urgent] sched/core: Use kfree_rcu() in do_set_cpus_allowed()

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID: 9a5418bc48babb313d2a62df29ebe21ce8c06c59
Gitweb: https://git.kernel.org/tip/9a5418bc48babb313d2a62df29ebe21ce8c06c59
Author: Waiman Long <[email protected]>
AuthorDate: Fri, 30 Dec 2022 23:11:20 -05:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Mon, 09 Jan 2023 11:43:23 +01:00

sched/core: Use kfree_rcu() in do_set_cpus_allowed()

Commit 851a723e45d1 ("sched: Always clear user_cpus_ptr in
do_set_cpus_allowed()") may call kfree() if user_cpus_ptr was previously
set. Unfortunately, some of the callers of do_set_cpus_allowed()
may have pi_lock held when calling it. So the following splats may be
printed especially when running with a PREEMPT_RT kernel:

WARNING: possible circular locking dependency detected
BUG: sleeping function called from invalid context

To avoid these problems, kfree_rcu() is used instead. An internal
cpumask_rcuhead union is created for the sole purpose of facilitating
the use of kfree_rcu() to free the cpumask.

Since user_cpus_ptr is not being used in non-SMP configs, the newly
introduced alloc_user_cpus_ptr() helper will return NULL in this case
and sched_setaffinity() is modified to handle this special case.

Fixes: 851a723e45d1 ("sched: Always clear user_cpus_ptr in do_set_cpus_allowed()")
Suggested-by: Peter Zijlstra <[email protected]>
Signed-off-by: Waiman Long <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Reviewed-by: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
kernel/sched/core.c | 33 +++++++++++++++++++++++++++++----
1 file changed, 29 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f9f6e54..bb1ee6d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2604,9 +2604,29 @@ void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
.user_mask = NULL,
.flags = SCA_USER, /* clear the user requested mask */
};
+ union cpumask_rcuhead {
+ cpumask_t cpumask;
+ struct rcu_head rcu;
+ };

__do_set_cpus_allowed(p, &ac);
- kfree(ac.user_mask);
+
+ /*
+ * Because this is called with p->pi_lock held, it is not possible
+ * to use kfree() here (when PREEMPT_RT=y), therefore punt to using
+ * kfree_rcu().
+ */
+ kfree_rcu((union cpumask_rcuhead *)ac.user_mask, rcu);
+}
+
+static cpumask_t *alloc_user_cpus_ptr(int node)
+{
+ /*
+ * See do_set_cpus_allowed() above for the rcu_head usage.
+ */
+ int size = max_t(int, cpumask_size(), sizeof(struct rcu_head));
+
+ return kmalloc_node(size, GFP_KERNEL, node);
}

int dup_user_cpus_ptr(struct task_struct *dst, struct task_struct *src,
@@ -2629,7 +2649,7 @@ int dup_user_cpus_ptr(struct task_struct *dst, struct task_struct *src,
if (data_race(!src->user_cpus_ptr))
return 0;

- user_mask = kmalloc_node(cpumask_size(), GFP_KERNEL, node);
+ user_mask = alloc_user_cpus_ptr(node);
if (!user_mask)
return -ENOMEM;

@@ -3605,6 +3625,11 @@ static inline bool rq_has_pinned_tasks(struct rq *rq)
return false;
}

+static inline cpumask_t *alloc_user_cpus_ptr(int node)
+{
+ return NULL;
+}
+
#endif /* !CONFIG_SMP */

static void
@@ -8265,8 +8290,8 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
if (retval)
goto out_put_task;

- user_mask = kmalloc(cpumask_size(), GFP_KERNEL);
- if (!user_mask) {
+ user_mask = alloc_user_cpus_ptr(NUMA_NO_NODE);
+ if (IS_ENABLED(CONFIG_SMP) && !user_mask) {
retval = -ENOMEM;
goto out_put_task;
}

2023-01-15 16:22:53

by Yujie Liu

[permalink] [raw]
Subject: Re: [PATCH v6 2/2] sched: Use kfree_rcu() in do_set_cpus_allowed()

Greeting,

FYI, we noticed general protection fault due to commit (built with gcc-11):

commit: 66f9c1813a72eecafa25492b551bb91b4fad59e1 ("[PATCH v6 2/2] sched: Use kfree_rcu() in do_set_cpus_allowed()")
url: https://github.com/intel-lab-lkp/linux/commits/Waiman-Long/sched-Fix-use-after-free-bug-in-dup_user_cpus_ptr/20221231-121414
base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git c89970202a1153b2fc230e89f90c180bd5bcbcef
patch link: https://lore.kernel.org/all/[email protected]/
patch subject: [PATCH v6 2/2] sched: Use kfree_rcu() in do_set_cpus_allowed()

in testcase: trinity
version: trinity-x86_64-e63e4843-1_20220913
with following parameters:

runtime: 300s
group: group-03

test-description: Trinity is a linux system call fuzz tester.
test-url: http://codemonkey.org.uk/projects/trinity/

on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


[ 84.410147][ T190]
[ 84.419943][ T190] Seeding trinity by 195457372 based on vm-snb/debian-11.1-x86_64-20220510.cgz/x86_64-randconfig-a011-20221226
[ 84.419977][ T190]
[ 96.648869][ T190] 2023-01-13 19:33:59 chroot --userspec nobody:nogroup / trinity -q -q -l off -s 195457372 -N 999999999 -c accept -c capget -c clock_settime -c clone3 -c fchmodat -c fchown16 -c fstat64 -c futex_waitv -c getgid -c getpgid -c getrlimit -c inotify_init -c io_uring_setup -c ipc -c kcmp -c kill -c madvise -c move_pages -c mq_timedsend -c munmap -c old_readdir -c open -c openat -c personality -c pidfd_getfd -c pipe -c preadv -c process_mrelease -c readv -c reboot -c rename -c semop -c semtimedop -c setfsuid16 -c setresuid16 -c shmctl -c signalfd4 -c sigprocmask -c sigsuspend -c ssetmask -c timer_delete -c times -c truncate64 -c userfaultfd
[ 96.648897][ T190]
[ 102.200867][ T3929] general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] KASAN
[ 102.202727][ T3929] KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
[ 102.204168][ T3929] CPU: 0 PID: 3929 Comm: trinity-main Tainted: G T 6.2.0-rc1-00030-g66f9c1813a72 #27 d46a36d033aa326de17d60a34c369156bd255876
[ 102.206484][ T3929] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-5 04/01/2014
[ 102.208121][ T3929] RIP: 0010:sched_setaffinity (??:?)
[ 102.209102][ T3929] Code: 4c 89 fa b8 ff ff 37 00 48 c1 ea 03 48 c1 e0 2a 80 3c 02 00 74 08 4c 89 ff e8 47 8c 2e 00 b8 ff ff 37 00 4d 8b 3f 48 c1 e0 2a <80> 38 00 74 07 31 ff e8 9c 8c 2e 00 48 c7 c0 48 26 b6 84 ba ff ff
All code
========
0: 4c 89 fa mov %r15,%rdx
3: b8 ff ff 37 00 mov $0x37ffff,%eax
8: 48 c1 ea 03 shr $0x3,%rdx
c: 48 c1 e0 2a shl $0x2a,%rax
10: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1)
14: 74 08 je 0x1e
16: 4c 89 ff mov %r15,%rdi
19: e8 47 8c 2e 00 callq 0x2e8c65
1e: b8 ff ff 37 00 mov $0x37ffff,%eax
23: 4d 8b 3f mov (%r15),%r15
26: 48 c1 e0 2a shl $0x2a,%rax
2a:* 80 38 00 cmpb $0x0,(%rax) <-- trapping instruction
2d: 74 07 je 0x36
2f: 31 ff xor %edi,%edi
31: e8 9c 8c 2e 00 callq 0x2e8cd2
36: 48 c7 c0 48 26 b6 84 mov $0xffffffff84b62648,%rax
3d: ba .byte 0xba
3e: ff (bad)
3f: ff .byte 0xff

Code starting with the faulting instruction
===========================================
0: 80 38 00 cmpb $0x0,(%rax)
3: 74 07 je 0xc
5: 31 ff xor %edi,%edi
7: e8 9c 8c 2e 00 callq 0x2e8ca8
c: 48 c7 c0 48 26 b6 84 mov $0xffffffff84b62648,%rax
13: ba .byte 0xba
14: ff (bad)
15: ff .byte 0xff
[ 102.212105][ T3929] RSP: 0018:ffffc90005237e40 EFLAGS: 00010286
[ 102.213141][ T3929] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 102.214501][ T3929] RDX: 1ffff92000a46fdd RSI: ffffffff8410cde0 RDI: ffffffff839661d8
[ 102.215862][ T3929] RBP: ffffc90005237eb0 R08: 0000000000000000 R09: ffffffff8597ace7
[ 102.217213][ T3929] R10: 0000000000000000 R11: ffffffff811a63eb R12: ffff88816fe32880
[ 102.218596][ T3929] R13: ffffc90005237e88 R14: 1ffff92000a46fc9 R15: 0000000000000001
[ 102.219945][ T3929] FS: 00007f23de705600(0000) GS:ffffffff83cd4000(0000) knlGS:0000000000000000
[ 102.221395][ T3929] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 102.222380][ T3929] CR2: 00007f23de624060 CR3: 000000013f39d000 CR4: 00000000000406f0
[ 102.223604][ T3929] DR0: 00007f23dc674000 DR1: 0000000000000000 DR2: 0000000000000000
[ 102.224807][ T3929] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000070602
[ 102.226095][ T3929] Call Trace:
[ 102.226701][ T3929] <TASK>
[ 102.227316][ T3929] ? sched_set_fifo_low (??:?)
[ 102.228240][ T3929] __x64_sys_sched_setaffinity (??:?)
[ 102.229285][ T3929] ? sched_setaffinity (??:?)
[ 102.230220][ T3929] ? lockdep_hardirqs_on_prepare (lockdep.c:?)
[ 102.231362][ T3929] do_syscall_64 (??:?)
[ 102.232178][ T3929] entry_SYSCALL_64_after_hwframe (??:?)
[ 102.233183][ T3929] RIP: 0033:0x7f23de6240d7
[ 102.234006][ T3929] Code: 1f 40 00 48 8b 15 b9 8d 0d 00 f7 d8 41 b9 ff ff ff ff 64 89 02 44 89 c8 c3 66 2e 0f 1f 84 00 00 00 00 00 b8 cb 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 29 41 89 c0 83 f8 ff 74 18 64 c7 04 25 38 00
All code
========
0: 1f (bad)
1: 40 00 48 8b add %cl,-0x75(%rax)
5: 15 b9 8d 0d 00 adc $0xd8db9,%eax
a: f7 d8 neg %eax
c: 41 b9 ff ff ff ff mov $0xffffffff,%r9d
12: 64 89 02 mov %eax,%fs:(%rdx)
15: 44 89 c8 mov %r9d,%eax
18: c3 retq
19: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
20: 00 00 00
23: b8 cb 00 00 00 mov $0xcb,%eax
28: 0f 05 syscall
2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction
30: 77 29 ja 0x5b
32: 41 89 c0 mov %eax,%r8d
35: 83 f8 ff cmp $0xffffffff,%eax
38: 74 18 je 0x52
3a: 64 fs
3b: c7 .byte 0xc7
3c: 04 25 add $0x25,%al
3e: 38 00 cmp %al,(%rax)

Code starting with the faulting instruction
===========================================
0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax
6: 77 29 ja 0x31
8: 41 89 c0 mov %eax,%r8d
b: 83 f8 ff cmp $0xffffffff,%eax
e: 74 18 je 0x28
10: 64 fs
11: c7 .byte 0xc7
12: 04 25 add $0x25,%al
14: 38 00 cmp %al,(%rax)
[ 102.236736][ T3929] RSP: 002b:00007ffcfaa833d8 EFLAGS: 00000206 ORIG_RAX: 00000000000000cb
[ 102.238145][ T3929] RAX: ffffffffffffffda RBX: 00007f23dcfd5000 RCX: 00007f23de6240d7
[ 102.239362][ T3929] RDX: 00007ffcfaa83410 RSI: 0000000000000080 RDI: 0000000000000f59
[ 102.240707][ T3929] RBP: 0000000000000f59 R08: 0000000000000078 R09: 0000000000000000
[ 102.242100][ T3929] R10: 00007f23de7297c0 R11: 0000000000000206 R12: 000055abbb602180
[ 102.243459][ T3929] R13: 00007f23dcfd5000 R14: 0000000000000000 R15: 0000000000000000
[ 102.244777][ T3929] </TASK>
[ 102.245406][ T3929] Modules linked in: crc32c_intel polyval_clmulni polyval_generic input_leds ghash_clmulni_intel mac_hid processor fuse stm_p_basic ofpart cmdlinepart
[ 102.247940][ T3929] ---[ end trace 0000000000000000 ]---
[ 102.248887][ T3929] RIP: 0010:sched_setaffinity (??:?)
[ 102.249926][ T3929] Code: 4c 89 fa b8 ff ff 37 00 48 c1 ea 03 48 c1 e0 2a 80 3c 02 00 74 08 4c 89 ff e8 47 8c 2e 00 b8 ff ff 37 00 4d 8b 3f 48 c1 e0 2a <80> 38 00 74 07 31 ff e8 9c 8c 2e 00 48 c7 c0 48 26 b6 84 ba ff ff
All code
========
0: 4c 89 fa mov %r15,%rdx
3: b8 ff ff 37 00 mov $0x37ffff,%eax
8: 48 c1 ea 03 shr $0x3,%rdx
c: 48 c1 e0 2a shl $0x2a,%rax
10: 80 3c 02 00 cmpb $0x0,(%rdx,%rax,1)
14: 74 08 je 0x1e
16: 4c 89 ff mov %r15,%rdi
19: e8 47 8c 2e 00 callq 0x2e8c65
1e: b8 ff ff 37 00 mov $0x37ffff,%eax
23: 4d 8b 3f mov (%r15),%r15
26: 48 c1 e0 2a shl $0x2a,%rax
2a:* 80 38 00 cmpb $0x0,(%rax) <-- trapping instruction
2d: 74 07 je 0x36
2f: 31 ff xor %edi,%edi
31: e8 9c 8c 2e 00 callq 0x2e8cd2
36: 48 c7 c0 48 26 b6 84 mov $0xffffffff84b62648,%rax
3d: ba .byte 0xba
3e: ff (bad)
3f: ff .byte 0xff

Code starting with the faulting instruction
===========================================
0: 80 38 00 cmpb $0x0,(%rax)
3: 74 07 je 0xc
5: 31 ff xor %edi,%edi
7: e8 9c 8c 2e 00 callq 0x2e8ca8
c: 48 c7 c0 48 26 b6 84 mov $0xffffffff84b62648,%rax
13: ba .byte 0xba
14: ff (bad)
15: ff .byte 0xff
[ 102.253027][ T3929] RSP: 0018:ffffc90005237e40 EFLAGS: 00010286
[ 102.254116][ T3929] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 102.255456][ T3929] RDX: 1ffff92000a46fdd RSI: ffffffff8410cde0 RDI: ffffffff839661d8
[ 102.256858][ T3929] RBP: ffffc90005237eb0 R08: 0000000000000000 R09: ffffffff8597ace7
[ 102.258260][ T3929] R10: 0000000000000000 R11: ffffffff811a63eb R12: ffff88816fe32880
[ 102.261801][ T3929] R13: ffffc90005237e88 R14: 1ffff92000a46fc9 R15: 0000000000000001
[ 102.263250][ T3929] FS: 00007f23de705600(0000) GS:ffffffff83cd4000(0000) knlGS:0000000000000000
[ 102.264793][ T3929] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 102.265904][ T3929] CR2: 00007f23de624060 CR3: 000000013f39d000 CR4: 00000000000406f0
[ 102.267328][ T3929] DR0: 00007f23dc674000 DR1: 0000000000000000 DR2: 0000000000000000
[ 102.268665][ T3929] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000070602
[ 102.270108][ T3929] Kernel panic - not syncing: Fatal exception
[ 102.271164][ T3929] Kernel Offset: disabled


If you fix the issue, kindly add following tag
| Reported-by: kernel test robot <[email protected]>
| Link: https://lore.kernel.org/oe-lkp/[email protected]


To reproduce:

# build kernel
cd linux
cp config-6.2.0-rc1-00030-g66f9c1813a72 .config
make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
cd <mod-install-dir>
find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz


git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests


Attachments:
(No filename) (11.42 kB)
config-6.2.0-rc1-00030-g66f9c1813a72 (145.94 kB)
job-script (4.91 kB)
dmesg.xz (34.38 kB)
Download all attachments