Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932136AbbHLPJf (ORCPT ); Wed, 12 Aug 2015 11:09:35 -0400 Received: from orbit.nwl.cc ([176.31.251.142]:35352 "EHLO mail.nwl.cc" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753540AbbHLPJd (ORCPT ); Wed, 12 Aug 2015 11:09:33 -0400 Date: Wed, 12 Aug 2015 17:09:31 +0200 From: Phil Sutter To: linux-kernel@vger.kernel.org Cc: Phil Sutter Subject: kthreads: sporadic NULL pointer dereference in exit_creds() Message-ID: <20150812150931.GE32353@orbit.nwl.cc> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 9833 Lines: 220 [please keep me on Cc: since I am not subscribed to this list] Hi, While enhancing lib/test_rhashtable.c by a few threads to provoke concurrency issues, I encountered a bug in the kernel's cleanup routines for kthreads. Upon calling kthread_stop(), it would occasionally call exit_creds() for the same task_struct pointer twice, thereby crashing the kernel in the second invocation due to dereferencing tsk->cred and tsk->real_cred being NULL. I managed to isolate the bug trigger into a small kernel module which: 1) creates a number of kernel threads 2) uses two semaphores to synchronise the thread startup and make the module init code wait for them to actually run before continuing 3) calls kthread_stop() for all started threads. I could successfully verify the problem using Linus's tree at commit 58ccab91342c1cc1fe08da9b198ac5d763706c2e (4.2.0-rc6) on a qemu machine with only a single virtual CPU. In order to analyze the problem, I've added a printk() to exit_creds(): -----------------------------------8<----------------------------------- --- a/kernel/cred.c +++ b/kernel/cred.c @@ -152,6 +152,7 @@ void exit_creds(struct task_struct *tsk) { struct cred *cred; + printk(KERN_INFO "%s: called for tsk at %p\n", __func__, tsk); kdebug("exit_creds(%u,%p,%p,{%d,%d})", tsk->pid, tsk->real_cred, tsk->cred, atomic_read(&tsk->cred->usage), read_cred_subscribers(tsk->cred)); -----------------------------------8<----------------------------------- Here is the kernel log after calling 'insmod ./kthread_test.ko tcount=20': -----------------------------------8<----------------------------------- [ 49.566682] kthread_test: module verification failed: signature and/or required key missing - tainting kernel [ 49.570457] init_kthread_test: starting test run [ 49.576535] init_kthread_test: started thread at ffff88003ae18000 [ 49.577552] init_kthread_test: started thread at ffff88003ae1a300 [ 49.578585] init_kthread_test: started thread at ffff88003ae1c600 [ 49.579580] init_kthread_test: started thread at ffff88003ae1d780 [ 49.580565] init_kthread_test: started thread at ffff88003ae1e900 [ 49.581553] init_kthread_test: started thread at ffff88003ac96900 [ 49.582604] init_kthread_test: started thread at ffff88003ac95780 [ 49.583607] init_kthread_test: started thread at ffff88003ac94600 [ 49.584766] init_kthread_test: started thread at ffff88003ac93480 [ 49.585783] init_kthread_test: started thread at ffff88003ac92300 [ 49.586775] init_kthread_test: started thread at ffff88003ac91180 [ 49.587772] init_kthread_test: started thread at ffff88003ac90000 [ 49.588763] init_kthread_test: started thread at ffff88003a9f0000 [ 49.589760] init_kthread_test: started thread at ffff88003a9f1180 [ 49.590755] init_kthread_test: started thread at ffff88003a9f2300 [ 49.591747] init_kthread_test: started thread at ffff88003a9f3480 [ 49.592742] init_kthread_test: started thread at ffff88003a9f4600 [ 49.593731] init_kthread_test: started thread at ffff88003a9f5780 [ 49.594722] init_kthread_test: started thread at ffff88003a9f6900 [ 49.595719] init_kthread_test: started thread at ffff88003aa18000 [ 49.596711] thread: running [ 49.597329] thread: running [ 49.597918] thread: running [ 49.598553] thread: running [ 49.599166] thread: running [ 49.599755] thread: running [ 49.600356] thread: running [ 49.601161] thread: running [ 49.602028] thread: running [ 49.613245] thread: running [ 49.613850] thread: running [ 49.614459] thread: running [ 49.615063] thread: running [ 49.615647] thread: running [ 49.616248] thread: running [ 49.616834] thread: running [ 49.617438] thread: running [ 49.618156] thread: running [ 49.618737] thread: running [ 49.619877] thread: running [ 49.620562] exit_creds: called for tsk at ffff88003ae18000 [ 49.621438] exit_creds: called for tsk at ffff88003ae1a300 [ 49.622304] exit_creds: called for tsk at ffff88003ae1c600 [ 49.623152] exit_creds: called for tsk at ffff88003ae1d780 [ 49.623991] exit_creds: called for tsk at ffff88003ae1e900 [ 49.624838] exit_creds: called for tsk at ffff88003ac96900 [ 49.625698] exit_creds: called for tsk at ffff88003ac95780 [ 49.626557] exit_creds: called for tsk at ffff88003ac94600 [ 49.627412] exit_creds: called for tsk at ffff88003ac92300 [ 49.628256] exit_creds: called for tsk at ffff88003ac91180 [ 49.629118] exit_creds: called for tsk at ffff88003ac90000 [ 49.629957] exit_creds: called for tsk at ffff88003a9f0000 [ 49.630804] exit_creds: called for tsk at ffff88003a9f1180 [ 49.631687] exit_creds: called for tsk at ffff88003a9f2300 [ 49.632539] exit_creds: called for tsk at ffff88003a9f3480 [ 49.633399] exit_creds: called for tsk at ffff88003a9f4600 [ 49.635447] exit_creds: called for tsk at ffff88003a9f5780 [ 49.636686] exit_creds: called for tsk at ffff88003a9f6900 [ 49.638128] exit_creds: called for tsk at ffff88003ac92300 [ 49.638976] BUG: unable to handle kernel NULL pointer dereference at (null) [ 49.639008] IP: [] exit_creds+0x37/0x80 [ 49.639008] PGD 39473067 PUD 3965a067 PMD 0 [ 49.639008] Oops: 0000 [#1] SMP Entering kdb (current=0xffff88003ae19180, pid 10479) on processor 0 Oops: (null) due to oops @ 0xffffffff81092967 CPU: 0 PID: 10479 Comm: insmod Tainted: G OE 4.2.0-rc6+ #55 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014 task: ffff88003ae19180 ti: ffff880039b64000 task.ti: ffff880039b64000 RIP: 0010:[] [] exit_creds+0x37/0x80 RSP: 0018:ffff880039b67c08 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88003ac92300 RCX: 000000000000002e RDX: 0000000000000000 RSI: ffff88003fc0e758 RDI: 0000000000000000 RBP: ffff880039b67c18 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000002adb R11: 00000000ffffffff R12: ffff88003ac92300 R13: 0000000000000000 R14: ffffffffa00a2000 R15: ffff88003a93bed8 FS: 00007f550d346740(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 000000003d276000 CR4: 00000000001406f0 Stack: ffff88003ac92300 ffff88003ac92300 ffff880039b67c38 ffffffff8106fb22 0000000000000000 ffff88003ac92300 ffff880039b67c68 ffffffff81090c99 ffff88003aa18000 0000000000000009 ffff88003aa18000 ffff88003a93be40 Call Trace: [] __put_task_struct+0x52/0x140 [] kthread_stop+0xd9/0xe0 [] init_kthread_test+0x151/0x1000 [kthread_test] [] ? 0xffffffffa00a2000 [] do_one_initcall+0xd3/0x200 [] ? kmem_cache_alloc_trace+0x17d/0x1f0 [] ? do_init_module+0x28/0x1ea [] do_init_module+0x61/0x1ea [] load_module+0x164d/0x1960 [] ? __symbol_put+0x60/0x60 [] ? copy_module_from_fd.isra.50+0x111/0x160 [] SyS_finit_module+0x9f/0xd0 [] entry_SYSCALL_64_fastpath+0x12/0x71 Code: 44 a2 5f 81 31 c0 48 89 e5 53 48 89 fb 48 c7 c7 7f 13 7f 81 48 83 ec 08 e8 df 7c 53 00 48 8b 83 10 09 00 00 48 8b bb 08 09 00 00 <8b> 00 48 c7 83 08 09 00 00 00 00 00 00 3e ff 0f 74 1e 48 8b bb -----------------------------------8<----------------------------------- The culprit here is the second invocation of exit_creds() for a task_struct at address 0xffff88003ac92300, although the logs show kthread_run() returned this address just once. I suspect a too optimistic usage of the cached entries so that sometimes an entry is reused before exit_creds() did run for it. Here is the reproducer code (kthread_test.c) I used: -----------------------------------8<----------------------------------- #include #include #include #include static int tcount = 100; module_param(tcount, int, 0); MODULE_PARM_DESC(tcount, "Number of threads to spawn (default: 100)"); static struct semaphore prestart_sem; static struct semaphore startup_sem = __SEMAPHORE_INITIALIZER(startup_sem, 0); static int threadfunc(void *unused) { up(&prestart_sem); if (down_interruptible(&startup_sem)) pr_warn("thread: down_interruptible failed!\n"); printk(KERN_INFO "thread: running\n"); return 0; } static int __init init_kthread_test(void) { struct task_struct **tsk; int i, err; tsk = kzalloc(tcount * sizeof(struct task_struct *), GFP_KERNEL); sema_init(&prestart_sem, 1 - tcount); printk(KERN_INFO "%s: starting test run\n", __func__); for (i = 0; i < tcount; i++) { tsk[i] = kthread_run(threadfunc, NULL, "thread[%d]", i); if (IS_ERR(tsk[i])) pr_warn("%s: kthread_run failed for thread %d\n", __func__, i); else printk(KERN_INFO "%s: started thread at %p\n", __func__, tsk[i]); } if (down_interruptible(&prestart_sem)) pr_warn("%s: down_interruptible failed!\n", __func__); for (i = 0; i < tcount; i++) up(&startup_sem); for (i = 0; i < tcount; i++) { if (IS_ERR(tsk[i])) continue; if ((err = kthread_stop(tsk[i]))) pr_warn("%s: kthread_stop failed for thread %d: %d\n", __func__, i, err); } kfree(tsk); printk(KERN_INFO "%s: test run ended\n", __func__); return 0; } static void __exit exit_kthread_test(void) { } module_init(init_kthread_test); module_exit(exit_kthread_test); MODULE_LICENSE("GPL"); -----------------------------------8<----------------------------------- Cheers, Phil -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/