Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753426AbbK3CU4 (ORCPT ); Sun, 29 Nov 2015 21:20:56 -0500 Received: from mail-lf0-f42.google.com ([209.85.215.42]:33314 "EHLO mail-lf0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752748AbbK3CUu (ORCPT ); Sun, 29 Nov 2015 21:20:50 -0500 MIME-Version: 1.0 In-Reply-To: <56584916.3080606@unitn.it> References: <20151127112647.GR20439@e106622-lin> <56584916.3080606@unitn.it> Date: Mon, 30 Nov 2015 10:20:48 +0800 Message-ID: Subject: Re: [PATCH v3] sched/deadline: fix earliest_dl.next logic From: Wanpeng Li To: Luca Abeni Cc: Juri Lelli , Wanpeng li , Ingo Molnar , Peter Zijlstra , "linux-kernel@vger.kernel.org" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5426 Lines: 123 2015-11-27 20:14 GMT+08:00 Luca Abeni : > Hi all, > > I ran some quick tests on this patch (because I was working on something > related), and it seems to me that it triggers a bug. Here are some > information: > - I applied the patch to git master (to be sure that I did not corrupt the > patch, > I extracted it again with git format-patch, and posted it here: > > http://disi.unitn.it/~abeni/H-Test/0001-sched-deadline-fix-earliest_dl.next-logic.patch > ) > - I tried a simple test program creating 4 periodic threads and scheduling > them with > SCHED_DEADLINE. And I got something like: > [ 29.055688] ------------[ cut here ]------------ > [ 29.056563] kernel BUG at > /home/luca/Src/Kernel/linux/kernel/sched/deadline.c:1443! > [ 29.056563] invalid opcode: 0000 [#1] SMP > [ 29.056563] Modules linked in: > [ 29.056563] CPU: 0 PID: 1136 Comm: periodic_thread Not tainted 4.4.0-rc2+ > #3 > [ 29.056563] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS > Bochs 01/01/2011 > [ 29.056563] task: ffff880006603d80 ti: ffff8800067f4000 task.ti: > ffff8800067f4000 > [ 29.056563] RIP: 0010:[] [] > pick_next_pushable_dl_task+0x5a/0x60 > [ 29.056563] RSP: 0000:ffff8800067f7e38 EFLAGS: 00010046 > [ 29.056563] RAX: ffff880006603d80 RBX: ffff880006a7ac50 RCX: > ffff8800067f4000 > [ 29.056563] RDX: ffff8800066040d0 RSI: 0000000000000000 RDI: > ffff880007a15ac0 > [ 29.056563] RBP: ffff8800067f7e38 R08: 0000000000000000 R09: > 0000000000000000 > [ 29.056563] R10: ffff880006603d80 R11: ffff880006604548 R12: > ffff880007a15ac0 > [ 29.056563] R13: ffff880006a7a900 R14: ffff8800066042e0 R15: > ffffffff81a09cc0 > [ 29.056563] FS: 00007f667e179700(0000) GS:ffff880007a00000(0000) > knlGS:0000000000000000 > [ 29.056563] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 29.056563] CR2: 00007f71f3c221b8 CR3: 00000000067ae000 CR4: > 00000000000006f0 > [ 29.056563] Stack: > [ 29.056563] ffff8800067f7e60 ffffffff81097137 ffff880007a15ac0 > ffff880006a7ab20 > [ 29.056563] ffff880006a7a900 ffff8800067f7e90 ffffffff81098ef1 > ffff880006603d80 > [ 29.056563] ffff880007a15ac0 ffff880007a15ad8 0000000000000000 > ffff8800067f7ee8 > [ 29.056563] Call Trace: > [ 29.056563] [] dequeue_pushable_dl_task+0x47/0x90 > [ 29.056563] [] pick_next_task_dl+0x81/0x200 > [ 29.056563] [] __schedule+0xb3d/0xd00 > [ 29.056563] [] schedule+0x37/0x80 > [ 29.056563] [] exit_to_usermode_loop+0x39/0x7f > [ 29.056563] [] prepare_exit_to_usermode+0x46/0x50 > [ 29.056563] [] retint_user+0x8/0x20 > [ 29.056563] Code: 3b 87 18 09 00 00 74 23 83 7a 98 01 7e 1b 83 ba f4 fc > ff ff 01 75 10 8b 92 f8 fc ff ff 85 d2 78 04 0f 0b 31 c0 5d c3 0f 0b 0f 0b > <0f> 0b 0f 0b 66 90 55 48 89 e5 41 54 49 89 f4 53 48 8b b7 18 09 > [ 29.056563] RIP [] > pick_next_pushable_dl_task+0x5a/0x60 > [ 29.056563] RSP > [ 29.056563] ---[ end trace 9f49218ec54a0b55 ]--- > - How to reproduce: > + Download http://disi.unitn.it/~abeni/H-Test/newcore-3.gz > + Start KVM with: > qemu-system-x86_64 -initrd newcore-3.gz -machine > pc-q35-2.0,accel=kvm -smp cpus=2 -kernel bzImage -serial file:qlog.txt > -append "console=ttyS0" > + Inside the VM, do > cd / > sudo ./periodic_thread > + You should see a crash in few seconds > The "periodic_thread" program just creates 4 periodic threads (with > execution time about > 10ms and period about 100ms) scheduled with SCHED_DEADLINE (with runtime > 15ms and period > 100ms). I can provide the source code if needed. > - The full kernel log is available here: > http://disi.unitn.it/~abeni/H-Test/qlog.txt > > > Here is my understanding of the crash: > - schedule() invokes pick_next_task_dl() which wants to do a context switch > (by selecting > for execution a new task "p" which is different from "prev") > - pick_next_task_dl() invokes put_prev, which puts the "prev" task in the > pushable tasks > queue (WARNING! "prev" is still the "current" task in this rq, because the > scheduler is > still running... I think this is the source of the issue) > - then, pick_next_task_dl() invokes dequeue_pushable_dl_task() on p, to > remove the selected > task from the pushable tasks queue... > - ...But after your patch dequeue_pushable_dl_task() invokes > pick_next_pushable_dl_task(). > Which sees that the next pushable task is the "current" task (see above). > This happens > becuase "prev" has already been inserted in the pushable tasks queue, and > can be the > next pushable task... But "current" has not been updated yet. > - The BUG_ON() at line 1443 of deadline.c is just "BUG_ON(task_current(rq, > p))" Thanks for your report and great analyse, Luca, you are right. > > Summing up, I think pick_next_pushable_dl_task() cannot be called from > dequeue_pushable_dl_task() (at least, not without removing or modifying that > BUG_ON()). Juri, how about remove this BUG_ON() just like rt class? Regards, Wanpeng Li -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/