Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753891AbbKBOgt (ORCPT ); Mon, 2 Nov 2015 09:36:49 -0500 Received: from mx1.redhat.com ([209.132.183.28]:60539 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753771AbbKBOgr (ORCPT ); Mon, 2 Nov 2015 09:36:47 -0500 Date: Mon, 2 Nov 2015 16:33:00 +0100 From: Oleg Nesterov To: Dmitry Vyukov Cc: Roland McGrath , Andrew Morton , "Amanieu d'Antras" , pmoore@redhat.com, Ingo Molnar , vdavydov@parallels.com, qiaowei.ren@intel.com, dave@stgolabs.net, Palmer Dabbelt , LKML , syzkaller , Kostya Serebryany , Alexander Potapenko , Sasha Levin Subject: Re: WARNING in task_participate_group_stop Message-ID: <20151102153300.GA21006@redhat.com> References: <20151102151333.GA17152@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3785 Lines: 103 On 11/02, Dmitry Vyukov wrote: > > On Mon, Nov 2, 2015 at 4:13 PM, Oleg Nesterov wrote: > > Hi Dmitry, > > > > On 11/02, Dmitry Vyukov wrote: > >> > >> WARNING: CPU: 1 PID: 1 at kernel/signal.c:334 > >> task_participate_group_stop+0x157/0x1d0() > >> Modules linked in: > >> CPU: 1 PID: 1 Comm: init Not tainted 4.3.0 #48 > >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 > >> ffffffff82e40280 ffff88003eb0fae0 ffffffff819efe55 0000000000000000 > >> ffff88003eb0fb20 ffffffff810ec871 ffffffff8110f4d7 ffff88003eb00000 > >> ffff88003eb20000 0000000000000000 ffff88003eb0fbf8 ffff88003eb20000 > >> Call Trace: > >> [] warn_slowpath_null+0x15/0x20 kernel/panic.c:480 > >> [] task_participate_group_stop+0x157/0x1d0 > >> kernel/signal.c:334 > >> [] do_signal_stop+0x1e7/0x6e0 kernel/signal.c:2060 > >> [] get_signal+0x387/0x11b0 kernel/signal.c:2316 > >> [] do_signal+0x8d/0x19e0 arch/x86/kernel/signal.c:707 > >> [] prepare_exit_to_usermode+0x11d/0x170 > >> arch/x86/entry/common.c:251 > >> [] syscall_return_slowpath+0xa3/0x2b0 > >> arch/x86/entry/common.c:317 > >> [] int_ret_from_sys_call+0x25/0x8f > >> arch/x86/entry/entry_64.S:281 > >> ---[ end trace f6697fd630b7c361 ]--- > >> > >> > >> The reproducer is (needs to be run as root): > >> > >> // autogenerated by syzkaller (http://github.com/google/syzkaller) > >> #include > >> #include > >> > >> int main() > >> { > >> int pid = 1; > >> ptrace(PTRACE_ATTACH, pid, 0, 0); > >> ptrace(PTRACE_SETOPTIONS, pid, 0, PTRACE_O_EXITKILL); > >> sleep(1); > >> return 0; > >> } > > > > Thanks. > > > > Can't reproduce, but at first glance the problem looks clear... > > Humm... did you run as root? Yes, > It reproduces all the time on my 4.3 kernel VM. Also firmly killed my > desktop running 3.13. Yes, it kills init and crashes the kernel. But I do not see the warning. > >> Yes, it is weird and it kills init right afterwards. > > > > Could you confirm that this WARN_ON() happens _after_ the reproducer exits? > > > >> But I wasn't able > >> to figure out what's the root cause (why task does not have > >> JOBCTL_STOP_PENDING) and maybe the same WARNING can be triggered > >> without root and/or with other than init process. So still posting it > >> here. > > > > Yes I think you are right. SIGSTOP can race with SIGKILL which (unlike SIGCONT) > > doesn't clear JOBCTL_STOP_DEQUEUED/PENDING/etc. > > > > This is mostly fine, the task won't block in TASK_STOPPED if SIGKILL is pending, > > but still is not right and leads to the warning above: JOBCTL_STOP_PENDING was not > > set because do_signal_stop()->task_set_jobctl_pending() checks fatal_signal_pending(). On a second thought, in this particular case (your test-case), SIGSTOP/SIGKILL do not race, although (so far) I think this doesn't matter. JOBCTL_STOP_PENDING comes from __ptrace_unlink() when the tracee already has the pending SIGKILL due to PTRACE_O_EXITKILL. Now. If the tracee (init) wakes up and dequeues SIGKILL before __ptrace_unlink() adds JOBCTL_STOP_PENDING, it won't see JOBCTL_STOP_PENDING and probably this is what happens on my testing machine. Perhaps __ptrace_unlink() should me more carefull too... > > Probably the patch below should fix the problem, but I'd like to think more before > > I send the fix. > > > Will test it. Great, thanks. Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/