Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp1863390yba; Fri, 10 May 2019 02:16:55 -0700 (PDT) X-Google-Smtp-Source: APXvYqyb3SLlRyl7Xq5jmHqzln1IesdrTESacugYz0RT6lNipKRT93PEAHaMz7elXthv4pJPNAvb X-Received: by 2002:a62:1a8b:: with SMTP id a133mr12152094pfa.87.1557479815788; Fri, 10 May 2019 02:16:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1557479815; cv=none; d=google.com; s=arc-20160816; b=noSGQzKuv0bIu/L3L8ZgwRHLzNVqvgQn/wfOv30qSH7FboHpJsg729XZ1XU1F3QzXN rAnHqZ+f/o8+9Y4N67bISTn3MK48F8yK0x5thLiAENPgO6avAM+AcmWU/lSCepMUPHhf nXGHrNHUl66C1iYzebUZd7UZgGsiuN77cKJMG5KLMMucgGCRyDaZo5J1JLlaCJI/ghsG NJk8Zzl+cd9xkpqkuiJAcvyOQST1mamBQMZYPiKZAenYFyzY4i9xbJV1iyvirZv1Favo gF06aaPIDTh5bLsisEDlwmo8p3AP4AS7aURVRtakIwTMkkZ9Shkqww8zkcL1iKWS/FeK WO/g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=3A+3EAHYobCEMXm7gsBbaqTEwFo+nyVccTSx0iwx+7M=; b=YVlZ/dg1/pCQjlT4qev4yHM4xY6HCGSV8A4yVwukTPqqtuZN2mmXwPCqMOyVJOxTJE UXPbb4+swuN2WXSP1QT3s4rxj+bCOFKd4wbyDnOktJjykAh8sRE2gwj2ro7RwwuetNCh AUNygSu6k8Xv+Bos6lBcOR+b19tG1GUAZUQeyfkGu7tVAQImA5kfzJ1ER6YlYn6LBc/3 uCUACz/6fssCPztmpPauaHqxdZ7ejv1ZOM1XfqGGPY9F4/X4r59gz30l7CvX4rgytZFW EbiST4phPKGVSNZDWp1A56O7tHTWElJ5rEIrXkR5B1AVYJad/cMVCj5omAWqk6v/QnQW CLbw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j190si6954195pgd.394.2019.05.10.02.16.39; Fri, 10 May 2019 02:16:55 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727347AbfEJJPm (ORCPT + 99 others); Fri, 10 May 2019 05:15:42 -0400 Received: from mx2.suse.de ([195.135.220.15]:53908 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727281AbfEJJPk (ORCPT ); Fri, 10 May 2019 05:15:40 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 491E3AE84; Fri, 10 May 2019 09:15:38 +0000 (UTC) Date: Fri, 10 May 2019 11:15:37 +0200 From: Petr Mladek To: Daniel Vetter Cc: Intel Graphics Development , DRI Development , Daniel Vetter , Peter Zijlstra , Ingo Molnar , Will Deacon , Sergey Senozhatsky , Steven Rostedt , John Ogness , Chris Wilson , Linux Kernel Mailing List Subject: Re: [PATCH] RFC: console: hack up console_lock more v3 Message-ID: <20190510091537.44e3aeb7gcrcob76@pathway.suse.cz> References: <20190509120903.28939-1-daniel.vetter@ffwll.ch> <20190509145620.2pjqko7copbxuzth@pathway.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20170912 (1.9.0) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 2019-05-09 18:43:12, Daniel Vetter wrote: > One thing to keep in mind is that the kernel is already dying, and > things will come crashing down later on This is important information. I havn't seen it mentioned earlier. > (I've seen this only in dmesg > tails capture in pstore in our CI, i.e. the box died for good). I just > want to make sure that the useful information isn't overwritten by > more dmesg splats that happen as a consequence of us somehow trying to > run things on an offline cpu. Once console_unlock has completed in > your above backtrace and the important stuff has gone out I'm totally > fine with the kernel just dying. Pulling the wake_up_process out from > under the semaphore.lock is enough to prevent lockdep from dumping > more stuff while we're trying to print the important things, With the more stuff you mean the lockdep splat? If yes, it might make sense to call debug_locks_off() earlier in panic(). > and I think the untangling of the locking hiararchy is useful irrespective > of this lockdep splat. Plus Peter doesn't sound like he likes to roll > out more printk_deferred kind of things. > > But if you think I should do the printk_deferred thing too I can look > into that. Just not quite sure what that's supposed to look like now. Your patch might remove the particular lockdep splat. It might be worth it (Peter mentioned also an optimization effect). Anyway it will not prevent the deadlock. The only way to avoid the deadlock is to use printk_deferred() with the current printk() code. Finally, I have recently worked on similar problem with dying system. I sent the following patch for testing. I wonder if it might be acceptable upstream: From: Petr Mladek Subject: sched/x86: Do not warn about offline CPUs when all are being stopped Patch-mainline: No, just for testing References: bsc#1104406 The warning about rescheduling offline CPUs cause dealock when the CPUs need to get stopped using NMI. It might happen with logbuf_lock, locks used by console drivers, especially tty. But it might also be caused by a registered kernel message dumper, for example, pstore. The warning is pretty common when there is a high load and CPUs are being stopped by native_stop_other_cpus(). But they are not really useful in this context. And they scrolls the really important messages off the screen. We need to fix printk() in the long term. But disabling the message looks reasonable at least in the meantime. Signed-off-by: Petr Mladek --- arch/x86/kernel/smp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) --- a/arch/x86/kernel/smp.c +++ b/arch/x86/kernel/smp.c @@ -124,7 +124,8 @@ static bool smp_no_nmi_ipi = false; */ static void native_smp_send_reschedule(int cpu) { - if (unlikely(cpu_is_offline(cpu))) { + if (unlikely(cpu_is_offline(cpu) && + atomic_read(&stopping_cpu) < 0)) { WARN(1, "sched: Unexpected reschedule of offline CPU#%d!\n", cpu); return; }