Received: by 10.223.176.46 with SMTP id f43csp4343378wra; Tue, 23 Jan 2018 08:03:25 -0800 (PST) X-Google-Smtp-Source: AH8x225TF+azE/XlRH3ShWJ8HQNYrvnKasOBH3ubc5oKFrUyS1z82jnk17o2LhznvhWWo/ZA/sMw X-Received: by 10.107.142.202 with SMTP id q193mr4430955iod.100.1516723404941; Tue, 23 Jan 2018 08:03:24 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516723404; cv=none; d=google.com; s=arc-20160816; b=KLGSMf9FbwzU4akWJNTB3bidONNvojaJfhSKGiPpfqBfBkLrR7OXF020mJ+z2daYVQ fZ8Sg9fy2LrN0K/pFYTEwXSw/UAj5yacA8FfXLxrGwDsNjhwMaHWUExrR62cSTD3arI6 wZvP4NUAVmjvuIADKbg1xgB2dwmOIUkzAZEtRgVETtpuFI5WfF5PqDB95oYgKCm/10Xy A65A86pTAimdUxJkGPINrLLFBgmDoY0ooK4ConawDFQ/+Ir0s82cKAcoPvFlBzmXDPc6 dPC4u5Qz4ac91HX2BpsRiMTuGd7puNo81Lbmrrp89KI+8xH+KvYmXz2FzLWGvSuXhmPh 5Z0g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=16dT3VSS5ghRTd4yjV9naL0iFAE8bDLhi0BhqYMPLAQ=; b=jjsBbm/Af/ikM3UN1FQjYnDMpWuaIO28SaQNjExOSIGkAaGmF0kmkvYCbs7opGZXTB WlhGVgCObslFZzqL6U+crOGuVtjQN4C3Hl5OD49BGQV7FY8BmFzCyrSiRPVkyCfEH+Z4 vfWSDVkwi0lmphQwwDfgguZXMFQrePAvfuuqacWcN5cStXtjPU19Z0chQ02J7kxbbn7A LkrS3HospQzYL4bCvGShgVoICpb48g2GpgBARR4H+zdfEPj3qafxBif7FBQxu2RyykL5 2aNWvU19Fg0HiH1mnLaqvizyMIIuGS+lSGuKJUUmzlmfF3TY1nyiwJSXQjxucrObu0du sOPQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=mh9c3Qg4; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e7si8058209ita.132.2018.01.23.08.03.11; Tue, 23 Jan 2018 08:03:24 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=mh9c3Qg4; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752335AbeAWQCB (ORCPT + 99 others); Tue, 23 Jan 2018 11:02:01 -0500 Received: from mail-pf0-f173.google.com ([209.85.192.173]:34785 "EHLO mail-pf0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751401AbeAWQB6 (ORCPT ); Tue, 23 Jan 2018 11:01:58 -0500 Received: by mail-pf0-f173.google.com with SMTP id e76so656745pfk.1 for ; Tue, 23 Jan 2018 08:01:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=16dT3VSS5ghRTd4yjV9naL0iFAE8bDLhi0BhqYMPLAQ=; b=mh9c3Qg4gZ25Ufkl5bbNr/tMzlo3Pm65fFDDYd9yf7Y+VxrGFDt/ayN57F3KeAgIRS vh4aFRZ6CNN8wbE+yccYigkx2ckMIh4FQUF7bg4HUHwHPmAqnA4VHPwb4V0UFonjzPjC RsrtTLv+nJRe3/kFMgdlwgW1oADyj/aUpgIr8GIg1+T0H94SA13DMJhPjOOvxdkHjM7W ZphehNW1VqliwDc7Vb8oaUcbA+e5Kx0AAe9cGMm8qO3tvHQneqyLbbYs+5LzxyY19VLw 8tglpRwgnlxBMcu0bfqExVfXKgobcK2k7hrjg3nPeNGYCkcl5pOfdO2WxTDIG0eVnqBi KQ9Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=16dT3VSS5ghRTd4yjV9naL0iFAE8bDLhi0BhqYMPLAQ=; b=GURFWptPBdzqD/Ob0a9K6+hyxi3hP74nJcuwfDHmLBoJdWjD8L8cexpaGrFI742mu6 VVIR7FfqkcdQ/tiMIjWYJW2fFJAMK4GU69L+hTnUYcfIiRNRlc2Tb1Blb6beaYef+EPR MwqFZhC9c0Cw1z5I1pfJk9xXEhhAv5Ml1KkcuC9K7ynarsBwpr5fQ+WXUG5+bKCJNZ5j Ky2pB6+mG+RYkPGjF+mJLCig6kJZnOJkN/IvgC2OxjhhwZOnLq96I2t38t2fCovjNfWv XTxW5de6/A1tlofzXPUYL46QhfWJql31C7YRFCCJTAbCSNCqNMkHV2JA6OfrLgkh/hij EJBw== X-Gm-Message-State: AKwxytdNid1exqjZqhoL/62rIXXrNpsCO96ky5cqjMiCTEGoqW1a+3Rl bhtpQklvT2ILCGcFbNKMgmI= X-Received: by 2002:a17:902:380c:: with SMTP id l12-v6mr5509171plc.8.1516723317503; Tue, 23 Jan 2018 08:01:57 -0800 (PST) Received: from localhost ([121.137.63.184]) by smtp.gmail.com with ESMTPSA id k195sm26060895pgc.61.2018.01.23.08.01.55 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 23 Jan 2018 08:01:56 -0800 (PST) Date: Wed, 24 Jan 2018 01:01:53 +0900 From: Sergey Senozhatsky To: Steven Rostedt Cc: Sergey Senozhatsky , Sergey Senozhatsky , Petr Mladek , Tejun Heo , akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang , Dave Hansen , Johannes Weiner , Mel Gorman , Michal Hocko , Vlastimil Babka , Peter Zijlstra , Linus Torvalds , Jan Kara , Mathieu Desnoyers , Tetsuo Handa , rostedt@home.goodmis.org, Byungchul Park , Pavel Machek , linux-kernel@vger.kernel.org Subject: Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup Message-ID: <20180123160153.GC429@tigerII.localdomain> References: <20180117121251.7283a56e@gandalf.local.home> <20180117134201.0a9cbbbf@gandalf.local.home> <20180119132052.02b89626@gandalf.local.home> <20180120071402.GB8371@jagdpanzerIV> <20180120104931.1942483e@gandalf.local.home> <20180121141521.GA429@tigerII.localdomain> <20180123064023.GA492@jagdpanzerIV> <20180123095652.5e14da85@gandalf.local.home> <20180123152130.GB429@tigerII.localdomain> <20180123104121.2ef96d81@gandalf.local.home> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180123104121.2ef96d81@gandalf.local.home> User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On (01/23/18 10:41), Steven Rostedt wrote: [..] > We can have more. But if printk is causing printks, that's a major bug. > And work queues are not going to fix it, it will just spread out the > pain. Have it be 100 printks, it needs to be fixed if it is happening. > And having all printks just generate more printks is not helpful. Even > if we slow them down. They will still never end. Dropping the messages is not the solution either. The original bug report report was - this "locks up my kernel". That's it. That's all people asked us to solve. With WQ we don't lockup the kernel, because we flush printk_safe in preemptible context. And people are very much expected to fix the misbehaving consoles. But that should not be printk_safe problem. > A printk causing a printk is a special case, and we need to just show > enough to let the user know that its happening, and why printks are > being throttled. Yes, we may lose data, but if every printk that goes > out causes another printk, then there's going to be so much noise that > we wont know what other things went wrong. Honestly, if someone showed > me a report where the logs were filled with printks that caused > printks, I'd stop right there and tell them that needs to be fixed > before we do anything else. And if that recursion is happening because > of another problem, I don't want to see the recursion printks. I want > to see the printks that show what is causing the recursions. I'll re-read this one tomorrow. Not quite following it. > > The problem is - we flush printk_safe too soon and printing CPU ends up > > in a lockup - it log_store()-s new messages while it's printing the pending > > No, the problem is that printks are causing more printks. Yes that will > make flushing them soon more likely to lock up the system. But that's > not the problem. The problem is printks causing printks. Yes. And ignoring those printk()-s by simply dropping them does not fix the problem by any means. > > ones. It's fine to do so when CPU is in preemptible context. Really, we > > should not care in printk_safe as long as we don't lockup the kernel. The > > misbehaving console must be fixed. If CPU is not in preemptible context then > > we do lockup the kernel. Because we flush printk_safe regardless of the > > current CPU context. If we will flush printk_safe via WQ then we automatically > > And if we can throttle recursive printks, then we should be able to > stop that from happening. pintk_safe was designed to be recursive. It was never designed to be used to troubleshoot or debug consoles. But it was designed to be recursive - because that's the sort of the problems it was meant to handle: recursive printks that would otherwise deadlock us. That's why we have it in the first place. > > add this "OK! The CPU is preemptible, we can log_store(), it's totally OK, we > > will not lockup it up." thing. Yes, we fill up the logbuf with probably needed > > and appreciated or unneeded messages. But we should not care in printk_safe. > > We don't lockup the kernel... And the misbehaving console must be fixed. > > I agree. Good. > > I disagree with "If we are having issues with irq_work, we are going to have > > issues with a work queue". There is a tremendous difference between irq_work > > on that CPU and queue_work_on(smp_proessor_id()). One does not care about CPU > > context, the other one does. > > But switching to work queue does not address the underlining problem > that printks are causing more printks. The only way to address those problems is to fix the console. That's the only. But that's not what I'm doing with my proposal. I fix the lockup scenario, the only reported problem so far. Whilst also keeping printk_safe around. -ss