Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754156AbcKWQ3f (ORCPT ); Wed, 23 Nov 2016 11:29:35 -0500 Received: from mx2.suse.de ([195.135.220.15]:38590 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751010AbcKWQ3d (ORCPT ); Wed, 23 Nov 2016 11:29:33 -0500 Date: Wed, 23 Nov 2016 17:29:30 +0100 From: Petr Mladek To: Daniel Thompson Cc: Jason Wessel , Peter Zijlstra , Andrew Morton , Sergey Senozhatsky , linux-kernel@vger.kernel.org Subject: Re: [PATCH 2/2] kdb: Call vkdb_printf() from vprintk_default() only when wanted Message-ID: <20161123162930.GG8220@pathway.suse.cz> References: <1477054235-1624-1-git-send-email-pmladek@suse.com> <1477054235-1624-3-git-send-email-pmladek@suse.com> <73b8fe23-4fc7-9d56-ed78-a3d6b398ad74@linaro.org> <20161122124516.GD8220@pathway.suse.cz> <795a631b-791e-14aa-8936-717b789b9b82@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <795a631b-791e-14aa-8936-717b789b9b82@linaro.org> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3484 Lines: 91 On Tue 2016-11-22 14:32:39, Daniel Thompson wrote: > On 22/11/16 12:45, Petr Mladek wrote: > >On Mon 2016-11-07 10:24:22, Daniel Thompson wrote: > >>On 21/10/16 13:50, Petr Mladek wrote: > >>>kdb_trap_printk allows to pass normal printk() messages to kdb via > >>>vkdb_printk(). For example, it is used to get backtrace using > >>>the classic show_stack(), see kdb_show_stack(). > >>> > >>>vkdb_printf() tries to avoid a potential infinite loop by disabling > >>>the trap. But this approach is racy, for example: > >>> > >>>CPU1 CPU2 > >>> > >>>vkdb_printf() > >>> // assume that kdb_trap_printk == 0 > >>> saved_trap_printk = kdb_trap_printk; > >>> kdb_trap_printk = 0; > >>> > >>> kdb_show_stack() > >>> kdb_trap_printk++; > >> > >>When kdb is running any of the commands that use kdb_trap_printk > >>there is a single active CPU and the other CPUs should be in a > >>holding pen inside kgdb_cpu_enter(). > >> > >>The only time this is violated is when there is a timeout waiting > >>for the other CPUs to report to the holding pen. > > > >It means that the race window is small but it is there. Do I get > >it correctly, please? > > I don't think the size of the race window is particularly > interesting; it does exist. > > It is more important to be clear that the circumstances when the > race exists are when an attempt to stop-the-world has failed. So > rather than close the race it might be more useful to make the race > unreachable by bringing the CPU that has failed to report under > control. Yup. The question is how the stop-the-world could fail and if we could prevent all failures now and in the future. > vkdb_printf()'s rough job is to: > > 1. Issue the string to the kdb console (not the same as kernel > console) > 2. Store the string in the kernel logs. > > Of the above, #2 is achieved by temporarily setting the console log > level to 0, calling printk() and restoring the log level. This is > why vkdb_printf() needs the silly dance to avoid the recursion. > > Right now I'm toying with the idea of using the printk_func code for > printk() interception. > > Firstly printk_func is per-cpu which removes a huge chain of complex > reasoning (for a case that doesn't really exist). > > Secondly we could call the saved_printk_func directly making > recursion impossible. Note that there are plans to define per-CPU printk_context variable, see https://lkml.kernel.org/r/20161027154933.1211-4-sergey.senozhatsky@gmail.com It will be used to choose the right printk_func. It will allow to have more contexts and enter some of them recursively. Then it should be easier to add support for KDB context. > Sadly, if we accept that a rouge CPU might still be calling printk() > then the approach above just opens up a race on the console log > level... We are also very close to add async printk. It will allow to delegate the console handling into separate kthread. Then printk() should not longer call console directly. The only exception would be a system panic, when we really want to get messages out ASAP. This last patchset can be seen at https://lkml.kernel.org/r/1461333180-2897-1-git-send-email-sergey.senozhatsky@gmail.com But Sergey plans to get in the printk_context/printk_safe stuff first. Anyway, then it should be easier to disable the console in kdb context. Best Regards, Petr PS: I am going to prepare v2 of this patchset. It might take some time as I am repeatedly interrupted by some other urgent work.