Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933594AbcKVOco (ORCPT ); Tue, 22 Nov 2016 09:32:44 -0500 Received: from mail-wm0-f52.google.com ([74.125.82.52]:38565 "EHLO mail-wm0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933135AbcKVOcn (ORCPT ); Tue, 22 Nov 2016 09:32:43 -0500 Subject: Re: [PATCH 2/2] kdb: Call vkdb_printf() from vprintk_default() only when wanted To: Petr Mladek References: <1477054235-1624-1-git-send-email-pmladek@suse.com> <1477054235-1624-3-git-send-email-pmladek@suse.com> <73b8fe23-4fc7-9d56-ed78-a3d6b398ad74@linaro.org> <20161122124516.GD8220@pathway.suse.cz> Cc: Jason Wessel , Peter Zijlstra , Andrew Morton , Sergey Senozhatsky , linux-kernel@vger.kernel.org From: Daniel Thompson Message-ID: <795a631b-791e-14aa-8936-717b789b9b82@linaro.org> Date: Tue, 22 Nov 2016 14:32:39 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <20161122124516.GD8220@pathway.suse.cz> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4510 Lines: 129 On 22/11/16 12:45, Petr Mladek wrote: > On Mon 2016-11-07 10:24:22, Daniel Thompson wrote: >> On 21/10/16 13:50, Petr Mladek wrote: >>> kdb_trap_printk allows to pass normal printk() messages to kdb via >>> vkdb_printk(). For example, it is used to get backtrace using >>> the classic show_stack(), see kdb_show_stack(). >>> >>> vkdb_printf() tries to avoid a potential infinite loop by disabling >>> the trap. But this approach is racy, for example: >>> >>> CPU1 CPU2 >>> >>> vkdb_printf() >>> // assume that kdb_trap_printk == 0 >>> saved_trap_printk = kdb_trap_printk; >>> kdb_trap_printk = 0; >>> >>> kdb_show_stack() >>> kdb_trap_printk++; >> >> When kdb is running any of the commands that use kdb_trap_printk >> there is a single active CPU and the other CPUs should be in a >> holding pen inside kgdb_cpu_enter(). >> >> The only time this is violated is when there is a timeout waiting >> for the other CPUs to report to the holding pen. > > It means that the race window is small but it is there. Do I get > it correctly, please? I don't think the size of the race window is particularly interesting; it does exist. It is more important to be clear that the circumstances when the race exists are when an attempt to stop-the-world has failed. So rather than close the race it might be more useful to make the race unreachable by bringing the CPU that has failed to report under control. > Thanks a lot for explanation. I was not sure how exactly this worked. > I only saw the games with kdb_printf_cpu in vkdb_printf(). Therefore > I expected that some parallelism was possible. Yes. I'm coming round to the point-of-view that if the code looks like it expected to run in parallel if should be correct for it to run in parallel! Otherwise its leads to broken thinking when changing things. >>> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c >>> index d5e397315473..db73e33811e7 100644 >>> --- a/kernel/printk/printk.c >>> +++ b/kernel/printk/printk.c >>> @@ -1941,7 +1941,9 @@ int vprintk_default(const char *fmt, va_list args) >>> int r; >>> >>> #ifdef CONFIG_KGDB_KDB >>> - if (unlikely(kdb_trap_printk)) { >>> + /* Allow to pass printk() to kdb but avoid a recursion. */ >>> + if (unlikely(kdb_trap_printk && >>> + kdb_printf_cpu != smp_processor_id())) { >> >> Firstly, why !=? Having re-read things, this comment was spurious. >> Secondly, if kdb_trap_printk is set and the "wrong" CPU calls printk >> then we have an opportunity to trap a rouge processor in the holding >> pen meaning the test should probably be part of vkdb_printk() >> anyway. > > I agree that it is confusing: > > On one hand, vkdb_printf() explicitly allows recursion on the same > CPU. See the handling of kdb_printf_lock before the 1st patch from > this series. Also it mentioned by the comment: > > /* Serialize kdb_printf if multiple cpus try to write at once. > * But if any cpu goes recursive in kdb, just print the output, > * even if it is interleaved with any other text. > */ > > On the other hand. The lines > > saved_trap_printk = kdb_trap_printk; > kdb_trap_printk = 0; > > means that someone wanted to explicitly disable recursion via > the generic printk(). This is the reason why I used "!=" and why > I added this check into vprintk_default(). vkdb_printf()'s rough job is to: 1. Issue the string to the kdb console (not the same as kernel console) 2. Store the string in the kernel logs. Of the above, #2 is achieved by temporarily setting the console log level to 0, calling printk() and restoring the log level. This is why vkdb_printf() needs the silly dance to avoid the recursion. > By other words, we allow recursion caused by kdb internal messages > that are printed directly by kdb_printf(). But we disable recursion > caused by all other messages that are printed using the generic > printk(). This patch keeps the logic. It might make some sense. > But it is hard for me to judge. Yes, I think it makes sense. Right now I'm toying with the idea of using the printk_func code for printk() interception. Firstly printk_func is per-cpu which removes a huge chain of complex reasoning (for a case that doesn't really exist). Secondly we could call the saved_printk_func directly making recursion impossible. Sadly, if we accept that a rouge CPU might still be calling printk() then the approach above just opens up a race on the console log level... Sigh... Daniel. > Best Regards, > Petr >