Received: by 10.223.176.46 with SMTP id f43csp1868017wra; Sun, 21 Jan 2018 06:16:07 -0800 (PST) X-Google-Smtp-Source: AH8x226/rCv2wrAuaAmChmVZ1B0k5KLwZIff0u+fn4Q3jAf+FrVKMn6+mM8oNdiAj7eNooxtPSs3 X-Received: by 10.99.104.131 with SMTP id d125mr1158961pgc.125.1516544167785; Sun, 21 Jan 2018 06:16:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516544167; cv=none; d=google.com; s=arc-20160816; b=rbRMV9yBnCJV3B8uxHh7MyGfxuhhQDmrgRn3XY2/uX7116o2/fbUtYLv0ihuPtjECC YD2/5lDQ4vEDQRg0kJFA/Cx0+GJuBUOlsnlF6HT3kJwSzoDl+AV73aa+QVadVss67FQS Guc9sVhrg79gypXfR4z+NnzKoHx+hyaoXxo/WrZmYsA6Fv761lEzT72gsVijUEtLzPDk NEToWEjwmEZFwWYRhrH9LbkWIICKfSHfEC8gnzrIdMcEFIbnxlyifyOwMFblc6s9w/ut IjmGOcDhIK2Lkiua76P3oemkhTd/zh9cmOz3kOD74nvN+glNWdmZCV/WzwfLlocm3z04 1d6A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=ZossGMs/65hCV0criHaEtRJr1+XJw/cOnW6yYhFlm8c=; b=IJ+XdGbpHNSrgGmVq3LJpoduO43tGtWAMTsl/pyQ/YAOAQkDiiU9b9Jg2kbhYHtMvW +uVFCMN3xjqiEiUliiD11hR/csj3EVR4PSjj+ntZdCvVo2njYSaZdhZ7noh6Y70sLhl1 9iuySyL0E8XOKiS4ZwxnaDV8vtCNFLm7ZU8hTDcjdp5M3cQ6XOGEOPXf0leCJI8ox7Vg 1xIbOai9w+22o6gR6aLMqRl9pVT99TDpLKRYSWe85FzPrHY3dZac3gvs6bupDsUu8VWr 6JhQkwc6TgID/avwU4gUBJU0AQwfQsrckXvzMpSuAmZJZJLFNWf242MkpnrwaxM+zKOZ pdZw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=P+RXBF/P; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q9si13593342pfk.189.2018.01.21.06.15.52; Sun, 21 Jan 2018 06:16:07 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=P+RXBF/P; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751291AbeAUOP3 (ORCPT + 99 others); Sun, 21 Jan 2018 09:15:29 -0500 Received: from mail-pf0-f193.google.com ([209.85.192.193]:43579 "EHLO mail-pf0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750982AbeAUOP1 (ORCPT ); Sun, 21 Jan 2018 09:15:27 -0500 Received: by mail-pf0-f193.google.com with SMTP id y26so5028114pfi.10 for ; Sun, 21 Jan 2018 06:15:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=ZossGMs/65hCV0criHaEtRJr1+XJw/cOnW6yYhFlm8c=; b=P+RXBF/PHmq/38rfO6WHyx8I7yTd9/IjveNvqgZySjLcH7w5/U6+jljUGE1aNI3BBO IRF6mLCogBXVBIzP0DsleRPXYuI+Ho2xIIkQ5sTCxBiO4RQpmB5cgIjME+zMjBTa3GzU MYrh3NhHrkWWSahza6v7vDOchHSBfZkc12mYPdfiW4UKGNUkSY2l4IatKMe//ZbSOIU7 ddbs/MMw+N/SZqeD8tHhAw0WaxHRIIXBLWGmdoA04GL4lJeAFiF19yaPMjCaGkeql0na jx5wZutyeGv8B/eHqJBfay+SvYCwdAXqbJ35yExaCbZcDlQQZPH8dUQne4f1U7UQZW7A 7m5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=ZossGMs/65hCV0criHaEtRJr1+XJw/cOnW6yYhFlm8c=; b=s9TsGp/6sb++YAcjA5LZieW50dcvXebh1P9PqguZrOol1BkL7FjuW9iQB7VyX3kbVj rTysWNc9kpP0r0ArVz4wNxZ2hMRSDHCan+7MbUMLgPlIyfjTJGV+raoEkHMW0lb/8nbr RiT8vvXxrne1eDfOSfLsBh1NL0ySTOuSExzhItXO8/HJpC0m4jC3y7MhSz1KsX0Pv7P8 SF0EeS76zoQUuBJNnoR6ktPzqIEQ+Z/b1s175/gLKSAkNivmrZ8fYJiygjxafekvh6hi s8/N3Sr3isZM3OsR9FLxr/OA/mALWxtMmo73eKuCHezRS7Gw7M7tyzd8eepvGPjtXCnS DQRA== X-Gm-Message-State: AKwxytdU28Ol3mx8UlF/lHILLastMOmfkdJWRd/CGQ6t3OHWvSDYTxqH GsgTUhJTNrCPv4Afwj2b8tU= X-Received: by 10.98.234.4 with SMTP id t4mr5413477pfh.74.1516544126281; Sun, 21 Jan 2018 06:15:26 -0800 (PST) Received: from localhost ([121.137.63.184]) by smtp.gmail.com with ESMTPSA id t7sm21942415pgu.77.2018.01.21.06.15.24 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sun, 21 Jan 2018 06:15:24 -0800 (PST) Date: Sun, 21 Jan 2018 23:15:21 +0900 From: Sergey Senozhatsky To: Steven Rostedt Cc: Sergey Senozhatsky , Tejun Heo , Petr Mladek , Sergey Senozhatsky , akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang , Dave Hansen , Johannes Weiner , Mel Gorman , Michal Hocko , Vlastimil Babka , Peter Zijlstra , Linus Torvalds , Jan Kara , Mathieu Desnoyers , Tetsuo Handa , rostedt@home.goodmis.org, Byungchul Park , Pavel Machek , linux-kernel@vger.kernel.org Subject: Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup Message-ID: <20180121141521.GA429@tigerII.localdomain> References: <20180111203057.5b1a8f8f@gandalf.local.home> <20180111215547.2f66a23a@gandalf.local.home> <20180116194456.GS3460072@devbig577.frc2.facebook.com> <20180117091208.ezvuhumnsarz5thh@pathway.suse.cz> <20180117151509.GT3460072@devbig577.frc2.facebook.com> <20180117121251.7283a56e@gandalf.local.home> <20180117134201.0a9cbbbf@gandalf.local.home> <20180119132052.02b89626@gandalf.local.home> <20180120071402.GB8371@jagdpanzerIV> <20180120104931.1942483e@gandalf.local.home> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180120104931.1942483e@gandalf.local.home> User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On (01/20/18 10:49), Steven Rostedt wrote: [..] > > printks from console_unlock()->call_console_drivers() are redirected > > to printk_safe buffer. we need irq_work on that CPU to flush its > > printk_safe buffer. > > So is the issue that we keep triggering this irq work then? Then this > solution does seem to be one that would work. Because after x amount of > recursive printks (printk called by printk) it would just stop printing > them, and end the irq work. > > Perhaps what Tejun is seeing is: > > printk() > net_console() > printk() --> redirected to irq work > > > printk > net_console() > printk() --> redirected to another irq work > > and so on and so on. it's a bit trickier than that, I think. we have printk recursion from console drivers. it's redirected to printk_safe and we queue an IRQ work to flush the buffer printk console_unlock call_console_drivers net_console printk printk_save -> irq_work queue now console_unlock() enables local IRQs, we have the printk_safe flush. but printk_safe flush does not call into the console_unlock(), it uses printk_deferred() version of printk IRQ work prink_safe_flush printk_deferred -> irq_work queue so we schedule another IRQ work (deferred printk work), which eventually tries to lock console_sem IRQ work wake_up_klogd_work_func() if (console_trylock()) console_unlock() if it succeeds then it goes to console_unlock(), where console driver can cause another printk recursion. but, once again, it will be redirected to printk_safe buffer first. if it fails then we have either the original CPU to print out those irq_work messages, which is sort of bad, or another CPU which already acquired the console_sem and will print out. > This solution would need to be tweaked to add a timer to allow only so > many nested printks in a given time. Otherwise it too would be an issue: [..] > > how are we going to distinguish between lockdep splats, for instance, > > or WARNs from call_console_drivers() -> foo_write(), which are valuable, > > and kmalloc() print outs, which might be less valuable? are we going to > > The problem is that printk causing more printks is extremely dangerous, > and ANY printk that is caused by a printk is of equal value, whether it > is a console driver running out of memory or a lockdep splat. And > the chances of having two hit at the same time is extremely low. so.... fix the console drivers ;) just kidding. ok... the problem is that we flush printk_safe right when console_unlock() printing loop enables local IRQs via printk_safe_exit_irqrestore() [given that IRQs were enabled in the first place when the CPU went to console_unlock()]. this forces that CPU to loop in console_unlock() as long as we have printk-s coming from call_console_drivers(). but we probably can postpone printk_safe flush. basically, we can declare a new rule - we don't flush printk_safe buffer as long as console_sem is locked. because this is how that printing CPU stuck in the console_unlock() printing loop. printk_safe buffer is very important when it comes to storing a non-repetitive stuff, like a lockdep splat, which is a single shot event. but the more repetitive the message is, like millions of similar kmalloc() dump_stack()-s over and over again, the less value in it. we should have printk_safe buffer big enough for important info, like a lockdep splat, but millions of similar kmalloc() messages are pretty invaluable - one is already enough, we can drop the rest. and we should not flush new messages while there is a CPU looping in console_unlock(), because it already has messages to print, which were log_store()-ed the normal way. this is where the "postpone thing" jumps in. so how do we postpone printk_safe flush. we can't console_trylock()/console_unlock() in printk_safe flush code. but there is a `console_locked' flag and is_console_locked() function which tell us if the console_sem is locked. as long as we are in console_unlock() printing loop that flag is set, even if we enabled local IRQs and printk_safe flush work arrived. so now printk_safe flush does extra check and does not flush printk_safe buffer content as long as someone is currently printing or soon will start printing. but we need to take extra step and to re-queue flush on CPUs that did postpone it [console_unlock() can reschedule]. so now we flush only when printing CPU printed all pending logbuf messages, hit the "console_seq == log_next_seq" and up() console_sem. this sets a boundary -- no matter how many times during the current printing loop we called console drivers and how many times those drivers caused printk recursion, we will flush only SAFE_LOG_BUF_LEN chars. IOW, what we have now, looks like this: a) printk_safe is for important stuff, we don't guarantee that a flood of messages will be preserved. b) we extend the previously existing "will flush messages later on from a safer context" and now we also consider console_unlock() printing loop as unsafe context. so the unsafe context it's not only the one that can deadlock, but also the one that can lockup CPU in a printing loop because of recursive printk messages. so this printk console_unlock { for (;;) { call_console_drivers net_console printk printk_save -> irq_work queue IRQ work prink_safe_flush printk_deferred -> log_store() iret } up(); } // which can never break out, because we can always append new messages // from prink_safe_flush. becomes this printk console_unlock { for (;;) { call_console_drivers net_console printk printk_save -> irq_work queue } up(); IRQ work prink_safe_flush printk_deferred -> log_store() iret } something completely untested, sketchy and ugly. --- kernel/printk/internal.h | 2 ++ kernel/printk/printk.c | 1 + kernel/printk/printk_safe.c | 37 +++++++++++++++++++++++++++++++++++-- 3 files changed, 38 insertions(+), 2 deletions(-) diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h index 2a7d04049af4..e85517818a49 100644 --- a/kernel/printk/internal.h +++ b/kernel/printk/internal.h @@ -30,6 +30,8 @@ __printf(1, 0) int vprintk_func(const char *fmt, va_list args); void __printk_safe_enter(void); void __printk_safe_exit(void); +void printk_safe_requeue_flushing(void); + #define printk_safe_enter_irqsave(flags) \ do { \ local_irq_save(flags); \ diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 9cb943c90d98..7aca23e8d7b2 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -2428,6 +2428,7 @@ void console_unlock(void) raw_spin_lock(&logbuf_lock); retry = console_seq != log_next_seq; raw_spin_unlock(&logbuf_lock); + printk_safe_requeue_flushing(); printk_safe_exit_irqrestore(flags); if (retry && console_trylock()) diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c index 3e3c2004bb23..45d5b292d7e1 100644 --- a/kernel/printk/printk_safe.c +++ b/kernel/printk/printk_safe.c @@ -22,6 +22,7 @@ #include #include #include +#include #include "internal.h" @@ -51,6 +52,7 @@ struct printk_safe_seq_buf { atomic_t message_lost; struct irq_work work; /* IRQ work that flushes the buffer */ unsigned char buffer[SAFE_LOG_BUF_LEN]; + bool need_requeue; }; static DEFINE_PER_CPU(struct printk_safe_seq_buf, safe_print_seq); @@ -196,6 +198,7 @@ static void __printk_safe_flush(struct irq_work *work) size_t len; int i; + s->need_requeue = false; /* * The lock has two functions. First, one reader has to flush all * available message to make the lockless synchronization with @@ -243,6 +246,36 @@ static void __printk_safe_flush(struct irq_work *work) raw_spin_unlock_irqrestore(&read_lock, flags); } +/* NMI buffers are always flushed */ +static void flush_nmi_buffer(struct irq_work *work) +{ + __printk_safe_flush(work); +} + +/* printk_safe buffers flushing, on the contrary, can be postponed */ +static void flush_printk_safe_buffer(struct irq_work *work) +{ + struct printk_safe_seq_buf *s = + container_of(work, struct printk_safe_seq_buf, work); + + if (is_console_locked()) { + s->need_requeue = true; + return; + } + + __printk_safe_flush(work); +} + +void printk_safe_requeue_flushing(void) +{ + int cpu; + + for_each_possible_cpu(cpu) { + if (per_cpu(safe_print_seq, cpu).need_requeue) + queue_flush_work(&per_cpu(safe_print_seq, cpu)); + } +} + /** * printk_safe_flush - flush all per-cpu nmi buffers. * @@ -387,11 +420,11 @@ void __init printk_safe_init(void) struct printk_safe_seq_buf *s; s = &per_cpu(safe_print_seq, cpu); - init_irq_work(&s->work, __printk_safe_flush); + init_irq_work(&s->work, flush_printk_safe_buffer); #ifdef CONFIG_PRINTK_NMI s = &per_cpu(nmi_print_seq, cpu); - init_irq_work(&s->work, __printk_safe_flush); + init_irq_work(&s->work, flush_nmi_buffer); #endif } --- > > lose all of them now? then we can do a much simpler thing - steal one > > bit from `printk_context' and use if for a new PRINTK_NOOP_CONTEXT, which > > will be set around call_console_drivers(). vprintk_func() would redirect > > printks to vprintk_noop(fmt, args), which will do nothing. > > Not sure what you mean here. Have some pseudo code to demonstrate with? sure, I meant that if we want to disable printk recursion from call_console_drivers(), then we can add another printk_safe section, say printk_noop_begin()/printk_noop_end(), which would set a PRINTK_NOOP bit of `printk_context', so when we have printk() under PRINTK_NOOP then vprintk_func() goes to a special vprintk_noop(fmt, args), which simply drops the message [does not store any in the per-cpu printk safe buffer, so we don't flush it and don't add new messages to the logbuf]. and we annotate call_console_drivers() as a pintk_noop function. but that a no-brainer and I'd prefer to have another solution. -ss