Received: by 10.223.176.46 with SMTP id f43csp2172387wra; Sun, 21 Jan 2018 13:05:33 -0800 (PST) X-Google-Smtp-Source: AH8x227UFwmfG3y/r6pTx3OnsCIKNCWuIjQRGsTlldDnI5uSkfYonS+PIUCTD//HdSeqfzb5OMru X-Received: by 2002:a17:902:bd47:: with SMTP id b7-v6mr2368057plx.300.1516568733822; Sun, 21 Jan 2018 13:05:33 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516568733; cv=none; d=google.com; s=arc-20160816; b=YhMLooVFDGYJwZI9OwlQ6FAGxbrN2l7KX8u8Duo2Wb5kpj7aZMZwRkCl9DWdhXQ1zj CXdwJ/m2yj6sLowTx2CAckfkqhLZ9FMk5WQbYY9TYHOUfCcY7pJ29kBwDK4Ag9P6uPi/ z0QZE51iAL7D6/9ext9wMXfu+nBWzLXy8L6+3BVzRsHx/ORiSRkzZql2Q81zOsSUyqWU TyXH+98wOh6mPR6H54FsBH/haPIA2+TAux8SMdiPwNxP7jdhYto7+pQ1U8Vt2RArq22+ bYb0RGpqmklzxZbw3Jsq9OBNm5G6+uIx8+gtnHThrofRRw1twvDQC1FapGJArbMRUpK6 dOUA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :dmarc-filter:arc-authentication-results; bh=BLI8GkooXMWcrh4qILarene7Xq+m+DAeGhZJTXBelWs=; b=aXVpldGfj30HLg6O05zkOLAQOwYCX4JHwGuv4cAlx7C/1Oen7LxKTytAU8SUs4Dh7I 82z/bk5ZPN+1Co7RyZu/qrwase/vdyPPpKJ+Rcf4/ijDGminSD7o6azEbS1WVKw+bEoy 9MrgfURNd8k9BAfAYKj1K3mMkIwcKNQr/Ux7sN2d2gLg0TV6mIL+8KyXunu+NzVf5RX/ LJtukqIuEJ67j5p4z+HwYknjj3AUB+PRsoyj2clzwcrkqvJ57OgnFTsQO9jRIxEGVbFP Jk4FM8fWm2jV0ISq3yzpt4dl96Dir2LJJ9lvNrnuuumYHHLhvZTP0bWtfMNJYT0DpchI B56A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p10-v6si2767397plo.810.2018.01.21.13.05.19; Sun, 21 Jan 2018 13:05:33 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751111AbeAUVE4 (ORCPT + 99 others); Sun, 21 Jan 2018 16:04:56 -0500 Received: from mail.kernel.org ([198.145.29.99]:58818 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751026AbeAUVEz (ORCPT ); Sun, 21 Jan 2018 16:04:55 -0500 Received: from gandalf.local.home (cpe-172-100-180-131.stny.res.rr.com [172.100.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id A3DBF214AB; Sun, 21 Jan 2018 21:04:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A3DBF214AB Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=rostedt@goodmis.org Date: Sun, 21 Jan 2018 16:04:41 -0500 From: Steven Rostedt To: Sergey Senozhatsky Cc: Sergey Senozhatsky , Tejun Heo , Petr Mladek , akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang , Dave Hansen , Johannes Weiner , Mel Gorman , Michal Hocko , Vlastimil Babka , Peter Zijlstra , Linus Torvalds , Jan Kara , Mathieu Desnoyers , Tetsuo Handa , rostedt@home.goodmis.org, Byungchul Park , Pavel Machek , linux-kernel@vger.kernel.org Subject: Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup Message-ID: <20180121160441.7ea4b6d9@gandalf.local.home> In-Reply-To: <20180121141521.GA429@tigerII.localdomain> References: <20180111203057.5b1a8f8f@gandalf.local.home> <20180111215547.2f66a23a@gandalf.local.home> <20180116194456.GS3460072@devbig577.frc2.facebook.com> <20180117091208.ezvuhumnsarz5thh@pathway.suse.cz> <20180117151509.GT3460072@devbig577.frc2.facebook.com> <20180117121251.7283a56e@gandalf.local.home> <20180117134201.0a9cbbbf@gandalf.local.home> <20180119132052.02b89626@gandalf.local.home> <20180120071402.GB8371@jagdpanzerIV> <20180120104931.1942483e@gandalf.local.home> <20180121141521.GA429@tigerII.localdomain> X-Mailer: Claws Mail 3.14.0 (GTK+ 2.24.31; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, 21 Jan 2018 23:15:21 +0900 Sergey Senozhatsky wrote: > so.... fix the console drivers ;) Totally agree! > > > > > just kidding. ok... Darn it! ;-) > the problem is that we flush printk_safe right when console_unlock() printing > loop enables local IRQs via printk_safe_exit_irqrestore() [given that IRQs > were enabled in the first place when the CPU went to console_unlock()]. > this forces that CPU to loop in console_unlock() as long as we have > printk-s coming from call_console_drivers(). but we probably can postpone > printk_safe flush. basically, we can declare a new rule - we don't flush > printk_safe buffer as long as console_sem is locked. because this is how > that printing CPU stuck in the console_unlock() printing loop. printk_safe > buffer is very important when it comes to storing a non-repetitive stuff, like > a lockdep splat, which is a single shot event. but the more repetitive the > message is, like millions of similar kmalloc() dump_stack()-s over and over > again, the less value in it. we should have printk_safe buffer big enough for > important info, like a lockdep splat, but millions of similar kmalloc() > messages are pretty invaluable - one is already enough, we can drop the rest. > and we should not flush new messages while there is a CPU looping in > console_unlock(), because it already has messages to print, which were > log_store()-ed the normal way. The above is really hard to read without any capitalization. Everything seems to be a run-on sentence and gives me a head ache. So you lost me there. > > this is where the "postpone thing" jumps in. so how do we postpone printk_safe > flush. > > we can't console_trylock()/console_unlock() in printk_safe flush code. > but there is a `console_locked' flag and is_console_locked() function which > tell us if the console_sem is locked. as long as we are in console_unlock() > printing loop that flag is set, even if we enabled local IRQs and printk_safe > flush work arrived. so now printk_safe flush does extra check and does > not flush printk_safe buffer content as long as someone is currently > printing or soon will start printing. but we need to take extra step and > to re-queue flush on CPUs that did postpone it [console_unlock() can > reschedule]. so now we flush only when printing CPU printed all pending > logbuf messages, hit the "console_seq == log_next_seq" and up() > console_sem. this sets a boundary -- no matter how many times during the > current printing loop we called console drivers and how many times those > drivers caused printk recursion, we will flush only SAFE_LOG_BUF_LEN chars. Another big paragraph with no capitals (besides macros and CPU ;-) I guess this is what it is like when people listen to me talk too fast. > > > IOW, what we have now, looks like this: > > a) printk_safe is for important stuff, we don't guarantee that a flood > of messages will be preserved. > > b) we extend the previously existing "will flush messages later on from > a safer context" and now we also consider console_unlock() printing loop > as unsafe context. so the unsafe context it's not only the one that can > deadlock, but also the one that can lockup CPU in a printing loop because > of recursive printk messages. Sure. > > > so this > > printk > console_unlock > { > for (;;) { > call_console_drivers > net_console > printk > printk_save -> irq_work queue > > IRQ work > prink_safe_flush > printk_deferred -> log_store() > iret > } > up(); > } > > > // which can never break out, because we can always append new messages > // from prink_safe_flush. > > becomes this > > printk > console_unlock > { > for (;;) { > call_console_drivers > net_console > printk > printk_save -> irq_work queue > > } > up(); > > IRQ work > prink_safe_flush > printk_deferred -> log_store() > iret > } But we do eventually send this data out to the consoles, and if the consoles cause more printks, wouldn't this still never end? > > > > something completely untested, sketchy and ugly. > > --- > > kernel/printk/internal.h | 2 ++ > kernel/printk/printk.c | 1 + > kernel/printk/printk_safe.c | 37 +++++++++++++++++++++++++++++++++++-- > 3 files changed, 38 insertions(+), 2 deletions(-) > > diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h > index 2a7d04049af4..e85517818a49 100644 > --- a/kernel/printk/internal.h > +++ b/kernel/printk/internal.h > @@ -30,6 +30,8 @@ __printf(1, 0) int vprintk_func(const char *fmt, va_list args); > void __printk_safe_enter(void); > void __printk_safe_exit(void); > > +void printk_safe_requeue_flushing(void); > + > #define printk_safe_enter_irqsave(flags) \ > do { \ > local_irq_save(flags); \ > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > index 9cb943c90d98..7aca23e8d7b2 100644 > --- a/kernel/printk/printk.c > +++ b/kernel/printk/printk.c > @@ -2428,6 +2428,7 @@ void console_unlock(void) > raw_spin_lock(&logbuf_lock); > retry = console_seq != log_next_seq; > raw_spin_unlock(&logbuf_lock); > + printk_safe_requeue_flushing(); > printk_safe_exit_irqrestore(flags); > > if (retry && console_trylock()) > diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c > index 3e3c2004bb23..45d5b292d7e1 100644 > --- a/kernel/printk/printk_safe.c > +++ b/kernel/printk/printk_safe.c > @@ -22,6 +22,7 @@ > #include > #include > #include > +#include > > #include "internal.h" > > @@ -51,6 +52,7 @@ struct printk_safe_seq_buf { > atomic_t message_lost; > struct irq_work work; /* IRQ work that flushes the buffer */ > unsigned char buffer[SAFE_LOG_BUF_LEN]; > + bool need_requeue; > }; > > static DEFINE_PER_CPU(struct printk_safe_seq_buf, safe_print_seq); > @@ -196,6 +198,7 @@ static void __printk_safe_flush(struct irq_work *work) > size_t len; > int i; > > + s->need_requeue = false; > /* > * The lock has two functions. First, one reader has to flush all > * available message to make the lockless synchronization with > @@ -243,6 +246,36 @@ static void __printk_safe_flush(struct irq_work *work) > raw_spin_unlock_irqrestore(&read_lock, flags); > } > > +/* NMI buffers are always flushed */ > +static void flush_nmi_buffer(struct irq_work *work) > +{ > + __printk_safe_flush(work); > +} > + > +/* printk_safe buffers flushing, on the contrary, can be postponed */ > +static void flush_printk_safe_buffer(struct irq_work *work) > +{ > + struct printk_safe_seq_buf *s = > + container_of(work, struct printk_safe_seq_buf, work); > + > + if (is_console_locked()) { > + s->need_requeue = true; > + return; > + } > + > + __printk_safe_flush(work); > +} > + > +void printk_safe_requeue_flushing(void) > +{ > + int cpu; > + > + for_each_possible_cpu(cpu) { > + if (per_cpu(safe_print_seq, cpu).need_requeue) > + queue_flush_work(&per_cpu(safe_print_seq, cpu)); > + } > +} > + > /** > * printk_safe_flush - flush all per-cpu nmi buffers. > * > @@ -387,11 +420,11 @@ void __init printk_safe_init(void) > struct printk_safe_seq_buf *s; > > s = &per_cpu(safe_print_seq, cpu); > - init_irq_work(&s->work, __printk_safe_flush); > + init_irq_work(&s->work, flush_printk_safe_buffer); > > #ifdef CONFIG_PRINTK_NMI > s = &per_cpu(nmi_print_seq, cpu); > - init_irq_work(&s->work, __printk_safe_flush); > + init_irq_work(&s->work, flush_nmi_buffer); > #endif > } > > --- > > > > > > lose all of them now? then we can do a much simpler thing - steal one > > > bit from `printk_context' and use if for a new PRINTK_NOOP_CONTEXT, which > > > will be set around call_console_drivers(). vprintk_func() would redirect > > > printks to vprintk_noop(fmt, args), which will do nothing. > > > > Not sure what you mean here. Have some pseudo code to demonstrate with? > > sure, I meant that if we want to disable printk recursion from > call_console_drivers(), then we can add another printk_safe section, say > printk_noop_begin()/printk_noop_end(), which would set a PRINTK_NOOP > bit of `printk_context', so when we have printk() under PRINTK_NOOP > then vprintk_func() goes to a special vprintk_noop(fmt, args), which > simply drops the message [does not store any in the per-cpu printk > safe buffer, so we don't flush it and don't add new messages to the > logbuf]. and we annotate call_console_drivers() as a pintk_noop > function. but that a no-brainer and I'd prefer to have another solution. > Another big paragraph without caps, but I figured it out. I say we try that solution and see if it fixes the current issues. Because right now, the bug I see Tejun presented was if something in printk causes printks, it will start a printk bomb and lock up the system. The only reasonable answer I see to that is to throttle printk in such a case. -- Steve