Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754523AbbFKOz5 (ORCPT ); Thu, 11 Jun 2015 10:55:57 -0400 Received: from cantor2.suse.de ([195.135.220.15]:57532 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752194AbbFKOz4 (ORCPT ); Thu, 11 Jun 2015 10:55:56 -0400 Date: Thu, 11 Jun 2015 16:55:47 +0200 From: Petr Mladek To: Peter Zijlstra Cc: Steven Rostedt , linux-kernel@vger.kernel.org, jkosina@suse.cz, paulmck@linux.vnet.ibm.com, Ingo Molnar , Thomas Gleixner Subject: Re: [RFC][PATCH] printk: Fixup the nmi printk mess Message-ID: <20150611145547.GA3234@dhcp128.suse.cz> References: <20150610125509.GO19282@twins.programming.kicks-ass.net> <20150610143155.GD9409@pathway.suse.cz> <20150610152917.GI3644@twins.programming.kicks-ass.net> <20150610192304.GY18673@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150610192304.GY18673@twins.programming.kicks-ass.net> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3950 Lines: 135 On Wed 2015-06-10 21:23:04, Peter Zijlstra wrote: > Below a version which does x-cpu stuff to allow the > trigger_all*_cpu_backtrace() initiator to flush buffers on behalf of > other CPUs. > > Compile tested only. The output from "echo l >/proc/sysrq-trigger" looks reasonable. It does not mix output from different CPUs. This is expected because of the @lock. The other observation is that it prints CPUs in _random_ order: 28, 24, 25, 1, 26, 2, 27, 3, ... The order is fine when I disable the irq_work. It means that irq_works are usually faster than printk_nmi_flush() => printk_nmi_flush() is not that useful => all the complexity with the three atomic variables (head, tail, read) did not bring much win. Anyway, I think that the current solution is racy and it cannot be fixed easily, see below. > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c > index c099b082cd02..99bfc1e3f32a 100644 > --- a/kernel/printk/printk.c > +++ b/kernel/printk/printk.c > @@ -1821,13 +1821,200 @@ int vprintk_default(const char *fmt, va_list args) > +static void __printk_nmi_flush(struct irq_work *work) > +{ > + static raw_spinlock_t lock = __RAW_SPIN_LOCK_INITIALIZER(lock); > + struct nmi_seq_buf *s = container_of(work, struct nmi_seq_buf, work); > + int len, head, size, i, last_i; > + > +again: > + /* > + * vprintk_nmi() truncate > + * > + * [S] head [S] head > + * wmb mb > + * [S] tail [S] read BTW, this is quite cryptic for me. Coffee did not help :-) * > + * therefore: > + */ > + i = atomic_read(&s->read); > + len = atomic_read(&s->tail); /* up to the tail is stable */ > + smp_rmb(); > + head = atomic_read(&s->head); > + > + /* > + * We cannot truncate tail because it could overwrite a store from > + * vprintk_nmi(), however vprintk_nmi() will always update tail to the > + * correct value. > + * > + * Therefore if head < tail, we missed a truncate and should do so now. > + */ > + if (head < len) > + len = 0; This is a bit confusing. It is a complicated way how to return on the next test. If I get this correctly. This might happen only inside _printk_nmi_flush() called on another CPU (from printk_nmi_flush()) when it interferes with the queued irq_work. The irq_work is faster and truncates the buffer. So, the return is fine after all because the irq_work printed everything. > + if (len - i <= 0) /* nothing to do */ > + return; > + /* > + * 'Consume' this chunk, avoids concurrent callers printing the same > + * stuff. > + */ > + if (atomic_cmpxchg(&s->read, i, len) != i) > + goto again; I think that this is racy: CPU0 CPU7 printk_nmi_flush() __printk_nmi_flush(for CPU7) i = atomic_read(&s->read); (100) len = atomic_read(&s->tail); (200) head = atomic_read(&s->head); (200) if (atomic_cmpxchg(&s->read, i, len) != i) we pass but we get interrupted or rescheduled on preemptive kernel another vprintk_nmi() leaves: head(400), tail(400) __printk_nmi_flush() in irq_work it prints string between 200-400 truncate buffer: head(0), read(0) another vprintk_nmi() returns: head(150), tail(150) print string between (100-200) => part of the new and part of old message and modifies @head and @read a wrong way I think that such races are hard to avoid without indexing the printed messages. But it would make the approach too complicated. I think that ordering CPUs is not worth it. I would go back to the first solution, add the @lock there, and double check races with seq_buf(). I stop here with commenting the code for now. Best Regards, Petr PS: I had two cups of coffee and hope that my comments are smaller fiasco than yesterday. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/