Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933207AbeALCzx (ORCPT + 1 other); Thu, 11 Jan 2018 21:55:53 -0500 Received: from mail.kernel.org ([198.145.29.99]:51110 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932638AbeALCzw (ORCPT ); Thu, 11 Jan 2018 21:55:52 -0500 DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 75C5B21746 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=rostedt@goodmis.org Date: Thu, 11 Jan 2018 21:55:47 -0500 From: Steven Rostedt To: Sergey Senozhatsky Cc: Petr Mladek , Tejun Heo , Sergey Senozhatsky , akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang , Dave Hansen , Johannes Weiner , Mel Gorman , Michal Hocko , Vlastimil Babka , Peter Zijlstra , Linus Torvalds , Jan Kara , Mathieu Desnoyers , Tetsuo Handa , rostedt@home.goodmis.org, Byungchul Park , Pavel Machek , linux-kernel@vger.kernel.org Subject: Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup Message-ID: <20180111215547.2f66a23a@gandalf.local.home> In-Reply-To: <20180111203057.5b1a8f8f@gandalf.local.home> References: <20180110132418.7080-1-pmladek@suse.com> <20180110140547.GZ3668920@devbig577.frc2.facebook.com> <20180110130517.6ff91716@vmware.local.home> <20180111045817.GA494@jagdpanzerIV> <20180111093435.GA24497@linux.suse> <20180111103845.GB477@jagdpanzerIV> <20180111112908.50de440a@vmware.local.home> <20180111203057.5b1a8f8f@gandalf.local.home> X-Mailer: Claws Mail 3.14.0 (GTK+ 2.24.31; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Thu, 11 Jan 2018 20:30:57 -0500 Steven Rostedt wrote: > I have to say that your analysis here really does point out the benefit > of my patch. > > Today, printk() can print for a time of A * B, where, as you state > above: > > A is the amount of data to print in the worst case > B the time call_console_drivers() needs to print a single > char to all registered and enabled consoles > > In the worse case, the current approach is A is infinite. That is, > printk() never stops, as long as there is a printk happening on another > CPU before B can finish. A will keep growing. The call to printk() will > never return. The more CPUs you have, the more likely this will occur. > All it takes is a few CPUs doing periodic printks. If there is a slow > console, where the periodic printk on other CPUs occur quicker than the > first can finish, the first one will be stuck forever. Doesn't take > much to have this happen. > > With my patch, A is fixed to the size of the buffer. A single printk() > can never print more than that. If another CPU comes in and does a > printk, then it will take over the task of printing, and release the > first printk. In fact, below is a module I made (starting with Tejun's crazy stress test, then removing all the craziness). This simple module locks up the system without my patch. After applying my patch, the system runs fine. All I did was start off a work queue on each CPU, and each CPU does one printk() followed by a millisecond sleep. No 10,000 printks, nothing in an interrupt handler. Preemption is disabled while the printk happens, but that's normal. This is much closer to an OOM happening all over the system, where OOMs stack dumps are occurring on different CPUS. I ran this on a box with 4 CPUs and a serial console (so it has a slow console). Again, all I have is each CPU doing exactly ONE printk()! then sleeping for a full millisecond! It will cause a lot of output, and perhaps slow the system down. But it should not lock up the system. But without my patch, it does! Try it! Test it on a box, and it will lock up. Then add my patch and see what the results are. I think this speaks very loudly in favor of applying my patch. Again, the below module locks up my system immediately without my patch. With my patch, no problem. In fact, it's still running, while I wrote this email, and it hardly shows a slow down in the system. -- Steve #include #include #include #include #include #include static bool stop_testing; static void preempt_printk_workfn(struct work_struct *work) { while (!READ_ONCE(stop_testing)) { preempt_disable(); printk("%5d%-75s\n", smp_processor_id(), " XXX PREEMPT"); preempt_enable(); msleep(1); } } static struct work_struct __percpu *works; static void finish(void) { int cpu; WRITE_ONCE(stop_testing, true); for_each_online_cpu(cpu) flush_work(per_cpu_ptr(works, cpu)); free_percpu(works); } static int __init test_init(void) { int cpu; works = alloc_percpu(struct work_struct); if (!works) return -ENOMEM; /* * This is just a test module. This will break if you * do any CPU hot plugging between loading and * unloading the module. */ for_each_online_cpu(cpu) { struct work_struct *work = per_cpu_ptr(works, cpu); INIT_WORK(work, &preempt_printk_workfn); schedule_work_on(cpu, work); } return 0; } static void __exit test_exit(void) { finish(); } module_init(test_init); module_exit(test_exit); MODULE_LICENSE("GPL");