Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752216AbbHSPin (ORCPT ); Wed, 19 Aug 2015 11:38:43 -0400 Received: from mx2.suse.de ([195.135.220.15]:52110 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752183AbbHSPik (ORCPT ); Wed, 19 Aug 2015 11:38:40 -0400 From: Jan Kara To: Andrew Morton Cc: LKML , pmladek@suse.com, rostedt@goodmis.org, Gavin Hu , KY Srinivasan , Jan Kara Subject: [PATCH 0/4] printk: Softlockup avoidance Date: Wed, 19 Aug 2015 17:38:27 +0200 Message-Id: <1439998711-7013-1-git-send-email-jack@suse.com> X-Mailer: git-send-email 2.1.4 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2670 Lines: 55 From: Jan Kara Hello, since lately there were several attempts at dealing with softlockups due to heavy printk traffic [1] [2] and I've been also privately pinged by couple of people about the state of the patch set, I've decided to respin the patch set. To remind the original problem: Currently, console_unlock() prints messages from kernel printk buffer to console while the buffer is non-empty. When serial console is attached, printing is slow and thus other CPUs in the system have plenty of time to append new messages to the buffer while one CPU is printing. Thus the CPU can spend unbounded amount of time doing printing in console_unlock(). This is especially serious when printk() gets called under some critical spinlock or with interrupts disabled. In practice users have observed a CPU can spend tens of seconds printing in console_unlock() (usually during boot when hundreds of SCSI devices are discovered) resulting in RCU stalls (CPU doing printing doesn't reach quiescent state for a long time), softlockup reports (IPIs for the printing CPU don't get served and thus other CPUs are spinning waiting for the printing CPU to process IPIs), and eventually a machine death (as messages from stalls and lockups append to printk buffer faster than we are able to print). So these machines are unable to boot with serial console attached. Also during artificial stress testing SATA disk disappears from the system because its interrupts aren't served for too long. This series addresses the problem in the following way: If CPU has printed more that printk_offload (defaults to 1000) characters, it wakes up one of dedicated printk kthreads (we don't use workqueue because that has deadlock potential if printk was called from workqueue code). Once we find out kthread is spinning on a lock, we stop printing, drop console_sem, and let kthread continue printing. Since there are two printing kthreads, they will pass printing between them and thus no CPU gets hogged by printing. Changes since the last posting [3]: * I have replaced the state machine to pass printing and spinning on console_sem with a simple spinlock which makes the code somewhat easier to read and verify. * Some of the patches were merged so I dropped them. Honza [1] https://lkml.org/lkml/2015/7/8/215 [2] http://marc.info/?l=linux-kernel&m=143929238407816&w=2 [3] https://lkml.org/lkml/2014/3/17/68 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/