Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751692AbdHON3H (ORCPT ); Tue, 15 Aug 2017 09:29:07 -0400 Received: from mail.kernel.org ([198.145.29.99]:55910 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751082AbdHON3F (ORCPT ); Tue, 15 Aug 2017 09:29:05 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1BFB122C8D Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=rostedt@goodmis.org Date: Tue, 15 Aug 2017 09:29:02 -0400 From: Steven Rostedt To: Daniel Lezcano Cc: paulmck@linux.vnet.ibm.com, Pratyush Anand , =?UTF-8?B?6rmA64+Z7ZiE?= , john.stultz@linaro.org, linux-kernel@vger.kernel.org Subject: Re: RCU stall when using function_graph Message-ID: <20170815092902.252f5e83@gandalf.local.home> In-Reply-To: <208e981d-40ec-54fa-6293-5b8e6fe10a84@linaro.org> References: <11d179df-d8a9-5d3e-3bc4-080df464e85d@linaro.org> <20170803124421.GP3730@linux.vnet.ibm.com> <20170803143801.GE1919@mai> <20170806170220.GQ3730@linux.vnet.ibm.com> <20170809125804.GT3730@linux.vnet.ibm.com> <20170809144033.GU3730@linux.vnet.ibm.com> <208e981d-40ec-54fa-6293-5b8e6fe10a84@linaro.org> X-Mailer: Claws Mail 3.14.0 (GTK+ 2.24.31; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1281 Lines: 37 [ I'm back from vacation! ] On Wed, 9 Aug 2017 17:51:33 +0200 Daniel Lezcano wrote: > Well, may be the instruction pointer thing is not a good idea. > > I learnt from this experience, an overloaded kernel with a lot of > interrupts can hang the console and issue RCU stall. > > However, someone else can face the same situation. Even if he reads the > RCU/stallwarn.txt documentation, it will be hard to figure out the issue. > > A message telling the grace period can't be reached because we are too > busy processing interrupts would have helped but I understand it is not > easy to implement. What if the stall code triggered an irqwork first? The irqwork would trigger as soon as interrupts were enabled again (or at the next tick, depending on the arch), and then it would know that RCU stalled due to an irq storm if the irqwork is being hit. -- Steve > > Perhaps, adding a new bullet in the documentation can help: > > "If the interrupt processing time is longer than the interval between > each interrupt, the CPU will keep processing the interrupts without > allowing the RCU's grace period kthread. This situation can happen if > there is a highly rated number of interrupts and the function_graph > tracer is enabled". > > > >