Subject: Re: RCU stall when using function_graph
To: Steven Rostedt <rostedt@goodmis.org>
Cc: paulmck@linux.vnet.ibm.com, Pratyush Anand <panand@redhat.com>,
        =?UTF-8?B?6rmA64+Z7ZiE?= <austinkernel.kim@gmail.com>,
        john.stultz@linaro.org, linux-kernel@vger.kernel.org
References: <11d179df-d8a9-5d3e-3bc4-080df464e85d@linaro.org>
 <20170803124421.GP3730@linux.vnet.ibm.com> <20170803143801.GE1919@mai>
 <CAOoBcBXo-=VYy2+TYEp=8+WSkOpDBr1x6uY=-r_GnTFKctXndQ@mail.gmail.com>
 <CAOoBcBVKpQkAVXji5qQu8r8GErqxpy9Ae9N97NhGpOQPgXudZg@mail.gmail.com>
 <CAOoBcBU00VRXmrNNEOjJHgXf9BimxKYOorJC0d3766mNdda=Bg@mail.gmail.com>
 <20170806170220.GQ3730@linux.vnet.ibm.com>
 <db4dc3c5-8a3d-9752-802e-ab509201e251@redhat.com>
 <20170809125804.GT3730@linux.vnet.ibm.com>
 <bf4f38d6-57b7-2281-db24-368d047956aa@linaro.org>
 <20170809144033.GU3730@linux.vnet.ibm.com>
 <208e981d-40ec-54fa-6293-5b8e6fe10a84@linaro.org>
 <20170815092902.252f5e83@gandalf.local.home>
From: Daniel Lezcano <daniel.lezcano@linaro.org>
Message-ID: <43e0a0bc-bdd4-6bd0-c970-336f2fb01c6d@linaro.org>
Date: Wed, 16 Aug 2017 10:42:15 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.2.1
MIME-Version: 1.0
In-Reply-To: <20170815092902.252f5e83@gandalf.local.home>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 8bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 1671
Lines: 48


Hi Steven,


On 15/08/2017 15:29, Steven Rostedt wrote:
> 
> [ I'm back from vacation! ]

Did you get the tapes? :)

> On Wed, 9 Aug 2017 17:51:33 +0200
> Daniel Lezcano <daniel.lezcano@linaro.org> wrote:
> 
>> Well, may be the instruction pointer thing is not a good idea.
>>
>> I learnt from this experience, an overloaded kernel with a lot of
>> interrupts can hang the console and issue RCU stall.
>>
>> However, someone else can face the same situation. Even if he reads the
>> RCU/stallwarn.txt documentation, it will be hard to figure out the issue.
>>
>> A message telling the grace period can't be reached because we are too
>> busy processing interrupts would have helped but I understand it is not
>> easy to implement.
> 
> What if the stall code triggered an irqwork first? The irqwork would
> trigger as soon as interrupts were enabled again (or at the next tick,
> depending on the arch), and then it would know that RCU stalled due to
> an irq storm if the irqwork is being hit.

Is that condition enough to tell the CPU is over utilized by the
interrupts handling?

And I'm wondering if it wouldn't make sense to have this detection in
the irq code. With or without the RCU stall warning kernel option set,
the irq framework will be warning about this situation. If the RCU stall
option is set, that will issue a second message. It will be easy to do
the connection between the first message and the second one, no ?


-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog