by Carsten Emde

[permalink] [raw]

Subject: Re: [PATCH 0/8] rcu: Ensure rcu read site is deadlock-immunity

Hi Paul,

>> Although all articles declare that rcu read site is deadlock-immunity.
>> It is not true for rcu-preempt, it will be deadlock if rcu read site
>> overlaps with scheduler lock.
>
> The real rule is that if the scheduler does its outermost rcu_read_unlock()
> with one of those locks held, it has to have avoided enabling preemption
> through the entire RCU read-side critical section.
>
> That said, avoiding the need for this rule would be a good thing.
>
> How did you test this? The rcutorture tests will not exercise this.
> (Intentionally so, given that it can deadlock!)
>
>> ec433f0c, 10f39bb1 and 016a8d5b just partially solve it. But rcu read site
>> is still not deadlock-immunity. And the problem described in 016a8d5b
>> is still existed(rcu_read_unlock_special() calls wake_up).
>>
>> The problem is fixed in patch5.
>
> This is going to require some serious review and testing. One requirement
> is that RCU priority boosting not persist significantly beyond the
> re-enabling of interrupts associated with the irq-disabled lock. To do
> otherwise breaks RCU priority boosting. At first glance, the added
> set_need_resched() might handle this, but that is part of the review
> and testing required.
>
> Steven, would you and Carsten be willing to try this and see if it
> helps with the issues you are seeing in -rt? (My guess is "no", since
> a deadlock would block forever rather than waking up after a couple
> thousand seconds, but worth a try.)
Your guess was correct, applying this patch doesn't heal the
NO_HZ_FULL+PREEMPT_RT_FULL 3.10.4 based system; it still is hanging at
-> synchronize_rcu -> wait_rcu_gp.

-Carsten.

2013-08-08 00:36:54

On Wed, Aug 07, 2013 at 06:24:58PM +0800, Lai Jiangshan wrote:
> After patch 10f39bb1, "special & RCU_READ_UNLOCK_BLOCKED" can't be true
> in irq nor softirq.(due to RCU_READ_UNLOCK_BLOCKED can only be set
> when preemption)
>
> Signed-off-by: Lai Jiangshan <[email protected]>
> ---
> kernel/rcutree_plugin.h | 6 ------
> 1 files changed, 0 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
> index 8fd947e..54f7e45 100644
> --- a/kernel/rcutree_plugin.h
> +++ b/kernel/rcutree_plugin.h
> @@ -361,12 +361,6 @@ void rcu_read_unlock_special(struct task_struct *t)
> rcu_preempt_qs(smp_processor_id());
> }
>
> - /* Hardware IRQ handlers cannot block. */
> - if (in_irq() || in_serving_softirq()) {
> - local_irq_restore(flags);
> - return;
> - }
> -

Good point, it is time to relax the redundant checking. Paranoid that
I am, I took an intermediate position, wrapping a WARN_ON_ONCE() around
the check as follows:

if (WARN_ON_ONCE(in_irq() || in_serving_softirq())) {
local_irq_restore(flags);
return;
}

If this warning never triggers over a period of some time, we can remove
it entirely.

I have queued this for 3.14 with your Signed-off-by. Please let me know
if you have any objections.

Thanx, Paul

> /* Clean up if blocked during RCU read-side critical section. */
> if (special & RCU_READ_UNLOCK_BLOCKED) {
> t->rcu_read_unlock_special &= ~RCU_READ_UNLOCK_BLOCKED;
> --
> 1.7.4.4
>