Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67;
DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org B95C260112
MIME-Version: 1.0
In-Reply-To: <tip-4bdced5c9a2922521e325896a7bbbf0132c94e56@git.kernel.org>
References: <20170424114732.1aac6dc4@gandalf.local.home> <tip-4bdced5c9a2922521e325896a7bbbf0132c94e56@git.kernel.org>
From:   Pavan Kondeti <pkondeti@codeaurora.org>
Date:   Fri, 19 Jan 2018 14:53:05 +0530
Message-ID: <CAEU1=PkiHO35Dzna8EQqNSKW1fr1y1zRQ5y66X117MG06sQtNA@mail.gmail.com>
Subject: Re: [tip:sched/core] sched/rt: Simplify the IPI based RT balancing logic
To:     williams@redhat.com, Steven Rostedt <rostedt@goodmis.org>,
        Ingo Molnar <mingo@kernel.org>,
        LKML <linux-kernel@vger.kernel.org>,
        Peter Zijlstra <peterz@infradead.org>,
        Thomas Gleixner <tglx@linutronix.de>, bristot@redhat.com,
        jkacur@redhat.com, efault@gmx.de, hpa@zytor.com,
        torvalds@linux-foundation.org, swood@redhat.com
Cc:     linux-tip-commits@vger.kernel.org,
        Pavan Kondeti <pkondeti@codeaurora.org>
Content-Type: text/plain; charset="UTF-8"
Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk

Hi Steven,

>  /* Called from hardirq context */
> -static void try_to_push_tasks(void *arg)
> +void rto_push_irq_work_func(struct irq_work *work)
>  {
> -       struct rt_rq *rt_rq = arg;
> -       struct rq *rq, *src_rq;
> -       int this_cpu;
> +       struct rq *rq;
>         int cpu;
>
> -       this_cpu = rt_rq->push_cpu;
> +       rq = this_rq();
>
> -       /* Paranoid check */
> -       BUG_ON(this_cpu != smp_processor_id());
> -
> -       rq = cpu_rq(this_cpu);
> -       src_rq = rq_of_rt_rq(rt_rq);
> -
> -again:
> +       /*
> +        * We do not need to grab the lock to check for has_pushable_tasks.
> +        * When it gets updated, a check is made if a push is possible.
> +        */
>         if (has_pushable_tasks(rq)) {
>                 raw_spin_lock(&rq->lock);
> -               push_rt_task(rq);
> +               push_rt_tasks(rq);
>                 raw_spin_unlock(&rq->lock);
>         }
>
> -       /* Pass the IPI to the next rt overloaded queue */
> -       raw_spin_lock(&rt_rq->push_lock);
> -       /*
> -        * If the source queue changed since the IPI went out,
> -        * we need to restart the search from that CPU again.
> -        */
> -       if (rt_rq->push_flags & RT_PUSH_IPI_RESTART) {
> -               rt_rq->push_flags &= ~RT_PUSH_IPI_RESTART;
> -               rt_rq->push_cpu = src_rq->cpu;
> -       }
> +       raw_spin_lock(&rq->rd->rto_lock);
>
> -       cpu = find_next_push_cpu(src_rq);
> +       /* Pass the IPI to the next rt overloaded queue */
> +       cpu = rto_next_cpu(rq);
>
> -       if (cpu >= nr_cpu_ids)
> -               rt_rq->push_flags &= ~RT_PUSH_IPI_EXECUTING;
> -       raw_spin_unlock(&rt_rq->push_lock);
> +       raw_spin_unlock(&rq->rd->rto_lock);
>
> -       if (cpu >= nr_cpu_ids)
> +       if (cpu < 0)
>                 return;

I am seeing "spinlock already unlocked" BUG for rd->rto_lock on a 4.9
stable kernel based system. This issue is observed only after
inclusion of this patch. It appears to me that rq->rd can change
between spinlock is acquired and released in rto_push_irq_work_func()
IRQ work if hotplug is in progress. It was only reported couple of
times during long stress testing. The issue can be easily reproduced
if an artificial delay is introduced between lock and unlock of
rto_lock. The rq->rd is changed under rq->lock, so we can protect this
race with rq->lock. The below patch solved the problem. we are taking
rq->lock in pull_rt_task()->tell_cpu_to_push(), so I extended the same
here. Please let me know your thoughts on this.

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index d863d39..478192b 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2284,6 +2284,7 @@ void rto_push_irq_work_func(struct irq_work *work)
                raw_spin_unlock(&rq->lock);
        }

+       raw_spin_lock(&rq->lock);
        raw_spin_lock(&rq->rd->rto_lock);

        /* Pass the IPI to the next rt overloaded queue */
@@ -2291,11 +2292,10 @@ void rto_push_irq_work_func(struct irq_work *work)

        raw_spin_unlock(&rq->rd->rto_lock);

-       if (cpu < 0)
-               return;
-
        /* Try the next RT overloaded CPU */
-       irq_work_queue_on(&rq->rd->rto_push_work, cpu);
+       if (cpu >= 0)
+               irq_work_queue_on(&rq->rd->rto_push_work, cpu);
+       raw_spin_unlock(&rq->lock);
 }
 #endif /* HAVE_RT_PUSH_IPI */


Thanks,
Pavan

-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a
Linux Foundation Collaborative Project