Received: by 10.223.176.46 with SMTP id f43csp765877wra; Fri, 19 Jan 2018 01:24:51 -0800 (PST) X-Google-Smtp-Source: ACJfBouUw8aU6GomowCCqD5SC7qoclQnfkk6ZwdYTdWPrZswyHYGbpsCv/d/UHhqBCz2nWcIWBOU X-Received: by 10.99.65.67 with SMTP id o64mr40002140pga.258.1516353891366; Fri, 19 Jan 2018 01:24:51 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516353891; cv=none; d=google.com; s=arc-20160816; b=Kh44XdoLqGVXIjYKS/58dk7U7M6Z4UavyPM0nB2mv7D9oCYmBN9DrCiFb6olveKzCK 5UKhbJeTY6ZHJPTJv0+2g6pZ109A0BxyGwshAG1Gm92x69zW6gOthBELKUK8HZdPBJeQ DHTuP1SLVerHJCT3wonu3rf6+Xsvuyg2RYOCUgWjOmvI8ORKkUXQ6l9AAUuGiSVE8/jt U23hRShf2Pvno+1j14JSEhEij1FrSmSCFGrhz3xjPTLWn3qJK9G04GiFaSK8syMQimhw YLzLVSVQF5+5NsW3iaYwUr0SkurIXEE9eEJVPodu6ous9RjA0vSQXzm72N7LFO9S51FJ Ia5Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dmarc-filter:dkim-signature :dkim-signature:arc-authentication-results; bh=rlkvvAFkZAFIvdbp7OU+d88gktFtMw6RxvOmgwECMY0=; b=pv4IlGyM3SrW25DL0e3uJ9d2wiNBZKi4+kn7oZqGHeSkUCJFIuR5ed4qMBDuCNRMro 6QL43zn79/rINuWb1K3figX30e/GM4I17RkjCtdMPVKkQQQ06E6eVeJBXtSnC9MLWjsI e4uRs7uWQ+8dMs1vHySFC4DKPuwBV3C406MLlQsVeelyICJCAMACEqFuzahrzFj9ZM/B tw7SjAyvkvgpn4/C2DWswB8/UuZd55Oe1u4o9BIJb3HPZgVchM0bTObqE5y3XChsT/6y pmyGi4MhWuCRRlBa9T/SG270X1ipiiYmDT9yPsSAg7dLl/SKlkPJOJIwnDYvYsDwGfFO cASg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@codeaurora.org header.s=default header.b=EebYN2ts; dkim=pass header.i=@codeaurora.org header.s=default header.b=juo27HD9; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k12si7261085pgc.587.2018.01.19.01.24.37; Fri, 19 Jan 2018 01:24:51 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@codeaurora.org header.s=default header.b=EebYN2ts; dkim=pass header.i=@codeaurora.org header.s=default header.b=juo27HD9; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932138AbeASJXS (ORCPT + 99 others); Fri, 19 Jan 2018 04:23:18 -0500 Received: from smtp.codeaurora.org ([198.145.29.96]:50762 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750841AbeASJXI (ORCPT ); Fri, 19 Jan 2018 04:23:08 -0500 Received: by smtp.codeaurora.org (Postfix, from userid 1000) id 7EFCC60112; Fri, 19 Jan 2018 09:23:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1516353787; bh=f481TYGdc1LoXhFJuajFaKoxYqxcpSimgnpkCYFOZ2o=; h=In-Reply-To:References:From:Date:Subject:To:Cc:From; b=EebYN2tslIahH8C5F119AT5mzogcLIehAMqDw8lwkdbtGl3+nNrg+7DxmVZ6Gef21 82taKQ8cbWuY0mgnSzJtFS9uX/tNS/U+ndKkBqdoLXYKN49iiLzuReWpKk+kCjEazc nwkvMp1oVEwgaJCkFko84WpUuKDcXM93w1Tkr5RI= X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on pdx-caf-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=2.0 tests=ALL_TRUSTED,BAYES_00, DKIM_SIGNED,T_DKIM_INVALID autolearn=no autolearn_force=no version=3.4.0 Received: from mail-oi0-f43.google.com (mail-oi0-f43.google.com [209.85.218.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: pkondeti@smtp.codeaurora.org) by smtp.codeaurora.org (Postfix) with ESMTPSA id B95C260112; Fri, 19 Jan 2018 09:23:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1516353786; bh=f481TYGdc1LoXhFJuajFaKoxYqxcpSimgnpkCYFOZ2o=; h=In-Reply-To:References:From:Date:Subject:To:Cc:From; b=juo27HD9Xig3CekqIaypGuFaiwGkwe+2QHQwYJ2blCB+cIwdaUa9P+luQZPUuKxQL MW68OEYS974QjB/UrjEGMJ2bBuGUZlVKFkUkaH28a8Ff1jcn9xeQVoDeVTBsPVdx/z +dNqE95KZa+52CrPP9GTYTfC5Tiw2BtDruGOTj6I= DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org B95C260112 Authentication-Results: pdx-caf-mail.web.codeaurora.org; dmarc=none (p=none dis=none) header.from=codeaurora.org Authentication-Results: pdx-caf-mail.web.codeaurora.org; spf=none smtp.mailfrom=pkondeti@codeaurora.org Received: by mail-oi0-f43.google.com with SMTP id t8so697924oie.6; Fri, 19 Jan 2018 01:23:06 -0800 (PST) X-Gm-Message-State: AKwxytdN5QMGXaADblbQ8jGYHN5elGET8qQXM5a8D2GhLy/UvpLOhI/Z ctwlZFS8J1U0h/WS/oBjUSfxa+k1/j/1NB+K//Y= X-Received: by 10.202.169.144 with SMTP id s138mr4325311oie.347.1516353786049; Fri, 19 Jan 2018 01:23:06 -0800 (PST) MIME-Version: 1.0 Received: by 10.74.85.194 with HTTP; Fri, 19 Jan 2018 01:23:05 -0800 (PST) In-Reply-To: References: <20170424114732.1aac6dc4@gandalf.local.home> From: Pavan Kondeti Date: Fri, 19 Jan 2018 14:53:05 +0530 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [tip:sched/core] sched/rt: Simplify the IPI based RT balancing logic To: williams@redhat.com, Steven Rostedt , Ingo Molnar , LKML , Peter Zijlstra , Thomas Gleixner , bristot@redhat.com, jkacur@redhat.com, efault@gmx.de, hpa@zytor.com, torvalds@linux-foundation.org, swood@redhat.com Cc: linux-tip-commits@vger.kernel.org, Pavan Kondeti Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Steven, > /* Called from hardirq context */ > -static void try_to_push_tasks(void *arg) > +void rto_push_irq_work_func(struct irq_work *work) > { > - struct rt_rq *rt_rq = arg; > - struct rq *rq, *src_rq; > - int this_cpu; > + struct rq *rq; > int cpu; > > - this_cpu = rt_rq->push_cpu; > + rq = this_rq(); > > - /* Paranoid check */ > - BUG_ON(this_cpu != smp_processor_id()); > - > - rq = cpu_rq(this_cpu); > - src_rq = rq_of_rt_rq(rt_rq); > - > -again: > + /* > + * We do not need to grab the lock to check for has_pushable_tasks. > + * When it gets updated, a check is made if a push is possible. > + */ > if (has_pushable_tasks(rq)) { > raw_spin_lock(&rq->lock); > - push_rt_task(rq); > + push_rt_tasks(rq); > raw_spin_unlock(&rq->lock); > } > > - /* Pass the IPI to the next rt overloaded queue */ > - raw_spin_lock(&rt_rq->push_lock); > - /* > - * If the source queue changed since the IPI went out, > - * we need to restart the search from that CPU again. > - */ > - if (rt_rq->push_flags & RT_PUSH_IPI_RESTART) { > - rt_rq->push_flags &= ~RT_PUSH_IPI_RESTART; > - rt_rq->push_cpu = src_rq->cpu; > - } > + raw_spin_lock(&rq->rd->rto_lock); > > - cpu = find_next_push_cpu(src_rq); > + /* Pass the IPI to the next rt overloaded queue */ > + cpu = rto_next_cpu(rq); > > - if (cpu >= nr_cpu_ids) > - rt_rq->push_flags &= ~RT_PUSH_IPI_EXECUTING; > - raw_spin_unlock(&rt_rq->push_lock); > + raw_spin_unlock(&rq->rd->rto_lock); > > - if (cpu >= nr_cpu_ids) > + if (cpu < 0) > return; I am seeing "spinlock already unlocked" BUG for rd->rto_lock on a 4.9 stable kernel based system. This issue is observed only after inclusion of this patch. It appears to me that rq->rd can change between spinlock is acquired and released in rto_push_irq_work_func() IRQ work if hotplug is in progress. It was only reported couple of times during long stress testing. The issue can be easily reproduced if an artificial delay is introduced between lock and unlock of rto_lock. The rq->rd is changed under rq->lock, so we can protect this race with rq->lock. The below patch solved the problem. we are taking rq->lock in pull_rt_task()->tell_cpu_to_push(), so I extended the same here. Please let me know your thoughts on this. diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index d863d39..478192b 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -2284,6 +2284,7 @@ void rto_push_irq_work_func(struct irq_work *work) raw_spin_unlock(&rq->lock); } + raw_spin_lock(&rq->lock); raw_spin_lock(&rq->rd->rto_lock); /* Pass the IPI to the next rt overloaded queue */ @@ -2291,11 +2292,10 @@ void rto_push_irq_work_func(struct irq_work *work) raw_spin_unlock(&rq->rd->rto_lock); - if (cpu < 0) - return; - /* Try the next RT overloaded CPU */ - irq_work_queue_on(&rq->rd->rto_push_work, cpu); + if (cpu >= 0) + irq_work_queue_on(&rq->rd->rto_push_work, cpu); + raw_spin_unlock(&rq->lock); } #endif /* HAVE_RT_PUSH_IPI */ Thanks, Pavan -- Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project