Received: by 10.223.176.46 with SMTP id f43csp1319086wra; Fri, 19 Jan 2018 09:48:18 -0800 (PST) X-Google-Smtp-Source: ACJfBou8pGSRVZls537Fjk0RV/tKc5UPh8cg57D8LcpQQF7jOYe0PKTLw++teMY4ape9DLJnmgW0 X-Received: by 10.99.110.78 with SMTP id j75mr40823301pgc.96.1516384098020; Fri, 19 Jan 2018 09:48:18 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516384097; cv=none; d=google.com; s=arc-20160816; b=v1sDi+PfmaC/JhBxr6U9L7nigJRb3qgLjpJxpflffI0oC/qpMc8RXKDko0ECqzT3Ql RWWQ/Q7xQD2MxaYlciJADLwVO0nSFBpP0PaGjnfQW2SK1XsA3nzWkvBNnOO9rnJpiwQs TfzzLG8EMFdtbPbIABvfJ+5xnTRpkIcE4ako4GVkgaNN57fG9Tr5gw13QlWC6T0WasVM ow3p/VtHB5sLWTAUbXo0j4zwUdYPWxBnh43TKrQphnRyLyjrS7dFmEXcHf9nIv6zDTYT XbzEYuuPo2Q470MEQFysY2hTwqj4R1xlhlXL1bo9PDlk3gRhaJByxP/Fus4Q7cw90XS8 4uAA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dmarc-filter:dkim-signature:dkim-signature :arc-authentication-results; bh=zCPA3JfFKSTvTYuKBO37w78/RcGqbbWMwdpaQk1+6JA=; b=g8DFc+LlBfSulb55FCHQxczLsU41K99rHrFMGr3lye6UyOcAshu7wMd1gykywHw7Aw /ec1LDSAdFQY53MxpbnMaQtTamOanIYAm9fNLzsFTxM1ne3DrxyoAQNvhA5VSMpfROxE dUYl/3Ddzd4PMjQosuG+7AIspNmI9dK12BB4c5MSWTUx6dbhlFpD/18CK9aRMXPg5QFf rx7nmJMVcWboXVyp4b4wcd+fhCkJ9kv7HjN5IVpVQ0+xPTNItnukly3HwkXnDI4fHYcS TurzSU+8VFW+z2PPY3bu0Nhmio17Lb1GZQyDnCngg6O8khr1t4ACpRtHH3p55n/lmLg2 P9AA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@codeaurora.org header.s=default header.b=WowyWTAF; dkim=pass header.i=@codeaurora.org header.s=default header.b=fI0aJ9Le; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 39-v6si981765plc.289.2018.01.19.09.48.04; Fri, 19 Jan 2018 09:48:17 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@codeaurora.org header.s=default header.b=WowyWTAF; dkim=pass header.i=@codeaurora.org header.s=default header.b=fI0aJ9Le; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932479AbeASRqa (ORCPT + 99 others); Fri, 19 Jan 2018 12:46:30 -0500 Received: from smtp.codeaurora.org ([198.145.29.96]:47324 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755932AbeASRqZ (ORCPT ); Fri, 19 Jan 2018 12:46:25 -0500 Received: by smtp.codeaurora.org (Postfix, from userid 1000) id 365CF60265; Fri, 19 Jan 2018 17:46:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1516383985; bh=JZwzChj89L4Rv9KQ1mcVvrVnNV7j9DCTqDp0jqJCxBA=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=WowyWTAFYofu6FkikafEYjRwNW9UX0R1ims5AeOz2jOFbthOmHg0TDV97NsRukXxj zFtwsruJyJLAwDl4IkWPc7EUuAfHOfhzYrdTgX6H4F9dtdRfcFNAcbwsXgVQfJDUvi G4URH98wwDFgXZPEX052GPmVw2OZV+lDf+KBzMUM= X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on pdx-caf-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=2.0 tests=ALL_TRUSTED,BAYES_00, DKIM_SIGNED,T_DKIM_INVALID autolearn=no autolearn_force=no version=3.4.0 Received: from codeaurora.org (blr-c-bdr-fw-01_globalnat_allzones-outside.qualcomm.com [103.229.19.19]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: pkondeti@smtp.codeaurora.org) by smtp.codeaurora.org (Postfix) with ESMTPSA id B8A7A60265; Fri, 19 Jan 2018 17:46:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1516383984; bh=JZwzChj89L4Rv9KQ1mcVvrVnNV7j9DCTqDp0jqJCxBA=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=fI0aJ9Le71qxhOsNwdg6fYHMduXAKDDO1xt3EPlrV5z9pX7RMJQ0ExBRb37vQ73I6 Kb9qG2Ikcy7wNxhHO6bdlUn56Y16+3YTU6onTN0Hg1h6tNLobWulYffqwOL211jpda hlSPoQYlZ7BGg51HMFtRKMJdtXQ0H0OxJeSwfRPY= DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org B8A7A60265 Authentication-Results: pdx-caf-mail.web.codeaurora.org; dmarc=none (p=none dis=none) header.from=codeaurora.org Authentication-Results: pdx-caf-mail.web.codeaurora.org; spf=none smtp.mailfrom=pkondeti@codeaurora.org Date: Fri, 19 Jan 2018 23:16:17 +0530 From: Pavan Kondeti To: Steven Rostedt Cc: williams@redhat.com, Ingo Molnar , LKML , Peter Zijlstra , Thomas Gleixner , bristot@redhat.com, jkacur@redhat.com, efault@gmx.de, hpa@zytor.com, torvalds@linux-foundation.org, swood@redhat.com, linux-tip-commits@vger.kernel.org Subject: Re: [tip:sched/core] sched/rt: Simplify the IPI based RT balancing logic Message-ID: <20180119174617.GA6563@codeaurora.org> References: <20170424114732.1aac6dc4@gandalf.local.home> <20180119100353.7f9f5154@gandalf.local.home> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180119100353.7f9f5154@gandalf.local.home> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jan 19, 2018 at 10:03:53AM -0500, Steven Rostedt wrote: > On Fri, 19 Jan 2018 14:53:05 +0530 > Pavan Kondeti wrote: > > > I am seeing "spinlock already unlocked" BUG for rd->rto_lock on a 4.9 > > stable kernel based system. This issue is observed only after > > inclusion of this patch. It appears to me that rq->rd can change > > between spinlock is acquired and released in rto_push_irq_work_func() > > IRQ work if hotplug is in progress. It was only reported couple of > > times during long stress testing. The issue can be easily reproduced > > if an artificial delay is introduced between lock and unlock of > > rto_lock. The rq->rd is changed under rq->lock, so we can protect this > > race with rq->lock. The below patch solved the problem. we are taking > > rq->lock in pull_rt_task()->tell_cpu_to_push(), so I extended the same > > here. Please let me know your thoughts on this. > > As so rq->rd can change. Interesting. > > > > > diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c > > index d863d39..478192b 100644 > > --- a/kernel/sched/rt.c > > +++ b/kernel/sched/rt.c > > @@ -2284,6 +2284,7 @@ void rto_push_irq_work_func(struct irq_work *work) > > raw_spin_unlock(&rq->lock); > > } > > > > + raw_spin_lock(&rq->lock); > > > What about just saving the rd then? > > struct root_domain *rd; > > rd = READ_ONCE(rq->rd); > > then use that. Then we don't need to worry about it changing. > I am thinking of another problem because of the race between rto_push_irq_work_func() and rq_attach_root() where rq->rd is modified. Lets say, we cache the rq->rd here and queued the IRQ work on a remote CPU. In the mean time, the rq_attach_root() might drop all the references to this cached (old) rd and wants to free it. The rq->rd is freed in RCU-sched callback. If that remote CPU is in RCU quiescent state, the rq->rd can get freed before the IRQ work is executed. This results in the corruption of the remote CPU's IRQ work list. Right? Taking rq->lock in rto_push_irq_work_func() also does not help here. Probably we have to wait for the IRQ work to finish before freeing the older root domain in RCU-sched callback. Thanks, Pavan -- Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.