Received: by 10.223.176.46 with SMTP id f43csp1397107wra; Fri, 19 Jan 2018 10:56:03 -0800 (PST) X-Google-Smtp-Source: ACJfBouBaXKKOLnS4XSo2lauj9L40X+7okP4TjyRzV396BCofl5oSBhSL312FZibmFwR6xJWAlud X-Received: by 2002:a17:902:6ec5:: with SMTP id l5-v6mr2095369pln.443.1516388163352; Fri, 19 Jan 2018 10:56:03 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516388163; cv=none; d=google.com; s=arc-20160816; b=aKuYq7owuLdMp+kpRFr2ySOLMDnrhTdLhVFA/TylWkbyXDUwAZoOmi6GDaZfP1LU6X UIWYmuImcJJ+dxNJepIQbKf9p3WmvOIQJLEPT49dZGqdgRfwQYjfEEQXiNxErolf5eN/ mhW4ZzzJtYGAQ4CNp2raD0KDdW/Y3UMlWf6CH/QUNCKSDGC7ITsNwRRcMU8SArmJwRhj 9MLiiUb2W19m/wNozzkfA/bV7ZeMqkf0URXQkoGZBGjcGBNEmaVX1SFCBU0/UmpN/MNy GiNGrLSOThJ/EbiJW+yTtgotKakcPoIiDVxcKJ9Mto0GWrhl/8g6OeZ1VbCCBl9hFi0L jEeg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dmarc-filter:dkim-signature:dkim-signature :arc-authentication-results; bh=iIRB5k2VWmZ/9p9Mi2z7cchnHxc6ltJ+Zzno9esRPuA=; b=WXgjLp904pkHVIUBp21qOukTJo3VAQMMPpywlaQegZK5YZHg7TxDxdAB1no9zhFSAb TxUy1gBn0TeQTCjSv0VMi3rj2G2WRWJ6CHeU4vxqxIzOMqUffL0m4VBeyFDQHxdJg8om amyA8dzCnwreRWCoVESTo+y9Vfpai92uoA3Bowf0B1ZThEmbQjktDSHvF/4oggbjByLj P5gnA6It/xIgyJtF0M1fduQXUMXKADnHS1S8A+bCw4vWgpWbOLWzupA8sUTvzLn4/M75 PoTgwlJOlameWrTNKBzgOLecmlzMqq0T5nAIPBehQKlcNIR1nRHGoXVmlHo6cv/OSoe/ 2CzA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@codeaurora.org header.s=default header.b=kDHw+wN+; dkim=pass header.i=@codeaurora.org header.s=default header.b=Cs42TN1D; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c29si8770124pgn.727.2018.01.19.10.55.48; Fri, 19 Jan 2018 10:56:03 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@codeaurora.org header.s=default header.b=kDHw+wN+; dkim=pass header.i=@codeaurora.org header.s=default header.b=Cs42TN1D; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756212AbeASSzK (ORCPT + 99 others); Fri, 19 Jan 2018 13:55:10 -0500 Received: from smtp.codeaurora.org ([198.145.29.96]:55684 "EHLO smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755830AbeASSzD (ORCPT ); Fri, 19 Jan 2018 13:55:03 -0500 Received: by smtp.codeaurora.org (Postfix, from userid 1000) id 560D660A65; Fri, 19 Jan 2018 18:55:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1516388103; bh=4qllH/ev2JFplQuwVaYOFOjrsbnDRcRDXhi+HGdJF5Q=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=kDHw+wN+gavJrVaM/93fFeYhDxZSpNklik4KRof+pTh2TD/8Ik+UsnwumRpKaQRP7 zFpJzZmx3gs8xYKuSoJMoV5H0i6l2ECFkxvLKQZ4pG0gFzfgg3WXgvIavQTqbzWsMO QP7+6iLqISWBAncDMcM7YD4+wpkORi8Rw+PzPFCc= X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on pdx-caf-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=2.0 tests=ALL_TRUSTED,BAYES_00, DKIM_SIGNED,T_DKIM_INVALID autolearn=no autolearn_force=no version=3.4.0 Received: from codeaurora.org (blr-c-bdr-fw-01_globalnat_allzones-outside.qualcomm.com [103.229.19.19]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: pkondeti@smtp.codeaurora.org) by smtp.codeaurora.org (Postfix) with ESMTPSA id 8903860376; Fri, 19 Jan 2018 18:54:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=codeaurora.org; s=default; t=1516388102; bh=4qllH/ev2JFplQuwVaYOFOjrsbnDRcRDXhi+HGdJF5Q=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=Cs42TN1DNxbqzEKQbE2DHJnayRrm9SbR3aaw6G7+Fq82jvjpoXIPaYDI9l5fF8+yX glLry3uWy9zcZlTfYEievb1ukSN+2mNBw1/Gwg/QH4IhRmIDfofyQcGcU24KGAHs92 IbkeSKSqzAyfSNZfAsobZ5xrhHdpGyjJQHBaQ8GA= DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org 8903860376 Authentication-Results: pdx-caf-mail.web.codeaurora.org; dmarc=none (p=none dis=none) header.from=codeaurora.org Authentication-Results: pdx-caf-mail.web.codeaurora.org; spf=none smtp.mailfrom=pkondeti@codeaurora.org Date: Sat, 20 Jan 2018 00:24:55 +0530 From: Pavan Kondeti To: Steven Rostedt Cc: williams@redhat.com, Ingo Molnar , LKML , Peter Zijlstra , Thomas Gleixner , bristot@redhat.com, jkacur@redhat.com, efault@gmx.de, hpa@zytor.com, torvalds@linux-foundation.org, swood@redhat.com, linux-tip-commits@vger.kernel.org Subject: Re: [tip:sched/core] sched/rt: Simplify the IPI based RT balancing logic Message-ID: <20180119185455.GB6563@codeaurora.org> References: <20170424114732.1aac6dc4@gandalf.local.home> <20180119100353.7f9f5154@gandalf.local.home> <20180119174617.GA6563@codeaurora.org> <20180119131121.22dac3d3@gandalf.local.home> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180119131121.22dac3d3@gandalf.local.home> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jan 19, 2018 at 01:11:21PM -0500, Steven Rostedt wrote: > On Fri, 19 Jan 2018 23:16:17 +0530 > Pavan Kondeti wrote: > > > I am thinking of another problem because of the race between > > rto_push_irq_work_func() and rq_attach_root() where rq->rd is modified. > > > > Lets say, we cache the rq->rd here and queued the IRQ work on a remote > > CPU. In the mean time, the rq_attach_root() might drop all the references > > to this cached (old) rd and wants to free it. The rq->rd is freed in > > RCU-sched callback. If that remote CPU is in RCU quiescent state, the rq->rd > > can get freed before the IRQ work is executed. This results in the corruption > > of the remote CPU's IRQ work list. Right? > > > > Taking rq->lock in rto_push_irq_work_func() also does not help here. Probably > > we have to wait for the IRQ work to finish before freeing the older root domain > > in RCU-sched callback. > > I was wondering about this too. Yeah, it would require an RCU like > update. Once the rd was unreferenced, it would need to wait for the > irq works to to finish before freeing it. > > The easy way to do this is to simply up the refcount when sending the > domain. Something like this: > > diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c > index 862a513adca3..89a086ed2b16 100644 > --- a/kernel/sched/rt.c > +++ b/kernel/sched/rt.c > @@ -1907,9 +1907,8 @@ static void push_rt_tasks(struct rq *rq) > * the rt_loop_next will cause the iterator to perform another scan. > * > */ > -static int rto_next_cpu(struct rq *rq) > +static int rto_next_cpu(struct root_domain *rd) > { > - struct root_domain *rd = rq->rd; > int next; > int cpu; > > @@ -1985,19 +1984,24 @@ static void tell_cpu_to_push(struct rq *rq) > * Otherwise it is finishing up and an ipi needs to be sent. > */ > if (rq->rd->rto_cpu < 0) > - cpu = rto_next_cpu(rq); > + cpu = rto_next_cpu(rq->rd); > > raw_spin_unlock(&rq->rd->rto_lock); > > rto_start_unlock(&rq->rd->rto_loop_start); > > - if (cpu >= 0) > + if (cpu >= 0) { > + /* Make sure the rd does not get freed while pushing */ > + sched_get_rd(rq->rd); > irq_work_queue_on(&rq->rd->rto_push_work, cpu); > + } > } Since this is covered by rq->lock, it is guaranteed that we increment the refcount on the older rd before RCU-sched callback is queued in rq_attach_root(). Either we keep older rd alive or use the updated rd. We are good here, I think. Thanks, Pavan -- Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.