Received: by 10.223.176.46 with SMTP id f43csp1350934wra; Fri, 19 Jan 2018 10:13:27 -0800 (PST) X-Google-Smtp-Source: ACJfBosoLTMzzRHUZdRWPwi9ne43ZQWYXGaKTUKgoXlCTnx0f34lpI91YEnsp+f1QAdHdlIorn0E X-Received: by 2002:a17:902:8:: with SMTP id 8-v6mr1606169pla.415.1516385607797; Fri, 19 Jan 2018 10:13:27 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516385607; cv=none; d=google.com; s=arc-20160816; b=hQGwDcfK1J5IwIOiCw2abhQh9ZcDC74jlqL8fWnQn+6Usc073fl6r2z8Jyvowo43Mf dh3PThm2ciFbORxG4BsLoMPmjuja2QQsixRt3AZJ+X0Jmub5z92BRB4P+c6MDrdHQMIQ v3I4eeCPRbIWitV0iGPXbet+4HO+a/S6Xckv1PxuBCjlMFcGyh9gf7aw0At3RW1Qsfgb FbtDei1UzkhTHQcKvn7mEfDY/CJJbD0OsPUC8WU02ZtrePZgO1tJ2laF4ApvqSeOF6ya Ysoq9iR9oJgNOZoQ/iACRGV8FaAF9ILGhcQ4BGLmFW/HHRPKVs6GL7l6UeDuyVD4zmAW 23+A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :dmarc-filter:arc-authentication-results; bh=lNI3TSKbCpG8DsGn323ig/AUUyfOTZkkmzGO9BcwyTI=; b=PnTxsgTor/EezkW2OSa8vfK307ivmHHBZ6ZgzUtyEJoW7GRqy2XVcg0OcRPmgLNPXP wCF9RHpq25owDkz2f5avk5iy+MTSmZG5WY/VQHWVS9i/+MiY3/gjRu2lrkYFM1MfS6Tn Gj1MA88TFQigsE80nx+R8i0grC43l2UisNvrZPJ3VZrulblki2F5AmscEZF9POsr/q++ 1eODR+/rmNQ4o7eSw1qSPhk9ahoPZb16ySHYg87IQgBK/cuoQYgt9vpRVpC/iSaTrVR9 o1nDrpxynpWj9RmaZPW33FmzBbQ+LlGmIIgljnpq0/Ca/H2BShxLCicjvYdQyJcoIrhc FS6A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 59-v6si1004726plc.627.2018.01.19.10.13.13; Fri, 19 Jan 2018 10:13:27 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932553AbeASSLa (ORCPT + 99 others); Fri, 19 Jan 2018 13:11:30 -0500 Received: from mail.kernel.org ([198.145.29.99]:48244 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755968AbeASSLZ (ORCPT ); Fri, 19 Jan 2018 13:11:25 -0500 Received: from gandalf.local.home (cpe-172-100-180-131.stny.res.rr.com [172.100.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 59DDC21456; Fri, 19 Jan 2018 18:11:23 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 59DDC21456 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=rostedt@goodmis.org Date: Fri, 19 Jan 2018 13:11:21 -0500 From: Steven Rostedt To: Pavan Kondeti Cc: williams@redhat.com, Ingo Molnar , LKML , Peter Zijlstra , Thomas Gleixner , bristot@redhat.com, jkacur@redhat.com, efault@gmx.de, hpa@zytor.com, torvalds@linux-foundation.org, swood@redhat.com, linux-tip-commits@vger.kernel.org Subject: Re: [tip:sched/core] sched/rt: Simplify the IPI based RT balancing logic Message-ID: <20180119131121.22dac3d3@gandalf.local.home> In-Reply-To: <20180119174617.GA6563@codeaurora.org> References: <20170424114732.1aac6dc4@gandalf.local.home> <20180119100353.7f9f5154@gandalf.local.home> <20180119174617.GA6563@codeaurora.org> X-Mailer: Claws Mail 3.14.0 (GTK+ 2.24.31; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 19 Jan 2018 23:16:17 +0530 Pavan Kondeti wrote: > I am thinking of another problem because of the race between > rto_push_irq_work_func() and rq_attach_root() where rq->rd is modified. > > Lets say, we cache the rq->rd here and queued the IRQ work on a remote > CPU. In the mean time, the rq_attach_root() might drop all the references > to this cached (old) rd and wants to free it. The rq->rd is freed in > RCU-sched callback. If that remote CPU is in RCU quiescent state, the rq->rd > can get freed before the IRQ work is executed. This results in the corruption > of the remote CPU's IRQ work list. Right? > > Taking rq->lock in rto_push_irq_work_func() also does not help here. Probably > we have to wait for the IRQ work to finish before freeing the older root domain > in RCU-sched callback. I was wondering about this too. Yeah, it would require an RCU like update. Once the rd was unreferenced, it would need to wait for the irq works to to finish before freeing it. The easy way to do this is to simply up the refcount when sending the domain. Something like this: diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 862a513adca3..89a086ed2b16 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -1907,9 +1907,8 @@ static void push_rt_tasks(struct rq *rq) * the rt_loop_next will cause the iterator to perform another scan. * */ -static int rto_next_cpu(struct rq *rq) +static int rto_next_cpu(struct root_domain *rd) { - struct root_domain *rd = rq->rd; int next; int cpu; @@ -1985,19 +1984,24 @@ static void tell_cpu_to_push(struct rq *rq) * Otherwise it is finishing up and an ipi needs to be sent. */ if (rq->rd->rto_cpu < 0) - cpu = rto_next_cpu(rq); + cpu = rto_next_cpu(rq->rd); raw_spin_unlock(&rq->rd->rto_lock); rto_start_unlock(&rq->rd->rto_loop_start); - if (cpu >= 0) + if (cpu >= 0) { + /* Make sure the rd does not get freed while pushing */ + sched_get_rd(rq->rd); irq_work_queue_on(&rq->rd->rto_push_work, cpu); + } } /* Called from hardirq context */ void rto_push_irq_work_func(struct irq_work *work) { + struct root_domain *rd = + container_of(work, struct root_domain, rto_push_work); struct rq *rq; int cpu; @@ -2013,18 +2017,20 @@ void rto_push_irq_work_func(struct irq_work *work) raw_spin_unlock(&rq->lock); } - raw_spin_lock(&rq->rd->rto_lock); + raw_spin_lock(&rd->rto_lock); /* Pass the IPI to the next rt overloaded queue */ - cpu = rto_next_cpu(rq); + cpu = rto_next_cpu(rd); - raw_spin_unlock(&rq->rd->rto_lock); + raw_spin_unlock(&rd->rto_lock); - if (cpu < 0) + if (cpu < 0) { + sched_put_rd(rd); return; + } /* Try the next RT overloaded CPU */ - irq_work_queue_on(&rq->rd->rto_push_work, cpu); + irq_work_queue_on(&rd->rto_push_work, cpu); } #endif /* HAVE_RT_PUSH_IPI */ diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 2e95505e23c6..fb5fc458547f 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -691,6 +691,8 @@ extern struct mutex sched_domains_mutex; extern void init_defrootdomain(void); extern int sched_init_domains(const struct cpumask *cpu_map); extern void rq_attach_root(struct rq *rq, struct root_domain *rd); +extern void sched_get_rd(struct root_domain *rd); +extern void sched_put_rd(struct root_domain *rd); #ifdef HAVE_RT_PUSH_IPI extern void rto_push_irq_work_func(struct irq_work *work); diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index 034cbed7f88b..519b024f4e94 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -259,6 +259,19 @@ void rq_attach_root(struct rq *rq, struct root_domain *rd) call_rcu_sched(&old_rd->rcu, free_rootdomain); } +void sched_get_rd(struct root_domain *rd) +{ + atomic_inc(&rd->refcount); +} + +void sched_put_rd(struct root_domain *rd) +{ + if (!atomic_dec_and_test(&rd->refcount)) + return; + + call_rcu_sched(&rd->rcu, free_rootdomain); +} + static int init_rootdomain(struct root_domain *rd) { if (!zalloc_cpumask_var(&rd->span, GFP_KERNEL)) -- Steve