Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932651Ab1BWUlV (ORCPT ); Wed, 23 Feb 2011 15:41:21 -0500 Received: from e34.co.us.ibm.com ([32.97.110.152]:32832 "EHLO e34.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752166Ab1BWUlL (ORCPT ); Wed, 23 Feb 2011 15:41:11 -0500 Date: Wed, 23 Feb 2011 12:41:02 -0800 From: "Paul E. McKenney" To: Frederic Weisbecker Cc: linux-kernel@vger.kernel.org, mingo@elte.hu, laijs@cn.fujitsu.com, dipankar@in.ibm.com, akpm@linux-foundation.org, mathieu.desnoyers@polymtl.ca, josh@joshtriplett.org, niv@us.ibm.com, tglx@linutronix.de, peterz@infradead.org, rostedt@goodmis.org, Valdis.Kletnieks@vt.edu, dhowells@redhat.com, eric.dumazet@gmail.com, darren@dvhart.com, "Paul E. McKenney" Subject: Re: [PATCH RFC tip/core/rcu 11/11] rcu: move TREE_RCU from softirq to kthread Message-ID: <20110223204102.GY2163@linux.vnet.ibm.com> Reply-To: paulmck@linux.vnet.ibm.com References: <20110223013917.GA20996@linux.vnet.ibm.com> <1298425183-21265-11-git-send-email-paulmck@linux.vnet.ibm.com> <20110223165043.GA2529@nowhere> <20110223190601.GT2163@linux.vnet.ibm.com> <20110223191333.GD2591@nowhere> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110223191333.GD2591@nowhere> User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6943 Lines: 191 On Wed, Feb 23, 2011 at 08:13:35PM +0100, Frederic Weisbecker wrote: > On Wed, Feb 23, 2011 at 11:06:01AM -0800, Paul E. McKenney wrote: > > On Wed, Feb 23, 2011 at 05:50:46PM +0100, Frederic Weisbecker wrote: > > > On Tue, Feb 22, 2011 at 05:39:40PM -0800, Paul E. McKenney wrote: > > > > +} > > > > + > > > > +/* > > > > + * Drop to non-real-time priority and yield, but only after posting a > > > > + * timer that will cause us to regain our real-time priority if we > > > > + * remain preempted. Either way, we restore our real-time priority > > > > + * before returning. > > > > + */ > > > > +static void rcu_yield(int cpu) > > > > +{ > > > > + struct rcu_data *rdp = per_cpu_ptr(rcu_sched_state.rda, cpu); > > > > + struct sched_param sp; > > > > + struct timer_list yield_timer; > > > > + > > > > + setup_timer(&yield_timer, rcu_cpu_kthread_timer, (unsigned long)rdp); > > > > + mod_timer(&yield_timer, jiffies + 2); > > > > + sp.sched_priority = 0; > > > > + sched_setscheduler_nocheck(current, SCHED_NORMAL, &sp); > > > > + schedule(); > > > > + sp.sched_priority = RCU_KTHREAD_PRIO; > > > > + sched_setscheduler_nocheck(current, SCHED_FIFO, &sp); > > > > + del_timer(&yield_timer); > > > > +} > > > > + > > > > +/* > > > > + * Handle cases where the rcu_cpu_kthread() ends up on the wrong CPU. > > > > + * This can happen while the corresponding CPU is either coming online > > > > + * or going offline. We cannot wait until the CPU is fully online > > > > + * before starting the kthread, because the various notifier functions > > > > + * can wait for RCU grace periods. So we park rcu_cpu_kthread() until > > > > + * the corresponding CPU is online. > > > > + * > > > > + * Return 1 if the kthread needs to stop, 0 otherwise. > > > > + * > > > > + * Caller must disable bh. This function can momentarily enable it. > > > > + */ > > > > +static int rcu_cpu_kthread_should_stop(int cpu) > > > > +{ > > > > + while (cpu_is_offline(cpu) || smp_processor_id() != cpu) { > > > > + if (kthread_should_stop()) > > > > + return 1; > > > > + local_bh_enable(); > > > > + schedule_timeout_uninterruptible(1); > > > > > > Why is it uninterruptible? Well that doesn't change much anyway. > > > It can be a problem for long time sleeping kernel threads because of > > > the hung task detector, but certainly not for 1 jiffy. > > > > Yep, and the next patch does in fact change this to > > schedule_timeout_interruptible(). > > > > Good eyes, though! > > > > Thanx, Paul > > Ok. > > Don't forget what I wrote below ;) But... But... I -already- forgot about it!!! Thank you for the reminder. ;-) > > > > + if (smp_processor_id() != cpu) > > > > + set_cpus_allowed_ptr(current, cpumask_of(cpu)); > > > > + local_bh_disable(); > > > > + } > > > > + return 0; > > > > +} > > > > + > > > > +/* > > > > + * Per-CPU kernel thread that invokes RCU callbacks. This replaces the > > > > + * earlier RCU softirq. > > > > + */ > > > > +static int rcu_cpu_kthread(void *arg) > > > > +{ > > > > + int cpu = (int)(long)arg; > > > > + unsigned long flags; > > > > + int spincnt = 0; > > > > + wait_queue_head_t *wqp = &per_cpu(rcu_cpu_wq, cpu); > > > > + char work; > > > > + char *workp = &per_cpu(rcu_cpu_has_work, cpu); > > > > + > > > > + for (;;) { > > > > + wait_event_interruptible(*wqp, > > > > + *workp != 0 || kthread_should_stop()); > > > > + local_bh_disable(); > > > > + if (rcu_cpu_kthread_should_stop(cpu)) { > > > > + local_bh_enable(); > > > > + break; > > > > + } > > > > + local_irq_save(flags); > > > > + work = *workp; > > > > + *workp = 0; > > > > + local_irq_restore(flags); > > > > + if (work) > > > > + rcu_process_callbacks(); > > > > + local_bh_enable(); > > > > + if (*workp != 0) > > > > + spincnt++; > > > > + else > > > > + spincnt = 0; > > > > + if (spincnt > 10) { > > > > + rcu_yield(cpu); > > > > + spincnt = 0; > > > > + } > > > > + } > > > > + return 0; > > > > +} > > > > + > > > > +/* > > > > + * Per-rcu_node kthread, which is in charge of waking up the per-CPU > > > > + * kthreads when needed. > > > > + */ > > > > +static int rcu_node_kthread(void *arg) > > > > +{ > > > > + int cpu; > > > > + unsigned long flags; > > > > + unsigned long mask; > > > > + struct rcu_node *rnp = (struct rcu_node *)arg; > > > > + struct sched_param sp; > > > > + struct task_struct *t; > > > > + > > > > + for (;;) { > > > > + wait_event_interruptible(rnp->node_wq, rnp->wakemask != 0 || > > > > + kthread_should_stop()); > > > > + if (kthread_should_stop()) > > > > + break; > > > > + raw_spin_lock_irqsave(&rnp->lock, flags); > > > > + mask = rnp->wakemask; > > > > + rnp->wakemask = 0; > > > > + raw_spin_unlock_irqrestore(&rnp->lock, flags); > > > > + for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask <<= 1) { > > > > + if ((mask & 0x1) == 0) > > > > + continue; > > > > + preempt_disable(); > > > > + per_cpu(rcu_cpu_has_work, cpu) = 1; > > > > + t = per_cpu(rcu_cpu_kthread_task, cpu); > > > > + if (t == NULL) { > > > > + preempt_enable(); > > > > + continue; > > > > + } > > > > + sp.sched_priority = RCU_KTHREAD_PRIO; > > > > + sched_setscheduler_nocheck(t, cpu, &sp); > > > > + wake_up_process(t); > > > > > > My (mis?)understanding of the picture is this node kthread is there to > > > wake up cpu threads that called rcu_yield(). But actually rcu_yield() > > > doesn't make the cpu thread sleeping, instead it switches to SCHED_NORMAL, > > > to avoid starving the system with callbacks. Indeed. My original plan was to make the per-CPU kthreads do RCU priority boosting, but this turned out to be a non-starter. I apparently failed to make all the necessary adjustments when backing away from this plan. > > > So I wonder if this wake_up_process() is actually relevant. > > > sched_setscheduler_nocheck() already handles the per sched policy rq migration > > > and the process is not sleeping. > > > > > > That said, by the time the process may have gone to sleep, because if no other > > > SCHED_NORMAL task was there, it has just continued and may have flushed > > > every callbacks. So this wake_up_process() may actually wake up the task > > > but it will sleep again right away due to the condition in wait_event_interruptible() > > > of the cpu thread. But in that case, someone would have called invoke_rcu_cpu_kthread(), which would do its own wake_up(). > > > Right? Yep, removed the redundant wake_up_process(). Thank you! Hmmm... And that second argument to sched_setscheduler_nocheck() is bogus as well... Should be SCHED_FIFO. Thanx, Paul > > > > + preempt_enable(); > > > > + } > > > > + } > > > > + return 0; > > > > +} -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/