Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754241Ab3IXQLD (ORCPT ); Tue, 24 Sep 2013 12:11:03 -0400 Received: from mx1.redhat.com ([209.132.183.28]:35974 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751301Ab3IXQLB (ORCPT ); Tue, 24 Sep 2013 12:11:01 -0400 Date: Tue, 24 Sep 2013 18:03:59 +0200 From: Oleg Nesterov To: Peter Zijlstra Cc: Mel Gorman , Rik van Riel , Srikar Dronamraju , Ingo Molnar , Andrea Arcangeli , Johannes Weiner , Linux-MM , LKML , Paul McKenney , Thomas Gleixner , Steven Rostedt Subject: Re: [PATCH] hotplug: Optimize {get,put}_online_cpus() Message-ID: <20130924160359.GA2739@redhat.com> References: <1378805550-29949-1-git-send-email-mgorman@suse.de> <1378805550-29949-38-git-send-email-mgorman@suse.de> <20130917143003.GA29354@twins.programming.kicks-ass.net> <20130917162050.GK22421@suse.de> <20130917164505.GG12926@twins.programming.kicks-ass.net> <20130918154939.GZ26785@twins.programming.kicks-ass.net> <20130919143241.GB26785@twins.programming.kicks-ass.net> <20130923175052.GA20991@redhat.com> <20130924123821.GT12926@twins.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130924123821.GT12926@twins.programming.kicks-ass.net> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3250 Lines: 100 On 09/24, Peter Zijlstra wrote: > > +static inline void get_online_cpus(void) > +{ > + might_sleep(); > + > + if (current->cpuhp_ref++) { > + barrier(); > + return; I don't undestand this barrier()... we are going to return if we already hold the lock, do we really need it? The same for put_online_cpus(). > +void __get_online_cpus(void) > { > - if (cpu_hotplug.active_writer == current) > + if (cpuhp_writer_task == current) > return; Probably it would be better to simply inc/dec ->cpuhp_ref in cpu_hotplug_begin/end and remove this check here and in __put_online_cpus(). This also means that the writer doing get/put_online_cpus() will always use the fast path, and __cpuhp_writer can go away, cpuhp_writer_task != NULL can be used instead. > + atomic_inc(&cpuhp_waitcount); > + > + /* > + * We either call schedule() in the wait, or we'll fall through > + * and reschedule on the preempt_enable() in get_online_cpus(). > + */ > + preempt_enable_no_resched(); > + wait_event(cpuhp_wq, !__cpuhp_writer); > + preempt_disable(); > + > + /* > + * It would be possible for cpu_hotplug_done() to complete before > + * the atomic_inc() above; in which case there is no writer waiting > + * and doing a wakeup would be BAD (tm). > + * > + * If however we still observe cpuhp_writer_task here we know > + * cpu_hotplug_done() is currently stuck waiting for cpuhp_waitcount. > + */ > + if (atomic_dec_and_test(&cpuhp_waitcount) && cpuhp_writer_task) > + cpuhp_writer_wake(); cpuhp_writer_wake() here and in __put_online_cpus() looks racy... Not only cpuhp_writer_wake() can hit cpuhp_writer_task == NULL (we need something like ACCESS_ONCE()), its task_struct can be already freed/reused if the writer exits. And I don't really understand the logic... This slow path succeds without incrementing any counter (except current->cpuhp_ref)? How the next writer can notice the fact it should wait for this reader? > void cpu_hotplug_done(void) > { > - cpu_hotplug.active_writer = NULL; > - mutex_unlock(&cpu_hotplug.lock); > + /* Signal the writer is done */ > + cpuhp_writer = 0; > + wake_up_all(&cpuhp_wq); > + > + /* Wait for any pending readers to be running */ > + cpuhp_writer_wait(!atomic_read(&cpuhp_waitcount)); > + cpuhp_writer_task = NULL; We also need to ensure that the next reader should see all changes done by the writer, iow this lacks "realease" semantics. But, Peter, the main question is, why this is better than percpu_rw_semaphore performance-wise? (Assuming we add task_struct->cpuhp_ref). If the writer is pending, percpu_down_read() does down_read(&brw->rw_sem); atomic_inc(&brw->slow_read_ctr); __up_read(&brw->rw_sem); is it really much worse than wait_event + atomic_dec_and_test? And! please note that with your implementation the new readers will be likely blocked while the writer sleeps in synchronize_sched(). This doesn't happen with percpu_rw_semaphore. Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/