Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754209AbdFNBP5 convert rfc822-to-8bit (ORCPT ); Tue, 13 Jun 2017 21:15:57 -0400 Received: from mail.kernel.org ([198.145.29.99]:34522 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753996AbdFNBPz (ORCPT ); Tue, 13 Jun 2017 21:15:55 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 19FEF235E1 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=goodmis.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=rostedt@goodmis.org Date: Tue, 13 Jun 2017 21:15:47 -0400 From: Steven Rostedt To: "Paul E. McKenney" Cc: Peter Zijlstra , Krister Johansen , Ingo Molnar , linux-kernel@vger.kernel.org, Paul Gortmaker , Thomas Gleixner Subject: Re: [PATCH tip/sched/core] Add comments to aid in safer usage of swake_up. Message-ID: <20170613211547.49814d25@gandalf.local.home> In-Reply-To: <20170613234205.GD3721@linux.vnet.ibm.com> References: <20170609032546.GF2553@templeofstupid.com> <20170609071957.GJ8337@worktop.programming.kicks-ass.net> <20170609124554.GF3721@linux.vnet.ibm.com> <20170613192308.173dd86a@gandalf.local.home> <20170613234205.GD3721@linux.vnet.ibm.com> X-Mailer: Claws Mail 3.14.0 (GTK+ 2.24.31; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4366 Lines: 123 On Tue, 13 Jun 2017 16:42:05 -0700 "Paul E. McKenney" wrote: > On Tue, Jun 13, 2017 at 07:23:08PM -0400, Steven Rostedt wrote: > > On Fri, 9 Jun 2017 05:45:54 -0700 > > "Paul E. McKenney" wrote: > > > > > On Fri, Jun 09, 2017 at 09:19:57AM +0200, Peter Zijlstra wrote: > > > > On Thu, Jun 08, 2017 at 08:25:46PM -0700, Krister Johansen wrote: > > > > > The behavior of swake_up() differs from that of wake_up(), and from the > > > > > swake_up() that came from RT linux. A memory barrier, or some other > > > > > synchronization, is needed prior to a swake_up so that the waiter sees > > > > > the condition set by the waker, and so that the waker does not see an > > > > > empty wait list. > > > > > > > > Urgh.. let me stare at that. But it sounds like the wrong solution since > > > > we wanted to keep the wait and swait APIs as close as possible. > > > > > > But don't they both need some sort of ordering, be it memory barriers or > > > locking, to handle the case where the wait/swait doesn't actually sleep? > > > > > > > Looking at an RCU example, and assuming that ordering can move around > > within a spin lock, and that changes can leak into a spin lock region > > from both before and after. Could we have: > > > > (looking at __call_rcu_core() and rcu_gp_kthread() > > > > CPU0 CPU1 > > ---- ---- > > __call_rcu_core() { > > > > spin_lock(rnp_root) > > need_wake = __rcu_start_gp() { > > rcu_start_gp_advanced() { > > gp_flags = FLAG_INIT > > } > > } > > > > rcu_gp_kthread() { > > swait_event_interruptible(wq, > > gp_flags & FLAG_INIT) { > > spin_lock(q->lock) > > > > *fetch wq->task_list here! * > > > > list_add(wq->task_list, q->task_list) > > spin_unlock(q->lock); > > > > *fetch old value of gp_flags here * > > Both reads of ->gp_flags are READ_ONCE(), so having seen the new value > in swait_event_interruptible(), this task/CPU cannot see the old value > from some later access. You have to have accesses to two different > variables to require a memory barrier (at least assuming consistent use > of READ_ONCE(), WRITE_ONCE(), or equivalent). If I'm not mistaken, READ_ONCE() and WRITE_ONCE() is just volatiles added. The compiler may not leak or move the the fetches, but what about the hardware? A spin_lock() only needs to make sure what is after it does not leak before it. A spin_unlock() only needs to make sure what is before it must not leak after it. >From my understandings of reading memory-barrier.txt, there's no guarantees that the hardware doesn't let reads or writes that happen before a spin_lock() happen after it. Nor does it guarantee that reads or writes that happen after a spin_unlock() doesn't happen before it. The spin_locks only need to protect the inside of the critical section, not the outside of it leaking in. I'm looking at this in particular: ==== (1) ACQUIRE operation implication: Memory operations issued after the ACQUIRE will be completed after the ACQUIRE operation has completed. Memory operations issued before the ACQUIRE may be completed after the ACQUIRE operation has completed. An smp_mb__before_spinlock(), combined with a following ACQUIRE, orders prior stores against subsequent loads and stores. Note that this is weaker than smp_mb()! The smp_mb__before_spinlock() primitive is free on many architectures. (2) RELEASE operation implication: Memory operations issued before the RELEASE will be completed before the RELEASE operation has completed. Memory operations issued after the RELEASE may be completed before the RELEASE operation has completed. ==== -- Steve > > > spin_unlock(rnp_root) > > > > rcu_gp_kthread_wake() { > > swake_up(wq) { > > swait_active(wq) { > > list_empty(wq->task_list) > > > > } * return false * > > > > if (condition) * false * > > schedule(); > > > > Looks like a memory barrier is missing. Perhaps we should slap on into > > swait_active()? I don't think it is wise to let users add there own, as > > I think we currently have bugs now. > > I -know- I have bugs now. ;-) > > But I don't believe this is one of them. Or am I getting confused? > > Thanx, Paul