Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753762AbbDPSCl (ORCPT ); Thu, 16 Apr 2015 14:02:41 -0400 Received: from mail-wg0-f50.google.com ([74.125.82.50]:35237 "EHLO mail-wg0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751597AbbDPSCc (ORCPT ); Thu, 16 Apr 2015 14:02:32 -0400 Date: Thu, 16 Apr 2015 20:02:27 +0200 From: Ingo Molnar To: Peter Zijlstra Cc: Steven Rostedt , Mel Gorman , Rik van Riel , Jason Low , Linus Torvalds , Thomas Gleixner , linux-kernel@vger.kernel.org, "Paul E. McKenney" , Andrew Morton , Oleg Nesterov , Mike Galbraith , Frederic Weisbecker , Mel Gorman , Preeti U Murthy , hideaki.kimura@hp.com, Aswin Chandramouleeswaran , Scott J Norton Subject: Re: [PATCH 1/3] sched, timer: Remove usages of ACCESS_ONCE in the scheduler Message-ID: <20150416180227.GB17401@gmail.com> References: <1429052986-9420-1-git-send-email-jason.low2@hp.com> <1429052986-9420-2-git-send-email-jason.low2@hp.com> <20150414195906.3adc89d9@gandalf.local.home> <1429063953.7039.88.camel@j-VirtualBox> <20150414224059.061ec5bf@grimm.local.home> <20150415074601.GC13449@gmail.com> <20150416165224.GD12676@worktop.ger.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150416165224.GD12676@worktop.ger.corp.intel.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3443 Lines: 100 * Peter Zijlstra wrote: > On Wed, Apr 15, 2015 at 09:46:01AM +0200, Ingo Molnar wrote: > > > @@ -2088,7 +2088,7 @@ void task_numa_fault(int last_cpupid, int mem_node, int pages, int flags) > > > > static void reset_ptenuma_scan(struct task_struct *p) > > { > > - ACCESS_ONCE(p->mm->numa_scan_seq)++; > > + WRITE_ONCE(p->mm->numa_scan_seq, READ_ONCE(p->mm->numa_scan_seq) + 1); > > vs > > seq = ACCESS_ONCE(p->mm->numa_scan_seq); > if (p->numa_scan_seq == seq) > return; > p->numa_scan_seq = seq; > > > > So the original ACCESS_ONCE() barriers were misguided to begin with: I > > think they tried to handle races with the scheduler balancing softirq > > and tried to avoid having to use atomics for the sequence counter > > (which would be overkill), but things like ACCESS_ONCE(x)++ never > > guaranteed atomicity (or even coherency) of the update. > > > > But since in reality this is only statistical sampling code, all these > > compiler barriers can be removed I think. Peter, Mel, Rik, do you > > agree? > > ACCESS_ONCE() is not a compiler barrier It's not a general compiler barrier (and I didn't claim so) but it is still a compiler barrier: it's documented as a weak, variable specific barrier in Documentation/memor-barriers.txt: COMPILER BARRIER ---------------- The Linux kernel has an explicit compiler barrier function that prevents the compiler from moving the memory accesses either side of it to the other side: barrier(); This is a general barrier -- there are no read-read or write-write variants of barrier(). However, ACCESS_ONCE() can be thought of as a weak form for barrier() that affects only the specific accesses flagged by the ACCESS_ONCE(). [...] > The 'read' side uses ACCESS_ONCE() for two purposes: > - to load the value once, we don't want the seq number to change under > us for obvious reasons > - to avoid load tearing and observe weird seq numbers > > The update side uses ACCESS_ONCE() to avoid write tearing, and > strictly speaking it should also worry about read-tearing since its > not hard serialized, although its very unlikely to actually have > concurrency (IIRC). So what bad effects can there be from the very unlikely read and write tearing? AFAICS nothing particularly bad. On the read side: seq = ACCESS_ONCE(p->mm->numa_scan_seq); if (p->numa_scan_seq == seq) return; p->numa_scan_seq = seq; If p->mm->numa_scan_seq gets loaded twice (very unlikely), and two different values happen, then we might get a 'double' NUMA placement run - i.e. statistical noise. On the update side: ACCESS_ONCE(p->mm->numa_scan_seq)++; p->mm->numa_scan_offset = 0; If the compiler tears that up we might skip an update - again statistical noise at worst. Nor is compiler tearing the only theoretical worry here: in theory, with long cache coherency latencies we might get two updates 'mixed up' and resulting in a (single) missed update. Only atomics would solve all the races, but I think that would be overdoing it. This is what I meant by that there's no harm from this race. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/