Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755334AbYJRBxs (ORCPT ); Fri, 17 Oct 2008 21:53:48 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753341AbYJRBxj (ORCPT ); Fri, 17 Oct 2008 21:53:39 -0400 Received: from ns.suse.de ([195.135.220.2]:57428 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752717AbYJRBxi (ORCPT ); Fri, 17 Oct 2008 21:53:38 -0400 Date: Sat, 18 Oct 2008 03:53:23 +0200 From: Nick Piggin To: Hugh Dickins Cc: Linus Torvalds , linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-arch@vger.kernel.org Subject: Re: [patch] mm: fix anon_vma races Message-ID: <20081018015323.GA11149@wotan.suse.de> References: <20081016041033.GB10371@wotan.suse.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.9i Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2381 Lines: 65 On Sat, Oct 18, 2008 at 01:13:16AM +0100, Hugh Dickins wrote: > On Fri, 17 Oct 2008, Linus Torvalds wrote: > > would be more obvious in the place where we actually fetch that "anon_vma" > > pointer again and actually derefernce it. > > > > HOWEVER: > > > > - there are potentially multiple places that do that, and putting it in > > the anon_vma_prepare() thing not only matches things with the > > smp_wmb(), making that whole pairing much more obvious, but it also > > means that we're guaranteed that any anon_vma user will have done the > > smp_read_barrier_depends(), since they all have to do that prepare > > thing anyway. > > No, it's not so that any anon_vma user would have done the > smp_read_barrier_depends() placed in anon_vma_prepare(). > > Anyone faulting in a page would have done it (swapoff? that > assumes it's been done, let's not worry about it right now). > > But they're doing it to make the page's ptes accessible to > memory reclaim, and the CPU doing memory reclaim will not > (unless by coincidence) have done that anon_vma_prepare() - > it's just reading the links which the faulters are providing. Yes, that's a very important flaw you point out with the fix. Good spotting. Actually another thing I was staying awake thinking about was the pairwise consistency problem. "Apparently" Linux is supposed to support arbitrary pairwise consistency. This means. CPU0 anon_vma.initialized = 1; smp_wmb() vma->anon_vma = anon_vma; CPU1 if (vma->anon_vma) page->anon_vma = vma->anon_vma; CPU2 if (page->anon_vma) { smp_read_barrier_depends(); assert(page->anon_vma.initialized); } The assertion may trigger because the store from CPU0 may not have propograted to CPU2 before the stores from CPU1. But after thinking about this a bit more, I think Linux would be broken all over the map under such ordering schemes. I think we'd have to mandate causal consistency. Are there any architectures we run on where this is not guaranteed? (I think recent clarifications to x86 ordering give us CC on that architecture). powerpc, ia64, alpha, sparc, arm, mips? (cced linux-arch) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/