Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753813AbYKMCbz (ORCPT ); Wed, 12 Nov 2008 21:31:55 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752408AbYKMCbo (ORCPT ); Wed, 12 Nov 2008 21:31:44 -0500 Received: from mx2.redhat.com ([66.187.237.31]:35315 "EHLO mx2.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752283AbYKMCbn (ORCPT ); Wed, 12 Nov 2008 21:31:43 -0500 Date: Thu, 13 Nov 2008 03:31:35 +0100 From: Andrea Arcangeli To: Lee Schermerhorn Cc: Christoph Lameter , Andrew Morton , Izik Eidus , linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org, chrisw@redhat.com, avi@redhat.com, izike@qumranet.com Subject: Re: [PATCH 2/4] Add replace_page(), change the mapping of pte from one page into another Message-ID: <20081113023135.GD10818@random.random> References: <20081111221753.GK10818@random.random> <20081111231722.GR10818@random.random> <20081112022701.GT10818@random.random> <20081112173258.GX10818@random.random> <1226527744.7560.93.camel@lts-notebook> <20081113020059.GC10818@random.random> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20081113020059.GC10818@random.random> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2132 Lines: 45 On Thu, Nov 13, 2008 at 03:00:59AM +0100, Andrea Arcangeli wrote: > CPU0 migrate.c CPU1 filemap.c > ------- ---------- > find_get_page > radix_tree_lookup_slot returns the oldpage > page_count still = expected_count > freeze_ref (oldpage->count = 0) > radix_tree_replace (too late, other side already got the oldpage) > unfreeze_ref (oldpage->count = 2) > page_cache_get_speculative(old_page) > set count to 3 and succeeds After reading more of this lockless radix tree code, I realized this below check is the one that was intended to restart find_get_page and prevent it to return the oldpage: if (unlikely(page != *pagep)) { But there's no barrier there, atomic_add_unless would need to provide an atomic smp_mb() _after_ atomic_add_unless executed. In the old days the atomic_* routines had no implied memory barriers, you had to use smp_mb__after_atomic_add_unless if you wanted to avoid the race. I don't see much in the ppc implementation of atomic_add_unless that would provide an implicit smb_mb after the page_cache_get_speculative returns, so I can't see why the pagep can't be by find_get_page read before the other cpu executes radix_tree_replace in the above timeline. I guess you intended to put an smp_mb() in between the page_cache_get_speculative and the *pagep to make the code safe on ppc too, but there isn't, and I think it must be fixed, either that or I don't understand ppc assembly right. The other side has a smp_wmb implicit inside radix_tree_replace_slot so it should be ok already to ensure we see the refcount going to 0 before we see the pagep changed (the fact the other side has a memory barrier, further confirms this side needs it too). BTW, the radix_tree_deref_slot might miss a rcu_barrier_depends() after radix_tree_deref_slot returns but I'm not entirely sure and only alpha would be affected in the worst case. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/