Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1765388AbYCGUA6 (ORCPT ); Fri, 7 Mar 2008 15:00:58 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754196AbYCGUAt (ORCPT ); Fri, 7 Mar 2008 15:00:49 -0500 Received: from relay1.sgi.com ([192.48.171.29]:50577 "EHLO relay.sgi.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753950AbYCGUAs (ORCPT ); Fri, 7 Mar 2008 15:00:48 -0500 Date: Fri, 7 Mar 2008 12:00:26 -0800 (PST) From: Christoph Lameter X-X-Sender: clameter@schroedinger.engr.sgi.com To: Andrea Arcangeli cc: Jack Steiner , Nick Piggin , akpm@linux-foundation.org, Robin Holt , Avi Kivity , kvm-devel@lists.sourceforge.net, Peter Zijlstra , general@lists.openfabrics.org, Steve Wise , Roland Dreier , Kanoj Sarcar , linux-kernel@vger.kernel.org, linux-mm@kvack.org, daniel.blueman@quadrics.com Subject: Re: [PATCH] 3/4 combine RCU with seqlock to allow mmu notifier methods to sleep (#v9 was 1/4) In-Reply-To: <20080307152328.GE24114@v2.random> Message-ID: References: <20080302155457.GK8091@v2.random> <20080303213707.GA8091@v2.random> <20080303220502.GA5301@v2.random> <47CC9B57.5050402@qumranet.com> <20080304133020.GC5301@v2.random> <20080304222030.GB8951@v2.random> <20080307151722.GD24114@v2.random> <20080307152328.GE24114@v2.random> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3443 Lines: 97 On Fri, 7 Mar 2008, Andrea Arcangeli wrote: > This combines the non-sleep-capable RCU locking of #v9 with a seqlock > so the mmu notifier fast path will require zero cacheline > writes/bouncing while still providing mmu_notifier_unregister and > allowing to schedule inside the mmu notifier methods. If we drop > mmu_notifier_unregister we can as well drop all seqlock and > rcu_read_lock()s. But this locking scheme combination is sexy enough > and 100% scalable (the mmu_notifier_list cacheline will be preloaded > anyway and that will most certainly include the sequence number value > in l1 for free even in Christoph's NUMA systems) so IMHO it worth to > keep mmu_notifier_unregister. Well its adds lots of processing. Not sure if its really worth it. Seems that this scheme cannot work since the existence of the structure passed to the callbacks is not guaranteed since the RCU locks are not held. You need some kind of a refcount to give the existence guarantee. > diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c > --- a/mm/mmu_notifier.c > +++ b/mm/mmu_notifier.c > @@ -20,7 +20,9 @@ void __mmu_notifier_release(struct mm_st > void __mmu_notifier_release(struct mm_struct *mm) > { > struct mmu_notifier *mn; > + unsigned seq; > > + seq = read_seqbegin(&mm->mmu_notifier_lock); > while (unlikely(!hlist_empty(&mm->mmu_notifier_list))) { > mn = hlist_entry(mm->mmu_notifier_list.first, > struct mmu_notifier, > @@ -28,6 +30,7 @@ void __mmu_notifier_release(struct mm_st > hlist_del(&mn->hlist); > if (mn->ops->release) > mn->ops->release(mn, mm); > + BUG_ON(read_seqretry(&mm->mmu_notifier_lock, seq)); > } > } So this is only for sanity checking? The BUG_ON detects concurrent operations that should not happen? Need a comment here. > @@ -42,11 +45,19 @@ int __mmu_notifier_clear_flush_young(str > struct mmu_notifier *mn; > struct hlist_node *n; > int young = 0; > + unsigned seq; > > rcu_read_lock(); > +restart: > + seq = read_seqbegin(&mm->mmu_notifier_lock); > hlist_for_each_entry_rcu(mn, n, &mm->mmu_notifier_list, hlist) { > - if (mn->ops->clear_flush_young) > + if (mn->ops->clear_flush_young) { > + rcu_read_unlock(); > young |= mn->ops->clear_flush_young(mn, mm, address); > + rcu_read_lock(); > + } > + if (read_seqretry(&mm->mmu_notifier_lock, seq)) > + goto restart; Great innovative idea of the seqlock for versioning checks. > } > rcu_read_unlock(); > Well that gets pretty sophisticated here. If you drop the rcu lock then the entity pointed to by mn can go away right? So how can you pass that structure to clear_flush_young? What is guaranteeing the existence of the structure? > @@ -58,11 +69,19 @@ void __mmu_notifier_invalidate_page(stru > { > struct mmu_notifier *mn; > struct hlist_node *n; > + unsigned seq; > > rcu_read_lock(); > +restart: > + seq = read_seqbegin(&mm->mmu_notifier_lock); > hlist_for_each_entry_rcu(mn, n, &mm->mmu_notifier_list, hlist) { > - if (mn->ops->invalidate_page) > + if (mn->ops->invalidate_page) { > + rcu_read_unlock(); > mn->ops->invalidate_page(mn, mm, address); Ditto structure can vanish since no existence guarantee exists. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/