Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754496AbbFKOX2 (ORCPT ); Thu, 11 Jun 2015 10:23:28 -0400 Received: from mail-qg0-f66.google.com ([209.85.192.66]:36319 "EHLO mail-qg0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752701AbbFKOX0 (ORCPT ); Thu, 11 Jun 2015 10:23:26 -0400 Date: Thu, 11 Jun 2015 10:23:14 -0400 From: Jerome Glisse To: Mark Hairgrove Cc: "akpm@linux-foundation.org" , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , Linus Torvalds , "joro@8bytes.org" , Mel Gorman , "H. Peter Anvin" , Peter Zijlstra , Andrea Arcangeli , Johannes Weiner , Larry Woodman , Rik van Riel , Dave Airlie , Brendan Conoboy , Joe Donohue , Duncan Poole , Sherry Cheung , Subhash Gutti , John Hubbard , Lucien Dunning , Cameron Buschardt , Arvind Gopalakrishnan , Haggai Eran , Shachar Raindel , Liran Liss , Roland Dreier , Ben Sander , Greg Stoner , John Bridgman , Michael Mantor , Paul Blinzer , Laurent Morichetti , Alexander Deucher , Oded Gabbay , =?iso-8859-1?B?Suly9G1l?= Glisse , Jatin Kumar , "linux-rdma@vger.kernel.org" Subject: Re: [PATCH 05/36] HMM: introduce heterogeneous memory management v3. Message-ID: <20150611142313.GA26195@gmail.com> References: <1432236705-4209-1-git-send-email-j.glisse@gmail.com> <1432236705-4209-6-git-send-email-j.glisse@gmail.com> <20150608211740.GA5241@gmail.com> <20150609155601.GA3101@gmail.com> <20150610154237.GA13465@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4861 Lines: 112 On Wed, Jun 10, 2015 at 06:15:08PM -0700, Mark Hairgrove wrote: [...] > > There is no race here, the mirror struct will only be freed once as again > > the list is a synchronization point. Whoever remove the mirror from the > > list is responsible to drop the list reference. > > > > In the fixed code the only thing that will happen twice is the ->release() > > callback. Even that can be work around to garanty it is call only once. > > > > Anyway i do not see anyrace here. > > > > The mirror lifetime is fine. The problem I see is with the device lifetime > on a multi-core system. Imagine this sequence: > > - On CPU1 the mm associated with the mirror is going down > - On CPU2 the driver unregisters the mirror then the device > > When this happens, the last device mutex_unlock on CPU1 is the only thing > preventing the free of the device in CPU2. That doesn't work, as described > in this thread: https://lkml.org/lkml/2013/12/2/997 > > Here's the full sequence again with mutex_unlock split apart. Hopefully > this shows the device_unregister problem more clearly: > > CPU1 (mm release) CPU2 (driver) > ---------------------- ---------------------- > hmm_notifier_release > down_write(&hmm->rwsem); > hlist_del_init(&mirror->mlist); > up_write(&hmm->rwsem); > > // CPU1 thread is preempted or > // something > hmm_mirror_unregister > hmm_mirror_kill > down_write(&hmm->rwsem); > // mirror removed by CPU1 already > // so hlist_unhashed returns 1 > up_write(&hmm->rwsem); > > hmm_mirror_unref(&mirror); > // Mirror ref now 1 > > // CPU2 thread is preempted or > // something > // CPU1 thread is scheduled > > hmm_mirror_unref(&mirror); > // Mirror ref now 0, cleanup > hmm_mirror_destroy(&mirror) > mutex_lock(&device->mutex); > list_del_init(&mirror->dlist); > device->ops->release(mirror); > kfree(mirror); > // CPU2 thread is scheduled, now > // both CPU1 and CPU2 are running > > hmm_device_unregister > mutex_lock(&device->mutex); > mutex_optimistic_spin() > mutex_unlock(&device->mutex); > [...] > __mutex_unlock_common_slowpath > // CPU2 releases lock > atomic_set(&lock->count, 1); > // Spinning CPU2 acquires now- > // free lock > // mutex_lock returns > // Device list empty > mutex_unlock(&device->mutex); > return 0; > kfree(hmm_device); > // CPU1 still accessing > // hmm_device->mutex in > //__mutex_unlock_common_slowpath Ok i see the race you are afraid of and really it is an unlikely one __mutex_unlock_common_slowpath() take a spinlock right after allowing other to take the mutex, when we are in your scenario there is no contention on that spinlock so it is taken right away and as there is no one in the mutex wait list then it goes directly to unlock the spinlock and return. You can ignore the debug function as if debugging is enabled than the mutex_lock() would need to also take the spinlock and thus you would have proper synchronization btw 2 thread thanks to the mutex.wait_lock. So basicly while CPU1 is going : spin_lock(mutex.wait_lock) if (!list_empty(mutex.wait_list)) { // wait_list is empty so branch not taken } spin_unlock(mutex.wait_lock) CPU2 would have to test the mirror list and mutex_unlock and return before the spin_unlock() of CPU1. This is a tight race, i can add a synchronize_rcu() to device_unregister after the mutex_unlock() so that we also add a grace period before the device is potentialy freed which should make that race completely unlikely. Moreover for something really bad to happen it would need that the freed memory to be reallocated right away by some other thread. Which really sound unlikely unless CPU1 is the slowest of all :) Cheers, J?r?me -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/