Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755218AbbIAD1a (ORCPT ); Mon, 31 Aug 2015 23:27:30 -0400 Received: from hqemgate14.nvidia.com ([216.228.121.143]:3254 "EHLO hqemgate14.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754796AbbIAD12 (ORCPT ); Mon, 31 Aug 2015 23:27:28 -0400 X-PGP-Universal: processed; by hqnvupgp08.nvidia.com on Mon, 31 Aug 2015 20:27:28 -0700 Date: Mon, 31 Aug 2015 20:27:17 -0700 From: Mark Hairgrove To: =?ISO-8859-15?Q?J=E9r=F4me_Glisse?= CC: , , , Linus Torvalds , , Mel Gorman , "H. Peter Anvin" , Peter Zijlstra , Andrea Arcangeli , Johannes Weiner , Larry Woodman , Rik van Riel , Dave Airlie , Brendan Conoboy , Joe Donohue , Christophe Harle , Duncan Poole , Sherry Cheung , Subhash Gutti , John Hubbard , Lucien Dunning , Cameron Buschardt , Arvind Gopalakrishnan , Haggai Eran , Shachar Raindel , Liran Liss , Roland Dreier , Ben Sander , Greg Stoner , John Bridgman , Michael Mantor , Paul Blinzer , Leonid Shamis , Laurent Morichetti , Alexander Deucher Subject: Re: [PATCH 02/15] mmu_notifier: keep track of active invalidation ranges v4 In-Reply-To: <1439493328-1028-3-git-send-email-jglisse@redhat.com> Message-ID: References: <1439493328-1028-1-git-send-email-jglisse@redhat.com> <1439493328-1028-3-git-send-email-jglisse@redhat.com> User-Agent: Alpine 2.00 (DEB 1167 2008-08-23) X-NVConfidentiality: public MIME-Version: 1.0 X-Originating-IP: [172.17.163.105] X-ClientProxiedBy: HQMAIL108.nvidia.com (172.18.146.13) To HQMAIL108.nvidia.com (172.18.146.13) Content-Type: multipart/mixed; boundary="8323329-360751024-1441078047=:18393" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6812 Lines: 197 --8323329-360751024-1441078047=:18393 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8BIT On Thu, 13 Aug 2015, Jérôme Glisse wrote: > The invalidate_range_start() and invalidate_range_end() can be > considered as forming an "atomic" section for the cpu page table > update point of view. Between this two function the cpu page > table content is unreliable for the address range being > invalidated. > > This patch use a structure define at all place doing range > invalidation. This structure is added to a list for the duration > of the update ie added with invalid_range_start() and removed > with invalidate_range_end(). > > Helpers allow querying if a range is valid and wait for it if > necessary. > > For proper synchronization, user must block any new range > invalidation from inside there invalidate_range_start() callback. s/there/their/ > Otherwise there is no garanty that a new range invalidation will s/garanty/guarantee/ > not be added after the call to the helper function to query for > existing range. > > [...] > > +/* mmu_notifier_range_is_valid_locked() - test if range overlap with active s/overlap/overlaps/ > + * invalidation. > + * > + * @mm: The mm struct. > + * @start: Start address of the range (inclusive). > + * @end: End address of the range (exclusive). > + * Returns: false if overlap with an active invalidation, true otherwise. > + * > + * This function test whether any active invalidated range conflict with a s/test/tests/ s/invalidated/invalidation/ s/conflict/conflicts/ > + * given range ([start, end[), active invalidation are added to a list inside end[ -> end] s/invalidation/invalidations/ > + * __mmu_notifier_invalidate_range_start() and removed from that list inside > + * __mmu_notifier_invalidate_range_end(). > + */ > +static bool mmu_notifier_range_is_valid_locked(struct mm_struct *mm, > + unsigned long start, > + unsigned long end) > +{ > + struct mmu_notifier_range *range; > + > + list_for_each_entry(range, &mm->mmu_notifier_mm->ranges, list) { > + if (range->end > start && range->start < end) > + return false; > + } > + return true; > +} > + > +/* mmu_notifier_range_is_valid() - test if range overlap with active s/overlap/overlaps/ > + * invalidation. > + * > + * @mm: The mm struct. > + * @start: Start address of the range (inclusive). > + * @end: End address of the range (exclusive). > + * > + * This function wait for any active range invalidation that conflict with the > + * given range, to end. See mmu_notifier_range_wait_valid() on how to use this > + * function properly. Bad copy/paste from range_wait_valid? mmu_notifier_range_is_valid just queries the state, it doesn't wait. > + */ > +bool mmu_notifier_range_is_valid(struct mm_struct *mm, > + unsigned long start, > + unsigned long end) > +{ > + bool valid; > + > + spin_lock(&mm->mmu_notifier_mm->lock); > + valid = mmu_notifier_range_is_valid_locked(mm, start, end); > + spin_unlock(&mm->mmu_notifier_mm->lock); > + return valid; > +} > +EXPORT_SYMBOL_GPL(mmu_notifier_range_is_valid); > + > +/* mmu_notifier_range_wait_valid() - wait for a range to have no conflict with > + * active invalidation. > + * > + * @mm: The mm struct. > + * @start: Start address of the range (inclusive). > + * @end: End address of the range (exclusive). > + * > + * This function wait for any active range invalidation that conflict with the > + * given range, to end. > + * > + * Note by the time this function return a new range invalidation that conflict > + * might have started. So you need to atomically block new range and query > + * again if range is still valid with mmu_notifier_range_is_valid(). So call > + * sequence should be : > + * > + * again: > + * mmu_notifier_range_wait_valid() > + * // block new invalidation using that lock inside your range_start callback > + * lock_block_new_invalidation() > + * if (!mmu_notifier_range_is_valid()) > + * goto again; > + * unlock() I think this example sequence can deadlock so I wouldn't want to encourage its use. New invalidation regions are added to the list before the range_start callback is invoked. Thread A Thread B ----------------- ----------------- mmu_notifier_range_wait_valid // returns __mmu_notifier_invalidate_range_start list_add_tail lock_block_new_invalidation ->invalidate_range_start // invalidation blocked in callback mmu_notifier_range_is_valid // fails goto again mmu_notifier_range_wait_valid // deadlock mmu_notifier_range_wait_valid can't finish until thread B's callback returns, but thread B's callback can't return because it's blocked. I see that HMM in later patches takes the approach of not holding the lock when mmu_notifier_range_is_valid returns false. Instead of stalling new invalidations it returns -EAGAIN to the caller. While that resolves the deadlock, it won't prevent the faulting thread from being starved in the pathological case. Is it out of the question to build a lock into the mmu notifier API directly? It's a little worrisome to me that the complexity for this locking is pushed into the callbacks rather than handled in the core. Something like this: mmu_notifier_range_lock(start, end) mmu_notifier_range_unlock(start, end) If that's not feasible and we have to stick with the current approach, then I suggest changing the "valid" name. "valid" doesn't have a clear meaning at first glance because the reader doesn't know what would make a range "valid." How about "active" instead? Then the names would look something like this, assuming the polarity matches their current versions: mmu_notifier_range_inactive_locked mmu_notifier_range_inactive mmu_notifier_range_wait_active > + */ > +void mmu_notifier_range_wait_valid(struct mm_struct *mm, > + unsigned long start, > + unsigned long end) > +{ > + spin_lock(&mm->mmu_notifier_mm->lock); > + while (!mmu_notifier_range_is_valid_locked(mm, start, end)) { > + int nranges = mm->mmu_notifier_mm->nranges; > + > + spin_unlock(&mm->mmu_notifier_mm->lock); > + wait_event(mm->mmu_notifier_mm->wait_queue, > + nranges != mm->mmu_notifier_mm->nranges); > + spin_lock(&mm->mmu_notifier_mm->lock); > + } > + spin_unlock(&mm->mmu_notifier_mm->lock); > +} > +EXPORT_SYMBOL_GPL(mmu_notifier_range_wait_valid); > + --8323329-360751024-1441078047=:18393-- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/