Received: by 2002:a25:8b12:0:0:0:0:0 with SMTP id i18csp1373757ybl; Wed, 14 Aug 2019 16:00:16 -0700 (PDT) X-Google-Smtp-Source: APXvYqynWXqk+FQ8Rwpov1oaNBY8uAeyNyh/l9y3Xbi9m2wggV9mZbCXDIdE05YZszxCx4i0RGo5 X-Received: by 2002:a62:6c1:: with SMTP id 184mr2307448pfg.230.1565823616008; Wed, 14 Aug 2019 16:00:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1565823615; cv=none; d=google.com; s=arc-20160816; b=LB6vQJXG7isPGL/nBYM7sIrGz0Ct8FLWIkElW+b+r1s+X633j0Tt1IRdVTFcSEzKWH UgFg9naIEDyjrGqduVEfeAXaLW/f+cH5Ev5XdXNBVE8tHUKojd8ocswap3B6oUMPnFsB RYeQzk++3hLq8aLTChZqRECY04E35+u0hpd5/t/kyTO2yRHAU4Gz9+/Am/O78gDT0uhM pAwVwru7evg7PlGAxnZQrONofX6TsLpDI5Z9ts6J5LoyXwusCyj+buEelNTVwYzk1z/5 Hc3GrYRWLkGZh3bQGcRPJj4DWqdXQEI+K14X2BJjPoCP6m6slAoGpQhSA/Q4kETmqPl/ W1nQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:dkim-signature:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=7OCMNAz4kjek+/81HtdyqfLdRerNt7gJqZQ4Qne52lg=; b=c3nvzDYbqWe8TBlsuF71V7MW4mDrd5WhYOETSYeeawqj4+8CJyYM9dCiS2i4k+O7i8 Ze1+WfI9cH8yTHw0+Qw4bEMDq12fvaKh//uuSndP7B8us05vpRSxzMEcZGTYYbdy5bqF juazmexqnh1VdEBqNJmP+B1iW1ApB4dcp7KM1+g3cLL9R+unS2+OjY0MKCN/s0eVYEcw 2amiVGs4MBxeBVVuae9frMe4rzqY6LxkWzqF0Z8AmwgFHnFcltuKAgyFuOOEbCz0aAyY WzN0xDrLc/sfWjUQ9GbrRF//VXPFejq8kCqQrGllxIUW7RYeMJyZLLVAbclmwTOek0QH rW3w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@nvidia.com header.s=n1 header.b=EOIv4DTJ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=nvidia.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j1si727260pld.241.2019.08.14.15.59.58; Wed, 14 Aug 2019 16:00:15 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@nvidia.com header.s=n1 header.b=EOIv4DTJ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=nvidia.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729434AbfHNVUg (ORCPT + 99 others); Wed, 14 Aug 2019 17:20:36 -0400 Received: from hqemgate14.nvidia.com ([216.228.121.143]:9449 "EHLO hqemgate14.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728014AbfHNVUf (ORCPT ); Wed, 14 Aug 2019 17:20:35 -0400 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqemgate14.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Wed, 14 Aug 2019 14:20:35 -0700 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Wed, 14 Aug 2019 14:20:33 -0700 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Wed, 14 Aug 2019 14:20:33 -0700 Received: from rcampbell-dev.nvidia.com (10.124.1.5) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Wed, 14 Aug 2019 21:20:32 +0000 Subject: Re: [PATCH v3 hmm 03/11] mm/mmu_notifiers: add a get/put scheme for the registration To: Jason Gunthorpe , CC: Andrea Arcangeli , Christoph Hellwig , John Hubbard , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , "Kuehling, Felix" , "Alex Deucher" , =?UTF-8?Q?Christian_K=c3=b6nig?= , "David (ChunMing) Zhou" , Dimitri Sivanich , , , , , , , Gavin Shan , Andrea Righi , Jason Gunthorpe , Christoph Hellwig References: <20190806231548.25242-1-jgg@ziepe.ca> <20190806231548.25242-4-jgg@ziepe.ca> X-Nvconfidentiality: public From: Ralph Campbell Message-ID: <0a23adb8-b827-cd90-503e-bfa84166c67e@nvidia.com> Date: Wed, 14 Aug 2019 14:20:31 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.0 MIME-Version: 1.0 In-Reply-To: <20190806231548.25242-4-jgg@ziepe.ca> X-Originating-IP: [10.124.1.5] X-ClientProxiedBy: HQMAIL105.nvidia.com (172.20.187.12) To HQMAIL107.nvidia.com (172.20.187.13) Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1565817635; bh=7OCMNAz4kjek+/81HtdyqfLdRerNt7gJqZQ4Qne52lg=; h=X-PGP-Universal:Subject:To:CC:References:X-Nvconfidentiality:From: Message-ID:Date:User-Agent:MIME-Version:In-Reply-To: X-Originating-IP:X-ClientProxiedBy:Content-Type:Content-Language: Content-Transfer-Encoding; b=EOIv4DTJn8jZudgokayR+GYl1EOeCIQHooKpkbcgJB4vkLSxjQ9d39IX4h5BMVwUf bFZ5snH5CJPHtTh6SSbZNce/7dp38DijAM/U8YN+14Gjt0lYd3DfvZwwugYwBOfpPH GheFSPyNDYdEhXVjhKs0jmNTTFEbuEZJ3IdefBAsE+ZmNH9oFHxaANX1QENpP7KoD3 MYeAGog5Dt8ukXxAD+HZb68jVptbfXU7Lu1DdGqad07hnpZe7H+H5BlqK5LPsDBEtU 46lWIb9BOL0o16abMvl7UQ/p6YNYoBKJ7tid+FXzvvPNXay4uIjcAXB28zI0zkOoiX MFFYiDmIotyvA== Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 8/6/19 4:15 PM, Jason Gunthorpe wrote: > From: Jason Gunthorpe > > Many places in the kernel have a flow where userspace will create some > object and that object will need to connect to the subsystem's > mmu_notifier subscription for the duration of its lifetime. > > In this case the subsystem is usually tracking multiple mm_structs and it > is difficult to keep track of what struct mmu_notifier's have been > allocated for what mm's. > > Since this has been open coded in a variety of exciting ways, provide core > functionality to do this safely. > > This approach uses the strct mmu_notifier_ops * as a key to determine if s/strct/struct > the subsystem has a notifier registered on the mm or not. If there is a > registration then the existing notifier struct is returned, otherwise the > ops->alloc_notifiers() is used to create a new per-subsystem notifier for > the mm. > > The destroy side incorporates an async call_srcu based destruction which > will avoid bugs in the callers such as commit 6d7c3cde93c1 ("mm/hmm: fix > use after free with struct hmm in the mmu notifiers"). > > Since we are inside the mmu notifier core locking is fairly simple, the > allocation uses the same approach as for mmu_notifier_mm, the write side > of the mmap_sem makes everything deterministic and we only need to do > hlist_add_head_rcu() under the mm_take_all_locks(). The new users count > and the discoverability in the hlist is fully serialized by the > mmu_notifier_mm->lock. > > Co-developed-by: Christoph Hellwig > Signed-off-by: Christoph Hellwig > Signed-off-by: Jason Gunthorpe Reviewed-by: Ralph Campbell > --- > include/linux/mmu_notifier.h | 35 ++++++++ > mm/mmu_notifier.c | 156 +++++++++++++++++++++++++++++++++-- > 2 files changed, 185 insertions(+), 6 deletions(-) > > diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h > index b6c004bd9f6ad9..31aa971315a142 100644 > --- a/include/linux/mmu_notifier.h > +++ b/include/linux/mmu_notifier.h > @@ -211,6 +211,19 @@ struct mmu_notifier_ops { > */ > void (*invalidate_range)(struct mmu_notifier *mn, struct mm_struct *mm, > unsigned long start, unsigned long end); > + > + /* > + * These callbacks are used with the get/put interface to manage the > + * lifetime of the mmu_notifier memory. alloc_notifier() returns a new > + * notifier for use with the mm. > + * > + * free_notifier() is only called after the mmu_notifier has been > + * fully put, calls to any ops callback are prevented and no ops > + * callbacks are currently running. It is called from a SRCU callback > + * and cannot sleep. > + */ > + struct mmu_notifier *(*alloc_notifier)(struct mm_struct *mm); > + void (*free_notifier)(struct mmu_notifier *mn); > }; > > /* > @@ -227,6 +240,9 @@ struct mmu_notifier_ops { > struct mmu_notifier { > struct hlist_node hlist; > const struct mmu_notifier_ops *ops; > + struct mm_struct *mm; > + struct rcu_head rcu; > + unsigned int users; > }; > > static inline int mm_has_notifiers(struct mm_struct *mm) > @@ -234,6 +250,21 @@ static inline int mm_has_notifiers(struct mm_struct *mm) > return unlikely(mm->mmu_notifier_mm); > } > > +struct mmu_notifier *mmu_notifier_get_locked(const struct mmu_notifier_ops *ops, > + struct mm_struct *mm); > +static inline struct mmu_notifier * > +mmu_notifier_get(const struct mmu_notifier_ops *ops, struct mm_struct *mm) > +{ > + struct mmu_notifier *ret; > + > + down_write(&mm->mmap_sem); > + ret = mmu_notifier_get_locked(ops, mm); > + up_write(&mm->mmap_sem); > + return ret; > +} > +void mmu_notifier_put(struct mmu_notifier *mn); > +void mmu_notifier_synchronize(void); > + > extern int mmu_notifier_register(struct mmu_notifier *mn, > struct mm_struct *mm); > extern int __mmu_notifier_register(struct mmu_notifier *mn, > @@ -581,6 +612,10 @@ static inline void mmu_notifier_mm_destroy(struct mm_struct *mm) > #define pudp_huge_clear_flush_notify pudp_huge_clear_flush > #define set_pte_at_notify set_pte_at > > +static inline void mmu_notifier_synchronize(void) > +{ > +} > + > #endif /* CONFIG_MMU_NOTIFIER */ > > #endif /* _LINUX_MMU_NOTIFIER_H */ > diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c > index 696810f632ade1..4a770b5211b71d 100644 > --- a/mm/mmu_notifier.c > +++ b/mm/mmu_notifier.c > @@ -248,6 +248,9 @@ int __mmu_notifier_register(struct mmu_notifier *mn, struct mm_struct *mm) > lockdep_assert_held_write(&mm->mmap_sem); > BUG_ON(atomic_read(&mm->mm_users) <= 0); > > + mn->mm = mm; > + mn->users = 1; > + > if (!mm->mmu_notifier_mm) { > /* > * kmalloc cannot be called under mm_take_all_locks(), but we > @@ -295,18 +298,24 @@ int __mmu_notifier_register(struct mmu_notifier *mn, struct mm_struct *mm) > } > EXPORT_SYMBOL_GPL(__mmu_notifier_register); > > -/* > +/** > + * mmu_notifier_register - Register a notifier on a mm > + * @mn: The notifier to attach > + * @mm : The mm to attach the notifier to Why the space before the colon? I know, super tiny nit. > + * > * Must not hold mmap_sem nor any other VM related lock when calling > * this registration function. Must also ensure mm_users can't go down > * to zero while this runs to avoid races with mmu_notifier_release, > * so mm has to be current->mm or the mm should be pinned safely such > * as with get_task_mm(). If the mm is not current->mm, the mm_users > * pin should be released by calling mmput after mmu_notifier_register > - * returns. mmu_notifier_unregister must be always called to > - * unregister the notifier. mm_count is automatically pinned to allow > - * mmu_notifier_unregister to safely run at any time later, before or > - * after exit_mmap. ->release will always be called before exit_mmap > - * frees the pages. > + * returns. > + * > + * mmu_notifier_unregister() or mmu_notifier_put() must be always called to > + * unregister the notifier. > + * > + * While the caller has a mmu_notifer get the mm pointer will remain valid, s/mmu_notifer/mmu_notifier Maybe "While the caller has a mmu_notifier, the mn->mm pointer will remain valid," > + * and can be converted to an active mm pointer via mmget_not_zero(). > */ > int mmu_notifier_register(struct mmu_notifier *mn, struct mm_struct *mm) > { > @@ -319,6 +328,72 @@ int mmu_notifier_register(struct mmu_notifier *mn, struct mm_struct *mm) > } > EXPORT_SYMBOL_GPL(mmu_notifier_register); > > +static struct mmu_notifier * > +find_get_mmu_notifier(struct mm_struct *mm, const struct mmu_notifier_ops *ops) > +{ > + struct mmu_notifier *mn; > + > + spin_lock(&mm->mmu_notifier_mm->lock); > + hlist_for_each_entry_rcu (mn, &mm->mmu_notifier_mm->list, hlist) { > + if (mn->ops != ops) > + continue; > + > + if (likely(mn->users != UINT_MAX)) > + mn->users++; > + else > + mn = ERR_PTR(-EOVERFLOW); > + spin_unlock(&mm->mmu_notifier_mm->lock); > + return mn; > + } > + spin_unlock(&mm->mmu_notifier_mm->lock); > + return NULL; > +} > + > +/** > + * mmu_notifier_get_locked - Return the single struct mmu_notifier for > + * the mm & ops > + * @ops: The operations struct being subscribe with > + * @mm : The mm to attach notifiers too > + * > + * This function either allocates a new mmu_notifier via > + * ops->alloc_notifier(), or returns an already existing notifier on the > + * list. The value of the ops pointer is used to determine when two notifiers > + * are the same. > + * > + * Each call to mmu_notifier_get() must be paired with a caller to s/caller/call > + * mmu_notifier_put(). The caller must hold the write side of mm->mmap_sem. > + * > + * While the caller has a mmu_notifer get the mm pointer will remain valid, same as "mmu_notifer" above. > + * and can be converted to an active mm pointer via mmget_not_zero(). > + */ > +struct mmu_notifier *mmu_notifier_get_locked(const struct mmu_notifier_ops *ops, > + struct mm_struct *mm) > +{ > + struct mmu_notifier *mn; > + int ret; > + > + lockdep_assert_held_write(&mm->mmap_sem); > + > + if (mm->mmu_notifier_mm) { > + mn = find_get_mmu_notifier(mm, ops); > + if (mn) > + return mn; > + } > + > + mn = ops->alloc_notifier(mm); > + if (IS_ERR(mn)) > + return mn; > + mn->ops = ops; > + ret = __mmu_notifier_register(mn, mm); > + if (ret) > + goto out_free; > + return mn; > +out_free: > + mn->ops->free_notifier(mn); > + return ERR_PTR(ret); > +} > +EXPORT_SYMBOL_GPL(mmu_notifier_get_locked); > + > /* this is called after the last mmu_notifier_unregister() returned */ > void __mmu_notifier_mm_destroy(struct mm_struct *mm) > { > @@ -397,6 +472,75 @@ void mmu_notifier_unregister_no_release(struct mmu_notifier *mn, > } > EXPORT_SYMBOL_GPL(mmu_notifier_unregister_no_release); > > +static void mmu_notifier_free_rcu(struct rcu_head *rcu) > +{ > + struct mmu_notifier *mn = container_of(rcu, struct mmu_notifier, rcu); > + struct mm_struct *mm = mn->mm; > + > + mn->ops->free_notifier(mn); > + /* Pairs with the get in __mmu_notifier_register() */ > + mmdrop(mm); > +} > + > +/** > + * mmu_notifier_put - Release the reference on the notifier > + * @mn: The notifier to act on > + * > + * This function must be paired with each mmu_notifier_get(), it releases the > + * reference obtained by the get. If this is the last reference then process > + * to free the notifier will be run asynchronously. > + * > + * Unlike mmu_notifier_unregister() the get/put flow only calls ops->release > + * when the mm_struct is destroyed. Instead free_notifier is always called to > + * release any resources held by the user. > + * > + * As ops->release is not guaranteed to be called, the user must ensure that > + * all sptes are dropped, and no new sptes can be established before > + * mmu_notifier_put() is called. > + * > + * This function can be called from the ops->release callback, however the > + * caller must still ensure it is called pairwise with mmu_notifier_get(). > + * > + * Modules calling this function must call mmu_notifier_module_unload() in I think you mean mmu_notifier_synchronize(). > + * their __exit functions to ensure the async work is completed. > + */ > +void mmu_notifier_put(struct mmu_notifier *mn) > +{ > + struct mm_struct *mm = mn->mm; > + > + spin_lock(&mm->mmu_notifier_mm->lock); > + if (WARN_ON(!mn->users) || --mn->users) > + goto out_unlock; > + hlist_del_init_rcu(&mn->hlist); > + spin_unlock(&mm->mmu_notifier_mm->lock); > + > + call_srcu(&srcu, &mn->rcu, mmu_notifier_free_rcu); > + return; > + > +out_unlock: > + spin_unlock(&mm->mmu_notifier_mm->lock); > +} > +EXPORT_SYMBOL_GPL(mmu_notifier_put); > + > +/** > + * mmu_notifier_synchronize - Ensure all mmu_notifiers are freed > + * > + * This function ensures that all outsanding async SRU work from s/outsanding/outstanding > + * mmu_notifier_put() is completed. After it returns any mmu_notifier_ops > + * associated with an unused mmu_notifier will no longer be called. > + * > + * Before using the caller must ensure that all of its mmu_notifiers have been > + * fully released via mmu_notifier_put(). > + * > + * Modules using the mmu_notifier_put() API should call this in their __exit > + * function to avoid module unloading races. > + */ > +void mmu_notifier_synchronize(void) > +{ > + synchronize_srcu(&srcu); > +} > +EXPORT_SYMBOL_GPL(mmu_notifier_synchronize); > + > bool > mmu_notifier_range_update_to_read_only(const struct mmu_notifier_range *range) > { >