From: Jason Gunthorpe <[email protected]>
This series introduces a new registration flow for mmu_notifiers based on
the idea that the user would like to get a single refcounted piece of
memory for a mm, keyed to its use.
For instance many users of mmu_notifiers use an interval tree or similar
to dispatch notifications to some object. There are many objects but only
one notifier subscription per mm holding the tree.
Of the 12 places that call mmu_notifier_register:
- 7 are maintaining some kind of obvious mapping of mm_struct to
mmu_notifier registration, ie in some linked list or hash table. Of
the 7 this series converts 4 (gru, hmm, RDMA, radeon)
- 3 (hfi1, gntdev, vhost) are registering multiple notifiers, but each
one immediately does some VA range filtering, ie with an interval tree.
These would be better with a global subsystem-wide range filter and
could convert to this API.
- 2 (kvm, amd_iommu) are deliberately using a single mm at a time, and
really can't use this API. One of the intel-svm's modes is also in this
list
The 3/7 unconverted drivers are:
- intel-svm
This driver tracks mm's in a global linked list 'global_svm_list'
and would benefit from this API.
Its flow is a bit complex, since it also wants a set of non-shared
notifiers.
- i915_gem_usrptr
This driver tracks mm's in a per-device hash
table (dev_priv->mm_structs), but only has an optional use of
mmu_notifiers. Since it still seems to need the hash table it is
difficult to convert.
- amdkfd/kfd_process
This driver is using a global SRCU hash table to track mm's
The control flow here is very complicated and the driver is relying on
this hash table to be fast on the ioctl syscall path.
It would definitely benefit, but only if the ioctl path didn't need to
do the search so often.
This series is already entangled with patches in the hmm & RDMA tree and
will require some git topic branches for the RDMA ODP stuff. I intend for
it to go through the hmm tree.
There is a git version here:
https://github.com/jgunthorpe/linux/commits/mmu_notifier
Which has the required pre-patches for the RDMA ODP conversion that are
still being reviewed.
Jason Gunthorpe (11):
mm/mmu_notifiers: hoist do_mmu_notifier_register down_write to the
caller
mm/mmu_notifiers: do not speculatively allocate a mmu_notifier_mm
mm/mmu_notifiers: add a get/put scheme for the registration
misc/sgi-gru: use mmu_notifier_get/put for struct gru_mm_struct
hmm: use mmu_notifier_get/put for 'struct hmm'
RDMA/odp: use mmu_notifier_get/put for 'struct ib_ucontext_per_mm'
RDMA/odp: remove ib_ucontext from ib_umem
drm/radeon: use mmu_notifier_get/put for struct radeon_mn
drm/amdkfd: fix a use after free race with mmu_notifer unregister
drm/amdkfd: use mmu_notifier_put
mm/mmu_notifiers: remove unregister_no_release
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 +
drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 3 -
drivers/gpu/drm/amd/amdkfd/kfd_process.c | 88 ++++-----
drivers/gpu/drm/nouveau/nouveau_drm.c | 3 +
drivers/gpu/drm/radeon/radeon.h | 3 -
drivers/gpu/drm/radeon/radeon_device.c | 2 -
drivers/gpu/drm/radeon/radeon_drv.c | 2 +
drivers/gpu/drm/radeon/radeon_mn.c | 157 ++++------------
drivers/infiniband/core/umem.c | 4 +-
drivers/infiniband/core/umem_odp.c | 183 ++++++------------
drivers/infiniband/core/uverbs_cmd.c | 3 -
drivers/infiniband/core/uverbs_main.c | 1 +
drivers/infiniband/hw/mlx5/main.c | 5 -
drivers/misc/sgi-gru/grufile.c | 1 +
drivers/misc/sgi-gru/grutables.h | 2 -
drivers/misc/sgi-gru/grutlbpurge.c | 84 +++------
include/linux/hmm.h | 12 +-
include/linux/mm_types.h | 6 -
include/linux/mmu_notifier.h | 40 +++-
include/rdma/ib_umem.h | 2 +-
include/rdma/ib_umem_odp.h | 10 +-
include/rdma/ib_verbs.h | 3 -
kernel/fork.c | 1 -
mm/hmm.c | 121 +++---------
mm/mmu_notifier.c | 230 +++++++++++++++++------
25 files changed, 408 insertions(+), 559 deletions(-)
--
2.22.0
From: Jason Gunthorpe <[email protected]>
mmu_notifier_unregister_no_release() and mmu_notifier_call_srcu() no
longer have any users, they have all been converted to use
mmu_notifier_put().
So delete this difficult to use interface.
Signed-off-by: Jason Gunthorpe <[email protected]>
---
include/linux/mmu_notifier.h | 5 -----
mm/mmu_notifier.c | 31 -------------------------------
2 files changed, 36 deletions(-)
diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index 31aa971315a142..52929e5ef70826 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -271,8 +271,6 @@ extern int __mmu_notifier_register(struct mmu_notifier *mn,
struct mm_struct *mm);
extern void mmu_notifier_unregister(struct mmu_notifier *mn,
struct mm_struct *mm);
-extern void mmu_notifier_unregister_no_release(struct mmu_notifier *mn,
- struct mm_struct *mm);
extern void __mmu_notifier_mm_destroy(struct mm_struct *mm);
extern void __mmu_notifier_release(struct mm_struct *mm);
extern int __mmu_notifier_clear_flush_young(struct mm_struct *mm,
@@ -513,9 +511,6 @@ static inline void mmu_notifier_range_init(struct mmu_notifier_range *range,
set_pte_at(___mm, ___address, __ptep, ___pte); \
})
-extern void mmu_notifier_call_srcu(struct rcu_head *rcu,
- void (*func)(struct rcu_head *rcu));
-
#else /* CONFIG_MMU_NOTIFIER */
struct mmu_notifier_range {
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index 4a770b5211b71d..2ec48f8ba9e288 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -21,18 +21,6 @@
/* global SRCU for all MMs */
DEFINE_STATIC_SRCU(srcu);
-/*
- * This function allows mmu_notifier::release callback to delay a call to
- * a function that will free appropriate resources. The function must be
- * quick and must not block.
- */
-void mmu_notifier_call_srcu(struct rcu_head *rcu,
- void (*func)(struct rcu_head *rcu))
-{
- call_srcu(&srcu, rcu, func);
-}
-EXPORT_SYMBOL_GPL(mmu_notifier_call_srcu);
-
/*
* This function can't run concurrently against mmu_notifier_register
* because mm->mm_users > 0 during mmu_notifier_register and exit_mmap
@@ -453,25 +441,6 @@ void mmu_notifier_unregister(struct mmu_notifier *mn, struct mm_struct *mm)
}
EXPORT_SYMBOL_GPL(mmu_notifier_unregister);
-/*
- * Same as mmu_notifier_unregister but no callback and no srcu synchronization.
- */
-void mmu_notifier_unregister_no_release(struct mmu_notifier *mn,
- struct mm_struct *mm)
-{
- spin_lock(&mm->mmu_notifier_mm->lock);
- /*
- * Can not use list_del_rcu() since __mmu_notifier_release
- * can delete it before we hold the lock.
- */
- hlist_del_init_rcu(&mn->hlist);
- spin_unlock(&mm->mmu_notifier_mm->lock);
-
- BUG_ON(atomic_read(&mm->mm_count) <= 0);
- mmdrop(mm);
-}
-EXPORT_SYMBOL_GPL(mmu_notifier_unregister_no_release);
-
static void mmu_notifier_free_rcu(struct rcu_head *rcu)
{
struct mmu_notifier *mn = container_of(rcu, struct mmu_notifier, rcu);
--
2.22.0
From: Jason Gunthorpe <[email protected]>
When using mmu_notifer_unregister_no_release() the caller must ensure
there is a SRCU synchronize before the mn memory is freed, otherwise use
after free races are possible, for instance:
CPU0 CPU1
invalidate_range_start
hlist_for_each_entry_rcu(..)
mmu_notifier_unregister_no_release(&p->mn)
kfree(mn)
if (mn->ops->invalidate_range_end)
The error unwind in amdkfd misses the SRCU synchronization.
amdkfd keeps the kfd_process around until the mm is released, so split the
flow to fully initialize the kfd_process and register it for find_process,
and with the notifier. Past this point the kfd_process does not need to be
cleaned up as it is fully ready.
The final failable step does a vm_mmap() and does not seem to impact the
kfd_process global state. Since it also cannot be undone (and already has
problems with undo if it internally fails), it has to be last.
This way we don't have to try to unwind the mmu_notifier_register() and
avoid the problem with the SRCU.
Along the way this also fixes various other error unwind bugs in the flow.
Fixes: 45102048f77e ("amdkfd: Add process queue manager module")
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
---
drivers/gpu/drm/amd/amdkfd/kfd_process.c | 78 +++++++++++-------------
1 file changed, 37 insertions(+), 41 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 8f1076c0c88a25..c06e6190f21ffa 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -62,8 +62,8 @@ static struct workqueue_struct *kfd_restore_wq;
static struct kfd_process *find_process(const struct task_struct *thread);
static void kfd_process_ref_release(struct kref *ref);
-static struct kfd_process *create_process(const struct task_struct *thread,
- struct file *filep);
+static struct kfd_process *create_process(const struct task_struct *thread);
+static int kfd_process_init_cwsr_apu(struct kfd_process *p, struct file *filep);
static void evict_process_worker(struct work_struct *work);
static void restore_process_worker(struct work_struct *work);
@@ -289,7 +289,15 @@ struct kfd_process *kfd_create_process(struct file *filep)
if (process) {
pr_debug("Process already found\n");
} else {
- process = create_process(thread, filep);
+ process = create_process(thread);
+ if (IS_ERR(process))
+ goto out;
+
+ ret = kfd_process_init_cwsr_apu(process, filep);
+ if (ret) {
+ process = ERR_PTR(ret);
+ goto out;
+ }
if (!procfs.kobj)
goto out;
@@ -609,81 +617,69 @@ static int kfd_process_device_init_cwsr_dgpu(struct kfd_process_device *pdd)
return 0;
}
-static struct kfd_process *create_process(const struct task_struct *thread,
- struct file *filep)
+/*
+ * On return the kfd_process is fully operational and will be freed when the
+ * mm is released
+ */
+static struct kfd_process *create_process(const struct task_struct *thread)
{
struct kfd_process *process;
int err = -ENOMEM;
process = kzalloc(sizeof(*process), GFP_KERNEL);
-
if (!process)
goto err_alloc_process;
- process->pasid = kfd_pasid_alloc();
- if (process->pasid == 0)
- goto err_alloc_pasid;
-
- if (kfd_alloc_process_doorbells(process) < 0)
- goto err_alloc_doorbells;
-
kref_init(&process->ref);
-
mutex_init(&process->mutex);
-
process->mm = thread->mm;
-
- /* register notifier */
- process->mmu_notifier.ops = &kfd_process_mmu_notifier_ops;
- err = mmu_notifier_register(&process->mmu_notifier, process->mm);
- if (err)
- goto err_mmu_notifier;
-
- hash_add_rcu(kfd_processes_table, &process->kfd_processes,
- (uintptr_t)process->mm);
-
process->lead_thread = thread->group_leader;
- get_task_struct(process->lead_thread);
-
INIT_LIST_HEAD(&process->per_device_data);
-
+ INIT_DELAYED_WORK(&process->eviction_work, evict_process_worker);
+ INIT_DELAYED_WORK(&process->restore_work, restore_process_worker);
+ process->last_restore_timestamp = get_jiffies_64();
kfd_event_init_process(process);
+ process->is_32bit_user_mode = in_compat_syscall();
+
+ process->pasid = kfd_pasid_alloc();
+ if (process->pasid == 0)
+ goto err_alloc_pasid;
+
+ if (kfd_alloc_process_doorbells(process) < 0)
+ goto err_alloc_doorbells;
err = pqm_init(&process->pqm, process);
if (err != 0)
goto err_process_pqm_init;
/* init process apertures*/
- process->is_32bit_user_mode = in_compat_syscall();
err = kfd_init_apertures(process);
if (err != 0)
goto err_init_apertures;
- INIT_DELAYED_WORK(&process->eviction_work, evict_process_worker);
- INIT_DELAYED_WORK(&process->restore_work, restore_process_worker);
- process->last_restore_timestamp = get_jiffies_64();
-
- err = kfd_process_init_cwsr_apu(process, filep);
+ /* Must be last, have to use release destruction after this */
+ process->mmu_notifier.ops = &kfd_process_mmu_notifier_ops;
+ err = mmu_notifier_register(&process->mmu_notifier, process->mm);
if (err)
- goto err_init_cwsr;
+ goto err_register_notifier;
+
+ get_task_struct(process->lead_thread);
+ hash_add_rcu(kfd_processes_table, &process->kfd_processes,
+ (uintptr_t)process->mm);
return process;
-err_init_cwsr:
+err_register_notifier:
kfd_process_free_outstanding_kfd_bos(process);
kfd_process_destroy_pdds(process);
err_init_apertures:
pqm_uninit(&process->pqm);
err_process_pqm_init:
- hash_del_rcu(&process->kfd_processes);
- synchronize_rcu();
- mmu_notifier_unregister_no_release(&process->mmu_notifier, process->mm);
-err_mmu_notifier:
- mutex_destroy(&process->mutex);
kfd_free_process_doorbells(process);
err_alloc_doorbells:
kfd_pasid_free(process->pasid);
err_alloc_pasid:
+ mutex_destroy(&process->mutex);
kfree(process);
err_alloc_process:
return ERR_PTR(err);
--
2.22.0
From: Jason Gunthorpe <[email protected]>
This simplifies the code to not have so many one line functions and extra
logic. __mmu_notifier_register() simply becomes the entry point to
register the notifier, and the other one calls it under lock.
Also add a lockdep_assert to check that the callers are holding the lock
as expected.
Suggested-by: Christoph Hellwig <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
---
mm/mmu_notifier.c | 35 ++++++++++++++---------------------
1 file changed, 14 insertions(+), 21 deletions(-)
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index b5670620aea0fc..218a6f108bc2d0 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -236,22 +236,22 @@ void __mmu_notifier_invalidate_range(struct mm_struct *mm,
}
EXPORT_SYMBOL_GPL(__mmu_notifier_invalidate_range);
-static int do_mmu_notifier_register(struct mmu_notifier *mn,
- struct mm_struct *mm,
- int take_mmap_sem)
+/*
+ * Same as mmu_notifier_register but here the caller must hold the
+ * mmap_sem in write mode.
+ */
+int __mmu_notifier_register(struct mmu_notifier *mn, struct mm_struct *mm)
{
struct mmu_notifier_mm *mmu_notifier_mm;
int ret;
+ lockdep_assert_held_write(&mm->mmap_sem);
BUG_ON(atomic_read(&mm->mm_users) <= 0);
- ret = -ENOMEM;
mmu_notifier_mm = kmalloc(sizeof(struct mmu_notifier_mm), GFP_KERNEL);
if (unlikely(!mmu_notifier_mm))
- goto out;
+ return -ENOMEM;
- if (take_mmap_sem)
- down_write(&mm->mmap_sem);
ret = mm_take_all_locks(mm);
if (unlikely(ret))
goto out_clean;
@@ -279,13 +279,11 @@ static int do_mmu_notifier_register(struct mmu_notifier *mn,
mm_drop_all_locks(mm);
out_clean:
- if (take_mmap_sem)
- up_write(&mm->mmap_sem);
kfree(mmu_notifier_mm);
-out:
BUG_ON(atomic_read(&mm->mm_users) <= 0);
return ret;
}
+EXPORT_SYMBOL_GPL(__mmu_notifier_register);
/*
* Must not hold mmap_sem nor any other VM related lock when calling
@@ -302,19 +300,14 @@ static int do_mmu_notifier_register(struct mmu_notifier *mn,
*/
int mmu_notifier_register(struct mmu_notifier *mn, struct mm_struct *mm)
{
- return do_mmu_notifier_register(mn, mm, 1);
-}
-EXPORT_SYMBOL_GPL(mmu_notifier_register);
+ int ret;
-/*
- * Same as mmu_notifier_register but here the caller must hold the
- * mmap_sem in write mode.
- */
-int __mmu_notifier_register(struct mmu_notifier *mn, struct mm_struct *mm)
-{
- return do_mmu_notifier_register(mn, mm, 0);
+ down_write(&mm->mmap_sem);
+ ret = __mmu_notifier_register(mn, mm);
+ up_write(&mm->mmap_sem);
+ return ret;
}
-EXPORT_SYMBOL_GPL(__mmu_notifier_register);
+EXPORT_SYMBOL_GPL(mmu_notifier_register);
/* this is called after the last mmu_notifier_unregister() returned */
void __mmu_notifier_mm_destroy(struct mm_struct *mm)
--
2.22.0
From: Jason Gunthorpe <[email protected]>
GRU is already using almost the same algorithm as get/put, it even
helpfully has a 10 year old comment to make this algorithm common, which
is finally happening.
There are a few differences and fixes from this conversion:
- GRU used rcu not srcu to read the hlist
- Unclear how the locking worked to prevent gru_register_mmu_notifier()
from running concurrently with gru_drop_mmu_notifier() - this version is
safe
- GRU had a release function which only set a variable without any locking
that skiped the synchronize_srcu during unregister which looks racey,
but this makes it reliable via the integrated call_srcu().
- It is unclear if the mmap_sem is actually held when
__mmu_notifier_register() was called, lockdep will now warn if this is
wrong
Signed-off-by: Jason Gunthorpe <[email protected]>
---
drivers/misc/sgi-gru/grufile.c | 1 +
drivers/misc/sgi-gru/grutables.h | 2 -
drivers/misc/sgi-gru/grutlbpurge.c | 84 +++++++++---------------------
3 files changed, 25 insertions(+), 62 deletions(-)
diff --git a/drivers/misc/sgi-gru/grufile.c b/drivers/misc/sgi-gru/grufile.c
index a2a142ae087bfa..9d042310214ff9 100644
--- a/drivers/misc/sgi-gru/grufile.c
+++ b/drivers/misc/sgi-gru/grufile.c
@@ -573,6 +573,7 @@ static void __exit gru_exit(void)
gru_free_tables();
misc_deregister(&gru_miscdev);
gru_proc_exit();
+ mmu_notifier_synchronize();
}
static const struct file_operations gru_fops = {
diff --git a/drivers/misc/sgi-gru/grutables.h b/drivers/misc/sgi-gru/grutables.h
index 438191c220570c..a7e44b2eb413f6 100644
--- a/drivers/misc/sgi-gru/grutables.h
+++ b/drivers/misc/sgi-gru/grutables.h
@@ -307,10 +307,8 @@ struct gru_mm_tracker { /* pack to reduce size */
struct gru_mm_struct {
struct mmu_notifier ms_notifier;
- atomic_t ms_refcnt;
spinlock_t ms_asid_lock; /* protects ASID assignment */
atomic_t ms_range_active;/* num range_invals active */
- char ms_released;
wait_queue_head_t ms_wait_queue;
DECLARE_BITMAP(ms_asidmap, GRU_MAX_GRUS);
struct gru_mm_tracker ms_asids[GRU_MAX_GRUS];
diff --git a/drivers/misc/sgi-gru/grutlbpurge.c b/drivers/misc/sgi-gru/grutlbpurge.c
index 59ba0adf23cee4..10921cd2608dfa 100644
--- a/drivers/misc/sgi-gru/grutlbpurge.c
+++ b/drivers/misc/sgi-gru/grutlbpurge.c
@@ -235,83 +235,47 @@ static void gru_invalidate_range_end(struct mmu_notifier *mn,
gms, range->start, range->end);
}
-static void gru_release(struct mmu_notifier *mn, struct mm_struct *mm)
+static struct mmu_notifier *gru_alloc_notifier(struct mm_struct *mm)
{
- struct gru_mm_struct *gms = container_of(mn, struct gru_mm_struct,
- ms_notifier);
+ struct gru_mm_struct *gms;
+
+ gms = kzalloc(sizeof(*gms), GFP_KERNEL);
+ if (!gms)
+ return ERR_PTR(-ENOMEM);
+ STAT(gms_alloc);
+ spin_lock_init(&gms->ms_asid_lock);
+ init_waitqueue_head(&gms->ms_wait_queue);
- gms->ms_released = 1;
- gru_dbg(grudev, "gms %p\n", gms);
+ return &gms->ms_notifier;
}
+static void gru_free_notifier(struct mmu_notifier *mn)
+{
+ kfree(container_of(mn, struct gru_mm_struct, ms_notifier));
+ STAT(gms_free);
+}
static const struct mmu_notifier_ops gru_mmuops = {
.invalidate_range_start = gru_invalidate_range_start,
.invalidate_range_end = gru_invalidate_range_end,
- .release = gru_release,
+ .alloc_notifier = gru_alloc_notifier,
+ .free_notifier = gru_free_notifier,
};
-/* Move this to the basic mmu_notifier file. But for now... */
-static struct mmu_notifier *mmu_find_ops(struct mm_struct *mm,
- const struct mmu_notifier_ops *ops)
-{
- struct mmu_notifier *mn, *gru_mn = NULL;
-
- if (mm->mmu_notifier_mm) {
- rcu_read_lock();
- hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list,
- hlist)
- if (mn->ops == ops) {
- gru_mn = mn;
- break;
- }
- rcu_read_unlock();
- }
- return gru_mn;
-}
-
struct gru_mm_struct *gru_register_mmu_notifier(void)
{
- struct gru_mm_struct *gms;
struct mmu_notifier *mn;
- int err;
-
- mn = mmu_find_ops(current->mm, &gru_mmuops);
- if (mn) {
- gms = container_of(mn, struct gru_mm_struct, ms_notifier);
- atomic_inc(&gms->ms_refcnt);
- } else {
- gms = kzalloc(sizeof(*gms), GFP_KERNEL);
- if (!gms)
- return ERR_PTR(-ENOMEM);
- STAT(gms_alloc);
- spin_lock_init(&gms->ms_asid_lock);
- gms->ms_notifier.ops = &gru_mmuops;
- atomic_set(&gms->ms_refcnt, 1);
- init_waitqueue_head(&gms->ms_wait_queue);
- err = __mmu_notifier_register(&gms->ms_notifier, current->mm);
- if (err)
- goto error;
- }
- if (gms)
- gru_dbg(grudev, "gms %p, refcnt %d\n", gms,
- atomic_read(&gms->ms_refcnt));
- return gms;
-error:
- kfree(gms);
- return ERR_PTR(err);
+
+ mn = mmu_notifier_get_locked(&gru_mmuops, current->mm);
+ if (IS_ERR(mn))
+ return ERR_CAST(mn);
+
+ return container_of(mn, struct gru_mm_struct, ms_notifier);
}
void gru_drop_mmu_notifier(struct gru_mm_struct *gms)
{
- gru_dbg(grudev, "gms %p, refcnt %d, released %d\n", gms,
- atomic_read(&gms->ms_refcnt), gms->ms_released);
- if (atomic_dec_return(&gms->ms_refcnt) == 0) {
- if (!gms->ms_released)
- mmu_notifier_unregister(&gms->ms_notifier, current->mm);
- kfree(gms);
- STAT(gms_free);
- }
+ mmu_notifier_put(&gms->ms_notifier);
}
/*
--
2.22.0
From: Jason Gunthorpe <[email protected]>
The sequence of mmu_notifier_unregister_no_release(),
mmu_notifier_call_srcu() is identical to mmu_notifier_put() with the
free_notifier callback.
As this is the last user of those APIs, converting it means we can drop
them.
Signed-off-by: Jason Gunthorpe <[email protected]>
---
drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 3 ---
drivers/gpu/drm/amd/amdkfd/kfd_process.c | 10 ++++------
2 files changed, 4 insertions(+), 9 deletions(-)
I'm really not sure what this is doing, but it is very strange to have a
release with no other callback. It would be good if this would change to use
get as well.
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 3933fb6a371efb..9450e20d17093b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -686,9 +686,6 @@ struct kfd_process {
/* We want to receive a notification when the mm_struct is destroyed */
struct mmu_notifier mmu_notifier;
- /* Use for delayed freeing of kfd_process structure */
- struct rcu_head rcu;
-
unsigned int pasid;
unsigned int doorbell_index;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index c06e6190f21ffa..e5e326f2f2675e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -486,11 +486,9 @@ static void kfd_process_ref_release(struct kref *ref)
queue_work(kfd_process_wq, &p->release_work);
}
-static void kfd_process_destroy_delayed(struct rcu_head *rcu)
+static void kfd_process_free_notifier(struct mmu_notifier *mn)
{
- struct kfd_process *p = container_of(rcu, struct kfd_process, rcu);
-
- kfd_unref_process(p);
+ kfd_unref_process(container_of(mn, struct kfd_process, mmu_notifier));
}
static void kfd_process_notifier_release(struct mmu_notifier *mn,
@@ -542,12 +540,12 @@ static void kfd_process_notifier_release(struct mmu_notifier *mn,
mutex_unlock(&p->mutex);
- mmu_notifier_unregister_no_release(&p->mmu_notifier, mm);
- mmu_notifier_call_srcu(&p->rcu, &kfd_process_destroy_delayed);
+ mmu_notifier_put(&p->mmu_notifier);
}
static const struct mmu_notifier_ops kfd_process_mmu_notifier_ops = {
.release = kfd_process_notifier_release,
+ .free_notifier = kfd_process_free_notifier,
};
static int kfd_process_init_cwsr_apu(struct kfd_process *p, struct file *filep)
--
2.22.0
From: Jason Gunthorpe <[email protected]>
This is a significant simplification, it eliminates all the remaining
'hmm' stuff in mm_struct, eliminates krefing along the critical notifier
paths, and takes away all the ugly locking and abuse of page_table_lock.
mmu_notifier_get() provides the single struct hmm per struct mm which
eliminates mm->hmm.
It also directly guarantees that no mmu_notifier op callback is callable
while concurrent free is possible, this eliminates all the krefs inside
the mmu_notifier callbacks.
The remaining krefs in the range code were overly cautious, drivers are
already not permitted to free the mirror while a range exists.
Signed-off-by: Jason Gunthorpe <[email protected]>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 +
drivers/gpu/drm/nouveau/nouveau_drm.c | 3 +
include/linux/hmm.h | 12 +--
include/linux/mm_types.h | 6 --
kernel/fork.c | 1 -
mm/hmm.c | 121 ++++++------------------
6 files changed, 33 insertions(+), 111 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index f2e8b4238efd49..d50774a5f98ef7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -1464,6 +1464,7 @@ static void __exit amdgpu_exit(void)
amdgpu_unregister_atpx_handler();
amdgpu_sync_fini();
amdgpu_fence_slab_fini();
+ mmu_notifier_synchronize();
}
module_init(amdgpu_init);
diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
index 7c2fcaba42d6c3..a0e48a482452d7 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -28,6 +28,7 @@
#include <linux/pci.h>
#include <linux/pm_runtime.h>
#include <linux/vga_switcheroo.h>
+#include <linux/mmu_notifier.h>
#include <drm/drmP.h>
#include <drm/drm_crtc_helper.h>
@@ -1292,6 +1293,8 @@ nouveau_drm_exit(void)
#ifdef CONFIG_NOUVEAU_PLATFORM_DRIVER
platform_driver_unregister(&nouveau_platform_driver);
#endif
+ if (IS_ENABLED(CONFIG_DRM_NOUVEAU_SVM))
+ mmu_notifier_synchronize();
}
module_init(nouveau_drm_init);
diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index 82265118d94abd..c3902449db6412 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -84,15 +84,12 @@
* @notifiers: count of active mmu notifiers
*/
struct hmm {
- struct mm_struct *mm;
- struct kref kref;
+ struct mmu_notifier mmu_notifier;
spinlock_t ranges_lock;
struct list_head ranges;
struct list_head mirrors;
- struct mmu_notifier mmu_notifier;
struct rw_semaphore mirrors_sem;
wait_queue_head_t wq;
- struct rcu_head rcu;
long notifiers;
};
@@ -436,13 +433,6 @@ long hmm_range_dma_unmap(struct hmm_range *range,
*/
#define HMM_RANGE_DEFAULT_TIMEOUT 1000
-/* Below are for HMM internal use only! Not to be used by device driver! */
-static inline void hmm_mm_init(struct mm_struct *mm)
-{
- mm->hmm = NULL;
-}
-#else /* IS_ENABLED(CONFIG_HMM_MIRROR) */
-static inline void hmm_mm_init(struct mm_struct *mm) {}
#endif /* IS_ENABLED(CONFIG_HMM_MIRROR) */
#endif /* LINUX_HMM_H */
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 3a37a89eb7a7c3..525d25d93330f2 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -25,7 +25,6 @@
struct address_space;
struct mem_cgroup;
-struct hmm;
/*
* Each physical page in the system has a struct page associated with
@@ -502,11 +501,6 @@ struct mm_struct {
atomic_long_t hugetlb_usage;
#endif
struct work_struct async_put_work;
-
-#ifdef CONFIG_HMM_MIRROR
- /* HMM needs to track a few things per mm */
- struct hmm *hmm;
-#endif
} __randomize_layout;
/*
diff --git a/kernel/fork.c b/kernel/fork.c
index d8ae0f1b414802..bd4a0762f12f3e 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1007,7 +1007,6 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
mm_init_owner(mm, p);
RCU_INIT_POINTER(mm->exe_file, NULL);
mmu_notifier_mm_init(mm);
- hmm_mm_init(mm);
init_tlb_flush_pending(mm);
#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS
mm->pmd_huge_pte = NULL;
diff --git a/mm/hmm.c b/mm/hmm.c
index 9a908902e4cc38..00f94f94906afc 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -26,101 +26,37 @@
#include <linux/mmu_notifier.h>
#include <linux/memory_hotplug.h>
-static const struct mmu_notifier_ops hmm_mmu_notifier_ops;
-
-/**
- * hmm_get_or_create - register HMM against an mm (HMM internal)
- *
- * @mm: mm struct to attach to
- * Return: an HMM object, either by referencing the existing
- * (per-process) object, or by creating a new one.
- *
- * This is not intended to be used directly by device drivers. If mm already
- * has an HMM struct then it get a reference on it and returns it. Otherwise
- * it allocates an HMM struct, initializes it, associate it with the mm and
- * returns it.
- */
-static struct hmm *hmm_get_or_create(struct mm_struct *mm)
+static struct mmu_notifier *hmm_alloc_notifier(struct mm_struct *mm)
{
struct hmm *hmm;
- lockdep_assert_held_write(&mm->mmap_sem);
-
- /* Abuse the page_table_lock to also protect mm->hmm. */
- spin_lock(&mm->page_table_lock);
- hmm = mm->hmm;
- if (mm->hmm && kref_get_unless_zero(&mm->hmm->kref))
- goto out_unlock;
- spin_unlock(&mm->page_table_lock);
-
- hmm = kmalloc(sizeof(*hmm), GFP_KERNEL);
+ hmm = kzalloc(sizeof(*hmm), GFP_KERNEL);
if (!hmm)
- return NULL;
+ return ERR_PTR(-ENOMEM);
+
init_waitqueue_head(&hmm->wq);
INIT_LIST_HEAD(&hmm->mirrors);
init_rwsem(&hmm->mirrors_sem);
- hmm->mmu_notifier.ops = NULL;
INIT_LIST_HEAD(&hmm->ranges);
spin_lock_init(&hmm->ranges_lock);
- kref_init(&hmm->kref);
hmm->notifiers = 0;
- hmm->mm = mm;
-
- hmm->mmu_notifier.ops = &hmm_mmu_notifier_ops;
- if (__mmu_notifier_register(&hmm->mmu_notifier, mm)) {
- kfree(hmm);
- return NULL;
- }
-
- mmgrab(hmm->mm);
-
- /*
- * We hold the exclusive mmap_sem here so we know that mm->hmm is
- * still NULL or 0 kref, and is safe to update.
- */
- spin_lock(&mm->page_table_lock);
- mm->hmm = hmm;
-
-out_unlock:
- spin_unlock(&mm->page_table_lock);
- return hmm;
+ return &hmm->mmu_notifier;
}
-static void hmm_free_rcu(struct rcu_head *rcu)
+static void hmm_free_notifier(struct mmu_notifier *mn)
{
- struct hmm *hmm = container_of(rcu, struct hmm, rcu);
+ struct hmm *hmm = container_of(mn, struct hmm, mmu_notifier);
- mmdrop(hmm->mm);
+ WARN_ON(!list_empty(&hmm->ranges));
+ WARN_ON(!list_empty(&hmm->mirrors));
kfree(hmm);
}
-static void hmm_free(struct kref *kref)
-{
- struct hmm *hmm = container_of(kref, struct hmm, kref);
-
- spin_lock(&hmm->mm->page_table_lock);
- if (hmm->mm->hmm == hmm)
- hmm->mm->hmm = NULL;
- spin_unlock(&hmm->mm->page_table_lock);
-
- mmu_notifier_unregister_no_release(&hmm->mmu_notifier, hmm->mm);
- mmu_notifier_call_srcu(&hmm->rcu, hmm_free_rcu);
-}
-
-static inline void hmm_put(struct hmm *hmm)
-{
- kref_put(&hmm->kref, hmm_free);
-}
-
static void hmm_release(struct mmu_notifier *mn, struct mm_struct *mm)
{
struct hmm *hmm = container_of(mn, struct hmm, mmu_notifier);
struct hmm_mirror *mirror;
- /* Bail out if hmm is in the process of being freed */
- if (!kref_get_unless_zero(&hmm->kref))
- return;
-
/*
* Since hmm_range_register() holds the mmget() lock hmm_release() is
* prevented as long as a range exists.
@@ -137,8 +73,6 @@ static void hmm_release(struct mmu_notifier *mn, struct mm_struct *mm)
mirror->ops->release(mirror);
}
up_read(&hmm->mirrors_sem);
-
- hmm_put(hmm);
}
static void notifiers_decrement(struct hmm *hmm)
@@ -169,9 +103,6 @@ static int hmm_invalidate_range_start(struct mmu_notifier *mn,
unsigned long flags;
int ret = 0;
- if (!kref_get_unless_zero(&hmm->kref))
- return 0;
-
spin_lock_irqsave(&hmm->ranges_lock, flags);
hmm->notifiers++;
list_for_each_entry(range, &hmm->ranges, list) {
@@ -206,7 +137,6 @@ static int hmm_invalidate_range_start(struct mmu_notifier *mn,
out:
if (ret)
notifiers_decrement(hmm);
- hmm_put(hmm);
return ret;
}
@@ -215,17 +145,15 @@ static void hmm_invalidate_range_end(struct mmu_notifier *mn,
{
struct hmm *hmm = container_of(mn, struct hmm, mmu_notifier);
- if (!kref_get_unless_zero(&hmm->kref))
- return;
-
notifiers_decrement(hmm);
- hmm_put(hmm);
}
static const struct mmu_notifier_ops hmm_mmu_notifier_ops = {
.release = hmm_release,
.invalidate_range_start = hmm_invalidate_range_start,
.invalidate_range_end = hmm_invalidate_range_end,
+ .alloc_notifier = hmm_alloc_notifier,
+ .free_notifier = hmm_free_notifier,
};
/*
@@ -237,18 +165,27 @@ static const struct mmu_notifier_ops hmm_mmu_notifier_ops = {
*
* To start mirroring a process address space, the device driver must register
* an HMM mirror struct.
+ *
+ * The caller cannot unregister the hmm_mirror while any ranges are
+ * registered.
+ *
+ * Callers using this function must put a call to mmu_notifier_synchronize()
+ * in their module exit functions.
*/
int hmm_mirror_register(struct hmm_mirror *mirror, struct mm_struct *mm)
{
+ struct mmu_notifier *mn;
+
lockdep_assert_held_write(&mm->mmap_sem);
/* Sanity check */
if (!mm || !mirror || !mirror->ops)
return -EINVAL;
- mirror->hmm = hmm_get_or_create(mm);
- if (!mirror->hmm)
- return -ENOMEM;
+ mn = mmu_notifier_get_locked(&hmm_mmu_notifier_ops, mm);
+ if (IS_ERR(mn))
+ return PTR_ERR(mn);
+ mirror->hmm = container_of(mn, struct hmm, mmu_notifier);
down_write(&mirror->hmm->mirrors_sem);
list_add(&mirror->list, &mirror->hmm->mirrors);
@@ -272,7 +209,7 @@ void hmm_mirror_unregister(struct hmm_mirror *mirror)
down_write(&hmm->mirrors_sem);
list_del(&mirror->list);
up_write(&hmm->mirrors_sem);
- hmm_put(hmm);
+ mmu_notifier_put(&hmm->mmu_notifier);
}
EXPORT_SYMBOL(hmm_mirror_unregister);
@@ -880,14 +817,13 @@ int hmm_range_register(struct hmm_range *range,
range->end = end;
/* Prevent hmm_release() from running while the range is valid */
- if (!mmget_not_zero(hmm->mm))
+ if (!mmget_not_zero(hmm->mmu_notifier.mm))
return -EFAULT;
/* Initialize range to track CPU page table updates. */
spin_lock_irqsave(&hmm->ranges_lock, flags);
range->hmm = hmm;
- kref_get(&hmm->kref);
list_add(&range->list, &hmm->ranges);
/*
@@ -919,8 +855,7 @@ void hmm_range_unregister(struct hmm_range *range)
spin_unlock_irqrestore(&hmm->ranges_lock, flags);
/* Drop reference taken by hmm_range_register() */
- mmput(hmm->mm);
- hmm_put(hmm);
+ mmput(hmm->mmu_notifier.mm);
/*
* The range is now invalid and the ref on the hmm is dropped, so
@@ -970,14 +905,14 @@ long hmm_range_fault(struct hmm_range *range, unsigned int flags)
struct mm_walk mm_walk;
int ret;
- lockdep_assert_held(&hmm->mm->mmap_sem);
+ lockdep_assert_held(&hmm->mmu_notifier.mm->mmap_sem);
do {
/* If range is no longer valid force retry. */
if (!range->valid)
return -EBUSY;
- vma = find_vma(hmm->mm, start);
+ vma = find_vma(hmm->mmu_notifier.mm, start);
if (vma == NULL || (vma->vm_flags & device_vma))
return -EFAULT;
--
2.22.0
On 2019-08-06 19:15, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <[email protected]>
>
> The sequence of mmu_notifier_unregister_no_release(),
> mmu_notifier_call_srcu() is identical to mmu_notifier_put() with the
> free_notifier callback.
>
> As this is the last user of those APIs, converting it means we can drop
> them.
>
> Signed-off-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
> ---
> drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 3 ---
> drivers/gpu/drm/amd/amdkfd/kfd_process.c | 10 ++++------
> 2 files changed, 4 insertions(+), 9 deletions(-)
>
> I'm really not sure what this is doing, but it is very strange to have a
> release with no other callback. It would be good if this would change to use
> get as well.
KFD uses the MMU notifier to detect process termination and free all the
resources associated with the process. This was first added for APUs
where the IOMMUv2 is set up to perform address translations using the
CPU page table for device memory access. That's where the association of
KFD process resources with the lifetime of the mm_struct comes from.
Regards,
Felix
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index 3933fb6a371efb..9450e20d17093b 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -686,9 +686,6 @@ struct kfd_process {
> /* We want to receive a notification when the mm_struct is destroyed */
> struct mmu_notifier mmu_notifier;
>
> - /* Use for delayed freeing of kfd_process structure */
> - struct rcu_head rcu;
> -
> unsigned int pasid;
> unsigned int doorbell_index;
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> index c06e6190f21ffa..e5e326f2f2675e 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> @@ -486,11 +486,9 @@ static void kfd_process_ref_release(struct kref *ref)
> queue_work(kfd_process_wq, &p->release_work);
> }
>
> -static void kfd_process_destroy_delayed(struct rcu_head *rcu)
> +static void kfd_process_free_notifier(struct mmu_notifier *mn)
> {
> - struct kfd_process *p = container_of(rcu, struct kfd_process, rcu);
> -
> - kfd_unref_process(p);
> + kfd_unref_process(container_of(mn, struct kfd_process, mmu_notifier));
> }
>
> static void kfd_process_notifier_release(struct mmu_notifier *mn,
> @@ -542,12 +540,12 @@ static void kfd_process_notifier_release(struct mmu_notifier *mn,
>
> mutex_unlock(&p->mutex);
>
> - mmu_notifier_unregister_no_release(&p->mmu_notifier, mm);
> - mmu_notifier_call_srcu(&p->rcu, &kfd_process_destroy_delayed);
> + mmu_notifier_put(&p->mmu_notifier);
> }
>
> static const struct mmu_notifier_ops kfd_process_mmu_notifier_ops = {
> .release = kfd_process_notifier_release,
> + .free_notifier = kfd_process_free_notifier,
> };
>
> static int kfd_process_init_cwsr_apu(struct kfd_process *p, struct file *filep)
On Tue, Aug 06, 2019 at 11:47:44PM +0000, Kuehling, Felix wrote:
> On 2019-08-06 19:15, Jason Gunthorpe wrote:
> > From: Jason Gunthorpe <[email protected]>
> >
> > The sequence of mmu_notifier_unregister_no_release(),
> > mmu_notifier_call_srcu() is identical to mmu_notifier_put() with the
> > free_notifier callback.
> >
> > As this is the last user of those APIs, converting it means we can drop
> > them.
> >
> > Signed-off-by: Jason Gunthorpe <[email protected]>
>
> Reviewed-by: Felix Kuehling <[email protected]>
>
> > drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 3 ---
> > drivers/gpu/drm/amd/amdkfd/kfd_process.c | 10 ++++------
> > 2 files changed, 4 insertions(+), 9 deletions(-)
> >
> > I'm really not sure what this is doing, but it is very strange to have a
> > release with no other callback. It would be good if this would change to use
> > get as well.
> KFD uses the MMU notifier to detect process termination and free all the
> resources associated with the process. This was first added for APUs
> where the IOMMUv2 is set up to perform address translations using the
> CPU page table for device memory access. That's where the association of
> KFD process resources with the lifetime of the mm_struct comes from.
When all the HW objects that could do DMA to this process are
destroyed then the mmu notififer should be torn down. The module
should remain locked until the DMA objects are destroyed.
I'm still unclear why this is needed, the IOMMU for PASID already has
notififers, and already blocks access when the mm_struct goes away,
why add a second layer of tracking?
Jason
Looks good,
Reviewed-by: Christoph Hellwig <[email protected]>
Looks good,
Reviewed-by: Christoph Hellwig <[email protected]>
On Tue, Aug 06, 2019 at 08:15:48PM -0300, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <[email protected]>
>
> mmu_notifier_unregister_no_release() and mmu_notifier_call_srcu() no
> longer have any users, they have all been converted to use
> mmu_notifier_put().
>
> So delete this difficult to use interface.
>
> Signed-off-by: Jason Gunthorpe <[email protected]>
Looks good:
Reviewed-by: Christoph Hellwig <[email protected]>
On Tue, Aug 06, 2019 at 08:15:38PM -0300, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <[email protected]>
>
> This simplifies the code to not have so many one line functions and extra
> logic. __mmu_notifier_register() simply becomes the entry point to
> register the notifier, and the other one calls it under lock.
>
> Also add a lockdep_assert to check that the callers are holding the lock
> as expected.
>
> Suggested-by: Christoph Hellwig <[email protected]>
> Signed-off-by: Jason Gunthorpe <[email protected]>
Looks good:
Reviewed-by: Christoph Hellwig <[email protected]>
On Thu, Aug 08, 2019 at 12:25:56PM +0200, Christoph Hellwig wrote:
> Looks good,
>
> Reviewed-by: Christoph Hellwig <[email protected]>
Dimitri, are you OK with this patch?
Thanks,
Jason
On Wed, Aug 14, 2019 at 03:58:34PM +0000, Jason Gunthorpe wrote:
> On Thu, Aug 08, 2019 at 12:25:56PM +0200, Christoph Hellwig wrote:
> > Looks good,
> >
> > Reviewed-by: Christoph Hellwig <[email protected]>
>
> Dimitri, are you OK with this patch?
>
I think this looks OK.
Reviewed-by: Dimitri Sivanich <[email protected]>
On 8/6/19 4:15 PM, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <[email protected]>
>
> This simplifies the code to not have so many one line functions and extra
> logic. __mmu_notifier_register() simply becomes the entry point to
> register the notifier, and the other one calls it under lock.
>
> Also add a lockdep_assert to check that the callers are holding the lock
> as expected.
>
> Suggested-by: Christoph Hellwig <[email protected]>
> Signed-off-by: Jason Gunthorpe <[email protected]>
Nice clean up.
Reviewed-by: Ralph Campbell <[email protected]>
On 8/6/19 4:15 PM, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <[email protected]>
>
> This is a significant simplification, it eliminates all the remaining
> 'hmm' stuff in mm_struct, eliminates krefing along the critical notifier
> paths, and takes away all the ugly locking and abuse of page_table_lock.
>
> mmu_notifier_get() provides the single struct hmm per struct mm which
> eliminates mm->hmm.
>
> It also directly guarantees that no mmu_notifier op callback is callable
> while concurrent free is possible, this eliminates all the krefs inside
> the mmu_notifier callbacks.
>
> The remaining krefs in the range code were overly cautious, drivers are
> already not permitted to free the mirror while a range exists.
>
> Signed-off-by: Jason Gunthorpe <[email protected]>
Looks good.
Reviewed-by: Ralph Campbell <[email protected]>
On 8/6/19 4:15 PM, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <[email protected]>
>
> mmu_notifier_unregister_no_release() and mmu_notifier_call_srcu() no
> longer have any users, they have all been converted to use
> mmu_notifier_put().
>
> So delete this difficult to use interface.
>
> Signed-off-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Ralph Campbell <[email protected]>
On 8/6/19 4:15 PM, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <[email protected]>
>
> This series introduces a new registration flow for mmu_notifiers based on
> the idea that the user would like to get a single refcounted piece of
> memory for a mm, keyed to its use.
>
> For instance many users of mmu_notifiers use an interval tree or similar
> to dispatch notifications to some object. There are many objects but only
> one notifier subscription per mm holding the tree.
>
> Of the 12 places that call mmu_notifier_register:
> - 7 are maintaining some kind of obvious mapping of mm_struct to
> mmu_notifier registration, ie in some linked list or hash table. Of
> the 7 this series converts 4 (gru, hmm, RDMA, radeon)
>
> - 3 (hfi1, gntdev, vhost) are registering multiple notifiers, but each
> one immediately does some VA range filtering, ie with an interval tree.
> These would be better with a global subsystem-wide range filter and
> could convert to this API.
>
> - 2 (kvm, amd_iommu) are deliberately using a single mm at a time, and
> really can't use this API. One of the intel-svm's modes is also in this
> list
>
> The 3/7 unconverted drivers are:
> - intel-svm
> This driver tracks mm's in a global linked list 'global_svm_list'
> and would benefit from this API.
>
> Its flow is a bit complex, since it also wants a set of non-shared
> notifiers.
>
> - i915_gem_usrptr
> This driver tracks mm's in a per-device hash
> table (dev_priv->mm_structs), but only has an optional use of
> mmu_notifiers. Since it still seems to need the hash table it is
> difficult to convert.
>
> - amdkfd/kfd_process
> This driver is using a global SRCU hash table to track mm's
>
> The control flow here is very complicated and the driver is relying on
> this hash table to be fast on the ioctl syscall path.
>
> It would definitely benefit, but only if the ioctl path didn't need to
> do the search so often.
>
> This series is already entangled with patches in the hmm & RDMA tree and
> will require some git topic branches for the RDMA ODP stuff. I intend for
> it to go through the hmm tree.
>
> There is a git version here:
>
> https://github.com/jgunthorpe/linux/commits/mmu_notifier
>
> Which has the required pre-patches for the RDMA ODP conversion that are
> still being reviewed.
>
> Jason Gunthorpe (11):
> mm/mmu_notifiers: hoist do_mmu_notifier_register down_write to the
> caller
> mm/mmu_notifiers: do not speculatively allocate a mmu_notifier_mm
> mm/mmu_notifiers: add a get/put scheme for the registration
> misc/sgi-gru: use mmu_notifier_get/put for struct gru_mm_struct
> hmm: use mmu_notifier_get/put for 'struct hmm'
> RDMA/odp: use mmu_notifier_get/put for 'struct ib_ucontext_per_mm'
> RDMA/odp: remove ib_ucontext from ib_umem
> drm/radeon: use mmu_notifier_get/put for struct radeon_mn
> drm/amdkfd: fix a use after free race with mmu_notifer unregister
> drm/amdkfd: use mmu_notifier_put
> mm/mmu_notifiers: remove unregister_no_release
>
> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 +
> drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 3 -
> drivers/gpu/drm/amd/amdkfd/kfd_process.c | 88 ++++-----
> drivers/gpu/drm/nouveau/nouveau_drm.c | 3 +
> drivers/gpu/drm/radeon/radeon.h | 3 -
> drivers/gpu/drm/radeon/radeon_device.c | 2 -
> drivers/gpu/drm/radeon/radeon_drv.c | 2 +
> drivers/gpu/drm/radeon/radeon_mn.c | 157 ++++------------
> drivers/infiniband/core/umem.c | 4 +-
> drivers/infiniband/core/umem_odp.c | 183 ++++++------------
> drivers/infiniband/core/uverbs_cmd.c | 3 -
> drivers/infiniband/core/uverbs_main.c | 1 +
> drivers/infiniband/hw/mlx5/main.c | 5 -
> drivers/misc/sgi-gru/grufile.c | 1 +
> drivers/misc/sgi-gru/grutables.h | 2 -
> drivers/misc/sgi-gru/grutlbpurge.c | 84 +++------
> include/linux/hmm.h | 12 +-
> include/linux/mm_types.h | 6 -
> include/linux/mmu_notifier.h | 40 +++-
> include/rdma/ib_umem.h | 2 +-
> include/rdma/ib_umem_odp.h | 10 +-
> include/rdma/ib_verbs.h | 3 -
> kernel/fork.c | 1 -
> mm/hmm.c | 121 +++---------
> mm/mmu_notifier.c | 230 +++++++++++++++++------
> 25 files changed, 408 insertions(+), 559 deletions(-)
For the core MM, HMM, and nouveau changes you can add:
Tested-by: Ralph Campbell <[email protected]>
On Tue, Aug 06, 2019 at 08:15:37PM -0300, Jason Gunthorpe wrote:
> This series is already entangled with patches in the hmm & RDMA tree and
> will require some git topic branches for the RDMA ODP stuff. I intend for
> it to go through the hmm tree.
> Jason Gunthorpe (11):
> mm/mmu_notifiers: hoist do_mmu_notifier_register down_write to the
> caller
> mm/mmu_notifiers: do not speculatively allocate a mmu_notifier_mm
> mm/mmu_notifiers: add a get/put scheme for the registration
> misc/sgi-gru: use mmu_notifier_get/put for struct gru_mm_struct
> hmm: use mmu_notifier_get/put for 'struct hmm'
> drm/radeon: use mmu_notifier_get/put for struct radeon_mn
> drm/amdkfd: fix a use after free race with mmu_notifer unregister
> drm/amdkfd: use mmu_notifier_put
Other than these patches:
> RDMA/odp: use mmu_notifier_get/put for 'struct ib_ucontext_per_mm'
> RDMA/odp: remove ib_ucontext from ib_umem
> mm/mmu_notifiers: remove unregister_no_release
This series has been applied.
I will apply the ODP patches when the series they depend on is merged
to the RDMA tree
Any further acks/remarks I will annotate, thanks in advance
Thanks to all reviewers,
Jason
On Tue, Aug 06, 2019 at 08:15:37PM -0300, Jason Gunthorpe wrote:
> This series is already entangled with patches in the hmm & RDMA tree and
> will require some git topic branches for the RDMA ODP stuff. I intend for
> it to go through the hmm tree.
The RDMA related patches have been applied to the RDMA tree on a
shared topic branch, so I've merged that into hmm.git and applied the
last patches from this series on top:
> RDMA/odp: use mmu_notifier_get/put for 'struct ib_ucontext_per_mm'
> RDMA/odp: remove ib_ucontext from ib_umem
> mm/mmu_notifiers: remove unregister_no_release
There was some conflict churn in the RDMA ODP patches vs what was used
to the patches from this series, I fixed it up. Now I'm waiting for
some testing feedback before pushing it to linux-next
Thanks,
Jason