From: Jérôme Glisse <[email protected]>
Since last version [4] i added the extra bits needed for the change_pte
optimization (which is a KSM thing). Here i am not posting users of
this, they will be posted to the appropriate sub-systems (KVM, GPU,
RDMA, ...) once this serie get upstream. If you want to look at users
of this see [5] [6]. If this gets in 5.1 then i will be submitting
those users for 5.2 (including KVM if KVM folks feel comfortable with
it).
Note that this serie does not change any behavior for any existing
code. It just pass down more informations to mmu notifier listener.
The rational for this patchset:
CPU page table update can happens for many reasons, not only as a
result of a syscall (munmap(), mprotect(), mremap(), madvise(), ...)
but also as a result of kernel activities (memory compression, reclaim,
migration, ...).
This patch introduce a set of enums that can be associated with each
of the events triggering a mmu notifier:
- UNMAP: munmap() or mremap()
- CLEAR: page table is cleared (migration, compaction, reclaim, ...)
- PROTECTION_VMA: change in access protections for the range
- PROTECTION_PAGE: change in access protections for page in the range
- SOFT_DIRTY: soft dirtyness tracking
Being able to identify munmap() and mremap() from other reasons why the
page table is cleared is important to allow user of mmu notifier to
update their own internal tracking structure accordingly (on munmap or
mremap it is not longer needed to track range of virtual address as it
becomes invalid). Without this serie, driver are force to assume that
every notification is an munmap which triggers useless trashing within
drivers that associate structure with range of virtual address. Each
driver is force to free up its tracking structure and then restore it
on next device page fault. With this serie we can also optimize device
page table update [5].
More over this can also be use to optimize out some page table updates
like for KVM where we can update the secondary MMU directly from the
callback instead of clearing it.
Patches to leverage this serie will be posted separately to each sub-
system.
Cheers,
Jérôme
[1] v1 https://lkml.org/lkml/2018/3/23/1049
[2] v2 https://lkml.org/lkml/2018/12/5/10
[3] v3 https://lkml.org/lkml/2018/12/13/620
[4] v4 https://lkml.org/lkml/2019/1/23/838
[5] patches to use this:
https://lkml.org/lkml/2019/1/23/833
https://lkml.org/lkml/2019/1/23/834
https://lkml.org/lkml/2019/1/23/832
https://lkml.org/lkml/2019/1/23/831
[6] KVM restore change pte optimization
https://patchwork.kernel.org/cover/10791179/
Cc: Christian König <[email protected]>
Cc: Joonas Lahtinen <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Felix Kuehling <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Ross Zwisler <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Christian Koenig <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Arnd Bergmann <[email protected]>
Jérôme Glisse (9):
mm/mmu_notifier: helper to test if a range invalidation is blockable
mm/mmu_notifier: convert user range->blockable to helper function
mm/mmu_notifier: convert mmu_notifier_range->blockable to a flags
mm/mmu_notifier: contextual information for event enums
mm/mmu_notifier: contextual information for event triggering
invalidation v2
mm/mmu_notifier: use correct mmu_notifier events for each invalidation
mm/mmu_notifier: pass down vma and reasons why mmu notifier is
happening v2
mm/mmu_notifier: mmu_notifier_range_update_to_read_only() helper
mm/mmu_notifier: set MMU_NOTIFIER_USE_CHANGE_PTE flag where
appropriate v2
drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 8 +--
drivers/gpu/drm/i915/i915_gem_userptr.c | 2 +-
drivers/gpu/drm/radeon/radeon_mn.c | 4 +-
drivers/infiniband/core/umem_odp.c | 5 +-
drivers/xen/gntdev.c | 6 +-
fs/proc/task_mmu.c | 3 +-
include/linux/mmu_notifier.h | 93 +++++++++++++++++++++++--
kernel/events/uprobes.c | 3 +-
mm/hmm.c | 6 +-
mm/huge_memory.c | 14 ++--
mm/hugetlb.c | 12 ++--
mm/khugepaged.c | 3 +-
mm/ksm.c | 9 ++-
mm/madvise.c | 3 +-
mm/memory.c | 26 ++++---
mm/migrate.c | 5 +-
mm/mmu_notifier.c | 12 +++-
mm/mprotect.c | 4 +-
mm/mremap.c | 3 +-
mm/oom_kill.c | 3 +-
mm/rmap.c | 6 +-
virt/kvm/kvm_main.c | 3 +-
22 files changed, 180 insertions(+), 53 deletions(-)
--
2.17.2
From: Jérôme Glisse <[email protected]>
Simple helpers to test if range invalidation is blockable. Latter
patches use cocinnelle to convert all direct dereference of range->
blockable to use this function instead so that we can convert the
blockable field to an unsigned for more flags.
Signed-off-by: Jérôme Glisse <[email protected]>
Cc: Christian König <[email protected]>
Cc: Joonas Lahtinen <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Felix Kuehling <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Ross Zwisler <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Christian Koenig <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Arnd Bergmann <[email protected]>
---
include/linux/mmu_notifier.h | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index 4050ec1c3b45..e630def131ce 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -226,6 +226,12 @@ extern void __mmu_notifier_invalidate_range_end(struct mmu_notifier_range *r,
extern void __mmu_notifier_invalidate_range(struct mm_struct *mm,
unsigned long start, unsigned long end);
+static inline bool
+mmu_notifier_range_blockable(const struct mmu_notifier_range *range)
+{
+ return range->blockable;
+}
+
static inline void mmu_notifier_release(struct mm_struct *mm)
{
if (mm_has_notifiers(mm))
@@ -455,6 +461,11 @@ static inline void _mmu_notifier_range_init(struct mmu_notifier_range *range,
#define mmu_notifier_range_init(range, mm, start, end) \
_mmu_notifier_range_init(range, start, end)
+static inline bool
+mmu_notifier_range_blockable(const struct mmu_notifier_range *range)
+{
+ return true;
+}
static inline int mm_has_notifiers(struct mm_struct *mm)
{
--
2.17.2
From: Jérôme Glisse <[email protected]>
Use an unsigned field for flags other than blockable and convert
the blockable field to be one of those flags.
Signed-off-by: Jérôme Glisse <[email protected]>
Cc: Christian König <[email protected]>
Cc: Joonas Lahtinen <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Felix Kuehling <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Ross Zwisler <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Christian Koenig <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Arnd Bergmann <[email protected]>
---
include/linux/mmu_notifier.h | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index e630def131ce..c8672c366f67 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -25,11 +25,13 @@ struct mmu_notifier_mm {
spinlock_t lock;
};
+#define MMU_NOTIFIER_RANGE_BLOCKABLE (1 << 0)
+
struct mmu_notifier_range {
struct mm_struct *mm;
unsigned long start;
unsigned long end;
- bool blockable;
+ unsigned flags;
};
struct mmu_notifier_ops {
@@ -229,7 +231,7 @@ extern void __mmu_notifier_invalidate_range(struct mm_struct *mm,
static inline bool
mmu_notifier_range_blockable(const struct mmu_notifier_range *range)
{
- return range->blockable;
+ return (range->flags & MMU_NOTIFIER_RANGE_BLOCKABLE);
}
static inline void mmu_notifier_release(struct mm_struct *mm)
@@ -275,7 +277,7 @@ static inline void
mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range)
{
if (mm_has_notifiers(range->mm)) {
- range->blockable = true;
+ range->flags |= MMU_NOTIFIER_RANGE_BLOCKABLE;
__mmu_notifier_invalidate_range_start(range);
}
}
@@ -284,7 +286,7 @@ static inline int
mmu_notifier_invalidate_range_start_nonblock(struct mmu_notifier_range *range)
{
if (mm_has_notifiers(range->mm)) {
- range->blockable = false;
+ range->flags &= ~MMU_NOTIFIER_RANGE_BLOCKABLE;
return __mmu_notifier_invalidate_range_start(range);
}
return 0;
@@ -331,6 +333,7 @@ static inline void mmu_notifier_range_init(struct mmu_notifier_range *range,
range->mm = mm;
range->start = start;
range->end = end;
+ range->flags = 0;
}
#define ptep_clear_flush_young_notify(__vma, __address, __ptep) \
--
2.17.2
From: Jérôme Glisse <[email protected]>
Use the mmu_notifier_range_blockable() helper function instead of
directly dereferencing the range->blockable field. This is done to
make it easier to change the mmu_notifier range field.
This patch is the outcome of the following coccinelle patch:
%<-------------------------------------------------------------------
@@
identifier I1, FN;
@@
FN(..., struct mmu_notifier_range *I1, ...) {
<...
-I1->blockable
+mmu_notifier_range_blockable(I1)
...>
}
------------------------------------------------------------------->%
spatch --in-place --sp-file blockable.spatch --dir .
Signed-off-by: Jérôme Glisse <[email protected]>
Cc: Christian König <[email protected]>
Cc: Joonas Lahtinen <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Felix Kuehling <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Ross Zwisler <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Christian Koenig <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Arnd Bergmann <[email protected]>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 8 ++++----
drivers/gpu/drm/i915/i915_gem_userptr.c | 2 +-
drivers/gpu/drm/radeon/radeon_mn.c | 4 ++--
drivers/infiniband/core/umem_odp.c | 5 +++--
drivers/xen/gntdev.c | 6 +++---
mm/hmm.c | 6 +++---
mm/mmu_notifier.c | 2 +-
virt/kvm/kvm_main.c | 3 ++-
8 files changed, 19 insertions(+), 17 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
index 3e6823fdd939..58ed401c5996 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
@@ -256,14 +256,14 @@ static int amdgpu_mn_invalidate_range_start_gfx(struct mmu_notifier *mn,
/* TODO we should be able to split locking for interval tree and
* amdgpu_mn_invalidate_node
*/
- if (amdgpu_mn_read_lock(amn, range->blockable))
+ if (amdgpu_mn_read_lock(amn, mmu_notifier_range_blockable(range)))
return -EAGAIN;
it = interval_tree_iter_first(&amn->objects, range->start, end);
while (it) {
struct amdgpu_mn_node *node;
- if (!range->blockable) {
+ if (!mmu_notifier_range_blockable(range)) {
amdgpu_mn_read_unlock(amn);
return -EAGAIN;
}
@@ -299,7 +299,7 @@ static int amdgpu_mn_invalidate_range_start_hsa(struct mmu_notifier *mn,
/* notification is exclusive, but interval is inclusive */
end = range->end - 1;
- if (amdgpu_mn_read_lock(amn, range->blockable))
+ if (amdgpu_mn_read_lock(amn, mmu_notifier_range_blockable(range)))
return -EAGAIN;
it = interval_tree_iter_first(&amn->objects, range->start, end);
@@ -307,7 +307,7 @@ static int amdgpu_mn_invalidate_range_start_hsa(struct mmu_notifier *mn,
struct amdgpu_mn_node *node;
struct amdgpu_bo *bo;
- if (!range->blockable) {
+ if (!mmu_notifier_range_blockable(range)) {
amdgpu_mn_read_unlock(amn);
return -EAGAIN;
}
diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
index 1d3f9a31ad61..777b3f8727e7 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -122,7 +122,7 @@ userptr_mn_invalidate_range_start(struct mmu_notifier *_mn,
while (it) {
struct drm_i915_gem_object *obj;
- if (!range->blockable) {
+ if (!mmu_notifier_range_blockable(range)) {
ret = -EAGAIN;
break;
}
diff --git a/drivers/gpu/drm/radeon/radeon_mn.c b/drivers/gpu/drm/radeon/radeon_mn.c
index b3019505065a..c9bd1278f573 100644
--- a/drivers/gpu/drm/radeon/radeon_mn.c
+++ b/drivers/gpu/drm/radeon/radeon_mn.c
@@ -133,7 +133,7 @@ static int radeon_mn_invalidate_range_start(struct mmu_notifier *mn,
/* TODO we should be able to split locking for interval tree and
* the tear down.
*/
- if (range->blockable)
+ if (mmu_notifier_range_blockable(range))
mutex_lock(&rmn->lock);
else if (!mutex_trylock(&rmn->lock))
return -EAGAIN;
@@ -144,7 +144,7 @@ static int radeon_mn_invalidate_range_start(struct mmu_notifier *mn,
struct radeon_bo *bo;
long r;
- if (!range->blockable) {
+ if (!mmu_notifier_range_blockable(range)) {
ret = -EAGAIN;
goto out_unlock;
}
diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
index 012044f16d1c..3a3f1538d295 100644
--- a/drivers/infiniband/core/umem_odp.c
+++ b/drivers/infiniband/core/umem_odp.c
@@ -151,7 +151,7 @@ static int ib_umem_notifier_invalidate_range_start(struct mmu_notifier *mn,
struct ib_ucontext_per_mm *per_mm =
container_of(mn, struct ib_ucontext_per_mm, mn);
- if (range->blockable)
+ if (mmu_notifier_range_blockable(range))
down_read(&per_mm->umem_rwsem);
else if (!down_read_trylock(&per_mm->umem_rwsem))
return -EAGAIN;
@@ -169,7 +169,8 @@ static int ib_umem_notifier_invalidate_range_start(struct mmu_notifier *mn,
return rbt_ib_umem_for_each_in_range(&per_mm->umem_tree, range->start,
range->end,
invalidate_range_start_trampoline,
- range->blockable, NULL);
+ mmu_notifier_range_blockable(range),
+ NULL);
}
static int invalidate_range_end_trampoline(struct ib_umem_odp *item, u64 start,
diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index 5efc5eee9544..9da8f7192f46 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -526,20 +526,20 @@ static int mn_invl_range_start(struct mmu_notifier *mn,
struct gntdev_grant_map *map;
int ret = 0;
- if (range->blockable)
+ if (mmu_notifier_range_blockable(range))
mutex_lock(&priv->lock);
else if (!mutex_trylock(&priv->lock))
return -EAGAIN;
list_for_each_entry(map, &priv->maps, next) {
ret = unmap_if_in_range(map, range->start, range->end,
- range->blockable);
+ mmu_notifier_range_blockable(range));
if (ret)
goto out_unlock;
}
list_for_each_entry(map, &priv->freeable_maps, next) {
ret = unmap_if_in_range(map, range->start, range->end,
- range->blockable);
+ mmu_notifier_range_blockable(range));
if (ret)
goto out_unlock;
}
diff --git a/mm/hmm.c b/mm/hmm.c
index 3c9781037918..a03b5083d880 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -205,9 +205,9 @@ static int hmm_invalidate_range_start(struct mmu_notifier *mn,
update.start = nrange->start;
update.end = nrange->end;
update.event = HMM_UPDATE_INVALIDATE;
- update.blockable = nrange->blockable;
+ update.blockable = mmu_notifier_range_blockable(nrange);
- if (nrange->blockable)
+ if (mmu_notifier_range_blockable(nrange))
mutex_lock(&hmm->lock);
else if (!mutex_trylock(&hmm->lock)) {
ret = -EAGAIN;
@@ -222,7 +222,7 @@ static int hmm_invalidate_range_start(struct mmu_notifier *mn,
}
mutex_unlock(&hmm->lock);
- if (nrange->blockable)
+ if (mmu_notifier_range_blockable(nrange))
down_read(&hmm->mirrors_sem);
else if (!down_read_trylock(&hmm->mirrors_sem)) {
ret = -EAGAIN;
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index 9c884abc7850..abd88c466eb2 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -180,7 +180,7 @@ int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range)
if (_ret) {
pr_info("%pS callback failed with %d in %sblockable context.\n",
mn->ops->invalidate_range_start, _ret,
- !range->blockable ? "non-" : "");
+ !mmu_notifier_range_blockable(range) ? "non-" : "");
ret = _ret;
}
}
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 38df17b7760e..629760c0fb95 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -386,7 +386,8 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
spin_unlock(&kvm->mmu_lock);
ret = kvm_arch_mmu_notifier_invalidate_range(kvm, range->start,
- range->end, range->blockable);
+ range->end,
+ mmu_notifier_range_blockable(range));
srcu_read_unlock(&kvm->srcu, idx);
--
2.17.2
From: Jérôme Glisse <[email protected]>
CPU page table update can happens for many reasons, not only as a result
of a syscall (munmap(), mprotect(), mremap(), madvise(), ...) but also
as a result of kernel activities (memory compression, reclaim, migration,
...).
This patch introduce a set of enums that can be associated with each of
the events triggering a mmu notifier. Latter patches take advantages of
those enum values.
- UNMAP: munmap() or mremap()
- CLEAR: page table is cleared (migration, compaction, reclaim, ...)
- PROTECTION_VMA: change in access protections for the range
- PROTECTION_PAGE: change in access protections for page in the range
- SOFT_DIRTY: soft dirtyness tracking
Being able to identify munmap() and mremap() from other reasons why the
page table is cleared is important to allow user of mmu notifier to
update their own internal tracking structure accordingly (on munmap or
mremap it is not longer needed to track range of virtual address as it
becomes invalid).
Signed-off-by: Jérôme Glisse <[email protected]>
Cc: Christian König <[email protected]>
Cc: Joonas Lahtinen <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Felix Kuehling <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Ross Zwisler <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Christian Koenig <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Arnd Bergmann <[email protected]>
---
include/linux/mmu_notifier.h | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index c8672c366f67..2386e71ac1b8 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -10,6 +10,36 @@
struct mmu_notifier;
struct mmu_notifier_ops;
+/**
+ * enum mmu_notifier_event - reason for the mmu notifier callback
+ * @MMU_NOTIFY_UNMAP: either munmap() that unmap the range or a mremap() that
+ * move the range
+ *
+ * @MMU_NOTIFY_CLEAR: clear page table entry (many reasons for this like
+ * madvise() or replacing a page by another one, ...).
+ *
+ * @MMU_NOTIFY_PROTECTION_VMA: update is due to protection change for the range
+ * ie using the vma access permission (vm_page_prot) to update the whole range
+ * is enough no need to inspect changes to the CPU page table (mprotect()
+ * syscall)
+ *
+ * @MMU_NOTIFY_PROTECTION_PAGE: update is due to change in read/write flag for
+ * pages in the range so to mirror those changes the user must inspect the CPU
+ * page table (from the end callback).
+ *
+ * @MMU_NOTIFY_SOFT_DIRTY: soft dirty accounting (still same page and same
+ * access flags). User should soft dirty the page in the end callback to make
+ * sure that anyone relying on soft dirtyness catch pages that might be written
+ * through non CPU mappings.
+ */
+enum mmu_notifier_event {
+ MMU_NOTIFY_UNMAP = 0,
+ MMU_NOTIFY_CLEAR,
+ MMU_NOTIFY_PROTECTION_VMA,
+ MMU_NOTIFY_PROTECTION_PAGE,
+ MMU_NOTIFY_SOFT_DIRTY,
+};
+
#ifdef CONFIG_MMU_NOTIFIER
/*
--
2.17.2
From: Jérôme Glisse <[email protected]>
Helper to test if a range is updated to read only (it is still valid
to read from the range). This is useful for device driver or anyone
who wish to optimize out update when they know that they already have
the range map read only.
Signed-off-by: Jérôme Glisse <[email protected]>
Cc: Christian König <[email protected]>
Cc: Joonas Lahtinen <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Felix Kuehling <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Ross Zwisler <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Christian Koenig <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Arnd Bergmann <[email protected]>
---
include/linux/mmu_notifier.h | 4 ++++
mm/mmu_notifier.c | 10 ++++++++++
2 files changed, 14 insertions(+)
diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index 0379956fff23..b6c004bd9f6a 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -259,6 +259,8 @@ extern void __mmu_notifier_invalidate_range_end(struct mmu_notifier_range *r,
bool only_end);
extern void __mmu_notifier_invalidate_range(struct mm_struct *mm,
unsigned long start, unsigned long end);
+extern bool
+mmu_notifier_range_update_to_read_only(const struct mmu_notifier_range *range);
static inline bool
mmu_notifier_range_blockable(const struct mmu_notifier_range *range)
@@ -568,6 +570,8 @@ static inline void mmu_notifier_mm_destroy(struct mm_struct *mm)
{
}
+#define mmu_notifier_range_update_to_read_only(r) false
+
#define ptep_clear_flush_young_notify ptep_clear_flush_young
#define pmdp_clear_flush_young_notify pmdp_clear_flush_young
#define ptep_clear_young_notify ptep_test_and_clear_young
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index abd88c466eb2..ee36068077b6 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -395,3 +395,13 @@ void mmu_notifier_unregister_no_release(struct mmu_notifier *mn,
mmdrop(mm);
}
EXPORT_SYMBOL_GPL(mmu_notifier_unregister_no_release);
+
+bool
+mmu_notifier_range_update_to_read_only(const struct mmu_notifier_range *range)
+{
+ if (!range->vma || range->event != MMU_NOTIFY_PROTECTION_VMA)
+ return false;
+ /* Return true if the vma still have the read flag set. */
+ return range->vma->vm_flags & VM_READ;
+}
+EXPORT_SYMBOL_GPL(mmu_notifier_range_update_to_read_only);
--
2.17.2
From: Jérôme Glisse <[email protected]>
When notifying change for a range use MMU_NOTIFIER_USE_CHANGE_PTE flag
for page table update that use set_pte_at_notify() and where the we are
going either from read and write to read only with same pfn or read only
to read and write with new pfn.
Note that set_pte_at_notify() itself should only be use in rare cases
ie we do not want to use it when we are updating a significant range of
virtual addresses and thus a significant number of pte. Instead for
those cases the event provided to mmu notifer invalidate_range_start()
callback should be use for optimization.
Changes since v1:
- Use the new unsigned flags field in struct mmu_notifier_range
- Use the new flags parameter to mmu_notifier_range_init()
- Explicitly list all the patterns where we can use change_pte()
Signed-off-by: Jérôme Glisse <[email protected]>
Cc: Christian König <[email protected]>
Cc: Joonas Lahtinen <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Felix Kuehling <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Ross Zwisler <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Christian Koenig <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Arnd Bergmann <[email protected]>
---
include/linux/mmu_notifier.h | 34 ++++++++++++++++++++++++++++++++--
mm/ksm.c | 11 ++++++-----
mm/memory.c | 5 +++--
3 files changed, 41 insertions(+), 9 deletions(-)
diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index b6c004bd9f6a..0230a4b06b46 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -40,6 +40,26 @@ enum mmu_notifier_event {
MMU_NOTIFY_SOFT_DIRTY,
};
+/*
+ * @MMU_NOTIFIER_RANGE_BLOCKABLE: can the mmu notifier range_start/range_end
+ * callback block or not ? If set then the callback can block.
+ *
+ * @MMU_NOTIFIER_USE_CHANGE_PTE: only set when the page table it updated with
+ * the set_pte_at_notify() the valid patterns for this are:
+ * - pte read and write to read only same pfn
+ * - pte read only to read and write (pfn can change or stay the same)
+ * - pte read only to read only with different pfn
+ * It is illegal to set in any other circumstances.
+ *
+ * Note that set_pte_at_notify() should not be use outside of the above cases.
+ * When updating a range in batch (like write protecting a range) it is better
+ * to rely on invalidate_range_start() and struct mmu_notifier_range to infer
+ * the kind of update that is happening (as an example you can look at the
+ * mmu_notifier_range_update_to_read_only() function).
+ */
+#define MMU_NOTIFIER_RANGE_BLOCKABLE (1 << 0)
+#define MMU_NOTIFIER_USE_CHANGE_PTE (1 << 1)
+
#ifdef CONFIG_MMU_NOTIFIER
/*
@@ -55,8 +75,6 @@ struct mmu_notifier_mm {
spinlock_t lock;
};
-#define MMU_NOTIFIER_RANGE_BLOCKABLE (1 << 0)
-
struct mmu_notifier_range {
struct vm_area_struct *vma;
struct mm_struct *mm;
@@ -268,6 +286,12 @@ mmu_notifier_range_blockable(const struct mmu_notifier_range *range)
return (range->flags & MMU_NOTIFIER_RANGE_BLOCKABLE);
}
+static inline bool
+mmu_notifier_range_use_change_pte(const struct mmu_notifier_range *range)
+{
+ return (range->flags & MMU_NOTIFIER_USE_CHANGE_PTE);
+}
+
static inline void mmu_notifier_release(struct mm_struct *mm)
{
if (mm_has_notifiers(mm))
@@ -509,6 +533,12 @@ mmu_notifier_range_blockable(const struct mmu_notifier_range *range)
return true;
}
+static inline bool
+mmu_notifier_range_use_change_pte(const struct mmu_notifier_range *range)
+{
+ return false;
+}
+
static inline int mm_has_notifiers(struct mm_struct *mm)
{
return 0;
diff --git a/mm/ksm.c b/mm/ksm.c
index b782fadade8f..41e51882f999 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1066,9 +1066,9 @@ static int write_protect_page(struct vm_area_struct *vma, struct page *page,
BUG_ON(PageTransCompound(page));
- mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm,
- pvmw.address,
- pvmw.address + PAGE_SIZE);
+ mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR,
+ MMU_NOTIFIER_USE_CHANGE_PTE, vma, mm,
+ pvmw.address, pvmw.address + PAGE_SIZE);
mmu_notifier_invalidate_range_start(&range);
if (!page_vma_mapped_walk(&pvmw))
@@ -1155,8 +1155,9 @@ static int replace_page(struct vm_area_struct *vma, struct page *page,
if (!pmd)
goto out;
- mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, addr,
- addr + PAGE_SIZE);
+ mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR,
+ MMU_NOTIFIER_USE_CHANGE_PTE,
+ vma, mm, addr, addr + PAGE_SIZE);
mmu_notifier_invalidate_range_start(&range);
ptep = pte_offset_map_lock(mm, pmd, addr, &ptl);
diff --git a/mm/memory.c b/mm/memory.c
index 45dbc174a88c..cb71d3ff1b97 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2282,8 +2282,9 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
__SetPageUptodate(new_page);
- mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm,
- vmf->address & PAGE_MASK,
+ mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR,
+ MMU_NOTIFIER_USE_CHANGE_PTE,
+ vma, mm, vmf->address & PAGE_MASK,
(vmf->address & PAGE_MASK) + PAGE_SIZE);
mmu_notifier_invalidate_range_start(&range);
--
2.17.2
From: Jérôme Glisse <[email protected]>
CPU page table update can happens for many reasons, not only as a result
of a syscall (munmap(), mprotect(), mremap(), madvise(), ...) but also
as a result of kernel activities (memory compression, reclaim, migration,
...).
Users of mmu notifier API track changes to the CPU page table and take
specific action for them. While current API only provide range of virtual
address affected by the change, not why the changes is happening
This patch is just passing down the new informations by adding it to the
mmu_notifier_range structure.
Changes since v1:
- Initialize flags field from mmu_notifier_range_init() arguments
Signed-off-by: Jérôme Glisse <[email protected]>
Cc: Christian König <[email protected]>
Cc: Joonas Lahtinen <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Felix Kuehling <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Ross Zwisler <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Christian Koenig <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Arnd Bergmann <[email protected]>
---
include/linux/mmu_notifier.h | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index 62f94cd85455..0379956fff23 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -58,10 +58,12 @@ struct mmu_notifier_mm {
#define MMU_NOTIFIER_RANGE_BLOCKABLE (1 << 0)
struct mmu_notifier_range {
+ struct vm_area_struct *vma;
struct mm_struct *mm;
unsigned long start;
unsigned long end;
unsigned flags;
+ enum mmu_notifier_event event;
};
struct mmu_notifier_ops {
@@ -363,10 +365,12 @@ static inline void mmu_notifier_range_init(struct mmu_notifier_range *range,
unsigned long start,
unsigned long end)
{
+ range->vma = vma;
+ range->event = event;
range->mm = mm;
range->start = start;
range->end = end;
- range->flags = 0;
+ range->flags = flags;
}
#define ptep_clear_flush_young_notify(__vma, __address, __ptep) \
--
2.17.2
From: Jérôme Glisse <[email protected]>
This update each existing invalidation to use the correct mmu notifier
event that represent what is happening to the CPU page table. See the
patch which introduced the events to see the rational behind this.
Signed-off-by: Jérôme Glisse <[email protected]>
Cc: Christian König <[email protected]>
Cc: Joonas Lahtinen <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Felix Kuehling <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Ross Zwisler <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Christian Koenig <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Arnd Bergmann <[email protected]>
---
fs/proc/task_mmu.c | 4 ++--
kernel/events/uprobes.c | 2 +-
mm/huge_memory.c | 14 ++++++--------
mm/hugetlb.c | 8 ++++----
mm/khugepaged.c | 2 +-
mm/ksm.c | 4 ++--
mm/madvise.c | 2 +-
mm/memory.c | 14 +++++++-------
mm/migrate.c | 4 ++--
mm/mprotect.c | 5 +++--
mm/rmap.c | 6 +++---
11 files changed, 32 insertions(+), 33 deletions(-)
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index fcbd0e574917..3b93ce496dd4 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1151,8 +1151,8 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
break;
}
- mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0,
- NULL, mm, 0, -1UL);
+ mmu_notifier_range_init(&range, MMU_NOTIFY_SOFT_DIRTY,
+ 0, NULL, mm, 0, -1UL);
mmu_notifier_invalidate_range_start(&range);
}
walk_page_range(0, mm->highest_vm_end, &clear_refs_walk);
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 46f546bdba00..8e8342080013 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -161,7 +161,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
struct mmu_notifier_range range;
struct mem_cgroup *memcg;
- mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, mm, addr,
+ mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, addr,
addr + PAGE_SIZE);
VM_BUG_ON_PAGE(PageTransHuge(old_page), old_page);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index c9d638f1b34e..1da6ca0f0f6d 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1184,9 +1184,8 @@ static vm_fault_t do_huge_pmd_wp_page_fallback(struct vm_fault *vmf,
cond_resched();
}
- mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, vma->vm_mm,
- haddr,
- haddr + HPAGE_PMD_SIZE);
+ mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
+ haddr, haddr + HPAGE_PMD_SIZE);
mmu_notifier_invalidate_range_start(&range);
vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
@@ -1349,9 +1348,8 @@ vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd)
vma, HPAGE_PMD_NR);
__SetPageUptodate(new_page);
- mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, vma->vm_mm,
- haddr,
- haddr + HPAGE_PMD_SIZE);
+ mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
+ haddr, haddr + HPAGE_PMD_SIZE);
mmu_notifier_invalidate_range_start(&range);
spin_lock(vmf->ptl);
@@ -2028,7 +2026,7 @@ void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud,
spinlock_t *ptl;
struct mmu_notifier_range range;
- mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, vma->vm_mm,
+ mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
address & HPAGE_PUD_MASK,
(address & HPAGE_PUD_MASK) + HPAGE_PUD_SIZE);
mmu_notifier_invalidate_range_start(&range);
@@ -2247,7 +2245,7 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
spinlock_t *ptl;
struct mmu_notifier_range range;
- mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, vma->vm_mm,
+ mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
address & HPAGE_PMD_MASK,
(address & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE);
mmu_notifier_invalidate_range_start(&range);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index d9e5c5a4c004..a58115c6b0a3 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3250,7 +3250,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
cow = (vma->vm_flags & (VM_SHARED | VM_MAYWRITE)) == VM_MAYWRITE;
if (cow) {
- mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, src,
+ mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, src,
vma->vm_start,
vma->vm_end);
mmu_notifier_invalidate_range_start(&range);
@@ -3631,7 +3631,7 @@ static vm_fault_t hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
pages_per_huge_page(h));
__SetPageUptodate(new_page);
- mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, mm, haddr,
+ mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, haddr,
haddr + huge_page_size(h));
mmu_notifier_invalidate_range_start(&range);
@@ -4357,8 +4357,8 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
* start/end. Set range.start/range.end to cover the maximum possible
* range if PMD sharing is possible.
*/
- mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, mm, start,
- end);
+ mmu_notifier_range_init(&range, MMU_NOTIFY_PROTECTION_VMA,
+ 0, vma, mm, start, end);
adjust_range_if_pmd_sharing_possible(vma, &range.start, &range.end);
BUG_ON(address >= end);
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index e7944f5e6258..579699d2b347 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1016,7 +1016,7 @@ static void collapse_huge_page(struct mm_struct *mm,
pte = pte_offset_map(pmd, address);
pte_ptl = pte_lockptr(mm, pmd);
- mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, NULL, mm,
+ mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, NULL, mm,
address, address + HPAGE_PMD_SIZE);
mmu_notifier_invalidate_range_start(&range);
pmd_ptl = pmd_lock(mm, pmd); /* probably unnecessary */
diff --git a/mm/ksm.c b/mm/ksm.c
index 2ea25fc0befb..b782fadade8f 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1066,7 +1066,7 @@ static int write_protect_page(struct vm_area_struct *vma, struct page *page,
BUG_ON(PageTransCompound(page));
- mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, mm,
+ mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm,
pvmw.address,
pvmw.address + PAGE_SIZE);
mmu_notifier_invalidate_range_start(&range);
@@ -1155,7 +1155,7 @@ static int replace_page(struct vm_area_struct *vma, struct page *page,
if (!pmd)
goto out;
- mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, mm, addr,
+ mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, addr,
addr + PAGE_SIZE);
mmu_notifier_invalidate_range_start(&range);
diff --git a/mm/madvise.c b/mm/madvise.c
index c617f53a9c09..a692d2a893b5 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -472,7 +472,7 @@ static int madvise_free_single_vma(struct vm_area_struct *vma,
range.end = min(vma->vm_end, end_addr);
if (range.end <= vma->vm_start)
return -EINVAL;
- mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, mm,
+ mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm,
range.start, range.end);
lru_add_drain();
diff --git a/mm/memory.c b/mm/memory.c
index 4565f636cca3..45dbc174a88c 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1010,8 +1010,8 @@ int copy_page_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
is_cow = is_cow_mapping(vma->vm_flags);
if (is_cow) {
- mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma,
- src_mm, addr, end);
+ mmu_notifier_range_init(&range, MMU_NOTIFY_PROTECTION_PAGE,
+ 0, vma, src_mm, addr, end);
mmu_notifier_invalidate_range_start(&range);
}
@@ -1358,7 +1358,7 @@ void zap_page_range(struct vm_area_struct *vma, unsigned long start,
struct mmu_gather tlb;
lru_add_drain();
- mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, vma->vm_mm,
+ mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
start, start + size);
tlb_gather_mmu(&tlb, vma->vm_mm, start, range.end);
update_hiwater_rss(vma->vm_mm);
@@ -1385,7 +1385,7 @@ static void zap_page_range_single(struct vm_area_struct *vma, unsigned long addr
struct mmu_gather tlb;
lru_add_drain();
- mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, vma->vm_mm,
+ mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
address, address + size);
tlb_gather_mmu(&tlb, vma->vm_mm, address, range.end);
update_hiwater_rss(vma->vm_mm);
@@ -2282,7 +2282,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
__SetPageUptodate(new_page);
- mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, mm,
+ mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm,
vmf->address & PAGE_MASK,
(vmf->address & PAGE_MASK) + PAGE_SIZE);
mmu_notifier_invalidate_range_start(&range);
@@ -4105,7 +4105,7 @@ static int __follow_pte_pmd(struct mm_struct *mm, unsigned long address,
goto out;
if (range) {
- mmu_notifier_range_init(range, MMU_NOTIFY_UNMAP, 0,
+ mmu_notifier_range_init(range, MMU_NOTIFY_CLEAR, 0,
NULL, mm, address & PMD_MASK,
(address & PMD_MASK) + PMD_SIZE);
mmu_notifier_invalidate_range_start(range);
@@ -4124,7 +4124,7 @@ static int __follow_pte_pmd(struct mm_struct *mm, unsigned long address,
goto out;
if (range) {
- mmu_notifier_range_init(range, MMU_NOTIFY_UNMAP, 0, NULL, mm,
+ mmu_notifier_range_init(range, MMU_NOTIFY_CLEAR, 0, NULL, mm,
address & PAGE_MASK,
(address & PAGE_MASK) + PAGE_SIZE);
mmu_notifier_invalidate_range_start(range);
diff --git a/mm/migrate.c b/mm/migrate.c
index 81eb307b2b5b..8e6d00541b3c 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2340,7 +2340,7 @@ static void migrate_vma_collect(struct migrate_vma *migrate)
mm_walk.mm = migrate->vma->vm_mm;
mm_walk.private = migrate;
- mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, NULL, mm_walk.mm,
+ mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, NULL, mm_walk.mm,
migrate->start,
migrate->end);
mmu_notifier_invalidate_range_start(&range);
@@ -2749,7 +2749,7 @@ static void migrate_vma_pages(struct migrate_vma *migrate)
notified = true;
mmu_notifier_range_init(&range,
- MMU_NOTIFY_UNMAP, 0,
+ MMU_NOTIFY_CLEAR, 0,
NULL,
migrate->vma->vm_mm,
addr, migrate->end);
diff --git a/mm/mprotect.c b/mm/mprotect.c
index b10984052ae9..65242f1e4457 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -185,8 +185,9 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma,
/* invoke the mmu notifier if the pmd is populated */
if (!range.start) {
- mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0,
- vma, vma->vm_mm, addr, end);
+ mmu_notifier_range_init(&range,
+ MMU_NOTIFY_PROTECTION_VMA, 0,
+ vma, vma->vm_mm, addr, end);
mmu_notifier_invalidate_range_start(&range);
}
diff --git a/mm/rmap.c b/mm/rmap.c
index c6535a6ec850..627b38ad5052 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -896,8 +896,8 @@ static bool page_mkclean_one(struct page *page, struct vm_area_struct *vma,
* We have to assume the worse case ie pmd for invalidation. Note that
* the page can not be free from this function.
*/
- mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, vma->vm_mm,
- address,
+ mmu_notifier_range_init(&range, MMU_NOTIFY_PROTECTION_PAGE,
+ 0, vma, vma->vm_mm, address,
min(vma->vm_end, address +
(PAGE_SIZE << compound_order(page))));
mmu_notifier_invalidate_range_start(&range);
@@ -1372,7 +1372,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
* Note that the page can not be free in this function as call of
* try_to_unmap() must hold a reference on the page.
*/
- mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, vma->vm_mm,
+ mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
address,
min(vma->vm_end, address +
(PAGE_SIZE << compound_order(page))));
--
2.17.2
From: Jérôme Glisse <[email protected]>
CPU page table update can happens for many reasons, not only as a result
of a syscall (munmap(), mprotect(), mremap(), madvise(), ...) but also
as a result of kernel activities (memory compression, reclaim, migration,
...).
Users of mmu notifier API track changes to the CPU page table and take
specific action for them. While current API only provide range of virtual
address affected by the change, not why the changes is happening.
This patchset do the initial mechanical convertion of all the places that
calls mmu_notifier_range_init to also provide the default MMU_NOTIFY_UNMAP
event as well as the vma if it is know (most invalidation happens against
a given vma). Passing down the vma allows the users of mmu notifier to
inspect the new vma page protection.
The MMU_NOTIFY_UNMAP is always the safe default as users of mmu notifier
should assume that every for the range is going away when that event
happens. A latter patch do convert mm call path to use a more appropriate
events for each call.
Changes since v1:
- add the flags parameter to init range flags
This is done as 2 patches so that no call site is forgotten especialy
as it uses this following coccinelle patch:
%<----------------------------------------------------------------------
@@
identifier I1, I2, I3, I4;
@@
static inline void mmu_notifier_range_init(struct mmu_notifier_range *I1,
+enum mmu_notifier_event event,
+unsigned flags,
+struct vm_area_struct *vma,
struct mm_struct *I2, unsigned long I3, unsigned long I4) { ... }
@@
@@
-#define mmu_notifier_range_init(range, mm, start, end)
+#define mmu_notifier_range_init(range, event, flags, vma, mm, start, end)
@@
expression E1, E3, E4;
identifier I1;
@@
<...
mmu_notifier_range_init(E1,
+MMU_NOTIFY_UNMAP, 0, I1,
I1->vm_mm, E3, E4)
...>
@@
expression E1, E2, E3, E4;
identifier FN, VMA;
@@
FN(..., struct vm_area_struct *VMA, ...) {
<...
mmu_notifier_range_init(E1,
+MMU_NOTIFY_UNMAP, 0, VMA,
E2, E3, E4)
...> }
@@
expression E1, E2, E3, E4;
identifier FN, VMA;
@@
FN(...) {
struct vm_area_struct *VMA;
<...
mmu_notifier_range_init(E1,
+MMU_NOTIFY_UNMAP, 0, VMA,
E2, E3, E4)
...> }
@@
expression E1, E2, E3, E4;
identifier FN;
@@
FN(...) {
<...
mmu_notifier_range_init(E1,
+MMU_NOTIFY_UNMAP, 0, NULL,
E2, E3, E4)
...> }
---------------------------------------------------------------------->%
Applied with:
spatch --all-includes --sp-file mmu-notifier.spatch fs/proc/task_mmu.c --in-place
spatch --sp-file mmu-notifier.spatch --dir kernel/events/ --in-place
spatch --sp-file mmu-notifier.spatch --dir mm --in-place
Signed-off-by: Jérôme Glisse <[email protected]>
Cc: Christian König <[email protected]>
Cc: Joonas Lahtinen <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: Felix Kuehling <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Ross Zwisler <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Christian Koenig <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Arnd Bergmann <[email protected]>
---
fs/proc/task_mmu.c | 3 ++-
include/linux/mmu_notifier.h | 5 ++++-
kernel/events/uprobes.c | 3 ++-
mm/huge_memory.c | 12 ++++++++----
mm/hugetlb.c | 12 ++++++++----
mm/khugepaged.c | 3 ++-
mm/ksm.c | 6 ++++--
mm/madvise.c | 3 ++-
mm/memory.c | 25 ++++++++++++++++---------
mm/migrate.c | 5 ++++-
mm/mprotect.c | 3 ++-
mm/mremap.c | 3 ++-
mm/oom_kill.c | 3 ++-
mm/rmap.c | 6 ++++--
14 files changed, 62 insertions(+), 30 deletions(-)
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 92a91e7816d8..fcbd0e574917 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1151,7 +1151,8 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
break;
}
- mmu_notifier_range_init(&range, mm, 0, -1UL);
+ mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0,
+ NULL, mm, 0, -1UL);
mmu_notifier_invalidate_range_start(&range);
}
walk_page_range(0, mm->highest_vm_end, &clear_refs_walk);
diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index 2386e71ac1b8..62f94cd85455 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -356,6 +356,9 @@ static inline void mmu_notifier_mm_destroy(struct mm_struct *mm)
static inline void mmu_notifier_range_init(struct mmu_notifier_range *range,
+ enum mmu_notifier_event event,
+ unsigned flags,
+ struct vm_area_struct *vma,
struct mm_struct *mm,
unsigned long start,
unsigned long end)
@@ -491,7 +494,7 @@ static inline void _mmu_notifier_range_init(struct mmu_notifier_range *range,
range->end = end;
}
-#define mmu_notifier_range_init(range, mm, start, end) \
+#define mmu_notifier_range_init(range,event,flags,vma,mm,start,end) \
_mmu_notifier_range_init(range, start, end)
static inline bool
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index affa830a198c..46f546bdba00 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -161,7 +161,8 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
struct mmu_notifier_range range;
struct mem_cgroup *memcg;
- mmu_notifier_range_init(&range, mm, addr, addr + PAGE_SIZE);
+ mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, mm, addr,
+ addr + PAGE_SIZE);
VM_BUG_ON_PAGE(PageTransHuge(old_page), old_page);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index d4847026d4b1..c9d638f1b34e 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1184,7 +1184,8 @@ static vm_fault_t do_huge_pmd_wp_page_fallback(struct vm_fault *vmf,
cond_resched();
}
- mmu_notifier_range_init(&range, vma->vm_mm, haddr,
+ mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, vma->vm_mm,
+ haddr,
haddr + HPAGE_PMD_SIZE);
mmu_notifier_invalidate_range_start(&range);
@@ -1348,7 +1349,8 @@ vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd)
vma, HPAGE_PMD_NR);
__SetPageUptodate(new_page);
- mmu_notifier_range_init(&range, vma->vm_mm, haddr,
+ mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, vma->vm_mm,
+ haddr,
haddr + HPAGE_PMD_SIZE);
mmu_notifier_invalidate_range_start(&range);
@@ -2026,7 +2028,8 @@ void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud,
spinlock_t *ptl;
struct mmu_notifier_range range;
- mmu_notifier_range_init(&range, vma->vm_mm, address & HPAGE_PUD_MASK,
+ mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, vma->vm_mm,
+ address & HPAGE_PUD_MASK,
(address & HPAGE_PUD_MASK) + HPAGE_PUD_SIZE);
mmu_notifier_invalidate_range_start(&range);
ptl = pud_lock(vma->vm_mm, pud);
@@ -2244,7 +2247,8 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
spinlock_t *ptl;
struct mmu_notifier_range range;
- mmu_notifier_range_init(&range, vma->vm_mm, address & HPAGE_PMD_MASK,
+ mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, vma->vm_mm,
+ address & HPAGE_PMD_MASK,
(address & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE);
mmu_notifier_invalidate_range_start(&range);
ptl = pmd_lock(vma->vm_mm, pmd);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 1c5219193b9e..d9e5c5a4c004 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3250,7 +3250,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
cow = (vma->vm_flags & (VM_SHARED | VM_MAYWRITE)) == VM_MAYWRITE;
if (cow) {
- mmu_notifier_range_init(&range, src, vma->vm_start,
+ mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, src,
+ vma->vm_start,
vma->vm_end);
mmu_notifier_invalidate_range_start(&range);
}
@@ -3362,7 +3363,8 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma,
/*
* If sharing possible, alert mmu notifiers of worst case.
*/
- mmu_notifier_range_init(&range, mm, start, end);
+ mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, mm, start,
+ end);
adjust_range_if_pmd_sharing_possible(vma, &range.start, &range.end);
mmu_notifier_invalidate_range_start(&range);
address = start;
@@ -3629,7 +3631,8 @@ static vm_fault_t hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
pages_per_huge_page(h));
__SetPageUptodate(new_page);
- mmu_notifier_range_init(&range, mm, haddr, haddr + huge_page_size(h));
+ mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, mm, haddr,
+ haddr + huge_page_size(h));
mmu_notifier_invalidate_range_start(&range);
/*
@@ -4354,7 +4357,8 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
* start/end. Set range.start/range.end to cover the maximum possible
* range if PMD sharing is possible.
*/
- mmu_notifier_range_init(&range, mm, start, end);
+ mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, mm, start,
+ end);
adjust_range_if_pmd_sharing_possible(vma, &range.start, &range.end);
BUG_ON(address >= end);
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 449044378782..e7944f5e6258 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1016,7 +1016,8 @@ static void collapse_huge_page(struct mm_struct *mm,
pte = pte_offset_map(pmd, address);
pte_ptl = pte_lockptr(mm, pmd);
- mmu_notifier_range_init(&range, mm, address, address + HPAGE_PMD_SIZE);
+ mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, NULL, mm,
+ address, address + HPAGE_PMD_SIZE);
mmu_notifier_invalidate_range_start(&range);
pmd_ptl = pmd_lock(mm, pmd); /* probably unnecessary */
/*
diff --git a/mm/ksm.c b/mm/ksm.c
index fa78626da9f0..2ea25fc0befb 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1066,7 +1066,8 @@ static int write_protect_page(struct vm_area_struct *vma, struct page *page,
BUG_ON(PageTransCompound(page));
- mmu_notifier_range_init(&range, mm, pvmw.address,
+ mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, mm,
+ pvmw.address,
pvmw.address + PAGE_SIZE);
mmu_notifier_invalidate_range_start(&range);
@@ -1154,7 +1155,8 @@ static int replace_page(struct vm_area_struct *vma, struct page *page,
if (!pmd)
goto out;
- mmu_notifier_range_init(&range, mm, addr, addr + PAGE_SIZE);
+ mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, mm, addr,
+ addr + PAGE_SIZE);
mmu_notifier_invalidate_range_start(&range);
ptep = pte_offset_map_lock(mm, pmd, addr, &ptl);
diff --git a/mm/madvise.c b/mm/madvise.c
index 21a7881a2db4..c617f53a9c09 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -472,7 +472,8 @@ static int madvise_free_single_vma(struct vm_area_struct *vma,
range.end = min(vma->vm_end, end_addr);
if (range.end <= vma->vm_start)
return -EINVAL;
- mmu_notifier_range_init(&range, mm, range.start, range.end);
+ mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, mm,
+ range.start, range.end);
lru_add_drain();
tlb_gather_mmu(&tlb, mm, range.start, range.end);
diff --git a/mm/memory.c b/mm/memory.c
index 34ced1369883..4565f636cca3 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1010,7 +1010,8 @@ int copy_page_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
is_cow = is_cow_mapping(vma->vm_flags);
if (is_cow) {
- mmu_notifier_range_init(&range, src_mm, addr, end);
+ mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma,
+ src_mm, addr, end);
mmu_notifier_invalidate_range_start(&range);
}
@@ -1334,7 +1335,8 @@ void unmap_vmas(struct mmu_gather *tlb,
{
struct mmu_notifier_range range;
- mmu_notifier_range_init(&range, vma->vm_mm, start_addr, end_addr);
+ mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, vma->vm_mm,
+ start_addr, end_addr);
mmu_notifier_invalidate_range_start(&range);
for ( ; vma && vma->vm_start < end_addr; vma = vma->vm_next)
unmap_single_vma(tlb, vma, start_addr, end_addr, NULL);
@@ -1356,7 +1358,8 @@ void zap_page_range(struct vm_area_struct *vma, unsigned long start,
struct mmu_gather tlb;
lru_add_drain();
- mmu_notifier_range_init(&range, vma->vm_mm, start, start + size);
+ mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, vma->vm_mm,
+ start, start + size);
tlb_gather_mmu(&tlb, vma->vm_mm, start, range.end);
update_hiwater_rss(vma->vm_mm);
mmu_notifier_invalidate_range_start(&range);
@@ -1382,7 +1385,8 @@ static void zap_page_range_single(struct vm_area_struct *vma, unsigned long addr
struct mmu_gather tlb;
lru_add_drain();
- mmu_notifier_range_init(&range, vma->vm_mm, address, address + size);
+ mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, vma->vm_mm,
+ address, address + size);
tlb_gather_mmu(&tlb, vma->vm_mm, address, range.end);
update_hiwater_rss(vma->vm_mm);
mmu_notifier_invalidate_range_start(&range);
@@ -2278,7 +2282,8 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
__SetPageUptodate(new_page);
- mmu_notifier_range_init(&range, mm, vmf->address & PAGE_MASK,
+ mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, mm,
+ vmf->address & PAGE_MASK,
(vmf->address & PAGE_MASK) + PAGE_SIZE);
mmu_notifier_invalidate_range_start(&range);
@@ -4100,8 +4105,9 @@ static int __follow_pte_pmd(struct mm_struct *mm, unsigned long address,
goto out;
if (range) {
- mmu_notifier_range_init(range, mm, address & PMD_MASK,
- (address & PMD_MASK) + PMD_SIZE);
+ mmu_notifier_range_init(range, MMU_NOTIFY_UNMAP, 0,
+ NULL, mm, address & PMD_MASK,
+ (address & PMD_MASK) + PMD_SIZE);
mmu_notifier_invalidate_range_start(range);
}
*ptlp = pmd_lock(mm, pmd);
@@ -4118,8 +4124,9 @@ static int __follow_pte_pmd(struct mm_struct *mm, unsigned long address,
goto out;
if (range) {
- mmu_notifier_range_init(range, mm, address & PAGE_MASK,
- (address & PAGE_MASK) + PAGE_SIZE);
+ mmu_notifier_range_init(range, MMU_NOTIFY_UNMAP, 0, NULL, mm,
+ address & PAGE_MASK,
+ (address & PAGE_MASK) + PAGE_SIZE);
mmu_notifier_invalidate_range_start(range);
}
ptep = pte_offset_map_lock(mm, pmd, address, ptlp);
diff --git a/mm/migrate.c b/mm/migrate.c
index 76517bf03621..81eb307b2b5b 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2340,7 +2340,8 @@ static void migrate_vma_collect(struct migrate_vma *migrate)
mm_walk.mm = migrate->vma->vm_mm;
mm_walk.private = migrate;
- mmu_notifier_range_init(&range, mm_walk.mm, migrate->start,
+ mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, NULL, mm_walk.mm,
+ migrate->start,
migrate->end);
mmu_notifier_invalidate_range_start(&range);
walk_page_range(migrate->start, migrate->end, &mm_walk);
@@ -2748,6 +2749,8 @@ static void migrate_vma_pages(struct migrate_vma *migrate)
notified = true;
mmu_notifier_range_init(&range,
+ MMU_NOTIFY_UNMAP, 0,
+ NULL,
migrate->vma->vm_mm,
addr, migrate->end);
mmu_notifier_invalidate_range_start(&range);
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 028c724dcb1a..b10984052ae9 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -185,7 +185,8 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma,
/* invoke the mmu notifier if the pmd is populated */
if (!range.start) {
- mmu_notifier_range_init(&range, vma->vm_mm, addr, end);
+ mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0,
+ vma, vma->vm_mm, addr, end);
mmu_notifier_invalidate_range_start(&range);
}
diff --git a/mm/mremap.c b/mm/mremap.c
index 3320616ed93f..364e79bcc1ff 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -249,7 +249,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
old_end = old_addr + len;
flush_cache_range(vma, old_addr, old_end);
- mmu_notifier_range_init(&range, vma->vm_mm, old_addr, old_end);
+ mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, vma->vm_mm,
+ old_addr, old_end);
mmu_notifier_invalidate_range_start(&range);
for (; old_addr < old_end; old_addr += extent, new_addr += extent) {
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 3a2484884cfd..539c91d0b26a 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -531,7 +531,8 @@ bool __oom_reap_task_mm(struct mm_struct *mm)
struct mmu_notifier_range range;
struct mmu_gather tlb;
- mmu_notifier_range_init(&range, mm, vma->vm_start,
+ mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0,
+ vma, mm, vma->vm_start,
vma->vm_end);
tlb_gather_mmu(&tlb, mm, range.start, range.end);
if (mmu_notifier_invalidate_range_start_nonblock(&range)) {
diff --git a/mm/rmap.c b/mm/rmap.c
index 0454ecc29537..c6535a6ec850 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -896,7 +896,8 @@ static bool page_mkclean_one(struct page *page, struct vm_area_struct *vma,
* We have to assume the worse case ie pmd for invalidation. Note that
* the page can not be free from this function.
*/
- mmu_notifier_range_init(&range, vma->vm_mm, address,
+ mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, vma->vm_mm,
+ address,
min(vma->vm_end, address +
(PAGE_SIZE << compound_order(page))));
mmu_notifier_invalidate_range_start(&range);
@@ -1371,7 +1372,8 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
* Note that the page can not be free in this function as call of
* try_to_unmap() must hold a reference on the page.
*/
- mmu_notifier_range_init(&range, vma->vm_mm, address,
+ mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, vma->vm_mm,
+ address,
min(vma->vm_end, address +
(PAGE_SIZE << compound_order(page))));
if (PageHuge(page)) {
--
2.17.2
On Tue, Feb 19, 2019 at 12:04 PM <[email protected]> wrote:
>
> From: Jérôme Glisse <[email protected]>
>
> Since last version [4] i added the extra bits needed for the change_pte
> optimization (which is a KSM thing). Here i am not posting users of
> this, they will be posted to the appropriate sub-systems (KVM, GPU,
> RDMA, ...) once this serie get upstream. If you want to look at users
> of this see [5] [6]. If this gets in 5.1 then i will be submitting
> those users for 5.2 (including KVM if KVM folks feel comfortable with
> it).
The users look small and straightforward. Why not await acks and
reviewed-by's for the users like a typical upstream submission and
merge them together? Is all of the functionality of this
infrastructure consumed by the proposed users? Last time I checked it
was only a subset.
On Tue, Feb 19, 2019 at 12:15:55PM -0800, Dan Williams wrote:
> On Tue, Feb 19, 2019 at 12:04 PM <[email protected]> wrote:
> >
> > From: J?r?me Glisse <[email protected]>
> >
> > Since last version [4] i added the extra bits needed for the change_pte
> > optimization (which is a KSM thing). Here i am not posting users of
> > this, they will be posted to the appropriate sub-systems (KVM, GPU,
> > RDMA, ...) once this serie get upstream. If you want to look at users
> > of this see [5] [6]. If this gets in 5.1 then i will be submitting
> > those users for 5.2 (including KVM if KVM folks feel comfortable with
> > it).
>
> The users look small and straightforward. Why not await acks and
> reviewed-by's for the users like a typical upstream submission and
> merge them together? Is all of the functionality of this
> infrastructure consumed by the proposed users? Last time I checked it
> was only a subset.
Yes pretty much all is use, the unuse case is SOFT_DIRTY and CLEAR
vs UNMAP. Both of which i intend to use. The RDMA folks already ack
the patches IIRC, so did radeon and amdgpu. I believe the i915 folks
were ok with it too. I do not want to merge things through Andrew
for all of this we discussed that in the past, merge mm bits through
Andrew in one release and bits that use things in the next release.
Cheers,
J?r?me
On Tue, Feb 19, 2019 at 03:30:33PM -0500, Jerome Glisse wrote:
> On Tue, Feb 19, 2019 at 12:15:55PM -0800, Dan Williams wrote:
> > On Tue, Feb 19, 2019 at 12:04 PM <[email protected]> wrote:
> > >
> > > From: Jérôme Glisse <[email protected]>
> > >
> > > Since last version [4] i added the extra bits needed for the change_pte
> > > optimization (which is a KSM thing). Here i am not posting users of
> > > this, they will be posted to the appropriate sub-systems (KVM, GPU,
> > > RDMA, ...) once this serie get upstream. If you want to look at users
> > > of this see [5] [6]. If this gets in 5.1 then i will be submitting
> > > those users for 5.2 (including KVM if KVM folks feel comfortable with
> > > it).
> >
> > The users look small and straightforward. Why not await acks and
> > reviewed-by's for the users like a typical upstream submission and
> > merge them together? Is all of the functionality of this
> > infrastructure consumed by the proposed users? Last time I checked it
> > was only a subset.
>
> Yes pretty much all is use, the unuse case is SOFT_DIRTY and CLEAR
> vs UNMAP. Both of which i intend to use. The RDMA folks already ack
> the patches IIRC, so did radeon and amdgpu. I believe the i915 folks
> were ok with it too. I do not want to merge things through Andrew
> for all of this we discussed that in the past, merge mm bits through
> Andrew in one release and bits that use things in the next release.
It is usually cleaner for everyone to split patches like this, for
instance I always prefer to merge RDMA patches via RDMA when
possible. Less conflicts.
The other somewhat reasonable option is to get acks and send your own
complete PR to Linus next week? That works OK for tree-wide changes.
Jason
On Tue, Feb 19, 2019 at 12:30 PM Jerome Glisse <[email protected]> wrote:
>
> On Tue, Feb 19, 2019 at 12:15:55PM -0800, Dan Williams wrote:
> > On Tue, Feb 19, 2019 at 12:04 PM <[email protected]> wrote:
> > >
> > > From: Jérôme Glisse <[email protected]>
> > >
> > > Since last version [4] i added the extra bits needed for the change_pte
> > > optimization (which is a KSM thing). Here i am not posting users of
> > > this, they will be posted to the appropriate sub-systems (KVM, GPU,
> > > RDMA, ...) once this serie get upstream. If you want to look at users
> > > of this see [5] [6]. If this gets in 5.1 then i will be submitting
> > > those users for 5.2 (including KVM if KVM folks feel comfortable with
> > > it).
> >
> > The users look small and straightforward. Why not await acks and
> > reviewed-by's for the users like a typical upstream submission and
> > merge them together? Is all of the functionality of this
> > infrastructure consumed by the proposed users? Last time I checked it
> > was only a subset.
>
> Yes pretty much all is use, the unuse case is SOFT_DIRTY and CLEAR
> vs UNMAP. Both of which i intend to use. The RDMA folks already ack
> the patches IIRC, so did radeon and amdgpu. I believe the i915 folks
> were ok with it too. I do not want to merge things through Andrew
> for all of this we discussed that in the past, merge mm bits through
> Andrew in one release and bits that use things in the next release.
Ok, I was trying to find the links to the acks on the mailing list,
those references would address my concerns. I see no reason to rush
SOFT_DIRTY and CLEAR ahead of the upstream user.
On Tue, Feb 19, 2019 at 12:41 PM Jason Gunthorpe <[email protected]> wrote:
>
> On Tue, Feb 19, 2019 at 03:30:33PM -0500, Jerome Glisse wrote:
> > On Tue, Feb 19, 2019 at 12:15:55PM -0800, Dan Williams wrote:
> > > On Tue, Feb 19, 2019 at 12:04 PM <[email protected]> wrote:
> > > >
> > > > From: Jérôme Glisse <[email protected]>
> > > >
> > > > Since last version [4] i added the extra bits needed for the change_pte
> > > > optimization (which is a KSM thing). Here i am not posting users of
> > > > this, they will be posted to the appropriate sub-systems (KVM, GPU,
> > > > RDMA, ...) once this serie get upstream. If you want to look at users
> > > > of this see [5] [6]. If this gets in 5.1 then i will be submitting
> > > > those users for 5.2 (including KVM if KVM folks feel comfortable with
> > > > it).
> > >
> > > The users look small and straightforward. Why not await acks and
> > > reviewed-by's for the users like a typical upstream submission and
> > > merge them together? Is all of the functionality of this
> > > infrastructure consumed by the proposed users? Last time I checked it
> > > was only a subset.
> >
> > Yes pretty much all is use, the unuse case is SOFT_DIRTY and CLEAR
> > vs UNMAP. Both of which i intend to use. The RDMA folks already ack
> > the patches IIRC, so did radeon and amdgpu. I believe the i915 folks
> > were ok with it too. I do not want to merge things through Andrew
> > for all of this we discussed that in the past, merge mm bits through
> > Andrew in one release and bits that use things in the next release.
>
> It is usually cleaner for everyone to split patches like this, for
> instance I always prefer to merge RDMA patches via RDMA when
> possible. Less conflicts.
>
> The other somewhat reasonable option is to get acks and send your own
> complete PR to Linus next week? That works OK for tree-wide changes.
Yes, I'm not proposing that they be merged together, instead I'm just
looking for the acked-by / reviewed-by tags even if those patches are
targeting the next merge window.
On Tue, Feb 19, 2019 at 12:40:37PM -0800, Dan Williams wrote:
> On Tue, Feb 19, 2019 at 12:30 PM Jerome Glisse <[email protected]> wrote:
> >
> > On Tue, Feb 19, 2019 at 12:15:55PM -0800, Dan Williams wrote:
> > > On Tue, Feb 19, 2019 at 12:04 PM <[email protected]> wrote:
> > > >
> > > > From: J?r?me Glisse <[email protected]>
> > > >
> > > > Since last version [4] i added the extra bits needed for the change_pte
> > > > optimization (which is a KSM thing). Here i am not posting users of
> > > > this, they will be posted to the appropriate sub-systems (KVM, GPU,
> > > > RDMA, ...) once this serie get upstream. If you want to look at users
> > > > of this see [5] [6]. If this gets in 5.1 then i will be submitting
> > > > those users for 5.2 (including KVM if KVM folks feel comfortable with
> > > > it).
> > >
> > > The users look small and straightforward. Why not await acks and
> > > reviewed-by's for the users like a typical upstream submission and
> > > merge them together? Is all of the functionality of this
> > > infrastructure consumed by the proposed users? Last time I checked it
> > > was only a subset.
> >
> > Yes pretty much all is use, the unuse case is SOFT_DIRTY and CLEAR
> > vs UNMAP. Both of which i intend to use. The RDMA folks already ack
> > the patches IIRC, so did radeon and amdgpu. I believe the i915 folks
> > were ok with it too. I do not want to merge things through Andrew
> > for all of this we discussed that in the past, merge mm bits through
> > Andrew in one release and bits that use things in the next release.
>
> Ok, I was trying to find the links to the acks on the mailing list,
> those references would address my concerns. I see no reason to rush
> SOFT_DIRTY and CLEAR ahead of the upstream user.
I intend to post user for those in next couple weeks for 5.2 HMM bits.
So user for this (CLEAR/UNMAP/SOFTDIRTY) will definitly materialize in
time for 5.2.
ACKS AMD/RADEON https://lkml.org/lkml/2019/2/1/395
ACKS RDMA https://lkml.org/lkml/2018/12/6/1473
For KVM Andrea Arcangeli seems to like the whole idea to restore the
change_pte optimization but i have not got ACK from Radim or Paolo,
however given the small performance improvement figure i get with it
i do not see while they would not ACK.
https://lkml.org/lkml/2019/2/18/1530
Cheers,
J?r?me
On Tue, Feb 19, 2019 at 12:58 PM Jerome Glisse <[email protected]> wrote:
>
> On Tue, Feb 19, 2019 at 12:40:37PM -0800, Dan Williams wrote:
> > On Tue, Feb 19, 2019 at 12:30 PM Jerome Glisse <[email protected]> wrote:
> > >
> > > On Tue, Feb 19, 2019 at 12:15:55PM -0800, Dan Williams wrote:
> > > > On Tue, Feb 19, 2019 at 12:04 PM <[email protected]> wrote:
> > > > >
> > > > > From: Jérôme Glisse <[email protected]>
> > > > >
> > > > > Since last version [4] i added the extra bits needed for the change_pte
> > > > > optimization (which is a KSM thing). Here i am not posting users of
> > > > > this, they will be posted to the appropriate sub-systems (KVM, GPU,
> > > > > RDMA, ...) once this serie get upstream. If you want to look at users
> > > > > of this see [5] [6]. If this gets in 5.1 then i will be submitting
> > > > > those users for 5.2 (including KVM if KVM folks feel comfortable with
> > > > > it).
> > > >
> > > > The users look small and straightforward. Why not await acks and
> > > > reviewed-by's for the users like a typical upstream submission and
> > > > merge them together? Is all of the functionality of this
> > > > infrastructure consumed by the proposed users? Last time I checked it
> > > > was only a subset.
> > >
> > > Yes pretty much all is use, the unuse case is SOFT_DIRTY and CLEAR
> > > vs UNMAP. Both of which i intend to use. The RDMA folks already ack
> > > the patches IIRC, so did radeon and amdgpu. I believe the i915 folks
> > > were ok with it too. I do not want to merge things through Andrew
> > > for all of this we discussed that in the past, merge mm bits through
> > > Andrew in one release and bits that use things in the next release.
> >
> > Ok, I was trying to find the links to the acks on the mailing list,
> > those references would address my concerns. I see no reason to rush
> > SOFT_DIRTY and CLEAR ahead of the upstream user.
>
> I intend to post user for those in next couple weeks for 5.2 HMM bits.
> So user for this (CLEAR/UNMAP/SOFTDIRTY) will definitly materialize in
> time for 5.2.
>
> ACKS AMD/RADEON https://lkml.org/lkml/2019/2/1/395
> ACKS RDMA https://lkml.org/lkml/2018/12/6/1473
Nice, thanks!
> For KVM Andrea Arcangeli seems to like the whole idea to restore the
> change_pte optimization but i have not got ACK from Radim or Paolo,
> however given the small performance improvement figure i get with it
> i do not see while they would not ACK.
Sure, but no need to push ahead without that confirmation, right? At
least for the piece that KVM cares about, maybe that's already covered
in the infrastructure RDMA and RADEON are using?
On Tue, Feb 19, 2019 at 01:19:09PM -0800, Dan Williams wrote:
> On Tue, Feb 19, 2019 at 12:58 PM Jerome Glisse <[email protected]> wrote:
> >
> > On Tue, Feb 19, 2019 at 12:40:37PM -0800, Dan Williams wrote:
> > > On Tue, Feb 19, 2019 at 12:30 PM Jerome Glisse <[email protected]> wrote:
> > > >
> > > > On Tue, Feb 19, 2019 at 12:15:55PM -0800, Dan Williams wrote:
> > > > > On Tue, Feb 19, 2019 at 12:04 PM <[email protected]> wrote:
> > > > > >
> > > > > > From: J?r?me Glisse <[email protected]>
> > > > > >
> > > > > > Since last version [4] i added the extra bits needed for the change_pte
> > > > > > optimization (which is a KSM thing). Here i am not posting users of
> > > > > > this, they will be posted to the appropriate sub-systems (KVM, GPU,
> > > > > > RDMA, ...) once this serie get upstream. If you want to look at users
> > > > > > of this see [5] [6]. If this gets in 5.1 then i will be submitting
> > > > > > those users for 5.2 (including KVM if KVM folks feel comfortable with
> > > > > > it).
> > > > >
> > > > > The users look small and straightforward. Why not await acks and
> > > > > reviewed-by's for the users like a typical upstream submission and
> > > > > merge them together? Is all of the functionality of this
> > > > > infrastructure consumed by the proposed users? Last time I checked it
> > > > > was only a subset.
> > > >
> > > > Yes pretty much all is use, the unuse case is SOFT_DIRTY and CLEAR
> > > > vs UNMAP. Both of which i intend to use. The RDMA folks already ack
> > > > the patches IIRC, so did radeon and amdgpu. I believe the i915 folks
> > > > were ok with it too. I do not want to merge things through Andrew
> > > > for all of this we discussed that in the past, merge mm bits through
> > > > Andrew in one release and bits that use things in the next release.
> > >
> > > Ok, I was trying to find the links to the acks on the mailing list,
> > > those references would address my concerns. I see no reason to rush
> > > SOFT_DIRTY and CLEAR ahead of the upstream user.
> >
> > I intend to post user for those in next couple weeks for 5.2 HMM bits.
> > So user for this (CLEAR/UNMAP/SOFTDIRTY) will definitly materialize in
> > time for 5.2.
> >
> > ACKS AMD/RADEON https://lkml.org/lkml/2019/2/1/395
> > ACKS RDMA https://lkml.org/lkml/2018/12/6/1473
>
> Nice, thanks!
>
> > For KVM Andrea Arcangeli seems to like the whole idea to restore the
> > change_pte optimization but i have not got ACK from Radim or Paolo,
> > however given the small performance improvement figure i get with it
> > i do not see while they would not ACK.
>
> Sure, but no need to push ahead without that confirmation, right? At
> least for the piece that KVM cares about, maybe that's already covered
> in the infrastructure RDMA and RADEON are using?
The change_pte() for KVM is just one bit flag on top of the rest. So
i don't see much value in saving this last patch. I will be working
with KVM folks to merge KVM bits in 5.2. If they do not want that then
removing that extra flags is not much work.
But if you prefer than Andrew can drop the last patch in the serie.
Cheers,
J?r?me
On 2/19/19 12:04 PM, [email protected] wrote:
> From: Jérôme Glisse <[email protected]>
>
> Simple helpers to test if range invalidation is blockable. Latter
> patches use cocinnelle to convert all direct dereference of range->
> blockable to use this function instead so that we can convert the
> blockable field to an unsigned for more flags.
>
> Signed-off-by: Jérôme Glisse <[email protected]>
> Cc: Christian König <[email protected]>
> Cc: Joonas Lahtinen <[email protected]>
> Cc: Jani Nikula <[email protected]>
> Cc: Rodrigo Vivi <[email protected]>
> Cc: Jan Kara <[email protected]>
> Cc: Andrea Arcangeli <[email protected]>
> Cc: Peter Xu <[email protected]>
> Cc: Felix Kuehling <[email protected]>
> Cc: Jason Gunthorpe <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Ross Zwisler <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
> Cc: Radim Krčmář <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Christian Koenig <[email protected]>
> Cc: Ralph Campbell <[email protected]>
> Cc: John Hubbard <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: Arnd Bergmann <[email protected]>
> ---
> include/linux/mmu_notifier.h | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
> index 4050ec1c3b45..e630def131ce 100644
> --- a/include/linux/mmu_notifier.h
> +++ b/include/linux/mmu_notifier.h
> @@ -226,6 +226,12 @@ extern void __mmu_notifier_invalidate_range_end(struct mmu_notifier_range *r,
> extern void __mmu_notifier_invalidate_range(struct mm_struct *mm,
> unsigned long start, unsigned long end);
>
> +static inline bool
> +mmu_notifier_range_blockable(const struct mmu_notifier_range *range)
> +{
> + return range->blockable;
> +}
> +
> static inline void mmu_notifier_release(struct mm_struct *mm)
> {
> if (mm_has_notifiers(mm))
> @@ -455,6 +461,11 @@ static inline void _mmu_notifier_range_init(struct mmu_notifier_range *range,
> #define mmu_notifier_range_init(range, mm, start, end) \
> _mmu_notifier_range_init(range, start, end)
>
> +static inline bool
> +mmu_notifier_range_blockable(const struct mmu_notifier_range *range)
> +{
> + return true;
> +}
>
> static inline int mm_has_notifiers(struct mm_struct *mm)
> {
Reviewed-by: Ralph Campbell <[email protected]>
On 2/19/19 12:04 PM, [email protected] wrote:
> From: Jérôme Glisse <[email protected]>
>
> Use the mmu_notifier_range_blockable() helper function instead of
> directly dereferencing the range->blockable field. This is done to
> make it easier to change the mmu_notifier range field.
>
> This patch is the outcome of the following coccinelle patch:
>
> %<-------------------------------------------------------------------
> @@
> identifier I1, FN;
> @@
> FN(..., struct mmu_notifier_range *I1, ...) {
> <...
> -I1->blockable
> +mmu_notifier_range_blockable(I1)
> ...>
> }
> ------------------------------------------------------------------->%
>
> spatch --in-place --sp-file blockable.spatch --dir .
>
> Signed-off-by: Jérôme Glisse <[email protected]>
> Cc: Christian König <[email protected]>
> Cc: Joonas Lahtinen <[email protected]>
> Cc: Jani Nikula <[email protected]>
> Cc: Rodrigo Vivi <[email protected]>
> Cc: Jan Kara <[email protected]>
> Cc: Andrea Arcangeli <[email protected]>
> Cc: Peter Xu <[email protected]>
> Cc: Felix Kuehling <[email protected]>
> Cc: Jason Gunthorpe <[email protected]>
> Cc: Ross Zwisler <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
> Cc: Radim Krčmář <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Christian Koenig <[email protected]>
> Cc: Ralph Campbell <[email protected]>
> Cc: John Hubbard <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: Arnd Bergmann <[email protected]>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 8 ++++----
> drivers/gpu/drm/i915/i915_gem_userptr.c | 2 +-
> drivers/gpu/drm/radeon/radeon_mn.c | 4 ++--
> drivers/infiniband/core/umem_odp.c | 5 +++--
> drivers/xen/gntdev.c | 6 +++---
> mm/hmm.c | 6 +++---
> mm/mmu_notifier.c | 2 +-
> virt/kvm/kvm_main.c | 3 ++-
> 8 files changed, 19 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
> index 3e6823fdd939..58ed401c5996 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
> @@ -256,14 +256,14 @@ static int amdgpu_mn_invalidate_range_start_gfx(struct mmu_notifier *mn,
> /* TODO we should be able to split locking for interval tree and
> * amdgpu_mn_invalidate_node
> */
> - if (amdgpu_mn_read_lock(amn, range->blockable))
> + if (amdgpu_mn_read_lock(amn, mmu_notifier_range_blockable(range)))
> return -EAGAIN;
>
> it = interval_tree_iter_first(&amn->objects, range->start, end);
> while (it) {
> struct amdgpu_mn_node *node;
>
> - if (!range->blockable) {
> + if (!mmu_notifier_range_blockable(range)) {
> amdgpu_mn_read_unlock(amn);
> return -EAGAIN;
> }
> @@ -299,7 +299,7 @@ static int amdgpu_mn_invalidate_range_start_hsa(struct mmu_notifier *mn,
> /* notification is exclusive, but interval is inclusive */
> end = range->end - 1;
>
> - if (amdgpu_mn_read_lock(amn, range->blockable))
> + if (amdgpu_mn_read_lock(amn, mmu_notifier_range_blockable(range)))
> return -EAGAIN;
>
> it = interval_tree_iter_first(&amn->objects, range->start, end);
> @@ -307,7 +307,7 @@ static int amdgpu_mn_invalidate_range_start_hsa(struct mmu_notifier *mn,
> struct amdgpu_mn_node *node;
> struct amdgpu_bo *bo;
>
> - if (!range->blockable) {
> + if (!mmu_notifier_range_blockable(range)) {
> amdgpu_mn_read_unlock(amn);
> return -EAGAIN;
> }
> diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
> index 1d3f9a31ad61..777b3f8727e7 100644
> --- a/drivers/gpu/drm/i915/i915_gem_userptr.c
> +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
> @@ -122,7 +122,7 @@ userptr_mn_invalidate_range_start(struct mmu_notifier *_mn,
> while (it) {
> struct drm_i915_gem_object *obj;
>
> - if (!range->blockable) {
> + if (!mmu_notifier_range_blockable(range)) {
> ret = -EAGAIN;
> break;
> }
> diff --git a/drivers/gpu/drm/radeon/radeon_mn.c b/drivers/gpu/drm/radeon/radeon_mn.c
> index b3019505065a..c9bd1278f573 100644
> --- a/drivers/gpu/drm/radeon/radeon_mn.c
> +++ b/drivers/gpu/drm/radeon/radeon_mn.c
> @@ -133,7 +133,7 @@ static int radeon_mn_invalidate_range_start(struct mmu_notifier *mn,
> /* TODO we should be able to split locking for interval tree and
> * the tear down.
> */
> - if (range->blockable)
> + if (mmu_notifier_range_blockable(range))
> mutex_lock(&rmn->lock);
> else if (!mutex_trylock(&rmn->lock))
> return -EAGAIN;
> @@ -144,7 +144,7 @@ static int radeon_mn_invalidate_range_start(struct mmu_notifier *mn,
> struct radeon_bo *bo;
> long r;
>
> - if (!range->blockable) {
> + if (!mmu_notifier_range_blockable(range)) {
> ret = -EAGAIN;
> goto out_unlock;
> }
> diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
> index 012044f16d1c..3a3f1538d295 100644
> --- a/drivers/infiniband/core/umem_odp.c
> +++ b/drivers/infiniband/core/umem_odp.c
> @@ -151,7 +151,7 @@ static int ib_umem_notifier_invalidate_range_start(struct mmu_notifier *mn,
> struct ib_ucontext_per_mm *per_mm =
> container_of(mn, struct ib_ucontext_per_mm, mn);
>
> - if (range->blockable)
> + if (mmu_notifier_range_blockable(range))
> down_read(&per_mm->umem_rwsem);
> else if (!down_read_trylock(&per_mm->umem_rwsem))
> return -EAGAIN;
> @@ -169,7 +169,8 @@ static int ib_umem_notifier_invalidate_range_start(struct mmu_notifier *mn,
> return rbt_ib_umem_for_each_in_range(&per_mm->umem_tree, range->start,
> range->end,
> invalidate_range_start_trampoline,
> - range->blockable, NULL);
> + mmu_notifier_range_blockable(range),
> + NULL);
> }
>
> static int invalidate_range_end_trampoline(struct ib_umem_odp *item, u64 start,
> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
> index 5efc5eee9544..9da8f7192f46 100644
> --- a/drivers/xen/gntdev.c
> +++ b/drivers/xen/gntdev.c
> @@ -526,20 +526,20 @@ static int mn_invl_range_start(struct mmu_notifier *mn,
> struct gntdev_grant_map *map;
> int ret = 0;
>
> - if (range->blockable)
> + if (mmu_notifier_range_blockable(range))
> mutex_lock(&priv->lock);
> else if (!mutex_trylock(&priv->lock))
> return -EAGAIN;
>
> list_for_each_entry(map, &priv->maps, next) {
> ret = unmap_if_in_range(map, range->start, range->end,
> - range->blockable);
> + mmu_notifier_range_blockable(range));
> if (ret)
> goto out_unlock;
> }
> list_for_each_entry(map, &priv->freeable_maps, next) {
> ret = unmap_if_in_range(map, range->start, range->end,
> - range->blockable);
> + mmu_notifier_range_blockable(range));
> if (ret)
> goto out_unlock;
> }
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 3c9781037918..a03b5083d880 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -205,9 +205,9 @@ static int hmm_invalidate_range_start(struct mmu_notifier *mn,
> update.start = nrange->start;
> update.end = nrange->end;
> update.event = HMM_UPDATE_INVALIDATE;
> - update.blockable = nrange->blockable;
> + update.blockable = mmu_notifier_range_blockable(nrange);
>
> - if (nrange->blockable)
> + if (mmu_notifier_range_blockable(nrange))
> mutex_lock(&hmm->lock);
> else if (!mutex_trylock(&hmm->lock)) {
> ret = -EAGAIN;
> @@ -222,7 +222,7 @@ static int hmm_invalidate_range_start(struct mmu_notifier *mn,
> }
> mutex_unlock(&hmm->lock);
>
> - if (nrange->blockable)
> + if (mmu_notifier_range_blockable(nrange))
> down_read(&hmm->mirrors_sem);
> else if (!down_read_trylock(&hmm->mirrors_sem)) {
> ret = -EAGAIN;
> diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> index 9c884abc7850..abd88c466eb2 100644
> --- a/mm/mmu_notifier.c
> +++ b/mm/mmu_notifier.c
> @@ -180,7 +180,7 @@ int __mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range)
> if (_ret) {
> pr_info("%pS callback failed with %d in %sblockable context.\n",
> mn->ops->invalidate_range_start, _ret,
> - !range->blockable ? "non-" : "");
> + !mmu_notifier_range_blockable(range) ? "non-" : "");
> ret = _ret;
> }
> }
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 38df17b7760e..629760c0fb95 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -386,7 +386,8 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
> spin_unlock(&kvm->mmu_lock);
>
> ret = kvm_arch_mmu_notifier_invalidate_range(kvm, range->start,
> - range->end, range->blockable);
> + range->end,
> + mmu_notifier_range_blockable(range));
>
> srcu_read_unlock(&kvm->srcu, idx);
>
>
Reviewed-by: Ralph Campbell <[email protected]>
On 2/19/19 12:04 PM, [email protected] wrote:
> From: Jérôme Glisse <[email protected]>
>
> Use an unsigned field for flags other than blockable and convert
> the blockable field to be one of those flags.
>
> Signed-off-by: Jérôme Glisse <[email protected]>
> Cc: Christian König <[email protected]>
> Cc: Joonas Lahtinen <[email protected]>
> Cc: Jani Nikula <[email protected]>
> Cc: Rodrigo Vivi <[email protected]>
> Cc: Jan Kara <[email protected]>
> Cc: Andrea Arcangeli <[email protected]>
> Cc: Peter Xu <[email protected]>
> Cc: Felix Kuehling <[email protected]>
> Cc: Jason Gunthorpe <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Ross Zwisler <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
> Cc: Radim Krčmář <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Christian Koenig <[email protected]>
> Cc: Ralph Campbell <[email protected]>
> Cc: John Hubbard <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: Arnd Bergmann <[email protected]>
> ---
> include/linux/mmu_notifier.h | 11 +++++++----
> 1 file changed, 7 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
> index e630def131ce..c8672c366f67 100644
> --- a/include/linux/mmu_notifier.h
> +++ b/include/linux/mmu_notifier.h
> @@ -25,11 +25,13 @@ struct mmu_notifier_mm {
> spinlock_t lock;
> };
>
> +#define MMU_NOTIFIER_RANGE_BLOCKABLE (1 << 0)
> +
> struct mmu_notifier_range {
> struct mm_struct *mm;
> unsigned long start;
> unsigned long end;
> - bool blockable;
> + unsigned flags;
> };
>
> struct mmu_notifier_ops {
> @@ -229,7 +231,7 @@ extern void __mmu_notifier_invalidate_range(struct mm_struct *mm,
> static inline bool
> mmu_notifier_range_blockable(const struct mmu_notifier_range *range)
> {
> - return range->blockable;
> + return (range->flags & MMU_NOTIFIER_RANGE_BLOCKABLE);
> }
>
> static inline void mmu_notifier_release(struct mm_struct *mm)
> @@ -275,7 +277,7 @@ static inline void
> mmu_notifier_invalidate_range_start(struct mmu_notifier_range *range)
> {
> if (mm_has_notifiers(range->mm)) {
> - range->blockable = true;
> + range->flags |= MMU_NOTIFIER_RANGE_BLOCKABLE;
> __mmu_notifier_invalidate_range_start(range);
> }
> }
> @@ -284,7 +286,7 @@ static inline int
> mmu_notifier_invalidate_range_start_nonblock(struct mmu_notifier_range *range)
> {
> if (mm_has_notifiers(range->mm)) {
> - range->blockable = false;
> + range->flags &= ~MMU_NOTIFIER_RANGE_BLOCKABLE;
> return __mmu_notifier_invalidate_range_start(range);
> }
> return 0;
> @@ -331,6 +333,7 @@ static inline void mmu_notifier_range_init(struct mmu_notifier_range *range,
> range->mm = mm;
> range->start = start;
> range->end = end;
> + range->flags = 0;
> }
>
> #define ptep_clear_flush_young_notify(__vma, __address, __ptep) \
>
Reviewed-by: Ralph Campbell <[email protected]>
On 2/19/19 12:04 PM, [email protected] wrote:
> From: Jérôme Glisse <[email protected]>
>
> CPU page table update can happens for many reasons, not only as a result
s/update/updates
s/happens/happen
> of a syscall (munmap(), mprotect(), mremap(), madvise(), ...) but also
> as a result of kernel activities (memory compression, reclaim, migration,
> ...).
>
> This patch introduce a set of enums that can be associated with each of
s/introduce/introduces
> the events triggering a mmu notifier. Latter patches take advantages of
> those enum values.
s/advantages/advantage
>
> - UNMAP: munmap() or mremap()
> - CLEAR: page table is cleared (migration, compaction, reclaim, ...)
> - PROTECTION_VMA: change in access protections for the range
> - PROTECTION_PAGE: change in access protections for page in the range
> - SOFT_DIRTY: soft dirtyness tracking
>
s/dirtyness/dirtiness
> Being able to identify munmap() and mremap() from other reasons why the
> page table is cleared is important to allow user of mmu notifier to
> update their own internal tracking structure accordingly (on munmap or
> mremap it is not longer needed to track range of virtual address as it
> becomes invalid).
>
> Signed-off-by: Jérôme Glisse <[email protected]>
> Cc: Christian König <[email protected]>
> Cc: Joonas Lahtinen <[email protected]>
> Cc: Jani Nikula <[email protected]>
> Cc: Rodrigo Vivi <[email protected]>
> Cc: Jan Kara <[email protected]>
> Cc: Andrea Arcangeli <[email protected]>
> Cc: Peter Xu <[email protected]>
> Cc: Felix Kuehling <[email protected]>
> Cc: Jason Gunthorpe <[email protected]>
> Cc: Ross Zwisler <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
> Cc: Radim Krčmář <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Christian Koenig <[email protected]>
> Cc: Ralph Campbell <[email protected]>
> Cc: John Hubbard <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: Arnd Bergmann <[email protected]>
> ---
> include/linux/mmu_notifier.h | 30 ++++++++++++++++++++++++++++++
> 1 file changed, 30 insertions(+)
>
> diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
> index c8672c366f67..2386e71ac1b8 100644
> --- a/include/linux/mmu_notifier.h
> +++ b/include/linux/mmu_notifier.h
> @@ -10,6 +10,36 @@
> struct mmu_notifier;
> struct mmu_notifier_ops;
>
> +/**
> + * enum mmu_notifier_event - reason for the mmu notifier callback
> + * @MMU_NOTIFY_UNMAP: either munmap() that unmap the range or a mremap() that
> + * move the range
I would say something about the VMA for the notifier range
is being deleted.
MMU notifier clients can then use this case to remove any policy or
access counts associated with the range.
Just changing the PTE to "no access" as in the CLEAR case
doesn't mean a policy which prefers device private memory
over system memory should be cleared.
> + *
> + * @MMU_NOTIFY_CLEAR: clear page table entry (many reasons for this like
> + * madvise() or replacing a page by another one, ...).
> + *
> + * @MMU_NOTIFY_PROTECTION_VMA: update is due to protection change for the range
> + * ie using the vma access permission (vm_page_prot) to update the whole range
> + * is enough no need to inspect changes to the CPU page table (mprotect()
> + * syscall)
> + *
> + * @MMU_NOTIFY_PROTECTION_PAGE: update is due to change in read/write flag for
> + * pages in the range so to mirror those changes the user must inspect the CPU
> + * page table (from the end callback).
> + *
> + * @MMU_NOTIFY_SOFT_DIRTY: soft dirty accounting (still same page and same
> + * access flags). User should soft dirty the page in the end callback to make
> + * sure that anyone relying on soft dirtyness catch pages that might be written
> + * through non CPU mappings.
> + */
> +enum mmu_notifier_event {
> + MMU_NOTIFY_UNMAP = 0,
> + MMU_NOTIFY_CLEAR,
> + MMU_NOTIFY_PROTECTION_VMA,
> + MMU_NOTIFY_PROTECTION_PAGE,
> + MMU_NOTIFY_SOFT_DIRTY,
> +};
> +
> #ifdef CONFIG_MMU_NOTIFIER
>
> /*
>
On 2/19/19 12:04 PM, [email protected] wrote:
> From: Jérôme Glisse <[email protected]>
>
> CPU page table update can happens for many reasons, not only as a result
> of a syscall (munmap(), mprotect(), mremap(), madvise(), ...) but also
> as a result of kernel activities (memory compression, reclaim, migration,
> ...).
>
> Users of mmu notifier API track changes to the CPU page table and take
> specific action for them. While current API only provide range of virtual
> address affected by the change, not why the changes is happening.
>
> This patchset do the initial mechanical convertion of all the places that
> calls mmu_notifier_range_init to also provide the default MMU_NOTIFY_UNMAP
> event as well as the vma if it is know (most invalidation happens against
> a given vma). Passing down the vma allows the users of mmu notifier to
> inspect the new vma page protection.
>
> The MMU_NOTIFY_UNMAP is always the safe default as users of mmu notifier
> should assume that every for the range is going away when that event
> happens. A latter patch do convert mm call path to use a more appropriate
> events for each call.
>
> Changes since v1:
> - add the flags parameter to init range flags
>
> This is done as 2 patches so that no call site is forgotten especialy
> as it uses this following coccinelle patch:
>
> %<----------------------------------------------------------------------
> @@
> identifier I1, I2, I3, I4;
> @@
> static inline void mmu_notifier_range_init(struct mmu_notifier_range *I1,
> +enum mmu_notifier_event event,
> +unsigned flags,
> +struct vm_area_struct *vma,
> struct mm_struct *I2, unsigned long I3, unsigned long I4) { ... }
>
> @@
> @@
> -#define mmu_notifier_range_init(range, mm, start, end)
> +#define mmu_notifier_range_init(range, event, flags, vma, mm, start, end)
>
> @@
> expression E1, E3, E4;
> identifier I1;
> @@
> <...
> mmu_notifier_range_init(E1,
> +MMU_NOTIFY_UNMAP, 0, I1,
> I1->vm_mm, E3, E4)
> ...>
>
> @@
> expression E1, E2, E3, E4;
> identifier FN, VMA;
> @@
> FN(..., struct vm_area_struct *VMA, ...) {
> <...
> mmu_notifier_range_init(E1,
> +MMU_NOTIFY_UNMAP, 0, VMA,
> E2, E3, E4)
> ...> }
>
> @@
> expression E1, E2, E3, E4;
> identifier FN, VMA;
> @@
> FN(...) {
> struct vm_area_struct *VMA;
> <...
> mmu_notifier_range_init(E1,
> +MMU_NOTIFY_UNMAP, 0, VMA,
> E2, E3, E4)
> ...> }
>
> @@
> expression E1, E2, E3, E4;
> identifier FN;
> @@
> FN(...) {
> <...
> mmu_notifier_range_init(E1,
> +MMU_NOTIFY_UNMAP, 0, NULL,
> E2, E3, E4)
> ...> }
> ---------------------------------------------------------------------->%
>
> Applied with:
> spatch --all-includes --sp-file mmu-notifier.spatch fs/proc/task_mmu.c --in-place
> spatch --sp-file mmu-notifier.spatch --dir kernel/events/ --in-place
> spatch --sp-file mmu-notifier.spatch --dir mm --in-place
>
> Signed-off-by: Jérôme Glisse <[email protected]>
> Cc: Christian König <[email protected]>
> Cc: Joonas Lahtinen <[email protected]>
> Cc: Jani Nikula <[email protected]>
> Cc: Rodrigo Vivi <[email protected]>
> Cc: Jan Kara <[email protected]>
> Cc: Andrea Arcangeli <[email protected]>
> Cc: Peter Xu <[email protected]>
> Cc: Felix Kuehling <[email protected]>
> Cc: Jason Gunthorpe <[email protected]>
> Cc: Ross Zwisler <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
> Cc: Radim Krčmář <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Christian Koenig <[email protected]>
> Cc: Ralph Campbell <[email protected]>
> Cc: John Hubbard <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: Arnd Bergmann <[email protected]>
> ---
Reviewed-by: Ralph Campbell <[email protected]>
On 2/19/19 12:04 PM, [email protected] wrote:
> From: Jérôme Glisse <[email protected]>
>
> This update each existing invalidation to use the correct mmu notifier
> event that represent what is happening to the CPU page table. See the
> patch which introduced the events to see the rational behind this.
>
> Signed-off-by: Jérôme Glisse <[email protected]>
> Cc: Christian König <[email protected]>
> Cc: Joonas Lahtinen <[email protected]>
> Cc: Jani Nikula <[email protected]>
> Cc: Rodrigo Vivi <[email protected]>
> Cc: Jan Kara <[email protected]>
> Cc: Andrea Arcangeli <[email protected]>
> Cc: Peter Xu <[email protected]>
> Cc: Felix Kuehling <[email protected]>
> Cc: Jason Gunthorpe <[email protected]>
> Cc: Ross Zwisler <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
> Cc: Radim Krčmář <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Christian Koenig <[email protected]>
> Cc: Ralph Campbell <[email protected]>
> Cc: John Hubbard <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: Arnd Bergmann <[email protected]>
> ---
> fs/proc/task_mmu.c | 4 ++--
> kernel/events/uprobes.c | 2 +-
> mm/huge_memory.c | 14 ++++++--------
> mm/hugetlb.c | 8 ++++----
> mm/khugepaged.c | 2 +-
> mm/ksm.c | 4 ++--
> mm/madvise.c | 2 +-
> mm/memory.c | 14 +++++++-------
> mm/migrate.c | 4 ++--
> mm/mprotect.c | 5 +++--
> mm/rmap.c | 6 +++---
> 11 files changed, 32 insertions(+), 33 deletions(-)
>
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index fcbd0e574917..3b93ce496dd4 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -1151,8 +1151,8 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
> break;
> }
>
> - mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0,
> - NULL, mm, 0, -1UL);
> + mmu_notifier_range_init(&range, MMU_NOTIFY_SOFT_DIRTY,
> + 0, NULL, mm, 0, -1UL);
> mmu_notifier_invalidate_range_start(&range);
> }
> walk_page_range(0, mm->highest_vm_end, &clear_refs_walk);
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 46f546bdba00..8e8342080013 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -161,7 +161,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
> struct mmu_notifier_range range;
> struct mem_cgroup *memcg;
>
> - mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, mm, addr,
> + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, addr,
> addr + PAGE_SIZE);
>
> VM_BUG_ON_PAGE(PageTransHuge(old_page), old_page);
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index c9d638f1b34e..1da6ca0f0f6d 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1184,9 +1184,8 @@ static vm_fault_t do_huge_pmd_wp_page_fallback(struct vm_fault *vmf,
> cond_resched();
> }
>
> - mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, vma->vm_mm,
> - haddr,
> - haddr + HPAGE_PMD_SIZE);
> + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
> + haddr, haddr + HPAGE_PMD_SIZE);
> mmu_notifier_invalidate_range_start(&range);
>
> vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
> @@ -1349,9 +1348,8 @@ vm_fault_t do_huge_pmd_wp_page(struct vm_fault *vmf, pmd_t orig_pmd)
> vma, HPAGE_PMD_NR);
> __SetPageUptodate(new_page);
>
> - mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, vma->vm_mm,
> - haddr,
> - haddr + HPAGE_PMD_SIZE);
> + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
> + haddr, haddr + HPAGE_PMD_SIZE);
> mmu_notifier_invalidate_range_start(&range);
>
> spin_lock(vmf->ptl);
> @@ -2028,7 +2026,7 @@ void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud,
> spinlock_t *ptl;
> struct mmu_notifier_range range;
>
> - mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, vma->vm_mm,
> + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
> address & HPAGE_PUD_MASK,
> (address & HPAGE_PUD_MASK) + HPAGE_PUD_SIZE);
> mmu_notifier_invalidate_range_start(&range);
> @@ -2247,7 +2245,7 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
> spinlock_t *ptl;
> struct mmu_notifier_range range;
>
> - mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, vma->vm_mm,
> + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
> address & HPAGE_PMD_MASK,
> (address & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE);
> mmu_notifier_invalidate_range_start(&range);
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index d9e5c5a4c004..a58115c6b0a3 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -3250,7 +3250,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
> cow = (vma->vm_flags & (VM_SHARED | VM_MAYWRITE)) == VM_MAYWRITE;
>
> if (cow) {
> - mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, src,
> + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, src,
> vma->vm_start,
> vma->vm_end);
> mmu_notifier_invalidate_range_start(&range);
> @@ -3631,7 +3631,7 @@ static vm_fault_t hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma,
> pages_per_huge_page(h));
> __SetPageUptodate(new_page);
>
> - mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, mm, haddr,
> + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, haddr,
> haddr + huge_page_size(h));
> mmu_notifier_invalidate_range_start(&range);
>
> @@ -4357,8 +4357,8 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
> * start/end. Set range.start/range.end to cover the maximum possible
> * range if PMD sharing is possible.
> */
> - mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, mm, start,
> - end);
> + mmu_notifier_range_init(&range, MMU_NOTIFY_PROTECTION_VMA,
> + 0, vma, mm, start, end);
> adjust_range_if_pmd_sharing_possible(vma, &range.start, &range.end);
>
> BUG_ON(address >= end);
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index e7944f5e6258..579699d2b347 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1016,7 +1016,7 @@ static void collapse_huge_page(struct mm_struct *mm,
> pte = pte_offset_map(pmd, address);
> pte_ptl = pte_lockptr(mm, pmd);
>
> - mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, NULL, mm,
> + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, NULL, mm,
The vma is revalidated so you can s/NULL/vma here.
> address, address + HPAGE_PMD_SIZE);
> mmu_notifier_invalidate_range_start(&range);
> pmd_ptl = pmd_lock(mm, pmd); /* probably unnecessary */
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 2ea25fc0befb..b782fadade8f 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -1066,7 +1066,7 @@ static int write_protect_page(struct vm_area_struct *vma, struct page *page,
>
> BUG_ON(PageTransCompound(page));
>
> - mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, mm,
> + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm,
> pvmw.address,
> pvmw.address + PAGE_SIZE);
> mmu_notifier_invalidate_range_start(&range);
> @@ -1155,7 +1155,7 @@ static int replace_page(struct vm_area_struct *vma, struct page *page,
> if (!pmd)
> goto out;
>
> - mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, mm, addr,
> + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, addr,
> addr + PAGE_SIZE);
> mmu_notifier_invalidate_range_start(&range);
>
> diff --git a/mm/madvise.c b/mm/madvise.c
> index c617f53a9c09..a692d2a893b5 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -472,7 +472,7 @@ static int madvise_free_single_vma(struct vm_area_struct *vma,
> range.end = min(vma->vm_end, end_addr);
> if (range.end <= vma->vm_start)
> return -EINVAL;
> - mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, mm,
> + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm,
> range.start, range.end);
>
> lru_add_drain();
> diff --git a/mm/memory.c b/mm/memory.c
> index 4565f636cca3..45dbc174a88c 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1010,8 +1010,8 @@ int copy_page_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
> is_cow = is_cow_mapping(vma->vm_flags);
>
> if (is_cow) {
> - mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma,
> - src_mm, addr, end);
> + mmu_notifier_range_init(&range, MMU_NOTIFY_PROTECTION_PAGE,
> + 0, vma, src_mm, addr, end);
> mmu_notifier_invalidate_range_start(&range);
> }
>
> @@ -1358,7 +1358,7 @@ void zap_page_range(struct vm_area_struct *vma, unsigned long start,
> struct mmu_gather tlb;
>
> lru_add_drain();
> - mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, vma->vm_mm,
> + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
> start, start + size);
> tlb_gather_mmu(&tlb, vma->vm_mm, start, range.end);
> update_hiwater_rss(vma->vm_mm);
> @@ -1385,7 +1385,7 @@ static void zap_page_range_single(struct vm_area_struct *vma, unsigned long addr
> struct mmu_gather tlb;
>
> lru_add_drain();
> - mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, vma->vm_mm,
> + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
> address, address + size);
> tlb_gather_mmu(&tlb, vma->vm_mm, address, range.end);
> update_hiwater_rss(vma->vm_mm);
> @@ -2282,7 +2282,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
>
> __SetPageUptodate(new_page);
>
> - mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, mm,
> + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm,
> vmf->address & PAGE_MASK,
> (vmf->address & PAGE_MASK) + PAGE_SIZE);
> mmu_notifier_invalidate_range_start(&range);
> @@ -4105,7 +4105,7 @@ static int __follow_pte_pmd(struct mm_struct *mm, unsigned long address,
> goto out;
>
> if (range) {
> - mmu_notifier_range_init(range, MMU_NOTIFY_UNMAP, 0,
> + mmu_notifier_range_init(range, MMU_NOTIFY_CLEAR, 0,
> NULL, mm, address & PMD_MASK,
> (address & PMD_MASK) + PMD_SIZE);
> mmu_notifier_invalidate_range_start(range);
> @@ -4124,7 +4124,7 @@ static int __follow_pte_pmd(struct mm_struct *mm, unsigned long address,
> goto out;
>
> if (range) {
> - mmu_notifier_range_init(range, MMU_NOTIFY_UNMAP, 0, NULL, mm,
> + mmu_notifier_range_init(range, MMU_NOTIFY_CLEAR, 0, NULL, mm,
> address & PAGE_MASK,
> (address & PAGE_MASK) + PAGE_SIZE);
> mmu_notifier_invalidate_range_start(range);
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 81eb307b2b5b..8e6d00541b3c 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -2340,7 +2340,7 @@ static void migrate_vma_collect(struct migrate_vma *migrate)
> mm_walk.mm = migrate->vma->vm_mm;
> mm_walk.private = migrate;
>
> - mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, NULL, mm_walk.mm,
> + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, NULL, mm_walk.mm,
You can s/NULL/mm_walk.vma here.
> migrate->start,
> migrate->end);
> mmu_notifier_invalidate_range_start(&range);
> @@ -2749,7 +2749,7 @@ static void migrate_vma_pages(struct migrate_vma *migrate)
> notified = true;
>
> mmu_notifier_range_init(&range,
> - MMU_NOTIFY_UNMAP, 0,
> + MMU_NOTIFY_CLEAR, 0,
> NULL,
You can s/NULL/migrate->vma here.
> migrate->vma->vm_mm,
> addr, migrate->end);
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index b10984052ae9..65242f1e4457 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -185,8 +185,9 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma,
>
> /* invoke the mmu notifier if the pmd is populated */
> if (!range.start) {
> - mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0,
> - vma, vma->vm_mm, addr, end);
> + mmu_notifier_range_init(&range,
> + MMU_NOTIFY_PROTECTION_VMA, 0,
> + vma, vma->vm_mm, addr, end);
> mmu_notifier_invalidate_range_start(&range);
> }
>
The call to mmu_notifier_range_init(MMU_NOTIFY_UNMAP) in mm/remap.c
move_page_tables() should probably be
mmu_notifier_range_init(MMU_NOTIFY_CLEAR) since
do_munmap() is called a bit later in move_vma().
> diff --git a/mm/rmap.c b/mm/rmap.c
> index c6535a6ec850..627b38ad5052 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -896,8 +896,8 @@ static bool page_mkclean_one(struct page *page, struct vm_area_struct *vma,
> * We have to assume the worse case ie pmd for invalidation. Note that
> * the page can not be free from this function.
> */
> - mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, vma->vm_mm,
> - address,
> + mmu_notifier_range_init(&range, MMU_NOTIFY_PROTECTION_PAGE,
> + 0, vma, vma->vm_mm, address,
> min(vma->vm_end, address +
> (PAGE_SIZE << compound_order(page))));
> mmu_notifier_invalidate_range_start(&range);
> @@ -1372,7 +1372,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
> * Note that the page can not be free in this function as call of
> * try_to_unmap() must hold a reference on the page.
> */
> - mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, vma->vm_mm,
> + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
> address,
> min(vma->vm_end, address +
> (PAGE_SIZE << compound_order(page))));
>
Reviewed-by: Ralph Campbell <[email protected]>
On 2/19/19 12:04 PM, [email protected] wrote:
> From: Jérôme Glisse <[email protected]>
>
> CPU page table update can happens for many reasons, not only as a result
> of a syscall (munmap(), mprotect(), mremap(), madvise(), ...) but also
> as a result of kernel activities (memory compression, reclaim, migration,
> ...).
>
> Users of mmu notifier API track changes to the CPU page table and take
> specific action for them. While current API only provide range of virtual
> address affected by the change, not why the changes is happening
>
> This patch is just passing down the new informations by adding it to the
> mmu_notifier_range structure.
>
> Changes since v1:
> - Initialize flags field from mmu_notifier_range_init() arguments
>
> Signed-off-by: Jérôme Glisse <[email protected]>
> Cc: Christian König <[email protected]>
> Cc: Joonas Lahtinen <[email protected]>
> Cc: Jani Nikula <[email protected]>
> Cc: Rodrigo Vivi <[email protected]>
> Cc: Jan Kara <[email protected]>
> Cc: Andrea Arcangeli <[email protected]>
> Cc: Peter Xu <[email protected]>
> Cc: Felix Kuehling <[email protected]>
> Cc: Jason Gunthorpe <[email protected]>
> Cc: Ross Zwisler <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
> Cc: Radim Krčmář <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Christian Koenig <[email protected]>
> Cc: Ralph Campbell <[email protected]>
> Cc: John Hubbard <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: Arnd Bergmann <[email protected]>
> ---
> include/linux/mmu_notifier.h | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
> index 62f94cd85455..0379956fff23 100644
> --- a/include/linux/mmu_notifier.h
> +++ b/include/linux/mmu_notifier.h
> @@ -58,10 +58,12 @@ struct mmu_notifier_mm {
> #define MMU_NOTIFIER_RANGE_BLOCKABLE (1 << 0)
>
> struct mmu_notifier_range {
> + struct vm_area_struct *vma;
> struct mm_struct *mm;
> unsigned long start;
> unsigned long end;
> unsigned flags;
> + enum mmu_notifier_event event;
> };
>
> struct mmu_notifier_ops {
> @@ -363,10 +365,12 @@ static inline void mmu_notifier_range_init(struct mmu_notifier_range *range,
> unsigned long start,
> unsigned long end)
> {
> + range->vma = vma;
> + range->event = event;
> range->mm = mm;
> range->start = start;
> range->end = end;
> - range->flags = 0;
> + range->flags = flags;
> }
>
> #define ptep_clear_flush_young_notify(__vma, __address, __ptep) \
>
Reviewed-by: Ralph Campbell <[email protected]>
On 2/19/19 12:04 PM, [email protected] wrote:
> From: Jérôme Glisse <[email protected]>
>
> Helper to test if a range is updated to read only (it is still valid
> to read from the range). This is useful for device driver or anyone
> who wish to optimize out update when they know that they already have
> the range map read only.
>
> Signed-off-by: Jérôme Glisse <[email protected]>
> Cc: Christian König <[email protected]>
> Cc: Joonas Lahtinen <[email protected]>
> Cc: Jani Nikula <[email protected]>
> Cc: Rodrigo Vivi <[email protected]>
> Cc: Jan Kara <[email protected]>
> Cc: Andrea Arcangeli <[email protected]>
> Cc: Peter Xu <[email protected]>
> Cc: Felix Kuehling <[email protected]>
> Cc: Jason Gunthorpe <[email protected]>
> Cc: Ross Zwisler <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
> Cc: Radim Krčmář <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Christian Koenig <[email protected]>
> Cc: Ralph Campbell <[email protected]>
> Cc: John Hubbard <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: Arnd Bergmann <[email protected]>
> ---
> include/linux/mmu_notifier.h | 4 ++++
> mm/mmu_notifier.c | 10 ++++++++++
> 2 files changed, 14 insertions(+)
>
> diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
> index 0379956fff23..b6c004bd9f6a 100644
> --- a/include/linux/mmu_notifier.h
> +++ b/include/linux/mmu_notifier.h
> @@ -259,6 +259,8 @@ extern void __mmu_notifier_invalidate_range_end(struct mmu_notifier_range *r,
> bool only_end);
> extern void __mmu_notifier_invalidate_range(struct mm_struct *mm,
> unsigned long start, unsigned long end);
> +extern bool
> +mmu_notifier_range_update_to_read_only(const struct mmu_notifier_range *range);
>
> static inline bool
> mmu_notifier_range_blockable(const struct mmu_notifier_range *range)
> @@ -568,6 +570,8 @@ static inline void mmu_notifier_mm_destroy(struct mm_struct *mm)
> {
> }
>
> +#define mmu_notifier_range_update_to_read_only(r) false
> +
> #define ptep_clear_flush_young_notify ptep_clear_flush_young
> #define pmdp_clear_flush_young_notify pmdp_clear_flush_young
> #define ptep_clear_young_notify ptep_test_and_clear_young
> diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> index abd88c466eb2..ee36068077b6 100644
> --- a/mm/mmu_notifier.c
> +++ b/mm/mmu_notifier.c
> @@ -395,3 +395,13 @@ void mmu_notifier_unregister_no_release(struct mmu_notifier *mn,
> mmdrop(mm);
> }
> EXPORT_SYMBOL_GPL(mmu_notifier_unregister_no_release);
> +
> +bool
> +mmu_notifier_range_update_to_read_only(const struct mmu_notifier_range *range)
> +{
> + if (!range->vma || range->event != MMU_NOTIFY_PROTECTION_VMA)
> + return false;
> + /* Return true if the vma still have the read flag set. */
> + return range->vma->vm_flags & VM_READ;
> +}
> +EXPORT_SYMBOL_GPL(mmu_notifier_range_update_to_read_only);
>
Don't you have to check for !WRITE & READ?
mprotect() can change the permissions from R/O to RW and
end up calling mmu_notifier_range_init() and
mmu_notifier_invalidate_range_start()/end().
I'm not sure how useful this is since only applies to the
MMU_NOTIFY_PROTECTION_VMA case.
Anyway, you can add
Reviewed-by: Ralph Campbell <[email protected]>
On 2/19/19 12:04 PM, [email protected] wrote:
> From: Jérôme Glisse <[email protected]>
>
> When notifying change for a range use MMU_NOTIFIER_USE_CHANGE_PTE flag
> for page table update that use set_pte_at_notify() and where the we are
> going either from read and write to read only with same pfn or read only
> to read and write with new pfn.
>
> Note that set_pte_at_notify() itself should only be use in rare cases
> ie we do not want to use it when we are updating a significant range of
> virtual addresses and thus a significant number of pte. Instead for
> those cases the event provided to mmu notifer invalidate_range_start()
> callback should be use for optimization.
>
> Changes since v1:
> - Use the new unsigned flags field in struct mmu_notifier_range
> - Use the new flags parameter to mmu_notifier_range_init()
> - Explicitly list all the patterns where we can use change_pte()
>
> Signed-off-by: Jérôme Glisse <[email protected]>
> Cc: Christian König <[email protected]>
> Cc: Joonas Lahtinen <[email protected]>
> Cc: Jani Nikula <[email protected]>
> Cc: Rodrigo Vivi <[email protected]>
> Cc: Jan Kara <[email protected]>
> Cc: Andrea Arcangeli <[email protected]>
> Cc: Peter Xu <[email protected]>
> Cc: Felix Kuehling <[email protected]>
> Cc: Jason Gunthorpe <[email protected]>
> Cc: Ross Zwisler <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Paolo Bonzini <[email protected]>
> Cc: Radim Krčmář <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: Christian Koenig <[email protected]>
> Cc: Ralph Campbell <[email protected]>
> Cc: John Hubbard <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: Arnd Bergmann <[email protected]>
> ---
> include/linux/mmu_notifier.h | 34 ++++++++++++++++++++++++++++++++--
> mm/ksm.c | 11 ++++++-----
> mm/memory.c | 5 +++--
> 3 files changed, 41 insertions(+), 9 deletions(-)
>
> diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
> index b6c004bd9f6a..0230a4b06b46 100644
> --- a/include/linux/mmu_notifier.h
> +++ b/include/linux/mmu_notifier.h
> @@ -40,6 +40,26 @@ enum mmu_notifier_event {
> MMU_NOTIFY_SOFT_DIRTY,
> };
>
> +/*
> + * @MMU_NOTIFIER_RANGE_BLOCKABLE: can the mmu notifier range_start/range_end
> + * callback block or not ? If set then the callback can block.
> + *
> + * @MMU_NOTIFIER_USE_CHANGE_PTE: only set when the page table it updated with
> + * the set_pte_at_notify() the valid patterns for this are:
> + * - pte read and write to read only same pfn
> + * - pte read only to read and write (pfn can change or stay the same)
> + * - pte read only to read only with different pfn
> + * It is illegal to set in any other circumstances.
> + *
> + * Note that set_pte_at_notify() should not be use outside of the above cases.
> + * When updating a range in batch (like write protecting a range) it is better
> + * to rely on invalidate_range_start() and struct mmu_notifier_range to infer
> + * the kind of update that is happening (as an example you can look at the
> + * mmu_notifier_range_update_to_read_only() function).
> + */
> +#define MMU_NOTIFIER_RANGE_BLOCKABLE (1 << 0)
> +#define MMU_NOTIFIER_USE_CHANGE_PTE (1 << 1)
> +
> #ifdef CONFIG_MMU_NOTIFIER
>
> /*
> @@ -55,8 +75,6 @@ struct mmu_notifier_mm {
> spinlock_t lock;
> };
>
> -#define MMU_NOTIFIER_RANGE_BLOCKABLE (1 << 0)
> -
> struct mmu_notifier_range {
> struct vm_area_struct *vma;
> struct mm_struct *mm;
> @@ -268,6 +286,12 @@ mmu_notifier_range_blockable(const struct mmu_notifier_range *range)
> return (range->flags & MMU_NOTIFIER_RANGE_BLOCKABLE);
> }
>
> +static inline bool
> +mmu_notifier_range_use_change_pte(const struct mmu_notifier_range *range)
> +{
> + return (range->flags & MMU_NOTIFIER_USE_CHANGE_PTE);
> +}
> +
> static inline void mmu_notifier_release(struct mm_struct *mm)
> {
> if (mm_has_notifiers(mm))
> @@ -509,6 +533,12 @@ mmu_notifier_range_blockable(const struct mmu_notifier_range *range)
> return true;
> }
>
> +static inline bool
> +mmu_notifier_range_use_change_pte(const struct mmu_notifier_range *range)
> +{
> + return false;
> +}
> +
> static inline int mm_has_notifiers(struct mm_struct *mm)
> {
> return 0;
> diff --git a/mm/ksm.c b/mm/ksm.c
> index b782fadade8f..41e51882f999 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -1066,9 +1066,9 @@ static int write_protect_page(struct vm_area_struct *vma, struct page *page,
>
> BUG_ON(PageTransCompound(page));
>
> - mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm,
> - pvmw.address,
> - pvmw.address + PAGE_SIZE);
> + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR,
> + MMU_NOTIFIER_USE_CHANGE_PTE, vma, mm,
> + pvmw.address, pvmw.address + PAGE_SIZE);
> mmu_notifier_invalidate_range_start(&range);
>
> if (!page_vma_mapped_walk(&pvmw))
> @@ -1155,8 +1155,9 @@ static int replace_page(struct vm_area_struct *vma, struct page *page,
> if (!pmd)
> goto out;
>
> - mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, addr,
> - addr + PAGE_SIZE);
> + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR,
> + MMU_NOTIFIER_USE_CHANGE_PTE,
> + vma, mm, addr, addr + PAGE_SIZE);
> mmu_notifier_invalidate_range_start(&range);
>
> ptep = pte_offset_map_lock(mm, pmd, addr, &ptl);
> diff --git a/mm/memory.c b/mm/memory.c
> index 45dbc174a88c..cb71d3ff1b97 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2282,8 +2282,9 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
>
> __SetPageUptodate(new_page);
>
> - mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm,
> - vmf->address & PAGE_MASK,
> + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR,
> + MMU_NOTIFIER_USE_CHANGE_PTE,
> + vma, mm, vmf->address & PAGE_MASK,
> (vmf->address & PAGE_MASK) + PAGE_SIZE);
> mmu_notifier_invalidate_range_start(&range);
>
>
Reviewed-by: Ralph Campbell <[email protected]>