2023-01-25 23:36:03

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v3 0/7] introduce vm_flags modifier functions

This patchset was originally published as a part of per-VMA locking [1] and
was split after suggestion that it's viable on its own and to facilitate
the review process. It is now a preprequisite for the next version of per-VMA
lock patchset, which reuses vm_flags modifier functions to lock the VMA when
vm_flags are being updated.

VMA vm_flags modifications are usually done under exclusive mmap_lock
protection because this attrubute affects other decisions like VMA merging
or splitting and races should be prevented. Introduce vm_flags modifier
functions to enforce correct locking.

The patchset applies cleanly over mm-unstable branch of mm tree.

My apologies for an extremely large distribution list. The patch touches
lots of files and many are in arch/ and drivers/.

Changes since v2 [2]
- Add an additional patch to convert vma assignment to a memcpy.
- Change vm_flags to a union of __private and const fields,
per Peter Zijlstra and Matthew Wilcox.
- Changed vm_flags type to vm_flags_t, per Matthew Wilcox.
- Removed unnecessary BUG_ON's from ksm_madvise and hugepage_madvise,
per Michal Hocko.
- Documented mod_vm_flags_nolock usage, per Michal Hocko.

[1] https://lore.kernel.org/all/[email protected]/
[2] https://lore.kernel.org/lkml/[email protected]/

Suren Baghdasaryan (7):
kernel/fork: convert vma assignment to a memcpy
mm: introduce vma->vm_flags wrapper functions
mm: replace VM_LOCKED_CLEAR_MASK with VM_LOCKED_MASK
mm: replace vma->vm_flags direct modifications with modifier calls
mm: replace vma->vm_flags indirect modification in ksm_madvise
mm: introduce mod_vm_flags_nolock and use it in untrack_pfn
mm: export dump_mm()

arch/arm/kernel/process.c | 2 +-
arch/ia64/mm/init.c | 8 +--
arch/loongarch/include/asm/tlb.h | 2 +-
arch/powerpc/kvm/book3s_hv_uvmem.c | 5 +-
arch/powerpc/kvm/book3s_xive_native.c | 2 +-
arch/powerpc/mm/book3s64/subpage_prot.c | 2 +-
arch/powerpc/platforms/book3s/vas-api.c | 2 +-
arch/powerpc/platforms/cell/spufs/file.c | 14 ++---
arch/s390/mm/gmap.c | 8 ++-
arch/x86/entry/vsyscall/vsyscall_64.c | 2 +-
arch/x86/kernel/cpu/sgx/driver.c | 2 +-
arch/x86/kernel/cpu/sgx/virt.c | 2 +-
arch/x86/mm/pat/memtype.c | 14 +++--
arch/x86/um/mem_32.c | 2 +-
drivers/acpi/pfr_telemetry.c | 2 +-
drivers/android/binder.c | 3 +-
drivers/char/mspec.c | 2 +-
drivers/crypto/hisilicon/qm.c | 2 +-
drivers/dax/device.c | 2 +-
drivers/dma/idxd/cdev.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 +-
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 4 +-
drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c | 4 +-
drivers/gpu/drm/amd/amdkfd/kfd_events.c | 4 +-
drivers/gpu/drm/amd/amdkfd/kfd_process.c | 4 +-
drivers/gpu/drm/drm_gem.c | 2 +-
drivers/gpu/drm/drm_gem_dma_helper.c | 3 +-
drivers/gpu/drm/drm_gem_shmem_helper.c | 2 +-
drivers/gpu/drm/drm_vm.c | 8 +--
drivers/gpu/drm/etnaviv/etnaviv_gem.c | 2 +-
drivers/gpu/drm/exynos/exynos_drm_gem.c | 4 +-
drivers/gpu/drm/gma500/framebuffer.c | 2 +-
drivers/gpu/drm/i810/i810_dma.c | 2 +-
drivers/gpu/drm/i915/gem/i915_gem_mman.c | 4 +-
drivers/gpu/drm/mediatek/mtk_drm_gem.c | 2 +-
drivers/gpu/drm/msm/msm_gem.c | 2 +-
drivers/gpu/drm/omapdrm/omap_gem.c | 3 +-
drivers/gpu/drm/rockchip/rockchip_drm_gem.c | 3 +-
drivers/gpu/drm/tegra/gem.c | 5 +-
drivers/gpu/drm/ttm/ttm_bo_vm.c | 3 +-
drivers/gpu/drm/virtio/virtgpu_vram.c | 2 +-
drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c | 2 +-
drivers/gpu/drm/xen/xen_drm_front_gem.c | 3 +-
drivers/hsi/clients/cmt_speech.c | 2 +-
drivers/hwtracing/intel_th/msu.c | 2 +-
drivers/hwtracing/stm/core.c | 2 +-
drivers/infiniband/hw/hfi1/file_ops.c | 4 +-
drivers/infiniband/hw/mlx5/main.c | 4 +-
drivers/infiniband/hw/qib/qib_file_ops.c | 13 ++---
drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 2 +-
.../infiniband/hw/vmw_pvrdma/pvrdma_verbs.c | 2 +-
.../common/videobuf2/videobuf2-dma-contig.c | 2 +-
.../common/videobuf2/videobuf2-vmalloc.c | 2 +-
drivers/media/v4l2-core/videobuf-dma-contig.c | 2 +-
drivers/media/v4l2-core/videobuf-dma-sg.c | 4 +-
drivers/media/v4l2-core/videobuf-vmalloc.c | 2 +-
drivers/misc/cxl/context.c | 2 +-
drivers/misc/habanalabs/common/memory.c | 2 +-
drivers/misc/habanalabs/gaudi/gaudi.c | 4 +-
drivers/misc/habanalabs/gaudi2/gaudi2.c | 8 +--
drivers/misc/habanalabs/goya/goya.c | 4 +-
drivers/misc/ocxl/context.c | 4 +-
drivers/misc/ocxl/sysfs.c | 2 +-
drivers/misc/open-dice.c | 4 +-
drivers/misc/sgi-gru/grufile.c | 4 +-
drivers/misc/uacce/uacce.c | 2 +-
drivers/sbus/char/oradax.c | 2 +-
drivers/scsi/cxlflash/ocxl_hw.c | 2 +-
drivers/scsi/sg.c | 2 +-
.../staging/media/atomisp/pci/hmm/hmm_bo.c | 2 +-
drivers/staging/media/deprecated/meye/meye.c | 4 +-
.../media/deprecated/stkwebcam/stk-webcam.c | 2 +-
drivers/target/target_core_user.c | 2 +-
drivers/uio/uio.c | 2 +-
drivers/usb/core/devio.c | 3 +-
drivers/usb/mon/mon_bin.c | 3 +-
drivers/vdpa/vdpa_user/iova_domain.c | 2 +-
drivers/vfio/pci/vfio_pci_core.c | 2 +-
drivers/vhost/vdpa.c | 2 +-
drivers/video/fbdev/68328fb.c | 2 +-
drivers/video/fbdev/core/fb_defio.c | 4 +-
drivers/xen/gntalloc.c | 2 +-
drivers/xen/gntdev.c | 4 +-
drivers/xen/privcmd-buf.c | 2 +-
drivers/xen/privcmd.c | 4 +-
fs/aio.c | 2 +-
fs/cramfs/inode.c | 2 +-
fs/erofs/data.c | 2 +-
fs/exec.c | 4 +-
fs/ext4/file.c | 2 +-
fs/fuse/dax.c | 2 +-
fs/hugetlbfs/inode.c | 4 +-
fs/orangefs/file.c | 3 +-
fs/proc/task_mmu.c | 2 +-
fs/proc/vmcore.c | 3 +-
fs/userfaultfd.c | 2 +-
fs/xfs/xfs_file.c | 2 +-
include/linux/mm.h | 55 +++++++++++++++++--
include/linux/mm_types.h | 10 +++-
include/linux/pgtable.h | 5 +-
kernel/bpf/ringbuf.c | 4 +-
kernel/bpf/syscall.c | 4 +-
kernel/events/core.c | 2 +-
kernel/fork.c | 4 +-
kernel/kcov.c | 2 +-
kernel/relay.c | 2 +-
mm/debug.c | 1 +
mm/hugetlb.c | 4 +-
mm/madvise.c | 2 +-
mm/memory.c | 19 ++++---
mm/memremap.c | 4 +-
mm/mlock.c | 12 ++--
mm/mmap.c | 32 ++++++-----
mm/mprotect.c | 2 +-
mm/mremap.c | 8 +--
mm/nommu.c | 11 ++--
mm/secretmem.c | 2 +-
mm/shmem.c | 2 +-
mm/vmalloc.c | 2 +-
net/ipv4/tcp.c | 4 +-
security/selinux/selinuxfs.c | 6 +-
sound/core/oss/pcm_oss.c | 2 +-
sound/core/pcm_native.c | 9 +--
sound/soc/pxa/mmp-sspa.c | 2 +-
sound/usb/usx2y/us122l.c | 4 +-
sound/usb/usx2y/usX2Yhwdep.c | 2 +-
sound/usb/usx2y/usx2yhwdeppcm.c | 2 +-
127 files changed, 295 insertions(+), 234 deletions(-)

--
2.39.1



2023-01-25 23:36:06

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v3 1/7] kernel/fork: convert vma assignment to a memcpy

Convert vma assignment in vm_area_dup() to a memcpy() to prevent compiler
errors when we add a const modifier to vma->vm_flags.

Signed-off-by: Suren Baghdasaryan <[email protected]>
---
kernel/fork.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 6683c1b0f460..a531901859d9 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -482,7 +482,7 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struct *orig)
* orig->shared.rb may be modified concurrently, but the clone
* will be reinitialized.
*/
- *new = data_race(*orig);
+ memcpy(new, orig, sizeof(*new));
INIT_LIST_HEAD(&new->anon_vma_chain);
dup_anon_vma_name(orig, new);
}
--
2.39.1


2023-01-25 23:36:13

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v3 2/7] mm: introduce vma->vm_flags wrapper functions

vm_flags are among VMA attributes which affect decisions like VMA merging
and splitting. Therefore all vm_flags modifications are performed after
taking exclusive mmap_lock to prevent vm_flags updates racing with such
operations. Introduce modifier functions for vm_flags to be used whenever
flags are updated. This way we can better check and control correct
locking behavior during these updates.

Signed-off-by: Suren Baghdasaryan <[email protected]>
---
include/linux/mm.h | 37 +++++++++++++++++++++++++++++++++++++
include/linux/mm_types.h | 10 +++++++++-
2 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index c2f62bdce134..bf16ddd544a5 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -627,6 +627,43 @@ static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *mm)
INIT_LIST_HEAD(&vma->anon_vma_chain);
}

+/* Use when VMA is not part of the VMA tree and needs no locking */
+static inline void init_vm_flags(struct vm_area_struct *vma,
+ vm_flags_t flags)
+{
+ ACCESS_PRIVATE(vma, __vm_flags) = flags;
+}
+
+/* Use when VMA is part of the VMA tree and modifications need coordination */
+static inline void reset_vm_flags(struct vm_area_struct *vma,
+ vm_flags_t flags)
+{
+ mmap_assert_write_locked(vma->vm_mm);
+ init_vm_flags(vma, flags);
+}
+
+static inline void set_vm_flags(struct vm_area_struct *vma,
+ vm_flags_t flags)
+{
+ mmap_assert_write_locked(vma->vm_mm);
+ ACCESS_PRIVATE(vma, __vm_flags) |= flags;
+}
+
+static inline void clear_vm_flags(struct vm_area_struct *vma,
+ vm_flags_t flags)
+{
+ mmap_assert_write_locked(vma->vm_mm);
+ ACCESS_PRIVATE(vma, __vm_flags) &= ~flags;
+}
+
+static inline void mod_vm_flags(struct vm_area_struct *vma,
+ vm_flags_t set, vm_flags_t clear)
+{
+ mmap_assert_write_locked(vma->vm_mm);
+ ACCESS_PRIVATE(vma, __vm_flags) |= set;
+ ACCESS_PRIVATE(vma, __vm_flags) &= ~clear;
+}
+
static inline void vma_set_anonymous(struct vm_area_struct *vma)
{
vma->vm_ops = NULL;
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 2d6d790d9bed..bccbd5896850 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -491,7 +491,15 @@ struct vm_area_struct {
* See vmf_insert_mixed_prot() for discussion.
*/
pgprot_t vm_page_prot;
- unsigned long vm_flags; /* Flags, see mm.h. */
+
+ /*
+ * Flags, see mm.h.
+ * To modify use {init|reset|set|clear|mod}_vm_flags() functions.
+ */
+ union {
+ const vm_flags_t vm_flags;
+ vm_flags_t __private __vm_flags;
+ };

/*
* For areas with an address space and backing store,
--
2.39.1


2023-01-25 23:36:16

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v3 3/7] mm: replace VM_LOCKED_CLEAR_MASK with VM_LOCKED_MASK

To simplify the usage of VM_LOCKED_CLEAR_MASK in clear_vm_flags(),
replace it with VM_LOCKED_MASK bitmask and convert all users.

Signed-off-by: Suren Baghdasaryan <[email protected]>
Acked-by: Michal Hocko <[email protected]>
---
include/linux/mm.h | 4 ++--
kernel/fork.c | 2 +-
mm/hugetlb.c | 4 ++--
mm/mlock.c | 6 +++---
mm/mmap.c | 6 +++---
mm/mremap.c | 2 +-
6 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index bf16ddd544a5..cccbc2811827 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -421,8 +421,8 @@ extern unsigned int kobjsize(const void *objp);
/* This mask defines which mm->def_flags a process can inherit its parent */
#define VM_INIT_DEF_MASK VM_NOHUGEPAGE

-/* This mask is used to clear all the VMA flags used by mlock */
-#define VM_LOCKED_CLEAR_MASK (~(VM_LOCKED | VM_LOCKONFAULT))
+/* This mask represents all the VMA flag bits used by mlock */
+#define VM_LOCKED_MASK (VM_LOCKED | VM_LOCKONFAULT)

/* Arch-specific flags to clear when updating VM flags on protection change */
#ifndef VM_ARCH_CLEAR
diff --git a/kernel/fork.c b/kernel/fork.c
index a531901859d9..4f097999a570 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -669,7 +669,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
tmp->anon_vma = NULL;
} else if (anon_vma_fork(tmp, mpnt))
goto fail_nomem_anon_vma_fork;
- tmp->vm_flags &= ~(VM_LOCKED | VM_LOCKONFAULT);
+ clear_vm_flags(tmp, VM_LOCKED_MASK);
file = tmp->vm_file;
if (file) {
struct address_space *mapping = file->f_mapping;
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index d20c8b09890e..4ecdbad9a451 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6973,8 +6973,8 @@ static unsigned long page_table_shareable(struct vm_area_struct *svma,
unsigned long s_end = sbase + PUD_SIZE;

/* Allow segments to share if only one is marked locked */
- unsigned long vm_flags = vma->vm_flags & VM_LOCKED_CLEAR_MASK;
- unsigned long svm_flags = svma->vm_flags & VM_LOCKED_CLEAR_MASK;
+ unsigned long vm_flags = vma->vm_flags & ~VM_LOCKED_MASK;
+ unsigned long svm_flags = svma->vm_flags & ~VM_LOCKED_MASK;

/*
* match the virtual addresses, permission and the alignment of the
diff --git a/mm/mlock.c b/mm/mlock.c
index 0336f52e03d7..5c4fff93cd6b 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -497,7 +497,7 @@ static int apply_vma_lock_flags(unsigned long start, size_t len,
if (vma->vm_start != tmp)
return -ENOMEM;

- newflags = vma->vm_flags & VM_LOCKED_CLEAR_MASK;
+ newflags = vma->vm_flags & ~VM_LOCKED_MASK;
newflags |= flags;
/* Here we know that vma->vm_start <= nstart < vma->vm_end. */
tmp = vma->vm_end;
@@ -661,7 +661,7 @@ static int apply_mlockall_flags(int flags)
struct vm_area_struct *vma, *prev = NULL;
vm_flags_t to_add = 0;

- current->mm->def_flags &= VM_LOCKED_CLEAR_MASK;
+ current->mm->def_flags &= ~VM_LOCKED_MASK;
if (flags & MCL_FUTURE) {
current->mm->def_flags |= VM_LOCKED;

@@ -681,7 +681,7 @@ static int apply_mlockall_flags(int flags)
for_each_vma(vmi, vma) {
vm_flags_t newflags;

- newflags = vma->vm_flags & VM_LOCKED_CLEAR_MASK;
+ newflags = vma->vm_flags & ~VM_LOCKED_MASK;
newflags |= to_add;

/* Ignore errors */
diff --git a/mm/mmap.c b/mm/mmap.c
index d4abc6feced1..323bd253b25a 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2671,7 +2671,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
if ((vm_flags & VM_SPECIAL) || vma_is_dax(vma) ||
is_vm_hugetlb_page(vma) ||
vma == get_gate_vma(current->mm))
- vma->vm_flags &= VM_LOCKED_CLEAR_MASK;
+ clear_vm_flags(vma, VM_LOCKED_MASK);
else
mm->locked_vm += (len >> PAGE_SHIFT);
}
@@ -3340,8 +3340,8 @@ static struct vm_area_struct *__install_special_mapping(
vma->vm_start = addr;
vma->vm_end = addr + len;

- vma->vm_flags = vm_flags | mm->def_flags | VM_DONTEXPAND | VM_SOFTDIRTY;
- vma->vm_flags &= VM_LOCKED_CLEAR_MASK;
+ init_vm_flags(vma, (vm_flags | mm->def_flags |
+ VM_DONTEXPAND | VM_SOFTDIRTY) & ~VM_LOCKED_MASK);
vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);

vma->vm_ops = ops;
diff --git a/mm/mremap.c b/mm/mremap.c
index 1b3ee02bead7..35db9752cb6a 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -687,7 +687,7 @@ static unsigned long move_vma(struct vm_area_struct *vma,

if (unlikely(!err && (flags & MREMAP_DONTUNMAP))) {
/* We always clear VM_LOCKED[ONFAULT] on the old vma */
- vma->vm_flags &= VM_LOCKED_CLEAR_MASK;
+ clear_vm_flags(vma, VM_LOCKED_MASK);

/*
* anon_vma links of the old vma is no longer needed after its page
--
2.39.1


2023-01-25 23:36:19

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v3 5/7] mm: replace vma->vm_flags indirect modification in ksm_madvise

Replace indirect modifications to vma->vm_flags with calls to modifier
functions to be able to track flag changes and to keep vma locking
correctness.

Signed-off-by: Suren Baghdasaryan <[email protected]>
Acked-by: Michal Hocko <[email protected]>
---
arch/powerpc/kvm/book3s_hv_uvmem.c | 5 ++++-
arch/s390/mm/gmap.c | 5 ++++-
2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 1d67baa5557a..325a7a47d348 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -393,6 +393,7 @@ static int kvmppc_memslot_page_merge(struct kvm *kvm,
{
unsigned long gfn = memslot->base_gfn;
unsigned long end, start = gfn_to_hva(kvm, gfn);
+ unsigned long vm_flags;
int ret = 0;
struct vm_area_struct *vma;
int merge_flag = (merge) ? MADV_MERGEABLE : MADV_UNMERGEABLE;
@@ -409,12 +410,14 @@ static int kvmppc_memslot_page_merge(struct kvm *kvm,
ret = H_STATE;
break;
}
+ vm_flags = vma->vm_flags;
ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
- merge_flag, &vma->vm_flags);
+ merge_flag, &vm_flags);
if (ret) {
ret = H_STATE;
break;
}
+ reset_vm_flags(vma, vm_flags);
start = vma->vm_end;
} while (end > vma->vm_end);

diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index 3a695b8a1e3c..d5eb47dcdacb 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -2587,14 +2587,17 @@ int gmap_mark_unmergeable(void)
{
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
+ unsigned long vm_flags;
int ret;
VMA_ITERATOR(vmi, mm, 0);

for_each_vma(vmi, vma) {
+ vm_flags = vma->vm_flags;
ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
- MADV_UNMERGEABLE, &vma->vm_flags);
+ MADV_UNMERGEABLE, &vm_flags);
if (ret)
return ret;
+ reset_vm_flags(vma, vm_flags);
}
mm->def_flags &= ~VM_MERGEABLE;
return 0;
--
2.39.1


2023-01-25 23:36:26

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v3 4/7] mm: replace vma->vm_flags direct modifications with modifier calls

Replace direct modifications to vma->vm_flags with calls to modifier
functions to be able to track flag changes and to keep vma locking
correctness.

Signed-off-by: Suren Baghdasaryan <[email protected]>
Acked-by: Michal Hocko <[email protected]>
---
arch/arm/kernel/process.c | 2 +-
arch/ia64/mm/init.c | 8 ++++----
arch/loongarch/include/asm/tlb.h | 2 +-
arch/powerpc/kvm/book3s_xive_native.c | 2 +-
arch/powerpc/mm/book3s64/subpage_prot.c | 2 +-
arch/powerpc/platforms/book3s/vas-api.c | 2 +-
arch/powerpc/platforms/cell/spufs/file.c | 14 +++++++-------
arch/s390/mm/gmap.c | 3 +--
arch/x86/entry/vsyscall/vsyscall_64.c | 2 +-
arch/x86/kernel/cpu/sgx/driver.c | 2 +-
arch/x86/kernel/cpu/sgx/virt.c | 2 +-
arch/x86/mm/pat/memtype.c | 6 +++---
arch/x86/um/mem_32.c | 2 +-
drivers/acpi/pfr_telemetry.c | 2 +-
drivers/android/binder.c | 3 +--
drivers/char/mspec.c | 2 +-
drivers/crypto/hisilicon/qm.c | 2 +-
drivers/dax/device.c | 2 +-
drivers/dma/idxd/cdev.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 +-
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 4 ++--
drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c | 4 ++--
drivers/gpu/drm/amd/amdkfd/kfd_events.c | 4 ++--
drivers/gpu/drm/amd/amdkfd/kfd_process.c | 4 ++--
drivers/gpu/drm/drm_gem.c | 2 +-
drivers/gpu/drm/drm_gem_dma_helper.c | 3 +--
drivers/gpu/drm/drm_gem_shmem_helper.c | 2 +-
drivers/gpu/drm/drm_vm.c | 8 ++++----
drivers/gpu/drm/etnaviv/etnaviv_gem.c | 2 +-
drivers/gpu/drm/exynos/exynos_drm_gem.c | 4 ++--
drivers/gpu/drm/gma500/framebuffer.c | 2 +-
drivers/gpu/drm/i810/i810_dma.c | 2 +-
drivers/gpu/drm/i915/gem/i915_gem_mman.c | 4 ++--
drivers/gpu/drm/mediatek/mtk_drm_gem.c | 2 +-
drivers/gpu/drm/msm/msm_gem.c | 2 +-
drivers/gpu/drm/omapdrm/omap_gem.c | 3 +--
drivers/gpu/drm/rockchip/rockchip_drm_gem.c | 3 +--
drivers/gpu/drm/tegra/gem.c | 5 ++---
drivers/gpu/drm/ttm/ttm_bo_vm.c | 3 +--
drivers/gpu/drm/virtio/virtgpu_vram.c | 2 +-
drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c | 2 +-
drivers/gpu/drm/xen/xen_drm_front_gem.c | 3 +--
drivers/hsi/clients/cmt_speech.c | 2 +-
drivers/hwtracing/intel_th/msu.c | 2 +-
drivers/hwtracing/stm/core.c | 2 +-
drivers/infiniband/hw/hfi1/file_ops.c | 4 ++--
drivers/infiniband/hw/mlx5/main.c | 4 ++--
drivers/infiniband/hw/qib/qib_file_ops.c | 13 ++++++-------
drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 2 +-
drivers/infiniband/hw/vmw_pvrdma/pvrdma_verbs.c | 2 +-
.../media/common/videobuf2/videobuf2-dma-contig.c | 2 +-
drivers/media/common/videobuf2/videobuf2-vmalloc.c | 2 +-
drivers/media/v4l2-core/videobuf-dma-contig.c | 2 +-
drivers/media/v4l2-core/videobuf-dma-sg.c | 4 ++--
drivers/media/v4l2-core/videobuf-vmalloc.c | 2 +-
drivers/misc/cxl/context.c | 2 +-
drivers/misc/habanalabs/common/memory.c | 2 +-
drivers/misc/habanalabs/gaudi/gaudi.c | 4 ++--
drivers/misc/habanalabs/gaudi2/gaudi2.c | 8 ++++----
drivers/misc/habanalabs/goya/goya.c | 4 ++--
drivers/misc/ocxl/context.c | 4 ++--
drivers/misc/ocxl/sysfs.c | 2 +-
drivers/misc/open-dice.c | 4 ++--
drivers/misc/sgi-gru/grufile.c | 4 ++--
drivers/misc/uacce/uacce.c | 2 +-
drivers/sbus/char/oradax.c | 2 +-
drivers/scsi/cxlflash/ocxl_hw.c | 2 +-
drivers/scsi/sg.c | 2 +-
drivers/staging/media/atomisp/pci/hmm/hmm_bo.c | 2 +-
drivers/staging/media/deprecated/meye/meye.c | 4 ++--
.../media/deprecated/stkwebcam/stk-webcam.c | 2 +-
drivers/target/target_core_user.c | 2 +-
drivers/uio/uio.c | 2 +-
drivers/usb/core/devio.c | 3 +--
drivers/usb/mon/mon_bin.c | 3 +--
drivers/vdpa/vdpa_user/iova_domain.c | 2 +-
drivers/vfio/pci/vfio_pci_core.c | 2 +-
drivers/vhost/vdpa.c | 2 +-
drivers/video/fbdev/68328fb.c | 2 +-
drivers/video/fbdev/core/fb_defio.c | 4 ++--
drivers/xen/gntalloc.c | 2 +-
drivers/xen/gntdev.c | 4 ++--
drivers/xen/privcmd-buf.c | 2 +-
drivers/xen/privcmd.c | 4 ++--
fs/aio.c | 2 +-
fs/cramfs/inode.c | 2 +-
fs/erofs/data.c | 2 +-
fs/exec.c | 4 ++--
fs/ext4/file.c | 2 +-
fs/fuse/dax.c | 2 +-
fs/hugetlbfs/inode.c | 4 ++--
fs/orangefs/file.c | 3 +--
fs/proc/task_mmu.c | 2 +-
fs/proc/vmcore.c | 3 +--
fs/userfaultfd.c | 2 +-
fs/xfs/xfs_file.c | 2 +-
include/linux/mm.h | 2 +-
kernel/bpf/ringbuf.c | 4 ++--
kernel/bpf/syscall.c | 4 ++--
kernel/events/core.c | 2 +-
kernel/kcov.c | 2 +-
kernel/relay.c | 2 +-
mm/madvise.c | 2 +-
mm/memory.c | 6 +++---
mm/mlock.c | 6 +++---
mm/mmap.c | 10 +++++-----
mm/mprotect.c | 2 +-
mm/mremap.c | 6 +++---
mm/nommu.c | 11 ++++++-----
mm/secretmem.c | 2 +-
mm/shmem.c | 2 +-
mm/vmalloc.c | 2 +-
net/ipv4/tcp.c | 4 ++--
security/selinux/selinuxfs.c | 6 +++---
sound/core/oss/pcm_oss.c | 2 +-
sound/core/pcm_native.c | 9 +++++----
sound/soc/pxa/mmp-sspa.c | 2 +-
sound/usb/usx2y/us122l.c | 4 ++--
sound/usb/usx2y/usX2Yhwdep.c | 2 +-
sound/usb/usx2y/usx2yhwdeppcm.c | 2 +-
120 files changed, 188 insertions(+), 199 deletions(-)

diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index f811733a8fc5..ec65f3ea3150 100644
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -316,7 +316,7 @@ static int __init gate_vma_init(void)
gate_vma.vm_page_prot = PAGE_READONLY_EXEC;
gate_vma.vm_start = 0xffff0000;
gate_vma.vm_end = 0xffff0000 + PAGE_SIZE;
- gate_vma.vm_flags = VM_READ | VM_EXEC | VM_MAYREAD | VM_MAYEXEC;
+ init_vm_flags(&gate_vma, VM_READ | VM_EXEC | VM_MAYREAD | VM_MAYEXEC);
return 0;
}
arch_initcall(gate_vma_init);
diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index fc4e4217e87f..d355e0ce28ab 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -109,7 +109,7 @@ ia64_init_addr_space (void)
vma_set_anonymous(vma);
vma->vm_start = current->thread.rbs_bot & PAGE_MASK;
vma->vm_end = vma->vm_start + PAGE_SIZE;
- vma->vm_flags = VM_DATA_DEFAULT_FLAGS|VM_GROWSUP|VM_ACCOUNT;
+ init_vm_flags(vma, VM_DATA_DEFAULT_FLAGS|VM_GROWSUP|VM_ACCOUNT);
vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
mmap_write_lock(current->mm);
if (insert_vm_struct(current->mm, vma)) {
@@ -127,8 +127,8 @@ ia64_init_addr_space (void)
vma_set_anonymous(vma);
vma->vm_end = PAGE_SIZE;
vma->vm_page_prot = __pgprot(pgprot_val(PAGE_READONLY) | _PAGE_MA_NAT);
- vma->vm_flags = VM_READ | VM_MAYREAD | VM_IO |
- VM_DONTEXPAND | VM_DONTDUMP;
+ init_vm_flags(vma, VM_READ | VM_MAYREAD | VM_IO |
+ VM_DONTEXPAND | VM_DONTDUMP);
mmap_write_lock(current->mm);
if (insert_vm_struct(current->mm, vma)) {
mmap_write_unlock(current->mm);
@@ -272,7 +272,7 @@ static int __init gate_vma_init(void)
vma_init(&gate_vma, NULL);
gate_vma.vm_start = FIXADDR_USER_START;
gate_vma.vm_end = FIXADDR_USER_END;
- gate_vma.vm_flags = VM_READ | VM_MAYREAD | VM_EXEC | VM_MAYEXEC;
+ init_vm_flags(&gate_vma, VM_READ | VM_MAYREAD | VM_EXEC | VM_MAYEXEC);
gate_vma.vm_page_prot = __pgprot(__ACCESS_BITS | _PAGE_PL_3 | _PAGE_AR_RX);

return 0;
diff --git a/arch/loongarch/include/asm/tlb.h b/arch/loongarch/include/asm/tlb.h
index dd24f5898f65..51e35b44d105 100644
--- a/arch/loongarch/include/asm/tlb.h
+++ b/arch/loongarch/include/asm/tlb.h
@@ -149,7 +149,7 @@ static inline void tlb_flush(struct mmu_gather *tlb)
struct vm_area_struct vma;

vma.vm_mm = tlb->mm;
- vma.vm_flags = 0;
+ init_vm_flags(&vma, 0);
if (tlb->fullmm) {
flush_tlb_mm(tlb->mm);
return;
diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
index 4f566bea5e10..7976af0f5ff8 100644
--- a/arch/powerpc/kvm/book3s_xive_native.c
+++ b/arch/powerpc/kvm/book3s_xive_native.c
@@ -324,7 +324,7 @@ static int kvmppc_xive_native_mmap(struct kvm_device *dev,
return -EINVAL;
}

- vma->vm_flags |= VM_IO | VM_PFNMAP;
+ set_vm_flags(vma, VM_IO | VM_PFNMAP);
vma->vm_page_prot = pgprot_noncached_wc(vma->vm_page_prot);

/*
diff --git a/arch/powerpc/mm/book3s64/subpage_prot.c b/arch/powerpc/mm/book3s64/subpage_prot.c
index d73b3b4176e8..72948cdb1911 100644
--- a/arch/powerpc/mm/book3s64/subpage_prot.c
+++ b/arch/powerpc/mm/book3s64/subpage_prot.c
@@ -156,7 +156,7 @@ static void subpage_mark_vma_nohuge(struct mm_struct *mm, unsigned long addr,
* VM_NOHUGEPAGE and split them.
*/
for_each_vma_range(vmi, vma, addr + len) {
- vma->vm_flags |= VM_NOHUGEPAGE;
+ set_vm_flags(vma, VM_NOHUGEPAGE);
walk_page_vma(vma, &subpage_walk_ops, NULL);
}
}
diff --git a/arch/powerpc/platforms/book3s/vas-api.c b/arch/powerpc/platforms/book3s/vas-api.c
index 9580e8e12165..d5b8e55d010a 100644
--- a/arch/powerpc/platforms/book3s/vas-api.c
+++ b/arch/powerpc/platforms/book3s/vas-api.c
@@ -525,7 +525,7 @@ static int coproc_mmap(struct file *fp, struct vm_area_struct *vma)
pfn = paste_addr >> PAGE_SHIFT;

/* flags, page_prot from cxl_mmap(), except we want cachable */
- vma->vm_flags |= VM_IO | VM_PFNMAP;
+ set_vm_flags(vma, VM_IO | VM_PFNMAP);
vma->vm_page_prot = pgprot_cached(vma->vm_page_prot);

prot = __pgprot(pgprot_val(vma->vm_page_prot) | _PAGE_DIRTY);
diff --git a/arch/powerpc/platforms/cell/spufs/file.c b/arch/powerpc/platforms/cell/spufs/file.c
index 62d90a5e23d1..784fa39a484a 100644
--- a/arch/powerpc/platforms/cell/spufs/file.c
+++ b/arch/powerpc/platforms/cell/spufs/file.c
@@ -291,7 +291,7 @@ static int spufs_mem_mmap(struct file *file, struct vm_area_struct *vma)
if (!(vma->vm_flags & VM_SHARED))
return -EINVAL;

- vma->vm_flags |= VM_IO | VM_PFNMAP;
+ set_vm_flags(vma, VM_IO | VM_PFNMAP);
vma->vm_page_prot = pgprot_noncached_wc(vma->vm_page_prot);

vma->vm_ops = &spufs_mem_mmap_vmops;
@@ -381,7 +381,7 @@ static int spufs_cntl_mmap(struct file *file, struct vm_area_struct *vma)
if (!(vma->vm_flags & VM_SHARED))
return -EINVAL;

- vma->vm_flags |= VM_IO | VM_PFNMAP;
+ set_vm_flags(vma, VM_IO | VM_PFNMAP);
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);

vma->vm_ops = &spufs_cntl_mmap_vmops;
@@ -1043,7 +1043,7 @@ static int spufs_signal1_mmap(struct file *file, struct vm_area_struct *vma)
if (!(vma->vm_flags & VM_SHARED))
return -EINVAL;

- vma->vm_flags |= VM_IO | VM_PFNMAP;
+ set_vm_flags(vma, VM_IO | VM_PFNMAP);
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);

vma->vm_ops = &spufs_signal1_mmap_vmops;
@@ -1179,7 +1179,7 @@ static int spufs_signal2_mmap(struct file *file, struct vm_area_struct *vma)
if (!(vma->vm_flags & VM_SHARED))
return -EINVAL;

- vma->vm_flags |= VM_IO | VM_PFNMAP;
+ set_vm_flags(vma, VM_IO | VM_PFNMAP);
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);

vma->vm_ops = &spufs_signal2_mmap_vmops;
@@ -1302,7 +1302,7 @@ static int spufs_mss_mmap(struct file *file, struct vm_area_struct *vma)
if (!(vma->vm_flags & VM_SHARED))
return -EINVAL;

- vma->vm_flags |= VM_IO | VM_PFNMAP;
+ set_vm_flags(vma, VM_IO | VM_PFNMAP);
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);

vma->vm_ops = &spufs_mss_mmap_vmops;
@@ -1364,7 +1364,7 @@ static int spufs_psmap_mmap(struct file *file, struct vm_area_struct *vma)
if (!(vma->vm_flags & VM_SHARED))
return -EINVAL;

- vma->vm_flags |= VM_IO | VM_PFNMAP;
+ set_vm_flags(vma, VM_IO | VM_PFNMAP);
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);

vma->vm_ops = &spufs_psmap_mmap_vmops;
@@ -1424,7 +1424,7 @@ static int spufs_mfc_mmap(struct file *file, struct vm_area_struct *vma)
if (!(vma->vm_flags & VM_SHARED))
return -EINVAL;

- vma->vm_flags |= VM_IO | VM_PFNMAP;
+ set_vm_flags(vma, VM_IO | VM_PFNMAP);
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);

vma->vm_ops = &spufs_mfc_mmap_vmops;
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index 69af6cdf1a2a..3a695b8a1e3c 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -2522,8 +2522,7 @@ static inline void thp_split_mm(struct mm_struct *mm)
VMA_ITERATOR(vmi, mm, 0);

for_each_vma(vmi, vma) {
- vma->vm_flags &= ~VM_HUGEPAGE;
- vma->vm_flags |= VM_NOHUGEPAGE;
+ mod_vm_flags(vma, VM_NOHUGEPAGE, VM_HUGEPAGE);
walk_page_vma(vma, &thp_split_walk_ops, NULL);
}
mm->def_flags |= VM_NOHUGEPAGE;
diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c
index 4af81df133ee..e2a1626d86d8 100644
--- a/arch/x86/entry/vsyscall/vsyscall_64.c
+++ b/arch/x86/entry/vsyscall/vsyscall_64.c
@@ -391,7 +391,7 @@ void __init map_vsyscall(void)
}

if (vsyscall_mode == XONLY)
- gate_vma.vm_flags = VM_EXEC;
+ init_vm_flags(&gate_vma, VM_EXEC);

BUILD_BUG_ON((unsigned long)__fix_to_virt(VSYSCALL_PAGE) !=
(unsigned long)VSYSCALL_ADDR);
diff --git a/arch/x86/kernel/cpu/sgx/driver.c b/arch/x86/kernel/cpu/sgx/driver.c
index aa9b8b868867..42c0bded93b6 100644
--- a/arch/x86/kernel/cpu/sgx/driver.c
+++ b/arch/x86/kernel/cpu/sgx/driver.c
@@ -95,7 +95,7 @@ static int sgx_mmap(struct file *file, struct vm_area_struct *vma)
return ret;

vma->vm_ops = &sgx_vm_ops;
- vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP | VM_IO;
+ set_vm_flags(vma, VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP | VM_IO);
vma->vm_private_data = encl;

return 0;
diff --git a/arch/x86/kernel/cpu/sgx/virt.c b/arch/x86/kernel/cpu/sgx/virt.c
index 6a77a14eee38..0774a0bfeb28 100644
--- a/arch/x86/kernel/cpu/sgx/virt.c
+++ b/arch/x86/kernel/cpu/sgx/virt.c
@@ -105,7 +105,7 @@ static int sgx_vepc_mmap(struct file *file, struct vm_area_struct *vma)

vma->vm_ops = &sgx_vepc_vm_ops;
/* Don't copy VMA in fork() */
- vma->vm_flags |= VM_PFNMAP | VM_IO | VM_DONTDUMP | VM_DONTCOPY;
+ set_vm_flags(vma, VM_PFNMAP | VM_IO | VM_DONTDUMP | VM_DONTCOPY);
vma->vm_private_data = vepc;

return 0;
diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
index fb4b1b5e0dea..ae9645c900fa 100644
--- a/arch/x86/mm/pat/memtype.c
+++ b/arch/x86/mm/pat/memtype.c
@@ -1000,7 +1000,7 @@ int track_pfn_remap(struct vm_area_struct *vma, pgprot_t *prot,

ret = reserve_pfn_range(paddr, size, prot, 0);
if (ret == 0 && vma)
- vma->vm_flags |= VM_PAT;
+ set_vm_flags(vma, VM_PAT);
return ret;
}

@@ -1066,7 +1066,7 @@ void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
}
free_pfn_range(paddr, size);
if (vma)
- vma->vm_flags &= ~VM_PAT;
+ clear_vm_flags(vma, VM_PAT);
}

/*
@@ -1076,7 +1076,7 @@ void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
*/
void untrack_pfn_moved(struct vm_area_struct *vma)
{
- vma->vm_flags &= ~VM_PAT;
+ clear_vm_flags(vma, VM_PAT);
}

pgprot_t pgprot_writecombine(pgprot_t prot)
diff --git a/arch/x86/um/mem_32.c b/arch/x86/um/mem_32.c
index cafd01f730da..bfd2c320ad25 100644
--- a/arch/x86/um/mem_32.c
+++ b/arch/x86/um/mem_32.c
@@ -16,7 +16,7 @@ static int __init gate_vma_init(void)
vma_init(&gate_vma, NULL);
gate_vma.vm_start = FIXADDR_USER_START;
gate_vma.vm_end = FIXADDR_USER_END;
- gate_vma.vm_flags = VM_READ | VM_MAYREAD | VM_EXEC | VM_MAYEXEC;
+ init_vm_flags(&gate_vma, VM_READ | VM_MAYREAD | VM_EXEC | VM_MAYEXEC);
gate_vma.vm_page_prot = PAGE_READONLY;

return 0;
diff --git a/drivers/acpi/pfr_telemetry.c b/drivers/acpi/pfr_telemetry.c
index 27fb6cdad75f..9e339c705b5b 100644
--- a/drivers/acpi/pfr_telemetry.c
+++ b/drivers/acpi/pfr_telemetry.c
@@ -310,7 +310,7 @@ pfrt_log_mmap(struct file *file, struct vm_area_struct *vma)
return -EROFS;

/* changing from read to write with mprotect is not allowed */
- vma->vm_flags &= ~VM_MAYWRITE;
+ clear_vm_flags(vma, VM_MAYWRITE);

pfrt_log_dev = to_pfrt_log_dev(file);

diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index 880224ec6abb..dd6c99223b8c 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -5572,8 +5572,7 @@ static int binder_mmap(struct file *filp, struct vm_area_struct *vma)
proc->pid, vma->vm_start, vma->vm_end, "bad vm_flags", -EPERM);
return -EPERM;
}
- vma->vm_flags |= VM_DONTCOPY | VM_MIXEDMAP;
- vma->vm_flags &= ~VM_MAYWRITE;
+ mod_vm_flags(vma, VM_DONTCOPY | VM_MIXEDMAP, VM_MAYWRITE);

vma->vm_ops = &binder_vm_ops;
vma->vm_private_data = proc;
diff --git a/drivers/char/mspec.c b/drivers/char/mspec.c
index f8231e2e84be..57bd36a28f95 100644
--- a/drivers/char/mspec.c
+++ b/drivers/char/mspec.c
@@ -206,7 +206,7 @@ mspec_mmap(struct file *file, struct vm_area_struct *vma,
refcount_set(&vdata->refcnt, 1);
vma->vm_private_data = vdata;

- vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP;
+ set_vm_flags(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP);
if (vdata->type == MSPEC_UNCACHED)
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
vma->vm_ops = &mspec_vm_ops;
diff --git a/drivers/crypto/hisilicon/qm.c b/drivers/crypto/hisilicon/qm.c
index 007ac7a69ce7..57ecdb5c97fb 100644
--- a/drivers/crypto/hisilicon/qm.c
+++ b/drivers/crypto/hisilicon/qm.c
@@ -2363,7 +2363,7 @@ static int hisi_qm_uacce_mmap(struct uacce_queue *q,
return -EINVAL;
}

- vma->vm_flags |= VM_IO;
+ set_vm_flags(vma, VM_IO);

return remap_pfn_range(vma, vma->vm_start,
phys_base >> PAGE_SHIFT,
diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index 5494d745ced5..6e9726dfaa7e 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -308,7 +308,7 @@ static int dax_mmap(struct file *filp, struct vm_area_struct *vma)
return rc;

vma->vm_ops = &dax_vm_ops;
- vma->vm_flags |= VM_HUGEPAGE;
+ set_vm_flags(vma, VM_HUGEPAGE);
return 0;
}

diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c
index e13e92609943..51cf836cf329 100644
--- a/drivers/dma/idxd/cdev.c
+++ b/drivers/dma/idxd/cdev.c
@@ -201,7 +201,7 @@ static int idxd_cdev_mmap(struct file *filp, struct vm_area_struct *vma)
if (rc < 0)
return rc;

- vma->vm_flags |= VM_DONTCOPY;
+ set_vm_flags(vma, VM_DONTCOPY);
pfn = (base + idxd_get_wq_portal_full_offset(wq->id,
IDXD_PORTAL_LIMITED)) >> PAGE_SHIFT;
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index bb7350ea1d75..70b08a0d13cd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -257,7 +257,7 @@ static int amdgpu_gem_object_mmap(struct drm_gem_object *obj, struct vm_area_str
*/
if (is_cow_mapping(vma->vm_flags) &&
!(vma->vm_flags & VM_ACCESS_FLAGS))
- vma->vm_flags &= ~VM_MAYWRITE;
+ clear_vm_flags(vma, VM_MAYWRITE);

return drm_gem_ttm_mmap(obj, vma);
}
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 6d291aa6386b..7beb8dd6a5e6 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2879,8 +2879,8 @@ static int kfd_mmio_mmap(struct kfd_dev *dev, struct kfd_process *process,

address = dev->adev->rmmio_remap.bus_addr;

- vma->vm_flags |= VM_IO | VM_DONTCOPY | VM_DONTEXPAND | VM_NORESERVE |
- VM_DONTDUMP | VM_PFNMAP;
+ set_vm_flags(vma, VM_IO | VM_DONTCOPY | VM_DONTEXPAND | VM_NORESERVE |
+ VM_DONTDUMP | VM_PFNMAP);

vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
index cd4e61bf0493..6cbe47cf9be5 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
@@ -159,8 +159,8 @@ int kfd_doorbell_mmap(struct kfd_dev *dev, struct kfd_process *process,
address = kfd_get_process_doorbells(pdd);
if (!address)
return -ENOMEM;
- vma->vm_flags |= VM_IO | VM_DONTCOPY | VM_DONTEXPAND | VM_NORESERVE |
- VM_DONTDUMP | VM_PFNMAP;
+ set_vm_flags(vma, VM_IO | VM_DONTCOPY | VM_DONTEXPAND | VM_NORESERVE |
+ VM_DONTDUMP | VM_PFNMAP);

vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
index 729d26d648af..95cd20056cea 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
@@ -1052,8 +1052,8 @@ int kfd_event_mmap(struct kfd_process *p, struct vm_area_struct *vma)
pfn = __pa(page->kernel_address);
pfn >>= PAGE_SHIFT;

- vma->vm_flags |= VM_IO | VM_DONTCOPY | VM_DONTEXPAND | VM_NORESERVE
- | VM_DONTDUMP | VM_PFNMAP;
+ set_vm_flags(vma, VM_IO | VM_DONTCOPY | VM_DONTEXPAND | VM_NORESERVE
+ | VM_DONTDUMP | VM_PFNMAP);

pr_debug("Mapping signal page\n");
pr_debug(" start user address == 0x%08lx\n", vma->vm_start);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 51b1683ac5c1..b40f4b122918 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1978,8 +1978,8 @@ int kfd_reserved_mem_mmap(struct kfd_dev *dev, struct kfd_process *process,
return -ENOMEM;
}

- vma->vm_flags |= VM_IO | VM_DONTCOPY | VM_DONTEXPAND
- | VM_NORESERVE | VM_DONTDUMP | VM_PFNMAP;
+ set_vm_flags(vma, VM_IO | VM_DONTCOPY | VM_DONTEXPAND
+ | VM_NORESERVE | VM_DONTDUMP | VM_PFNMAP);
/* Mapping pages to user process */
return remap_pfn_range(vma, vma->vm_start,
PFN_DOWN(__pa(qpd->cwsr_kaddr)),
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index b8db675e7fb5..6ea7bcaa592b 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -1047,7 +1047,7 @@ int drm_gem_mmap_obj(struct drm_gem_object *obj, unsigned long obj_size,
goto err_drm_gem_object_put;
}

- vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP;
+ set_vm_flags(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP);
vma->vm_page_prot = pgprot_writecombine(vm_get_page_prot(vma->vm_flags));
vma->vm_page_prot = pgprot_decrypted(vma->vm_page_prot);
}
diff --git a/drivers/gpu/drm/drm_gem_dma_helper.c b/drivers/gpu/drm/drm_gem_dma_helper.c
index 1e658c448366..41f241b9a581 100644
--- a/drivers/gpu/drm/drm_gem_dma_helper.c
+++ b/drivers/gpu/drm/drm_gem_dma_helper.c
@@ -530,8 +530,7 @@ int drm_gem_dma_mmap(struct drm_gem_dma_object *dma_obj, struct vm_area_struct *
* the whole buffer.
*/
vma->vm_pgoff -= drm_vma_node_start(&obj->vma_node);
- vma->vm_flags &= ~VM_PFNMAP;
- vma->vm_flags |= VM_DONTEXPAND;
+ mod_vm_flags(vma, VM_DONTEXPAND, VM_PFNMAP);

if (dma_obj->map_noncoherent) {
vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c b/drivers/gpu/drm/drm_gem_shmem_helper.c
index b602cd72a120..a5032dfac492 100644
--- a/drivers/gpu/drm/drm_gem_shmem_helper.c
+++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
@@ -633,7 +633,7 @@ int drm_gem_shmem_mmap(struct drm_gem_shmem_object *shmem, struct vm_area_struct
if (ret)
return ret;

- vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP;
+ set_vm_flags(vma, VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP);
vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
if (shmem->map_wc)
vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
diff --git a/drivers/gpu/drm/drm_vm.c b/drivers/gpu/drm/drm_vm.c
index f024dc93939e..8867bb6c40e3 100644
--- a/drivers/gpu/drm/drm_vm.c
+++ b/drivers/gpu/drm/drm_vm.c
@@ -476,7 +476,7 @@ static int drm_mmap_dma(struct file *filp, struct vm_area_struct *vma)

if (!capable(CAP_SYS_ADMIN) &&
(dma->flags & _DRM_DMA_USE_PCI_RO)) {
- vma->vm_flags &= ~(VM_WRITE | VM_MAYWRITE);
+ clear_vm_flags(vma, VM_WRITE | VM_MAYWRITE);
#if defined(__i386__) || defined(__x86_64__)
pgprot_val(vma->vm_page_prot) &= ~_PAGE_RW;
#else
@@ -492,7 +492,7 @@ static int drm_mmap_dma(struct file *filp, struct vm_area_struct *vma)

vma->vm_ops = &drm_vm_dma_ops;

- vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
+ set_vm_flags(vma, VM_DONTEXPAND | VM_DONTDUMP);

drm_vm_open_locked(dev, vma);
return 0;
@@ -560,7 +560,7 @@ static int drm_mmap_locked(struct file *filp, struct vm_area_struct *vma)
return -EINVAL;

if (!capable(CAP_SYS_ADMIN) && (map->flags & _DRM_READ_ONLY)) {
- vma->vm_flags &= ~(VM_WRITE | VM_MAYWRITE);
+ clear_vm_flags(vma, VM_WRITE | VM_MAYWRITE);
#if defined(__i386__) || defined(__x86_64__)
pgprot_val(vma->vm_page_prot) &= ~_PAGE_RW;
#else
@@ -628,7 +628,7 @@ static int drm_mmap_locked(struct file *filp, struct vm_area_struct *vma)
default:
return -EINVAL; /* This should never happen. */
}
- vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
+ set_vm_flags(vma, VM_DONTEXPAND | VM_DONTDUMP);

drm_vm_open_locked(dev, vma);
return 0;
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
index c5ae5492e1af..9a5a317038a4 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
@@ -130,7 +130,7 @@ static int etnaviv_gem_mmap_obj(struct etnaviv_gem_object *etnaviv_obj,
{
pgprot_t vm_page_prot;

- vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP;
+ set_vm_flags(vma, VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP);

vm_page_prot = vm_get_page_prot(vma->vm_flags);

diff --git a/drivers/gpu/drm/exynos/exynos_drm_gem.c b/drivers/gpu/drm/exynos/exynos_drm_gem.c
index 3e493f48e0d4..c330d415729c 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_gem.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_gem.c
@@ -274,7 +274,7 @@ static int exynos_drm_gem_mmap_buffer(struct exynos_drm_gem *exynos_gem,
unsigned long vm_size;
int ret;

- vma->vm_flags &= ~VM_PFNMAP;
+ clear_vm_flags(vma, VM_PFNMAP);
vma->vm_pgoff = 0;

vm_size = vma->vm_end - vma->vm_start;
@@ -368,7 +368,7 @@ static int exynos_drm_gem_mmap(struct drm_gem_object *obj, struct vm_area_struct
if (obj->import_attach)
return dma_buf_mmap(obj->dma_buf, vma, 0);

- vma->vm_flags |= VM_IO | VM_DONTEXPAND | VM_DONTDUMP;
+ set_vm_flags(vma, VM_IO | VM_DONTEXPAND | VM_DONTDUMP);

DRM_DEV_DEBUG_KMS(to_dma_dev(obj->dev), "flags = 0x%x\n",
exynos_gem->flags);
diff --git a/drivers/gpu/drm/gma500/framebuffer.c b/drivers/gpu/drm/gma500/framebuffer.c
index 8d5a37b8f110..471d5b3c1535 100644
--- a/drivers/gpu/drm/gma500/framebuffer.c
+++ b/drivers/gpu/drm/gma500/framebuffer.c
@@ -139,7 +139,7 @@ static int psbfb_mmap(struct fb_info *info, struct vm_area_struct *vma)
*/
vma->vm_ops = &psbfb_vm_ops;
vma->vm_private_data = (void *)fb;
- vma->vm_flags |= VM_IO | VM_MIXEDMAP | VM_DONTEXPAND | VM_DONTDUMP;
+ set_vm_flags(vma, VM_IO | VM_MIXEDMAP | VM_DONTEXPAND | VM_DONTDUMP);
return 0;
}

diff --git a/drivers/gpu/drm/i810/i810_dma.c b/drivers/gpu/drm/i810/i810_dma.c
index 9fb4dd63342f..bced8c30709e 100644
--- a/drivers/gpu/drm/i810/i810_dma.c
+++ b/drivers/gpu/drm/i810/i810_dma.c
@@ -102,7 +102,7 @@ static int i810_mmap_buffers(struct file *filp, struct vm_area_struct *vma)
buf = dev_priv->mmap_buffer;
buf_priv = buf->dev_private;

- vma->vm_flags |= VM_DONTCOPY;
+ set_vm_flags(vma, VM_DONTCOPY);

buf_priv->currently_mapped = I810_BUF_MAPPED;

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
index 0ad44f3868de..71b9e0485cb9 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -979,7 +979,7 @@ int i915_gem_mmap(struct file *filp, struct vm_area_struct *vma)
i915_gem_object_put(obj);
return -EINVAL;
}
- vma->vm_flags &= ~VM_MAYWRITE;
+ clear_vm_flags(vma, VM_MAYWRITE);
}

anon = mmap_singleton(to_i915(dev));
@@ -988,7 +988,7 @@ int i915_gem_mmap(struct file *filp, struct vm_area_struct *vma)
return PTR_ERR(anon);
}

- vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP | VM_IO;
+ set_vm_flags(vma, VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP | VM_IO);

/*
* We keep the ref on mmo->obj, not vm_file, but we require
diff --git a/drivers/gpu/drm/mediatek/mtk_drm_gem.c b/drivers/gpu/drm/mediatek/mtk_drm_gem.c
index 47e96b0289f9..427089733b87 100644
--- a/drivers/gpu/drm/mediatek/mtk_drm_gem.c
+++ b/drivers/gpu/drm/mediatek/mtk_drm_gem.c
@@ -158,7 +158,7 @@ static int mtk_drm_gem_object_mmap(struct drm_gem_object *obj,
* dma_alloc_attrs() allocated a struct page table for mtk_gem, so clear
* VM_PFNMAP flag that was set by drm_gem_mmap_obj()/drm_gem_mmap().
*/
- vma->vm_flags |= VM_IO | VM_DONTEXPAND | VM_DONTDUMP;
+ set_vm_flags(vma, VM_IO | VM_DONTEXPAND | VM_DONTDUMP);
vma->vm_page_prot = pgprot_writecombine(vm_get_page_prot(vma->vm_flags));
vma->vm_page_prot = pgprot_decrypted(vma->vm_page_prot);

diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 1dee0d18abbb..8aff3ae909af 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -1012,7 +1012,7 @@ static int msm_gem_object_mmap(struct drm_gem_object *obj, struct vm_area_struct
{
struct msm_gem_object *msm_obj = to_msm_bo(obj);

- vma->vm_flags |= VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP;
+ set_vm_flags(vma, VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP);
vma->vm_page_prot = msm_gem_pgprot(msm_obj, vm_get_page_prot(vma->vm_flags));

return 0;
diff --git a/drivers/gpu/drm/omapdrm/omap_gem.c b/drivers/gpu/drm/omapdrm/omap_gem.c
index cf571796fd26..9c0e7d6a3784 100644
--- a/drivers/gpu/drm/omapdrm/omap_gem.c
+++ b/drivers/gpu/drm/omapdrm/omap_gem.c
@@ -543,8 +543,7 @@ int omap_gem_mmap_obj(struct drm_gem_object *obj,
{
struct omap_gem_object *omap_obj = to_omap_bo(obj);

- vma->vm_flags &= ~VM_PFNMAP;
- vma->vm_flags |= VM_MIXEDMAP;
+ mod_vm_flags(vma, VM_MIXEDMAP, VM_PFNMAP);

if (omap_obj->flags & OMAP_BO_WC) {
vma->vm_page_prot = pgprot_writecombine(vm_get_page_prot(vma->vm_flags));
diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_gem.c b/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
index 6edb7c52cb3d..735b64bbdcf2 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_gem.c
@@ -251,8 +251,7 @@ static int rockchip_drm_gem_object_mmap(struct drm_gem_object *obj,
* We allocated a struct page table for rk_obj, so clear
* VM_PFNMAP flag that was set by drm_gem_mmap_obj()/drm_gem_mmap().
*/
- vma->vm_flags |= VM_IO | VM_DONTEXPAND | VM_DONTDUMP;
- vma->vm_flags &= ~VM_PFNMAP;
+ mod_vm_flags(vma, VM_IO | VM_DONTEXPAND | VM_DONTDUMP, VM_PFNMAP);

vma->vm_page_prot = pgprot_writecombine(vm_get_page_prot(vma->vm_flags));
vma->vm_page_prot = pgprot_decrypted(vma->vm_page_prot);
diff --git a/drivers/gpu/drm/tegra/gem.c b/drivers/gpu/drm/tegra/gem.c
index 979e7bc902f6..6cdc6c45ef27 100644
--- a/drivers/gpu/drm/tegra/gem.c
+++ b/drivers/gpu/drm/tegra/gem.c
@@ -574,7 +574,7 @@ int __tegra_gem_mmap(struct drm_gem_object *gem, struct vm_area_struct *vma)
* and set the vm_pgoff (used as a fake buffer offset by DRM)
* to 0 as we want to map the whole buffer.
*/
- vma->vm_flags &= ~VM_PFNMAP;
+ clear_vm_flags(vma, VM_PFNMAP);
vma->vm_pgoff = 0;

err = dma_mmap_wc(gem->dev->dev, vma, bo->vaddr, bo->iova,
@@ -588,8 +588,7 @@ int __tegra_gem_mmap(struct drm_gem_object *gem, struct vm_area_struct *vma)
} else {
pgprot_t prot = vm_get_page_prot(vma->vm_flags);

- vma->vm_flags |= VM_MIXEDMAP;
- vma->vm_flags &= ~VM_PFNMAP;
+ mod_vm_flags(vma, VM_MIXEDMAP, VM_PFNMAP);

vma->vm_page_prot = pgprot_writecombine(prot);
}
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index 5a3e4b891377..0861e6e33964 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -468,8 +468,7 @@ int ttm_bo_mmap_obj(struct vm_area_struct *vma, struct ttm_buffer_object *bo)

vma->vm_private_data = bo;

- vma->vm_flags |= VM_PFNMAP;
- vma->vm_flags |= VM_IO | VM_DONTEXPAND | VM_DONTDUMP;
+ set_vm_flags(vma, VM_PFNMAP | VM_IO | VM_DONTEXPAND | VM_DONTDUMP);
return 0;
}
EXPORT_SYMBOL(ttm_bo_mmap_obj);
diff --git a/drivers/gpu/drm/virtio/virtgpu_vram.c b/drivers/gpu/drm/virtio/virtgpu_vram.c
index 6b45b0429fef..5498a1dbef63 100644
--- a/drivers/gpu/drm/virtio/virtgpu_vram.c
+++ b/drivers/gpu/drm/virtio/virtgpu_vram.c
@@ -46,7 +46,7 @@ static int virtio_gpu_vram_mmap(struct drm_gem_object *obj,
return -EINVAL;

vma->vm_pgoff -= drm_vma_node_start(&obj->vma_node);
- vma->vm_flags |= VM_MIXEDMAP | VM_DONTEXPAND;
+ set_vm_flags(vma, VM_MIXEDMAP | VM_DONTEXPAND);
vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
vma->vm_page_prot = pgprot_decrypted(vma->vm_page_prot);
vma->vm_ops = &virtio_gpu_vram_vm_ops;
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c
index 265f7c48d856..8c8015528b6f 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c
@@ -97,7 +97,7 @@ int vmw_mmap(struct file *filp, struct vm_area_struct *vma)

/* Use VM_PFNMAP rather than VM_MIXEDMAP if not a COW mapping */
if (!is_cow_mapping(vma->vm_flags))
- vma->vm_flags = (vma->vm_flags & ~VM_MIXEDMAP) | VM_PFNMAP;
+ mod_vm_flags(vma, VM_PFNMAP, VM_MIXEDMAP);

ttm_bo_put(bo); /* release extra ref taken by ttm_bo_mmap_obj() */

diff --git a/drivers/gpu/drm/xen/xen_drm_front_gem.c b/drivers/gpu/drm/xen/xen_drm_front_gem.c
index 4c95ebcdcc2d..18a93ad4aa1f 100644
--- a/drivers/gpu/drm/xen/xen_drm_front_gem.c
+++ b/drivers/gpu/drm/xen/xen_drm_front_gem.c
@@ -69,8 +69,7 @@ static int xen_drm_front_gem_object_mmap(struct drm_gem_object *gem_obj,
* vm_pgoff (used as a fake buffer offset by DRM) to 0 as we want to map
* the whole buffer.
*/
- vma->vm_flags &= ~VM_PFNMAP;
- vma->vm_flags |= VM_MIXEDMAP | VM_DONTEXPAND;
+ mod_vm_flags(vma, VM_MIXEDMAP | VM_DONTEXPAND, VM_PFNMAP);
vma->vm_pgoff = 0;

/*
diff --git a/drivers/hsi/clients/cmt_speech.c b/drivers/hsi/clients/cmt_speech.c
index 8069f795c864..952a31e742a1 100644
--- a/drivers/hsi/clients/cmt_speech.c
+++ b/drivers/hsi/clients/cmt_speech.c
@@ -1264,7 +1264,7 @@ static int cs_char_mmap(struct file *file, struct vm_area_struct *vma)
if (vma_pages(vma) != 1)
return -EINVAL;

- vma->vm_flags |= VM_IO | VM_DONTDUMP | VM_DONTEXPAND;
+ set_vm_flags(vma, VM_IO | VM_DONTDUMP | VM_DONTEXPAND);
vma->vm_ops = &cs_char_vm_ops;
vma->vm_private_data = file->private_data;

diff --git a/drivers/hwtracing/intel_th/msu.c b/drivers/hwtracing/intel_th/msu.c
index 6c8215a47a60..a6f178bf3ded 100644
--- a/drivers/hwtracing/intel_th/msu.c
+++ b/drivers/hwtracing/intel_th/msu.c
@@ -1659,7 +1659,7 @@ static int intel_th_msc_mmap(struct file *file, struct vm_area_struct *vma)
atomic_dec(&msc->user_count);

vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
- vma->vm_flags |= VM_DONTEXPAND | VM_DONTCOPY;
+ set_vm_flags(vma, VM_DONTEXPAND | VM_DONTCOPY);
vma->vm_ops = &msc_mmap_ops;
return ret;
}
diff --git a/drivers/hwtracing/stm/core.c b/drivers/hwtracing/stm/core.c
index 2712e699ba08..9a59e61c4194 100644
--- a/drivers/hwtracing/stm/core.c
+++ b/drivers/hwtracing/stm/core.c
@@ -715,7 +715,7 @@ static int stm_char_mmap(struct file *file, struct vm_area_struct *vma)
pm_runtime_get_sync(&stm->dev);

vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
- vma->vm_flags |= VM_IO | VM_DONTEXPAND | VM_DONTDUMP;
+ set_vm_flags(vma, VM_IO | VM_DONTEXPAND | VM_DONTDUMP);
vma->vm_ops = &stm_mmap_vmops;
vm_iomap_memory(vma, phys, size);

diff --git a/drivers/infiniband/hw/hfi1/file_ops.c b/drivers/infiniband/hw/hfi1/file_ops.c
index f5f9269fdc16..7294f2d33bc6 100644
--- a/drivers/infiniband/hw/hfi1/file_ops.c
+++ b/drivers/infiniband/hw/hfi1/file_ops.c
@@ -403,7 +403,7 @@ static int hfi1_file_mmap(struct file *fp, struct vm_area_struct *vma)
ret = -EPERM;
goto done;
}
- vma->vm_flags &= ~VM_MAYWRITE;
+ clear_vm_flags(vma, VM_MAYWRITE);
addr = vma->vm_start;
for (i = 0 ; i < uctxt->egrbufs.numbufs; i++) {
memlen = uctxt->egrbufs.buffers[i].len;
@@ -528,7 +528,7 @@ static int hfi1_file_mmap(struct file *fp, struct vm_area_struct *vma)
goto done;
}

- vma->vm_flags = flags;
+ reset_vm_flags(vma, flags);
hfi1_cdbg(PROC,
"%u:%u type:%u io/vf:%d/%d, addr:0x%llx, len:%lu(%lu), flags:0x%lx\n",
ctxt, subctxt, type, mapio, vmf, memaddr, memlen,
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index c669ef6e47e7..538318c809b3 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -2087,7 +2087,7 @@ static int mlx5_ib_mmap_clock_info_page(struct mlx5_ib_dev *dev,

if (vma->vm_flags & (VM_WRITE | VM_EXEC))
return -EPERM;
- vma->vm_flags &= ~VM_MAYWRITE;
+ clear_vm_flags(vma, VM_MAYWRITE);

if (!dev->mdev->clock_info)
return -EOPNOTSUPP;
@@ -2311,7 +2311,7 @@ static int mlx5_ib_mmap(struct ib_ucontext *ibcontext, struct vm_area_struct *vm

if (vma->vm_flags & VM_WRITE)
return -EPERM;
- vma->vm_flags &= ~VM_MAYWRITE;
+ clear_vm_flags(vma, VM_MAYWRITE);

/* Don't expose to user-space information it shouldn't have */
if (PAGE_SIZE > 4096)
diff --git a/drivers/infiniband/hw/qib/qib_file_ops.c b/drivers/infiniband/hw/qib/qib_file_ops.c
index 3937144b2ae5..16ef80df4b7f 100644
--- a/drivers/infiniband/hw/qib/qib_file_ops.c
+++ b/drivers/infiniband/hw/qib/qib_file_ops.c
@@ -733,7 +733,7 @@ static int qib_mmap_mem(struct vm_area_struct *vma, struct qib_ctxtdata *rcd,
}

/* don't allow them to later change with mprotect */
- vma->vm_flags &= ~VM_MAYWRITE;
+ clear_vm_flags(vma, VM_MAYWRITE);
}

pfn = virt_to_phys(kvaddr) >> PAGE_SHIFT;
@@ -769,7 +769,7 @@ static int mmap_ureg(struct vm_area_struct *vma, struct qib_devdata *dd,
phys = dd->physaddr + ureg;
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);

- vma->vm_flags |= VM_DONTCOPY | VM_DONTEXPAND;
+ set_vm_flags(vma, VM_DONTCOPY | VM_DONTEXPAND);
ret = io_remap_pfn_range(vma, vma->vm_start,
phys >> PAGE_SHIFT,
vma->vm_end - vma->vm_start,
@@ -810,8 +810,7 @@ static int mmap_piobufs(struct vm_area_struct *vma,
* don't allow them to later change to readable with mprotect (for when
* not initially mapped readable, as is normally the case)
*/
- vma->vm_flags &= ~VM_MAYREAD;
- vma->vm_flags |= VM_DONTCOPY | VM_DONTEXPAND;
+ mod_vm_flags(vma, VM_DONTCOPY | VM_DONTEXPAND, VM_MAYREAD);

/* We used PAT if wc_cookie == 0 */
if (!dd->wc_cookie)
@@ -852,7 +851,7 @@ static int mmap_rcvegrbufs(struct vm_area_struct *vma,
goto bail;
}
/* don't allow them to later change to writable with mprotect */
- vma->vm_flags &= ~VM_MAYWRITE;
+ clear_vm_flags(vma, VM_MAYWRITE);

start = vma->vm_start;

@@ -944,7 +943,7 @@ static int mmap_kvaddr(struct vm_area_struct *vma, u64 pgaddr,
* Don't allow permission to later change to writable
* with mprotect.
*/
- vma->vm_flags &= ~VM_MAYWRITE;
+ clear_vm_flags(vma, VM_MAYWRITE);
} else
goto bail;
len = vma->vm_end - vma->vm_start;
@@ -955,7 +954,7 @@ static int mmap_kvaddr(struct vm_area_struct *vma, u64 pgaddr,

vma->vm_pgoff = (unsigned long) addr >> PAGE_SHIFT;
vma->vm_ops = &qib_file_vm_ops;
- vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
+ set_vm_flags(vma, VM_DONTEXPAND | VM_DONTDUMP);
ret = 1;

bail:
diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
index 6e8c4fbb8083..6f9237c2a26b 100644
--- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
+++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
@@ -672,7 +672,7 @@ int usnic_ib_mmap(struct ib_ucontext *context,
usnic_dbg("\n");

us_ibdev = to_usdev(context->device);
- vma->vm_flags |= VM_IO;
+ set_vm_flags(vma, VM_IO);
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
vfid = vma->vm_pgoff;
usnic_dbg("Page Offset %lu PAGE_SHIFT %u VFID %u\n",
diff --git a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_verbs.c b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_verbs.c
index 19176583dbde..7f1b7b5dd3f4 100644
--- a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_verbs.c
+++ b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_verbs.c
@@ -408,7 +408,7 @@ int pvrdma_mmap(struct ib_ucontext *ibcontext, struct vm_area_struct *vma)
}

/* Map UAR to kernel space, VM_LOCKED? */
- vma->vm_flags |= VM_DONTCOPY | VM_DONTEXPAND;
+ set_vm_flags(vma, VM_DONTCOPY | VM_DONTEXPAND);
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
if (io_remap_pfn_range(vma, start, context->uar.pfn, size,
vma->vm_page_prot))
diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
index 5f1175f8b349..e66ae399749e 100644
--- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
+++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
@@ -293,7 +293,7 @@ static int vb2_dc_mmap(void *buf_priv, struct vm_area_struct *vma)
return ret;
}

- vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
+ set_vm_flags(vma, VM_DONTEXPAND | VM_DONTDUMP);
vma->vm_private_data = &buf->handler;
vma->vm_ops = &vb2_common_vm_ops;

diff --git a/drivers/media/common/videobuf2/videobuf2-vmalloc.c b/drivers/media/common/videobuf2/videobuf2-vmalloc.c
index 959b45beb1f3..edb47240ec17 100644
--- a/drivers/media/common/videobuf2/videobuf2-vmalloc.c
+++ b/drivers/media/common/videobuf2/videobuf2-vmalloc.c
@@ -185,7 +185,7 @@ static int vb2_vmalloc_mmap(void *buf_priv, struct vm_area_struct *vma)
/*
* Make sure that vm_areas for 2 buffers won't be merged together
*/
- vma->vm_flags |= VM_DONTEXPAND;
+ set_vm_flags(vma, VM_DONTEXPAND);

/*
* Use common vm_area operations to track buffer refcount.
diff --git a/drivers/media/v4l2-core/videobuf-dma-contig.c b/drivers/media/v4l2-core/videobuf-dma-contig.c
index f2c439359557..c030823185ba 100644
--- a/drivers/media/v4l2-core/videobuf-dma-contig.c
+++ b/drivers/media/v4l2-core/videobuf-dma-contig.c
@@ -314,7 +314,7 @@ static int __videobuf_mmap_mapper(struct videobuf_queue *q,
}

vma->vm_ops = &videobuf_vm_ops;
- vma->vm_flags |= VM_DONTEXPAND;
+ set_vm_flags(vma, VM_DONTEXPAND);
vma->vm_private_data = map;

dev_dbg(q->dev, "mmap %p: q=%p %08lx-%08lx (%lx) pgoff %08lx buf %d\n",
diff --git a/drivers/media/v4l2-core/videobuf-dma-sg.c b/drivers/media/v4l2-core/videobuf-dma-sg.c
index 234e9f647c96..9adac4875f29 100644
--- a/drivers/media/v4l2-core/videobuf-dma-sg.c
+++ b/drivers/media/v4l2-core/videobuf-dma-sg.c
@@ -630,8 +630,8 @@ static int __videobuf_mmap_mapper(struct videobuf_queue *q,
map->count = 1;
map->q = q;
vma->vm_ops = &videobuf_vm_ops;
- vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
- vma->vm_flags &= ~VM_IO; /* using shared anonymous pages */
+ /* using shared anonymous pages */
+ mod_vm_flags(vma, VM_DONTEXPAND | VM_DONTDUMP, VM_IO);
vma->vm_private_data = map;
dprintk(1, "mmap %p: q=%p %08lx-%08lx pgoff %08lx bufs %d-%d\n",
map, q, vma->vm_start, vma->vm_end, vma->vm_pgoff, first, last);
diff --git a/drivers/media/v4l2-core/videobuf-vmalloc.c b/drivers/media/v4l2-core/videobuf-vmalloc.c
index 9b2443720ab0..48d439ccd414 100644
--- a/drivers/media/v4l2-core/videobuf-vmalloc.c
+++ b/drivers/media/v4l2-core/videobuf-vmalloc.c
@@ -247,7 +247,7 @@ static int __videobuf_mmap_mapper(struct videobuf_queue *q,
}

vma->vm_ops = &videobuf_vm_ops;
- vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
+ set_vm_flags(vma, VM_DONTEXPAND | VM_DONTDUMP);
vma->vm_private_data = map;

dprintk(1, "mmap %p: q=%p %08lx-%08lx (%lx) pgoff %08lx buf %d\n",
diff --git a/drivers/misc/cxl/context.c b/drivers/misc/cxl/context.c
index acaa44809c58..17562e4efcb2 100644
--- a/drivers/misc/cxl/context.c
+++ b/drivers/misc/cxl/context.c
@@ -220,7 +220,7 @@ int cxl_context_iomap(struct cxl_context *ctx, struct vm_area_struct *vma)
pr_devel("%s: mmio physical: %llx pe: %i master:%i\n", __func__,
ctx->psn_phys, ctx->pe , ctx->master);

- vma->vm_flags |= VM_IO | VM_PFNMAP;
+ set_vm_flags(vma, VM_IO | VM_PFNMAP);
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
vma->vm_ops = &cxl_mmap_vmops;
return 0;
diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c
index 5e9ae7600d75..ad8eae764b9b 100644
--- a/drivers/misc/habanalabs/common/memory.c
+++ b/drivers/misc/habanalabs/common/memory.c
@@ -2082,7 +2082,7 @@ static int hl_ts_mmap(struct hl_mmap_mem_buf *buf, struct vm_area_struct *vma, v
{
struct hl_ts_buff *ts_buff = buf->private;

- vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP | VM_DONTCOPY | VM_NORESERVE;
+ set_vm_flags(vma, VM_DONTEXPAND | VM_DONTDUMP | VM_DONTCOPY | VM_NORESERVE);
return remap_vmalloc_range(vma, ts_buff->user_buff_address, 0);
}

diff --git a/drivers/misc/habanalabs/gaudi/gaudi.c b/drivers/misc/habanalabs/gaudi/gaudi.c
index 9f5e208701ba..4186f04da224 100644
--- a/drivers/misc/habanalabs/gaudi/gaudi.c
+++ b/drivers/misc/habanalabs/gaudi/gaudi.c
@@ -4236,8 +4236,8 @@ static int gaudi_mmap(struct hl_device *hdev, struct vm_area_struct *vma,
{
int rc;

- vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP |
- VM_DONTCOPY | VM_NORESERVE;
+ set_vm_flags(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP |
+ VM_DONTCOPY | VM_NORESERVE);

rc = dma_mmap_coherent(hdev->dev, vma, cpu_addr,
(dma_addr - HOST_PHYS_BASE), size);
diff --git a/drivers/misc/habanalabs/gaudi2/gaudi2.c b/drivers/misc/habanalabs/gaudi2/gaudi2.c
index e793fb2bdcbe..7311c3053944 100644
--- a/drivers/misc/habanalabs/gaudi2/gaudi2.c
+++ b/drivers/misc/habanalabs/gaudi2/gaudi2.c
@@ -5538,8 +5538,8 @@ static int gaudi2_mmap(struct hl_device *hdev, struct vm_area_struct *vma,
{
int rc;

- vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP |
- VM_DONTCOPY | VM_NORESERVE;
+ set_vm_flags(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP |
+ VM_DONTCOPY | VM_NORESERVE);

#ifdef _HAS_DMA_MMAP_COHERENT

@@ -10116,8 +10116,8 @@ static int gaudi2_block_mmap(struct hl_device *hdev, struct vm_area_struct *vma,

address = pci_resource_start(hdev->pdev, SRAM_CFG_BAR_ID) + offset_in_bar;

- vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP |
- VM_DONTCOPY | VM_NORESERVE;
+ set_vm_flags(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP |
+ VM_DONTCOPY | VM_NORESERVE);

rc = remap_pfn_range(vma, vma->vm_start, address >> PAGE_SHIFT,
block_size, vma->vm_page_prot);
diff --git a/drivers/misc/habanalabs/goya/goya.c b/drivers/misc/habanalabs/goya/goya.c
index 0f083fcf81a6..5e2aaa26ea29 100644
--- a/drivers/misc/habanalabs/goya/goya.c
+++ b/drivers/misc/habanalabs/goya/goya.c
@@ -2880,8 +2880,8 @@ static int goya_mmap(struct hl_device *hdev, struct vm_area_struct *vma,
{
int rc;

- vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP |
- VM_DONTCOPY | VM_NORESERVE;
+ set_vm_flags(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP |
+ VM_DONTCOPY | VM_NORESERVE);

rc = dma_mmap_coherent(hdev->dev, vma, cpu_addr,
(dma_addr - HOST_PHYS_BASE), size);
diff --git a/drivers/misc/ocxl/context.c b/drivers/misc/ocxl/context.c
index 9eb0d93b01c6..e6f941248e93 100644
--- a/drivers/misc/ocxl/context.c
+++ b/drivers/misc/ocxl/context.c
@@ -180,7 +180,7 @@ static int check_mmap_afu_irq(struct ocxl_context *ctx,
if ((vma->vm_flags & VM_READ) || (vma->vm_flags & VM_EXEC) ||
!(vma->vm_flags & VM_WRITE))
return -EINVAL;
- vma->vm_flags &= ~(VM_MAYREAD | VM_MAYEXEC);
+ clear_vm_flags(vma, VM_MAYREAD | VM_MAYEXEC);
return 0;
}

@@ -204,7 +204,7 @@ int ocxl_context_mmap(struct ocxl_context *ctx, struct vm_area_struct *vma)
if (rc)
return rc;

- vma->vm_flags |= VM_IO | VM_PFNMAP;
+ set_vm_flags(vma, VM_IO | VM_PFNMAP);
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
vma->vm_ops = &ocxl_vmops;
return 0;
diff --git a/drivers/misc/ocxl/sysfs.c b/drivers/misc/ocxl/sysfs.c
index 25c78df8055d..9398246cac79 100644
--- a/drivers/misc/ocxl/sysfs.c
+++ b/drivers/misc/ocxl/sysfs.c
@@ -134,7 +134,7 @@ static int global_mmio_mmap(struct file *filp, struct kobject *kobj,
(afu->config.global_mmio_size >> PAGE_SHIFT))
return -EINVAL;

- vma->vm_flags |= VM_IO | VM_PFNMAP;
+ set_vm_flags(vma, VM_IO | VM_PFNMAP);
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
vma->vm_ops = &global_mmio_vmops;
vma->vm_private_data = afu;
diff --git a/drivers/misc/open-dice.c b/drivers/misc/open-dice.c
index 9dda47b3fd70..61b4747270aa 100644
--- a/drivers/misc/open-dice.c
+++ b/drivers/misc/open-dice.c
@@ -95,12 +95,12 @@ static int open_dice_mmap(struct file *filp, struct vm_area_struct *vma)
if (vma->vm_flags & VM_WRITE)
return -EPERM;
/* Ensure userspace cannot acquire VM_WRITE later. */
- vma->vm_flags &= ~VM_MAYWRITE;
+ clear_vm_flags(vma, VM_MAYSHARE);
}

/* Create write-combine mapping so all clients observe a wipe. */
vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
- vma->vm_flags |= VM_DONTCOPY | VM_DONTDUMP;
+ set_vm_flags(vma, VM_DONTCOPY | VM_DONTDUMP);
return vm_iomap_memory(vma, drvdata->rmem->base, drvdata->rmem->size);
}

diff --git a/drivers/misc/sgi-gru/grufile.c b/drivers/misc/sgi-gru/grufile.c
index 7ffcfc0bb587..8b777286d3b2 100644
--- a/drivers/misc/sgi-gru/grufile.c
+++ b/drivers/misc/sgi-gru/grufile.c
@@ -101,8 +101,8 @@ static int gru_file_mmap(struct file *file, struct vm_area_struct *vma)
vma->vm_end & (GRU_GSEG_PAGESIZE - 1))
return -EINVAL;

- vma->vm_flags |= VM_IO | VM_PFNMAP | VM_LOCKED |
- VM_DONTCOPY | VM_DONTEXPAND | VM_DONTDUMP;
+ set_vm_flags(vma, VM_IO | VM_PFNMAP | VM_LOCKED |
+ VM_DONTCOPY | VM_DONTEXPAND | VM_DONTDUMP);
vma->vm_page_prot = PAGE_SHARED;
vma->vm_ops = &gru_vm_ops;

diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c
index 905eff1f840e..f57e91cdb0f6 100644
--- a/drivers/misc/uacce/uacce.c
+++ b/drivers/misc/uacce/uacce.c
@@ -229,7 +229,7 @@ static int uacce_fops_mmap(struct file *filep, struct vm_area_struct *vma)
if (!qfr)
return -ENOMEM;

- vma->vm_flags |= VM_DONTCOPY | VM_DONTEXPAND | VM_WIPEONFORK;
+ set_vm_flags(vma, VM_DONTCOPY | VM_DONTEXPAND | VM_WIPEONFORK);
vma->vm_ops = &uacce_vm_ops;
vma->vm_private_data = q;
qfr->type = type;
diff --git a/drivers/sbus/char/oradax.c b/drivers/sbus/char/oradax.c
index 21b7cb6e7e70..a096734daad0 100644
--- a/drivers/sbus/char/oradax.c
+++ b/drivers/sbus/char/oradax.c
@@ -389,7 +389,7 @@ static int dax_devmap(struct file *f, struct vm_area_struct *vma)
/* completion area is mapped read-only for user */
if (vma->vm_flags & VM_WRITE)
return -EPERM;
- vma->vm_flags &= ~VM_MAYWRITE;
+ clear_vm_flags(vma, VM_MAYWRITE);

if (remap_pfn_range(vma, vma->vm_start, ctx->ca_buf_ra >> PAGE_SHIFT,
len, vma->vm_page_prot))
diff --git a/drivers/scsi/cxlflash/ocxl_hw.c b/drivers/scsi/cxlflash/ocxl_hw.c
index 631eda2d467e..d386c25c2699 100644
--- a/drivers/scsi/cxlflash/ocxl_hw.c
+++ b/drivers/scsi/cxlflash/ocxl_hw.c
@@ -1167,7 +1167,7 @@ static int afu_mmap(struct file *file, struct vm_area_struct *vma)
(ctx->psn_size >> PAGE_SHIFT))
return -EINVAL;

- vma->vm_flags |= VM_IO | VM_PFNMAP;
+ set_vm_flags(vma, VM_IO | VM_PFNMAP);
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
vma->vm_ops = &ocxlflash_vmops;
return 0;
diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index ff9854f59964..7438adfe3bdc 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -1288,7 +1288,7 @@ sg_mmap(struct file *filp, struct vm_area_struct *vma)
}

sfp->mmap_called = 1;
- vma->vm_flags |= VM_IO | VM_DONTEXPAND | VM_DONTDUMP;
+ set_vm_flags(vma, VM_IO | VM_DONTEXPAND | VM_DONTDUMP);
vma->vm_private_data = sfp;
vma->vm_ops = &sg_mmap_vm_ops;
out:
diff --git a/drivers/staging/media/atomisp/pci/hmm/hmm_bo.c b/drivers/staging/media/atomisp/pci/hmm/hmm_bo.c
index 5e53eed8ae95..df1c944e5058 100644
--- a/drivers/staging/media/atomisp/pci/hmm/hmm_bo.c
+++ b/drivers/staging/media/atomisp/pci/hmm/hmm_bo.c
@@ -1072,7 +1072,7 @@ int hmm_bo_mmap(struct vm_area_struct *vma, struct hmm_buffer_object *bo)
vma->vm_private_data = bo;

vma->vm_ops = &hmm_bo_vm_ops;
- vma->vm_flags |= VM_IO | VM_DONTEXPAND | VM_DONTDUMP;
+ set_vm_flags(vma, VM_IO | VM_DONTEXPAND | VM_DONTDUMP);

/*
* call hmm_bo_vm_open explicitly.
diff --git a/drivers/staging/media/deprecated/meye/meye.c b/drivers/staging/media/deprecated/meye/meye.c
index 5d87efd9b95c..2505e64d7119 100644
--- a/drivers/staging/media/deprecated/meye/meye.c
+++ b/drivers/staging/media/deprecated/meye/meye.c
@@ -1476,8 +1476,8 @@ static int meye_mmap(struct file *file, struct vm_area_struct *vma)
}

vma->vm_ops = &meye_vm_ops;
- vma->vm_flags &= ~VM_IO; /* not I/O memory */
- vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
+ /* not I/O memory */
+ mod_vm_flags(vma, VM_DONTEXPAND | VM_DONTDUMP, VM_IO);
vma->vm_private_data = (void *) (offset / gbufsize);
meye_vm_open(vma);

diff --git a/drivers/staging/media/deprecated/stkwebcam/stk-webcam.c b/drivers/staging/media/deprecated/stkwebcam/stk-webcam.c
index 787edb3d47c2..196d1034f104 100644
--- a/drivers/staging/media/deprecated/stkwebcam/stk-webcam.c
+++ b/drivers/staging/media/deprecated/stkwebcam/stk-webcam.c
@@ -779,7 +779,7 @@ static int v4l_stk_mmap(struct file *fp, struct vm_area_struct *vma)
ret = remap_vmalloc_range(vma, sbuf->buffer, 0);
if (ret)
return ret;
- vma->vm_flags |= VM_DONTEXPAND;
+ set_vm_flags(vma, VM_DONTEXPAND);
vma->vm_private_data = sbuf;
vma->vm_ops = &stk_v4l_vm_ops;
sbuf->v4lbuf.flags |= V4L2_BUF_FLAG_MAPPED;
diff --git a/drivers/target/target_core_user.c b/drivers/target/target_core_user.c
index 2940559c3086..9fd64259904c 100644
--- a/drivers/target/target_core_user.c
+++ b/drivers/target/target_core_user.c
@@ -1928,7 +1928,7 @@ static int tcmu_mmap(struct uio_info *info, struct vm_area_struct *vma)
{
struct tcmu_dev *udev = container_of(info, struct tcmu_dev, uio_info);

- vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
+ set_vm_flags(vma, VM_DONTEXPAND | VM_DONTDUMP);
vma->vm_ops = &tcmu_vm_ops;

vma->vm_private_data = udev;
diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c
index 43afbb7c5ab9..08802744f3b7 100644
--- a/drivers/uio/uio.c
+++ b/drivers/uio/uio.c
@@ -713,7 +713,7 @@ static const struct vm_operations_struct uio_logical_vm_ops = {

static int uio_mmap_logical(struct vm_area_struct *vma)
{
- vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
+ set_vm_flags(vma, VM_DONTEXPAND | VM_DONTDUMP);
vma->vm_ops = &uio_logical_vm_ops;
return 0;
}
diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c
index 837f3e57f580..d9aefa259883 100644
--- a/drivers/usb/core/devio.c
+++ b/drivers/usb/core/devio.c
@@ -279,8 +279,7 @@ static int usbdev_mmap(struct file *file, struct vm_area_struct *vma)
}
}

- vma->vm_flags |= VM_IO;
- vma->vm_flags |= (VM_DONTEXPAND | VM_DONTDUMP);
+ set_vm_flags(vma, VM_IO | VM_DONTEXPAND | VM_DONTDUMP);
vma->vm_ops = &usbdev_vm_ops;
vma->vm_private_data = usbm;

diff --git a/drivers/usb/mon/mon_bin.c b/drivers/usb/mon/mon_bin.c
index 094e812e9e69..9b2d48a65fdf 100644
--- a/drivers/usb/mon/mon_bin.c
+++ b/drivers/usb/mon/mon_bin.c
@@ -1272,8 +1272,7 @@ static int mon_bin_mmap(struct file *filp, struct vm_area_struct *vma)
if (vma->vm_flags & VM_WRITE)
return -EPERM;

- vma->vm_flags &= ~VM_MAYWRITE;
- vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
+ mod_vm_flags(vma, VM_DONTEXPAND | VM_DONTDUMP, VM_MAYWRITE);
vma->vm_private_data = filp->private_data;
mon_bin_vma_open(vma);
return 0;
diff --git a/drivers/vdpa/vdpa_user/iova_domain.c b/drivers/vdpa/vdpa_user/iova_domain.c
index e682bc7ee6c9..39dcce2e455b 100644
--- a/drivers/vdpa/vdpa_user/iova_domain.c
+++ b/drivers/vdpa/vdpa_user/iova_domain.c
@@ -512,7 +512,7 @@ static int vduse_domain_mmap(struct file *file, struct vm_area_struct *vma)
{
struct vduse_iova_domain *domain = file->private_data;

- vma->vm_flags |= VM_DONTDUMP | VM_DONTEXPAND;
+ set_vm_flags(vma, VM_DONTDUMP | VM_DONTEXPAND);
vma->vm_private_data = domain;
vma->vm_ops = &vduse_domain_mmap_ops;

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 26a541cc64d1..86eb3fc9ffb4 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1799,7 +1799,7 @@ int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma
* See remap_pfn_range(), called from vfio_pci_fault() but we can't
* change vm_flags within the fault handler. Set them now.
*/
- vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP;
+ set_vm_flags(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP);
vma->vm_ops = &vfio_pci_mmap_ops;

return 0;
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index ec32f785dfde..7b81994a7d02 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -1315,7 +1315,7 @@ static int vhost_vdpa_mmap(struct file *file, struct vm_area_struct *vma)
if (vma->vm_end - vma->vm_start != notify.size)
return -ENOTSUPP;

- vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP;
+ set_vm_flags(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP);
vma->vm_ops = &vhost_vdpa_vm_ops;
return 0;
}
diff --git a/drivers/video/fbdev/68328fb.c b/drivers/video/fbdev/68328fb.c
index 7db03ed77c76..a794a740af10 100644
--- a/drivers/video/fbdev/68328fb.c
+++ b/drivers/video/fbdev/68328fb.c
@@ -391,7 +391,7 @@ static int mc68x328fb_mmap(struct fb_info *info, struct vm_area_struct *vma)
#ifndef MMU
/* this is uClinux (no MMU) specific code */

- vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
+ set_vm_flags(vma, VM_DONTEXPAND | VM_DONTDUMP);
vma->vm_start = videomemory;

return 0;
diff --git a/drivers/video/fbdev/core/fb_defio.c b/drivers/video/fbdev/core/fb_defio.c
index c730253ab85c..af0bfaa2d014 100644
--- a/drivers/video/fbdev/core/fb_defio.c
+++ b/drivers/video/fbdev/core/fb_defio.c
@@ -232,9 +232,9 @@ static const struct address_space_operations fb_deferred_io_aops = {
int fb_deferred_io_mmap(struct fb_info *info, struct vm_area_struct *vma)
{
vma->vm_ops = &fb_deferred_io_vm_ops;
- vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
+ set_vm_flags(vma, VM_DONTEXPAND | VM_DONTDUMP);
if (!(info->flags & FBINFO_VIRTFB))
- vma->vm_flags |= VM_IO;
+ set_vm_flags(vma, VM_IO);
vma->vm_private_data = info;
return 0;
}
diff --git a/drivers/xen/gntalloc.c b/drivers/xen/gntalloc.c
index a15729beb9d1..ee4a8958dc68 100644
--- a/drivers/xen/gntalloc.c
+++ b/drivers/xen/gntalloc.c
@@ -525,7 +525,7 @@ static int gntalloc_mmap(struct file *filp, struct vm_area_struct *vma)

vma->vm_private_data = vm_priv;

- vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
+ set_vm_flags(vma, VM_DONTEXPAND | VM_DONTDUMP);

vma->vm_ops = &gntalloc_vmops;

diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index 4d9a3050de6a..6d5bb1ebb661 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -1055,10 +1055,10 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)

vma->vm_ops = &gntdev_vmops;

- vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP | VM_MIXEDMAP;
+ set_vm_flags(vma, VM_DONTEXPAND | VM_DONTDUMP | VM_MIXEDMAP);

if (use_ptemod)
- vma->vm_flags |= VM_DONTCOPY;
+ set_vm_flags(vma, VM_DONTCOPY);

vma->vm_private_data = map;
if (map->flags) {
diff --git a/drivers/xen/privcmd-buf.c b/drivers/xen/privcmd-buf.c
index dd5bbb6e1b6b..037547918630 100644
--- a/drivers/xen/privcmd-buf.c
+++ b/drivers/xen/privcmd-buf.c
@@ -156,7 +156,7 @@ static int privcmd_buf_mmap(struct file *file, struct vm_area_struct *vma)
vma_priv->file_priv = file_priv;
vma_priv->users = 1;

- vma->vm_flags |= VM_IO | VM_DONTEXPAND;
+ set_vm_flags(vma, VM_IO | VM_DONTEXPAND);
vma->vm_ops = &privcmd_buf_vm_ops;
vma->vm_private_data = vma_priv;

diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index 1edf45ee9890..4c8cfc6f86d8 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -934,8 +934,8 @@ static int privcmd_mmap(struct file *file, struct vm_area_struct *vma)
{
/* DONTCOPY is essential for Xen because copy_page_range doesn't know
* how to recreate these mappings */
- vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTCOPY |
- VM_DONTEXPAND | VM_DONTDUMP;
+ set_vm_flags(vma, VM_IO | VM_PFNMAP | VM_DONTCOPY |
+ VM_DONTEXPAND | VM_DONTDUMP);
vma->vm_ops = &privcmd_vm_ops;
vma->vm_private_data = NULL;

diff --git a/fs/aio.c b/fs/aio.c
index 650cd795aa7e..4106b3209e5e 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -392,7 +392,7 @@ static const struct vm_operations_struct aio_ring_vm_ops = {

static int aio_ring_mmap(struct file *file, struct vm_area_struct *vma)
{
- vma->vm_flags |= VM_DONTEXPAND;
+ set_vm_flags(vma, VM_DONTEXPAND);
vma->vm_ops = &aio_ring_vm_ops;
return 0;
}
diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c
index 50e4e060db68..aa8695e685aa 100644
--- a/fs/cramfs/inode.c
+++ b/fs/cramfs/inode.c
@@ -408,7 +408,7 @@ static int cramfs_physmem_mmap(struct file *file, struct vm_area_struct *vma)
* unpopulated ptes via cramfs_read_folio().
*/
int i;
- vma->vm_flags |= VM_MIXEDMAP;
+ set_vm_flags(vma, VM_MIXEDMAP);
for (i = 0; i < pages && !ret; i++) {
vm_fault_t vmf;
unsigned long off = i * PAGE_SIZE;
diff --git a/fs/erofs/data.c b/fs/erofs/data.c
index f57f921683d7..e6413ced2bb1 100644
--- a/fs/erofs/data.c
+++ b/fs/erofs/data.c
@@ -429,7 +429,7 @@ static int erofs_file_mmap(struct file *file, struct vm_area_struct *vma)
return -EINVAL;

vma->vm_ops = &erofs_dax_vm_ops;
- vma->vm_flags |= VM_HUGEPAGE;
+ set_vm_flags(vma, VM_HUGEPAGE);
return 0;
}
#else
diff --git a/fs/exec.c b/fs/exec.c
index c0df813d2b45..ed499e850970 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -270,7 +270,7 @@ static int __bprm_mm_init(struct linux_binprm *bprm)
BUILD_BUG_ON(VM_STACK_FLAGS & VM_STACK_INCOMPLETE_SETUP);
vma->vm_end = STACK_TOP_MAX;
vma->vm_start = vma->vm_end - PAGE_SIZE;
- vma->vm_flags = VM_SOFTDIRTY | VM_STACK_FLAGS | VM_STACK_INCOMPLETE_SETUP;
+ init_vm_flags(vma, VM_SOFTDIRTY | VM_STACK_FLAGS | VM_STACK_INCOMPLETE_SETUP);
vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);

err = insert_vm_struct(mm, vma);
@@ -834,7 +834,7 @@ int setup_arg_pages(struct linux_binprm *bprm,
}

/* mprotect_fixup is overkill to remove the temporary stack flags */
- vma->vm_flags &= ~VM_STACK_INCOMPLETE_SETUP;
+ clear_vm_flags(vma, VM_STACK_INCOMPLETE_SETUP);

stack_expand = 131072UL; /* randomly 32*4k (or 2*64k) pages */
stack_size = vma->vm_end - vma->vm_start;
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 7ac0a81bd371..baeb385b07c7 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -801,7 +801,7 @@ static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
file_accessed(file);
if (IS_DAX(file_inode(file))) {
vma->vm_ops = &ext4_dax_vm_ops;
- vma->vm_flags |= VM_HUGEPAGE;
+ set_vm_flags(vma, VM_HUGEPAGE);
} else {
vma->vm_ops = &ext4_file_vm_ops;
}
diff --git a/fs/fuse/dax.c b/fs/fuse/dax.c
index e23e802a8013..599969edc869 100644
--- a/fs/fuse/dax.c
+++ b/fs/fuse/dax.c
@@ -860,7 +860,7 @@ int fuse_dax_mmap(struct file *file, struct vm_area_struct *vma)
{
file_accessed(file);
vma->vm_ops = &fuse_dax_vm_ops;
- vma->vm_flags |= VM_MIXEDMAP | VM_HUGEPAGE;
+ set_vm_flags(vma, VM_MIXEDMAP | VM_HUGEPAGE);
return 0;
}

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 48f1a8ad2243..b0f59d8a7b09 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -132,7 +132,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
* way when do_mmap unwinds (may be important on powerpc
* and ia64).
*/
- vma->vm_flags |= VM_HUGETLB | VM_DONTEXPAND;
+ set_vm_flags(vma, VM_HUGETLB | VM_DONTEXPAND);
vma->vm_ops = &hugetlb_vm_ops;

ret = seal_check_future_write(info->seals, vma);
@@ -811,7 +811,7 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
* as input to create an allocation policy.
*/
vma_init(&pseudo_vma, mm);
- pseudo_vma.vm_flags = (VM_HUGETLB | VM_MAYSHARE | VM_SHARED);
+ init_vm_flags(&pseudo_vma, VM_HUGETLB | VM_MAYSHARE | VM_SHARED);
pseudo_vma.vm_file = file;

for (index = start; index < end; index++) {
diff --git a/fs/orangefs/file.c b/fs/orangefs/file.c
index 167fa43b24f9..0f668db6bcf3 100644
--- a/fs/orangefs/file.c
+++ b/fs/orangefs/file.c
@@ -389,8 +389,7 @@ static int orangefs_file_mmap(struct file *file, struct vm_area_struct *vma)
"orangefs_file_mmap: called on %pD\n", file);

/* set the sequential readahead hint */
- vma->vm_flags |= VM_SEQ_READ;
- vma->vm_flags &= ~VM_RAND_READ;
+ mod_vm_flags(vma, VM_SEQ_READ, VM_RAND_READ);

file_accessed(file);
vma->vm_ops = &orangefs_file_vm_ops;
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index f937c4cd0214..9018645a359c 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1301,7 +1301,7 @@ static ssize_t clear_refs_write(struct file *file, const char __user *buf,
for_each_vma(vmi, vma) {
if (!(vma->vm_flags & VM_SOFTDIRTY))
continue;
- vma->vm_flags &= ~VM_SOFTDIRTY;
+ clear_vm_flags(vma, VM_SOFTDIRTY);
vma_set_page_prot(vma);
}

diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
index 09a81e4b1273..858e4e804f85 100644
--- a/fs/proc/vmcore.c
+++ b/fs/proc/vmcore.c
@@ -582,8 +582,7 @@ static int mmap_vmcore(struct file *file, struct vm_area_struct *vma)
if (vma->vm_flags & (VM_WRITE | VM_EXEC))
return -EPERM;

- vma->vm_flags &= ~(VM_MAYWRITE | VM_MAYEXEC);
- vma->vm_flags |= VM_MIXEDMAP;
+ mod_vm_flags(vma, VM_MIXEDMAP, VM_MAYWRITE | VM_MAYEXEC);
vma->vm_ops = &vmcore_mmap_ops;

len = 0;
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index f3c75c6222de..40030059db53 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -113,7 +113,7 @@ static void userfaultfd_set_vm_flags(struct vm_area_struct *vma,
{
const bool uffd_wp_changed = (vma->vm_flags ^ flags) & VM_UFFD_WP;

- vma->vm_flags = flags;
+ reset_vm_flags(vma, flags);
/*
* For shared mappings, we want to enable writenotify while
* userfaultfd-wp is enabled (see vma_wants_writenotify()). We'll simply
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 595a5bcf46b9..bf777fed0dd4 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1429,7 +1429,7 @@ xfs_file_mmap(
file_accessed(file);
vma->vm_ops = &xfs_file_vm_ops;
if (IS_DAX(inode))
- vma->vm_flags |= VM_HUGEPAGE;
+ set_vm_flags(vma, VM_HUGEPAGE);
return 0;
}

diff --git a/include/linux/mm.h b/include/linux/mm.h
index cccbc2811827..1ab5f73360f2 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3652,7 +3652,7 @@ static inline int seal_check_future_write(int seals, struct vm_area_struct *vma)
* VM_MAYWRITE as we still want them to be COW-writable.
*/
if (vma->vm_flags & VM_SHARED)
- vma->vm_flags &= ~(VM_MAYWRITE);
+ clear_vm_flags(vma, VM_MAYWRITE);
}

return 0;
diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c
index 80f4b4d88aaf..d2c967cc2873 100644
--- a/kernel/bpf/ringbuf.c
+++ b/kernel/bpf/ringbuf.c
@@ -269,7 +269,7 @@ static int ringbuf_map_mmap_kern(struct bpf_map *map, struct vm_area_struct *vma
if (vma->vm_pgoff != 0 || vma->vm_end - vma->vm_start != PAGE_SIZE)
return -EPERM;
} else {
- vma->vm_flags &= ~VM_MAYWRITE;
+ clear_vm_flags(vma, VM_MAYWRITE);
}
/* remap_vmalloc_range() checks size and offset constraints */
return remap_vmalloc_range(vma, rb_map->rb,
@@ -290,7 +290,7 @@ static int ringbuf_map_mmap_user(struct bpf_map *map, struct vm_area_struct *vma
*/
return -EPERM;
} else {
- vma->vm_flags &= ~VM_MAYWRITE;
+ clear_vm_flags(vma, VM_MAYWRITE);
}
/* remap_vmalloc_range() checks size and offset constraints */
return remap_vmalloc_range(vma, rb_map->rb, vma->vm_pgoff + RINGBUF_PGOFF);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 64131f88c553..db19094c7ac7 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -882,10 +882,10 @@ static int bpf_map_mmap(struct file *filp, struct vm_area_struct *vma)
/* set default open/close callbacks */
vma->vm_ops = &bpf_map_default_vmops;
vma->vm_private_data = map;
- vma->vm_flags &= ~VM_MAYEXEC;
+ clear_vm_flags(vma, VM_MAYEXEC);
if (!(vma->vm_flags & VM_WRITE))
/* disallow re-mapping with PROT_WRITE */
- vma->vm_flags &= ~VM_MAYWRITE;
+ clear_vm_flags(vma, VM_MAYWRITE);

err = map->ops->map_mmap(map, vma);
if (err)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index d56328e5080e..6745460dcf49 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6573,7 +6573,7 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)
* Since pinned accounting is per vm we cannot allow fork() to copy our
* vma.
*/
- vma->vm_flags |= VM_DONTCOPY | VM_DONTEXPAND | VM_DONTDUMP;
+ set_vm_flags(vma, VM_DONTCOPY | VM_DONTEXPAND | VM_DONTDUMP);
vma->vm_ops = &perf_mmap_vmops;

if (event->pmu->event_mapped)
diff --git a/kernel/kcov.c b/kernel/kcov.c
index e5cd09fd8a05..27fc1e26e1e1 100644
--- a/kernel/kcov.c
+++ b/kernel/kcov.c
@@ -489,7 +489,7 @@ static int kcov_mmap(struct file *filep, struct vm_area_struct *vma)
goto exit;
}
spin_unlock_irqrestore(&kcov->lock, flags);
- vma->vm_flags |= VM_DONTEXPAND;
+ set_vm_flags(vma, VM_DONTEXPAND);
for (off = 0; off < size; off += PAGE_SIZE) {
page = vmalloc_to_page(kcov->area + off);
res = vm_insert_page(vma, vma->vm_start + off, page);
diff --git a/kernel/relay.c b/kernel/relay.c
index ef12532168d9..085aa8707bc2 100644
--- a/kernel/relay.c
+++ b/kernel/relay.c
@@ -91,7 +91,7 @@ static int relay_mmap_buf(struct rchan_buf *buf, struct vm_area_struct *vma)
return -EINVAL;

vma->vm_ops = &relay_file_mmap_ops;
- vma->vm_flags |= VM_DONTEXPAND;
+ set_vm_flags(vma, VM_DONTEXPAND);
vma->vm_private_data = buf;

return 0;
diff --git a/mm/madvise.c b/mm/madvise.c
index 7db6622f8293..74941a9784b4 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -176,7 +176,7 @@ static int madvise_update_vma(struct vm_area_struct *vma,
/*
* vm_flags is protected by the mmap_lock held in write mode.
*/
- vma->vm_flags = new_flags;
+ reset_vm_flags(vma, new_flags);
if (!vma->vm_file || vma_is_anon_shmem(vma)) {
error = replace_anon_vma_name(vma, anon_name);
if (error)
diff --git a/mm/memory.c b/mm/memory.c
index ec833a2e0601..d6902065e558 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1928,7 +1928,7 @@ int vm_insert_pages(struct vm_area_struct *vma, unsigned long addr,
if (!(vma->vm_flags & VM_MIXEDMAP)) {
BUG_ON(mmap_read_trylock(vma->vm_mm));
BUG_ON(vma->vm_flags & VM_PFNMAP);
- vma->vm_flags |= VM_MIXEDMAP;
+ set_vm_flags(vma, VM_MIXEDMAP);
}
/* Defer page refcount checking till we're about to map that page. */
return insert_pages(vma, addr, pages, num, vma->vm_page_prot);
@@ -1986,7 +1986,7 @@ int vm_insert_page(struct vm_area_struct *vma, unsigned long addr,
if (!(vma->vm_flags & VM_MIXEDMAP)) {
BUG_ON(mmap_read_trylock(vma->vm_mm));
BUG_ON(vma->vm_flags & VM_PFNMAP);
- vma->vm_flags |= VM_MIXEDMAP;
+ set_vm_flags(vma, VM_MIXEDMAP);
}
return insert_page(vma, addr, page, vma->vm_page_prot);
}
@@ -2452,7 +2452,7 @@ int remap_pfn_range_notrack(struct vm_area_struct *vma, unsigned long addr,
vma->vm_pgoff = pfn;
}

- vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP;
+ set_vm_flags(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP);

BUG_ON(addr >= end);
pfn -= addr >> PAGE_SHIFT;
diff --git a/mm/mlock.c b/mm/mlock.c
index 5c4fff93cd6b..5b2431221b4d 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -380,7 +380,7 @@ static void mlock_vma_pages_range(struct vm_area_struct *vma,
*/
if (newflags & VM_LOCKED)
newflags |= VM_IO;
- WRITE_ONCE(vma->vm_flags, newflags);
+ reset_vm_flags(vma, newflags);

lru_add_drain();
walk_page_range(vma->vm_mm, start, end, &mlock_walk_ops, NULL);
@@ -388,7 +388,7 @@ static void mlock_vma_pages_range(struct vm_area_struct *vma,

if (newflags & VM_IO) {
newflags &= ~VM_IO;
- WRITE_ONCE(vma->vm_flags, newflags);
+ reset_vm_flags(vma, newflags);
}
}

@@ -457,7 +457,7 @@ static int mlock_fixup(struct vma_iterator *vmi, struct vm_area_struct *vma,

if ((newflags & VM_LOCKED) && (oldflags & VM_LOCKED)) {
/* No work to do, and mlocking twice would be wrong */
- vma->vm_flags = newflags;
+ reset_vm_flags(vma, newflags);
} else {
mlock_vma_pages_range(vma, start, end, newflags);
}
diff --git a/mm/mmap.c b/mm/mmap.c
index 323bd253b25a..2c6e9072e6a8 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2558,7 +2558,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
vma_iter_set(&vmi, addr);
vma->vm_start = addr;
vma->vm_end = end;
- vma->vm_flags = vm_flags;
+ init_vm_flags(vma, vm_flags);
vma->vm_page_prot = vm_get_page_prot(vm_flags);
vma->vm_pgoff = pgoff;

@@ -2686,7 +2686,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
* then new mapped in-place (which must be aimed as
* a completely new data area).
*/
- vma->vm_flags |= VM_SOFTDIRTY;
+ set_vm_flags(vma, VM_SOFTDIRTY);

vma_set_page_prot(vma);

@@ -2911,7 +2911,7 @@ static int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
init_vma_prep(&vp, vma);
vma_prepare(&vp);
vma->vm_end = addr + len;
- vma->vm_flags |= VM_SOFTDIRTY;
+ set_vm_flags(vma, VM_SOFTDIRTY);
vma_iter_store(vmi, vma);

vma_complete(&vp, vmi, mm);
@@ -2928,7 +2928,7 @@ static int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
vma->vm_start = addr;
vma->vm_end = addr + len;
vma->vm_pgoff = addr >> PAGE_SHIFT;
- vma->vm_flags = flags;
+ init_vm_flags(vma, flags);
vma->vm_page_prot = vm_get_page_prot(flags);
if (vma_iter_store_gfp(vmi, vma, GFP_KERNEL))
goto mas_store_fail;
@@ -2940,7 +2940,7 @@ static int do_brk_flags(struct vma_iterator *vmi, struct vm_area_struct *vma,
mm->data_vm += len >> PAGE_SHIFT;
if (flags & VM_LOCKED)
mm->locked_vm += (len >> PAGE_SHIFT);
- vma->vm_flags |= VM_SOFTDIRTY;
+ set_vm_flags(vma, VM_SOFTDIRTY);
validate_mm(mm);
return 0;

diff --git a/mm/mprotect.c b/mm/mprotect.c
index cce6a0e58fb5..fba770d889ea 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -670,7 +670,7 @@ mprotect_fixup(struct vma_iterator *vmi, struct mmu_gather *tlb,
* vm_flags and vm_page_prot are protected by the mmap_lock
* held in write mode.
*/
- vma->vm_flags = newflags;
+ reset_vm_flags(vma, newflags);
if (vma_wants_manual_pte_write_upgrade(vma))
mm_cp_flags |= MM_CP_TRY_CHANGE_WRITABLE;
vma_set_page_prot(vma);
diff --git a/mm/mremap.c b/mm/mremap.c
index 35db9752cb6a..0f3c78e8eea5 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -662,7 +662,7 @@ static unsigned long move_vma(struct vm_area_struct *vma,

/* Conceal VM_ACCOUNT so old reservation is not undone */
if (vm_flags & VM_ACCOUNT && !(flags & MREMAP_DONTUNMAP)) {
- vma->vm_flags &= ~VM_ACCOUNT;
+ clear_vm_flags(vma, VM_ACCOUNT);
if (vma->vm_start < old_addr)
account_start = vma->vm_start;
if (vma->vm_end > old_addr + old_len)
@@ -718,12 +718,12 @@ static unsigned long move_vma(struct vm_area_struct *vma,
/* Restore VM_ACCOUNT if one or two pieces of vma left */
if (account_start) {
vma = vma_prev(&vmi);
- vma->vm_flags |= VM_ACCOUNT;
+ set_vm_flags(vma, VM_ACCOUNT);
}

if (account_end) {
vma = vma_next(&vmi);
- vma->vm_flags |= VM_ACCOUNT;
+ set_vm_flags(vma, VM_ACCOUNT);
}

return new_addr;
diff --git a/mm/nommu.c b/mm/nommu.c
index 9a166738909e..93d052b5a0c2 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -173,7 +173,7 @@ static void *__vmalloc_user_flags(unsigned long size, gfp_t flags)
mmap_write_lock(current->mm);
vma = find_vma(current->mm, (unsigned long)ret);
if (vma)
- vma->vm_flags |= VM_USERMAP;
+ set_vm_flags(vma, VM_USERMAP);
mmap_write_unlock(current->mm);
}

@@ -950,7 +950,8 @@ static int do_mmap_private(struct vm_area_struct *vma,

atomic_long_add(total, &mmap_pages_allocated);

- region->vm_flags = vma->vm_flags |= VM_MAPPED_COPY;
+ set_vm_flags(vma, VM_MAPPED_COPY);
+ region->vm_flags = vma->flags;
region->vm_start = (unsigned long) base;
region->vm_end = region->vm_start + len;
region->vm_top = region->vm_start + (total << PAGE_SHIFT);
@@ -1047,7 +1048,7 @@ unsigned long do_mmap(struct file *file,
region->vm_flags = vm_flags;
region->vm_pgoff = pgoff;

- vma->vm_flags = vm_flags;
+ init_vm_flags(vma, vm_flags);
vma->vm_pgoff = pgoff;

if (file) {
@@ -1111,7 +1112,7 @@ unsigned long do_mmap(struct file *file,
vma->vm_end = start + len;

if (pregion->vm_flags & VM_MAPPED_COPY)
- vma->vm_flags |= VM_MAPPED_COPY;
+ set_vm_flags(vma, VM_MAPPED_COPY);
else {
ret = do_mmap_shared_file(vma);
if (ret < 0) {
@@ -1601,7 +1602,7 @@ int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
if (addr != (pfn << PAGE_SHIFT))
return -EINVAL;

- vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP;
+ set_vm_flags(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP);
return 0;
}
EXPORT_SYMBOL(remap_pfn_range);
diff --git a/mm/secretmem.c b/mm/secretmem.c
index be3fff86ba00..236a1b6b4100 100644
--- a/mm/secretmem.c
+++ b/mm/secretmem.c
@@ -128,7 +128,7 @@ static int secretmem_mmap(struct file *file, struct vm_area_struct *vma)
if (mlock_future_check(vma->vm_mm, vma->vm_flags | VM_LOCKED, len))
return -EAGAIN;

- vma->vm_flags |= VM_LOCKED | VM_DONTDUMP;
+ set_vm_flags(vma, VM_LOCKED | VM_DONTDUMP);
vma->vm_ops = &secretmem_vm_ops;

return 0;
diff --git a/mm/shmem.c b/mm/shmem.c
index 9e1015cbad29..3d7fc7a979c6 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2304,7 +2304,7 @@ static int shmem_mmap(struct file *file, struct vm_area_struct *vma)
return ret;

/* arm64 - allow memory tagging on RAM-based files */
- vma->vm_flags |= VM_MTE_ALLOWED;
+ set_vm_flags(vma, VM_MTE_ALLOWED);

file_accessed(file);
/* This is anonymous shared memory if it is unlinked at the time of mmap */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index dfde5324e480..8fccba7aa514 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -3682,7 +3682,7 @@ int remap_vmalloc_range_partial(struct vm_area_struct *vma, unsigned long uaddr,
size -= PAGE_SIZE;
} while (size > 0);

- vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
+ set_vm_flags(vma, VM_DONTEXPAND | VM_DONTDUMP);

return 0;
}
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index f713c0422f0f..cfa2e8a92fcb 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1890,10 +1890,10 @@ int tcp_mmap(struct file *file, struct socket *sock,
{
if (vma->vm_flags & (VM_WRITE | VM_EXEC))
return -EPERM;
- vma->vm_flags &= ~(VM_MAYWRITE | VM_MAYEXEC);
+ clear_vm_flags(vma, VM_MAYWRITE | VM_MAYEXEC);

/* Instruct vm_insert_page() to not mmap_read_lock(mm) */
- vma->vm_flags |= VM_MIXEDMAP;
+ set_vm_flags(vma, VM_MIXEDMAP);

vma->vm_ops = &tcp_vm_ops;
return 0;
diff --git a/security/selinux/selinuxfs.c b/security/selinux/selinuxfs.c
index 0a6894cdc54d..9037deb5979e 100644
--- a/security/selinux/selinuxfs.c
+++ b/security/selinux/selinuxfs.c
@@ -262,7 +262,7 @@ static int sel_mmap_handle_status(struct file *filp,
if (vma->vm_flags & VM_WRITE)
return -EPERM;
/* disallow mprotect() turns it into writable */
- vma->vm_flags &= ~VM_MAYWRITE;
+ clear_vm_flags(vma, VM_MAYWRITE);

return remap_pfn_range(vma, vma->vm_start,
page_to_pfn(status),
@@ -506,13 +506,13 @@ static int sel_mmap_policy(struct file *filp, struct vm_area_struct *vma)
{
if (vma->vm_flags & VM_SHARED) {
/* do not allow mprotect to make mapping writable */
- vma->vm_flags &= ~VM_MAYWRITE;
+ clear_vm_flags(vma, VM_MAYWRITE);

if (vma->vm_flags & VM_WRITE)
return -EACCES;
}

- vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
+ set_vm_flags(vma, VM_DONTEXPAND | VM_DONTDUMP);
vma->vm_ops = &sel_mmap_policy_ops;

return 0;
diff --git a/sound/core/oss/pcm_oss.c b/sound/core/oss/pcm_oss.c
index ac2efeb63a39..52473e2acd07 100644
--- a/sound/core/oss/pcm_oss.c
+++ b/sound/core/oss/pcm_oss.c
@@ -2910,7 +2910,7 @@ static int snd_pcm_oss_mmap(struct file *file, struct vm_area_struct *area)
}
/* set VM_READ access as well to fix memset() routines that do
reads before writes (to improve performance) */
- area->vm_flags |= VM_READ;
+ set_vm_flags(area, VM_READ);
if (substream == NULL)
return -ENXIO;
runtime = substream->runtime;
diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c
index 9c122e757efe..f716bdb70afe 100644
--- a/sound/core/pcm_native.c
+++ b/sound/core/pcm_native.c
@@ -3675,8 +3675,9 @@ static int snd_pcm_mmap_status(struct snd_pcm_substream *substream, struct file
return -EINVAL;
area->vm_ops = &snd_pcm_vm_ops_status;
area->vm_private_data = substream;
- area->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
- area->vm_flags &= ~(VM_WRITE | VM_MAYWRITE);
+ mod_vm_flags(area, VM_DONTEXPAND | VM_DONTDUMP,
+ VM_WRITE | VM_MAYWRITE);
+
return 0;
}

@@ -3712,7 +3713,7 @@ static int snd_pcm_mmap_control(struct snd_pcm_substream *substream, struct file
return -EINVAL;
area->vm_ops = &snd_pcm_vm_ops_control;
area->vm_private_data = substream;
- area->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
+ set_vm_flags(area, VM_DONTEXPAND | VM_DONTDUMP);
return 0;
}

@@ -3828,7 +3829,7 @@ static const struct vm_operations_struct snd_pcm_vm_ops_data_fault = {
int snd_pcm_lib_default_mmap(struct snd_pcm_substream *substream,
struct vm_area_struct *area)
{
- area->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
+ set_vm_flags(area, VM_DONTEXPAND | VM_DONTDUMP);
if (!substream->ops->page &&
!snd_dma_buffer_mmap(snd_pcm_get_dma_buf(substream), area))
return 0;
diff --git a/sound/soc/pxa/mmp-sspa.c b/sound/soc/pxa/mmp-sspa.c
index fb5a4390443f..fdd72d9bb46c 100644
--- a/sound/soc/pxa/mmp-sspa.c
+++ b/sound/soc/pxa/mmp-sspa.c
@@ -404,7 +404,7 @@ static int mmp_pcm_mmap(struct snd_soc_component *component,
struct snd_pcm_substream *substream,
struct vm_area_struct *vma)
{
- vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
+ set_vm_flags(vma, VM_DONTEXPAND | VM_DONTDUMP);
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
return remap_pfn_range(vma, vma->vm_start,
substream->dma_buffer.addr >> PAGE_SHIFT,
diff --git a/sound/usb/usx2y/us122l.c b/sound/usb/usx2y/us122l.c
index e558931cce16..b51db622a69b 100644
--- a/sound/usb/usx2y/us122l.c
+++ b/sound/usb/usx2y/us122l.c
@@ -224,9 +224,9 @@ static int usb_stream_hwdep_mmap(struct snd_hwdep *hw,
}

area->vm_ops = &usb_stream_hwdep_vm_ops;
- area->vm_flags |= VM_DONTDUMP;
+ set_vm_flags(area, VM_DONTDUMP);
if (!read)
- area->vm_flags |= VM_DONTEXPAND;
+ set_vm_flags(area, VM_DONTEXPAND);
area->vm_private_data = us122l;
atomic_inc(&us122l->mmap_count);
out:
diff --git a/sound/usb/usx2y/usX2Yhwdep.c b/sound/usb/usx2y/usX2Yhwdep.c
index c29da0341bc5..3abe6d891f98 100644
--- a/sound/usb/usx2y/usX2Yhwdep.c
+++ b/sound/usb/usx2y/usX2Yhwdep.c
@@ -61,7 +61,7 @@ static int snd_us428ctls_mmap(struct snd_hwdep *hw, struct file *filp, struct vm
}

area->vm_ops = &us428ctls_vm_ops;
- area->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
+ set_vm_flags(area, VM_DONTEXPAND | VM_DONTDUMP);
area->vm_private_data = hw->private_data;
return 0;
}
diff --git a/sound/usb/usx2y/usx2yhwdeppcm.c b/sound/usb/usx2y/usx2yhwdeppcm.c
index 767a227d54da..22ce93b2fb24 100644
--- a/sound/usb/usx2y/usx2yhwdeppcm.c
+++ b/sound/usb/usx2y/usx2yhwdeppcm.c
@@ -706,7 +706,7 @@ static int snd_usx2y_hwdep_pcm_mmap(struct snd_hwdep *hw, struct file *filp, str
return -ENODEV;

area->vm_ops = &snd_usx2y_hwdep_pcm_vm_ops;
- area->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
+ set_vm_flags(area, VM_DONTEXPAND | VM_DONTDUMP);
area->vm_private_data = hw->private_data;
return 0;
}
--
2.39.1


2023-01-25 23:36:38

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v3 6/7] mm: introduce mod_vm_flags_nolock and use it in untrack_pfn

In cases when VMA flags are modified after VMA was isolated and mmap_lock
was downgraded, flags modifications would result in an assertion because
mmap write lock is not held.
Introduce mod_vm_flags_nolock to be used in such situation, when VMA is
not part of VMA tree and locking it is not required.
Pass a hint to untrack_pfn to conditionally use mod_vm_flags_nolock for
flags modification and to avoid assertion.

Signed-off-by: Suren Baghdasaryan <[email protected]>
---
arch/x86/mm/pat/memtype.c | 10 +++++++---
include/linux/mm.h | 16 +++++++++++++---
include/linux/pgtable.h | 5 +++--
mm/memory.c | 13 +++++++------
mm/memremap.c | 4 ++--
mm/mmap.c | 16 ++++++++++------
6 files changed, 42 insertions(+), 22 deletions(-)

diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
index ae9645c900fa..d8adc0b42cf2 100644
--- a/arch/x86/mm/pat/memtype.c
+++ b/arch/x86/mm/pat/memtype.c
@@ -1046,7 +1046,7 @@ void track_pfn_insert(struct vm_area_struct *vma, pgprot_t *prot, pfn_t pfn)
* can be for the entire vma (in which case pfn, size are zero).
*/
void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
- unsigned long size)
+ unsigned long size, bool mm_wr_locked)
{
resource_size_t paddr;
unsigned long prot;
@@ -1065,8 +1065,12 @@ void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
size = vma->vm_end - vma->vm_start;
}
free_pfn_range(paddr, size);
- if (vma)
- clear_vm_flags(vma, VM_PAT);
+ if (vma) {
+ if (mm_wr_locked)
+ clear_vm_flags(vma, VM_PAT);
+ else
+ mod_vm_flags_nolock(vma, 0, VM_PAT);
+ }
}

/*
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 1ab5f73360f2..86bf043136f3 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -656,12 +656,22 @@ static inline void clear_vm_flags(struct vm_area_struct *vma,
ACCESS_PRIVATE(vma, __vm_flags) &= ~flags;
}

+/*
+ * Use only if VMA has been previously isolated, is not part of the VMA tree
+ * and therefore needs no locking.
+ */
+static inline void mod_vm_flags_nolock(struct vm_area_struct *vma,
+ vm_flags_t set, vm_flags_t clear)
+{
+ ACCESS_PRIVATE(vma, __vm_flags) |= set;
+ ACCESS_PRIVATE(vma, __vm_flags) &= ~clear;
+}
+
static inline void mod_vm_flags(struct vm_area_struct *vma,
vm_flags_t set, vm_flags_t clear)
{
mmap_assert_write_locked(vma->vm_mm);
- ACCESS_PRIVATE(vma, __vm_flags) |= set;
- ACCESS_PRIVATE(vma, __vm_flags) &= ~clear;
+ mod_vm_flags_nolock(vma, set, clear);
}

static inline void vma_set_anonymous(struct vm_area_struct *vma)
@@ -2087,7 +2097,7 @@ static inline void zap_vma_pages(struct vm_area_struct *vma)
}
void unmap_vmas(struct mmu_gather *tlb, struct maple_tree *mt,
struct vm_area_struct *start_vma, unsigned long start,
- unsigned long end);
+ unsigned long end, bool mm_wr_locked);

struct mmu_notifier_range;

diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 5fd45454c073..c63cd44777ec 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1185,7 +1185,8 @@ static inline int track_pfn_copy(struct vm_area_struct *vma)
* can be for the entire vma (in which case pfn, size are zero).
*/
static inline void untrack_pfn(struct vm_area_struct *vma,
- unsigned long pfn, unsigned long size)
+ unsigned long pfn, unsigned long size,
+ bool mm_wr_locked)
{
}

@@ -1203,7 +1204,7 @@ extern void track_pfn_insert(struct vm_area_struct *vma, pgprot_t *prot,
pfn_t pfn);
extern int track_pfn_copy(struct vm_area_struct *vma);
extern void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
- unsigned long size);
+ unsigned long size, bool mm_wr_locked);
extern void untrack_pfn_moved(struct vm_area_struct *vma);
#endif

diff --git a/mm/memory.c b/mm/memory.c
index d6902065e558..5b11b50e2c4a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1613,7 +1613,7 @@ void unmap_page_range(struct mmu_gather *tlb,
static void unmap_single_vma(struct mmu_gather *tlb,
struct vm_area_struct *vma, unsigned long start_addr,
unsigned long end_addr,
- struct zap_details *details)
+ struct zap_details *details, bool mm_wr_locked)
{
unsigned long start = max(vma->vm_start, start_addr);
unsigned long end;
@@ -1628,7 +1628,7 @@ static void unmap_single_vma(struct mmu_gather *tlb,
uprobe_munmap(vma, start, end);

if (unlikely(vma->vm_flags & VM_PFNMAP))
- untrack_pfn(vma, 0, 0);
+ untrack_pfn(vma, 0, 0, mm_wr_locked);

if (start != end) {
if (unlikely(is_vm_hugetlb_page(vma))) {
@@ -1675,7 +1675,7 @@ static void unmap_single_vma(struct mmu_gather *tlb,
*/
void unmap_vmas(struct mmu_gather *tlb, struct maple_tree *mt,
struct vm_area_struct *vma, unsigned long start_addr,
- unsigned long end_addr)
+ unsigned long end_addr, bool mm_wr_locked)
{
struct mmu_notifier_range range;
struct zap_details details = {
@@ -1689,7 +1689,8 @@ void unmap_vmas(struct mmu_gather *tlb, struct maple_tree *mt,
start_addr, end_addr);
mmu_notifier_invalidate_range_start(&range);
do {
- unmap_single_vma(tlb, vma, start_addr, end_addr, &details);
+ unmap_single_vma(tlb, vma, start_addr, end_addr, &details,
+ mm_wr_locked);
} while ((vma = mas_find(&mas, end_addr - 1)) != NULL);
mmu_notifier_invalidate_range_end(&range);
}
@@ -1723,7 +1724,7 @@ void zap_page_range_single(struct vm_area_struct *vma, unsigned long address,
* unmap 'address-end' not 'range.start-range.end' as range
* could have been expanded for hugetlb pmd sharing.
*/
- unmap_single_vma(&tlb, vma, address, end, details);
+ unmap_single_vma(&tlb, vma, address, end, details, false);
mmu_notifier_invalidate_range_end(&range);
tlb_finish_mmu(&tlb);
}
@@ -2492,7 +2493,7 @@ int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,

err = remap_pfn_range_notrack(vma, addr, pfn, size, prot);
if (err)
- untrack_pfn(vma, pfn, PAGE_ALIGN(size));
+ untrack_pfn(vma, pfn, PAGE_ALIGN(size), true);
return err;
}
EXPORT_SYMBOL(remap_pfn_range);
diff --git a/mm/memremap.c b/mm/memremap.c
index 08cbf54fe037..2f88f43d4a01 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -129,7 +129,7 @@ static void pageunmap_range(struct dev_pagemap *pgmap, int range_id)
}
mem_hotplug_done();

- untrack_pfn(NULL, PHYS_PFN(range->start), range_len(range));
+ untrack_pfn(NULL, PHYS_PFN(range->start), range_len(range), true);
pgmap_array_delete(range);
}

@@ -276,7 +276,7 @@ static int pagemap_range(struct dev_pagemap *pgmap, struct mhp_params *params,
if (!is_private)
kasan_remove_zero_shadow(__va(range->start), range_len(range));
err_kasan:
- untrack_pfn(NULL, PHYS_PFN(range->start), range_len(range));
+ untrack_pfn(NULL, PHYS_PFN(range->start), range_len(range), true);
err_pfn_remap:
pgmap_array_delete(range);
return error;
diff --git a/mm/mmap.c b/mm/mmap.c
index 2c6e9072e6a8..69d440997648 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -78,7 +78,7 @@ core_param(ignore_rlimit_data, ignore_rlimit_data, bool, 0644);
static void unmap_region(struct mm_struct *mm, struct maple_tree *mt,
struct vm_area_struct *vma, struct vm_area_struct *prev,
struct vm_area_struct *next, unsigned long start,
- unsigned long end);
+ unsigned long end, bool mm_wr_locked);

static pgprot_t vm_pgprot_modify(pgprot_t oldprot, unsigned long vm_flags)
{
@@ -2136,14 +2136,14 @@ static inline void remove_mt(struct mm_struct *mm, struct ma_state *mas)
static void unmap_region(struct mm_struct *mm, struct maple_tree *mt,
struct vm_area_struct *vma, struct vm_area_struct *prev,
struct vm_area_struct *next,
- unsigned long start, unsigned long end)
+ unsigned long start, unsigned long end, bool mm_wr_locked)
{
struct mmu_gather tlb;

lru_add_drain();
tlb_gather_mmu(&tlb, mm);
update_hiwater_rss(mm);
- unmap_vmas(&tlb, mt, vma, start, end);
+ unmap_vmas(&tlb, mt, vma, start, end, mm_wr_locked);
free_pgtables(&tlb, mt, vma, prev ? prev->vm_end : FIRST_USER_ADDRESS,
next ? next->vm_start : USER_PGTABLES_CEILING);
tlb_finish_mmu(&tlb);
@@ -2391,7 +2391,11 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
mmap_write_downgrade(mm);
}

- unmap_region(mm, &mt_detach, vma, prev, next, start, end);
+ /*
+ * We can free page tables without write-locking mmap_lock because VMAs
+ * were isolated before we downgraded mmap_lock.
+ */
+ unmap_region(mm, &mt_detach, vma, prev, next, start, end, !downgrade);
/* Statistics and freeing VMAs */
mas_set(&mas_detach, start);
remove_mt(mm, &mas_detach);
@@ -2704,7 +2708,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,

/* Undo any partial mapping done by a device driver. */
unmap_region(mm, &mm->mm_mt, vma, prev, next, vma->vm_start,
- vma->vm_end);
+ vma->vm_end, true);
}
if (file && (vm_flags & VM_SHARED))
mapping_unmap_writable(file->f_mapping);
@@ -3031,7 +3035,7 @@ void exit_mmap(struct mm_struct *mm)
tlb_gather_mmu_fullmm(&tlb, mm);
/* update_hiwater_rss(mm) here? but nobody should be looking */
/* Use ULONG_MAX here to ensure all VMAs in the mm are unmapped */
- unmap_vmas(&tlb, &mm->mm_mt, vma, 0, ULONG_MAX);
+ unmap_vmas(&tlb, &mm->mm_mt, vma, 0, ULONG_MAX, false);
mmap_read_unlock(mm);

/*
--
2.39.1


2023-01-25 23:36:45

by Suren Baghdasaryan

[permalink] [raw]
Subject: [PATCH v3 7/7] mm: export dump_mm()

mmap_assert_write_locked() is used in vm_flags modifiers. Because
mmap_assert_write_locked() uses dump_mm() and vm_flags are sometimes
modified from inside a module, it's necessary to export dump_mm()
function.

Signed-off-by: Suren Baghdasaryan <[email protected]>
Acked-by: Michal Hocko <[email protected]>
---
mm/debug.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/mm/debug.c b/mm/debug.c
index 9d3d893dc7f4..96d594e16292 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -215,6 +215,7 @@ void dump_mm(const struct mm_struct *mm)
mm->def_flags, &mm->def_flags
);
}
+EXPORT_SYMBOL(dump_mm);

static bool page_init_poisoning __read_mostly = true;

--
2.39.1


2023-01-26 00:22:09

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v3 1/7] kernel/fork: convert vma assignment to a memcpy

On Wed, 25 Jan 2023 15:35:48 -0800 Suren Baghdasaryan <[email protected]> wrote:

> Convert vma assignment in vm_area_dup() to a memcpy() to prevent compiler
> errors when we add a const modifier to vma->vm_flags.
>
> ...
>
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -482,7 +482,7 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struct *orig)
> * orig->shared.rb may be modified concurrently, but the clone
> * will be reinitialized.
> */
> - *new = data_race(*orig);
> + memcpy(new, orig, sizeof(*new));

The data_race() removal is unchangelogged?

> INIT_LIST_HEAD(&new->anon_vma_chain);
> dup_anon_vma_name(orig, new);
> }


2023-01-26 00:24:27

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v3 2/7] mm: introduce vma->vm_flags wrapper functions

On Wed, 25 Jan 2023 15:35:49 -0800 Suren Baghdasaryan <[email protected]> wrote:

> vm_flags are among VMA attributes which affect decisions like VMA merging
> and splitting. Therefore all vm_flags modifications are performed after
> taking exclusive mmap_lock to prevent vm_flags updates racing with such
> operations. Introduce modifier functions for vm_flags to be used whenever
> flags are updated. This way we can better check and control correct
> locking behavior during these updates.
>
> ...
>
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> +static inline void init_vm_flags(struct vm_area_struct *vma,
> +static inline void reset_vm_flags(struct vm_area_struct *vma,
> +static inline void set_vm_flags(struct vm_area_struct *vma,
> +static inline void clear_vm_flags(struct vm_area_struct *vma,
> +static inline void mod_vm_flags(struct vm_area_struct *vma,

vm_flags_init(), vm_flags_reset(), etc?

This would be more idiomatic and I do think the most-significant-first
naming style is preferable.


2023-01-26 00:28:34

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v3 2/7] mm: introduce vma->vm_flags wrapper functions

On Wed, 25 Jan 2023 15:35:49 -0800 Suren Baghdasaryan <[email protected]> wrote:

> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -491,7 +491,15 @@ struct vm_area_struct {
> * See vmf_insert_mixed_prot() for discussion.
> */
> pgprot_t vm_page_prot;
> - unsigned long vm_flags; /* Flags, see mm.h. */
> +
> + /*
> + * Flags, see mm.h.
> + * To modify use {init|reset|set|clear|mod}_vm_flags() functions.
> + */
> + union {
> + const vm_flags_t vm_flags;
> + vm_flags_t __private __vm_flags;
> + };

Typically when making a change like this we'll rename the affected
field/variable/function/etc. This will reliably and deliberately break
unconverted usage sites.

This const trick will get us partway there, by breaking setters. But
renaming it will break both setters and getters.


2023-01-26 00:50:19

by Suren Baghdasaryan

[permalink] [raw]
Subject: Re: [PATCH v3 1/7] kernel/fork: convert vma assignment to a memcpy

On Wed, Jan 25, 2023 at 4:22 PM Andrew Morton <[email protected]> wrote:
>
> On Wed, 25 Jan 2023 15:35:48 -0800 Suren Baghdasaryan <[email protected]> wrote:
>
> > Convert vma assignment in vm_area_dup() to a memcpy() to prevent compiler
> > errors when we add a const modifier to vma->vm_flags.
> >
> > ...
> >
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -482,7 +482,7 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struct *orig)
> > * orig->shared.rb may be modified concurrently, but the clone
> > * will be reinitialized.
> > */
> > - *new = data_race(*orig);
> > + memcpy(new, orig, sizeof(*new));
>
> The data_race() removal is unchangelogged?

True. I'll add a note in the changelog about that. Ideally I would
like to preserve it but I could not find a way to do that.

>
> > INIT_LIST_HEAD(&new->anon_vma_chain);
> > dup_anon_vma_name(orig, new);
> > }
>

2023-01-26 00:52:01

by Suren Baghdasaryan

[permalink] [raw]
Subject: Re: [PATCH v3 2/7] mm: introduce vma->vm_flags wrapper functions

On Wed, Jan 25, 2023 at 4:24 PM Andrew Morton <[email protected]> wrote:
>
> On Wed, 25 Jan 2023 15:35:49 -0800 Suren Baghdasaryan <[email protected]> wrote:
>
> > vm_flags are among VMA attributes which affect decisions like VMA merging
> > and splitting. Therefore all vm_flags modifications are performed after
> > taking exclusive mmap_lock to prevent vm_flags updates racing with such
> > operations. Introduce modifier functions for vm_flags to be used whenever
> > flags are updated. This way we can better check and control correct
> > locking behavior during these updates.
> >
> > ...
> >
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > +static inline void init_vm_flags(struct vm_area_struct *vma,
> > +static inline void reset_vm_flags(struct vm_area_struct *vma,
> > +static inline void set_vm_flags(struct vm_area_struct *vma,
> > +static inline void clear_vm_flags(struct vm_area_struct *vma,
> > +static inline void mod_vm_flags(struct vm_area_struct *vma,
>
> vm_flags_init(), vm_flags_reset(), etc?
>
> This would be more idiomatic and I do think the most-significant-first
> naming style is preferable.

Thanks for the suggestion! I will rename them in the next version.

>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
>

2023-01-26 00:56:34

by Suren Baghdasaryan

[permalink] [raw]
Subject: Re: [PATCH v3 2/7] mm: introduce vma->vm_flags wrapper functions

On Wed, Jan 25, 2023 at 4:28 PM Andrew Morton <[email protected]> wrote:
>
> On Wed, 25 Jan 2023 15:35:49 -0800 Suren Baghdasaryan <[email protected]> wrote:
>
> > --- a/include/linux/mm_types.h
> > +++ b/include/linux/mm_types.h
> > @@ -491,7 +491,15 @@ struct vm_area_struct {
> > * See vmf_insert_mixed_prot() for discussion.
> > */
> > pgprot_t vm_page_prot;
> > - unsigned long vm_flags; /* Flags, see mm.h. */
> > +
> > + /*
> > + * Flags, see mm.h.
> > + * To modify use {init|reset|set|clear|mod}_vm_flags() functions.
> > + */
> > + union {
> > + const vm_flags_t vm_flags;
> > + vm_flags_t __private __vm_flags;
> > + };
>
> Typically when making a change like this we'll rename the affected
> field/variable/function/etc. This will reliably and deliberately break
> unconverted usage sites.
>
> This const trick will get us partway there, by breaking setters. But
> renaming it will break both setters and getters.

My intent here is to break setters but to allow getters to keep
reading vma->vm_flags directly. We could provide get_vm_flags() and
convert all getters as well but it would introduce a huge additional
churn (800+ hits) with no obvious benefits, I think. Does that clarify
the intent of this trick?

>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
>

2023-01-26 01:35:10

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v3 1/7] kernel/fork: convert vma assignment to a memcpy

On Wed, 25 Jan 2023 16:50:01 -0800 Suren Baghdasaryan <[email protected]> wrote:

> On Wed, Jan 25, 2023 at 4:22 PM Andrew Morton <[email protected]> wrote:
> >
> > On Wed, 25 Jan 2023 15:35:48 -0800 Suren Baghdasaryan <[email protected]> wrote:
> >
> > > Convert vma assignment in vm_area_dup() to a memcpy() to prevent compiler
> > > errors when we add a const modifier to vma->vm_flags.
> > >
> > > ...
> > >
> > > --- a/kernel/fork.c
> > > +++ b/kernel/fork.c
> > > @@ -482,7 +482,7 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struct *orig)
> > > * orig->shared.rb may be modified concurrently, but the clone
> > > * will be reinitialized.
> > > */
> > > - *new = data_race(*orig);
> > > + memcpy(new, orig, sizeof(*new));
> >
> > The data_race() removal is unchangelogged?
>
> True. I'll add a note in the changelog about that. Ideally I would
> like to preserve it but I could not find a way to do that.
>

Perhaps Paul can comment?

I wonder if KCSAN knows how to detect this race, given that it's now in
a memcpy. I assume so.

2023-01-26 08:00:22

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH v3 2/7] mm: introduce vma->vm_flags wrapper functions

On Wed 25-01-23 16:56:17, Suren Baghdasaryan wrote:
> On Wed, Jan 25, 2023 at 4:28 PM Andrew Morton <[email protected]> wrote:
> >
> > On Wed, 25 Jan 2023 15:35:49 -0800 Suren Baghdasaryan <[email protected]> wrote:
> >
> > > --- a/include/linux/mm_types.h
> > > +++ b/include/linux/mm_types.h
> > > @@ -491,7 +491,15 @@ struct vm_area_struct {
> > > * See vmf_insert_mixed_prot() for discussion.
> > > */
> > > pgprot_t vm_page_prot;
> > > - unsigned long vm_flags; /* Flags, see mm.h. */
> > > +
> > > + /*
> > > + * Flags, see mm.h.
> > > + * To modify use {init|reset|set|clear|mod}_vm_flags() functions.
> > > + */
> > > + union {
> > > + const vm_flags_t vm_flags;
> > > + vm_flags_t __private __vm_flags;
> > > + };
> >
> > Typically when making a change like this we'll rename the affected
> > field/variable/function/etc. This will reliably and deliberately break
> > unconverted usage sites.
> >
> > This const trick will get us partway there, by breaking setters. But
> > renaming it will break both setters and getters.
>
> My intent here is to break setters but to allow getters to keep
> reading vma->vm_flags directly. We could provide get_vm_flags() and
> convert all getters as well but it would introduce a huge additional
> churn (800+ hits) with no obvious benefits, I think. Does that clarify
> the intent of this trick?

I think that makes sense at this stage. The conversion patch is quite
large already. Maybe the final renaming could be done on top of
everything and patch generated by coccinele. The const union is a neat
trick but a potential lockdep assert is a nice plus as well. I wouldn't
see it as a top priority though.
--
Michal Hocko
SUSE Labs

2023-01-26 08:34:55

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH v3 2/7] mm: introduce vma->vm_flags wrapper functions

On Wed 25-01-23 15:35:49, Suren Baghdasaryan wrote:
> vm_flags are among VMA attributes which affect decisions like VMA merging
> and splitting. Therefore all vm_flags modifications are performed after
> taking exclusive mmap_lock to prevent vm_flags updates racing with such
> operations. Introduce modifier functions for vm_flags to be used whenever
> flags are updated. This way we can better check and control correct
> locking behavior during these updates.
>
> Signed-off-by: Suren Baghdasaryan <[email protected]>

with or without the proposed renaming (it seems we are not consistent
on that much even in the core kernel - e.g. init_rwsem vs. mutex_init)

Acked-by: Michal Hocko <[email protected]>

> ---
> include/linux/mm.h | 37 +++++++++++++++++++++++++++++++++++++
> include/linux/mm_types.h | 10 +++++++++-
> 2 files changed, 46 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index c2f62bdce134..bf16ddd544a5 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -627,6 +627,43 @@ static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *mm)
> INIT_LIST_HEAD(&vma->anon_vma_chain);
> }
>
> +/* Use when VMA is not part of the VMA tree and needs no locking */
> +static inline void init_vm_flags(struct vm_area_struct *vma,
> + vm_flags_t flags)
> +{
> + ACCESS_PRIVATE(vma, __vm_flags) = flags;
> +}
> +
> +/* Use when VMA is part of the VMA tree and modifications need coordination */
> +static inline void reset_vm_flags(struct vm_area_struct *vma,
> + vm_flags_t flags)
> +{
> + mmap_assert_write_locked(vma->vm_mm);
> + init_vm_flags(vma, flags);
> +}
> +
> +static inline void set_vm_flags(struct vm_area_struct *vma,
> + vm_flags_t flags)
> +{
> + mmap_assert_write_locked(vma->vm_mm);
> + ACCESS_PRIVATE(vma, __vm_flags) |= flags;
> +}
> +
> +static inline void clear_vm_flags(struct vm_area_struct *vma,
> + vm_flags_t flags)
> +{
> + mmap_assert_write_locked(vma->vm_mm);
> + ACCESS_PRIVATE(vma, __vm_flags) &= ~flags;
> +}
> +
> +static inline void mod_vm_flags(struct vm_area_struct *vma,
> + vm_flags_t set, vm_flags_t clear)
> +{
> + mmap_assert_write_locked(vma->vm_mm);
> + ACCESS_PRIVATE(vma, __vm_flags) |= set;
> + ACCESS_PRIVATE(vma, __vm_flags) &= ~clear;
> +}
> +
> static inline void vma_set_anonymous(struct vm_area_struct *vma)
> {
> vma->vm_ops = NULL;
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 2d6d790d9bed..bccbd5896850 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -491,7 +491,15 @@ struct vm_area_struct {
> * See vmf_insert_mixed_prot() for discussion.
> */
> pgprot_t vm_page_prot;
> - unsigned long vm_flags; /* Flags, see mm.h. */
> +
> + /*
> + * Flags, see mm.h.
> + * To modify use {init|reset|set|clear|mod}_vm_flags() functions.
> + */
> + union {
> + const vm_flags_t vm_flags;
> + vm_flags_t __private __vm_flags;
> + };
>
> /*
> * For areas with an address space and backing store,
> --
> 2.39.1

--
Michal Hocko
SUSE Labs

2023-01-26 08:35:53

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH v3 6/7] mm: introduce mod_vm_flags_nolock and use it in untrack_pfn

On Wed 25-01-23 15:35:53, Suren Baghdasaryan wrote:
> In cases when VMA flags are modified after VMA was isolated and mmap_lock
> was downgraded, flags modifications would result in an assertion because
> mmap write lock is not held.
> Introduce mod_vm_flags_nolock to be used in such situation, when VMA is
> not part of VMA tree and locking it is not required.
> Pass a hint to untrack_pfn to conditionally use mod_vm_flags_nolock for
> flags modification and to avoid assertion.
>
> Signed-off-by: Suren Baghdasaryan <[email protected]>

Acked-by: Michal Hocko <[email protected]>
Thanks!

> ---
> arch/x86/mm/pat/memtype.c | 10 +++++++---
> include/linux/mm.h | 16 +++++++++++++---
> include/linux/pgtable.h | 5 +++--
> mm/memory.c | 13 +++++++------
> mm/memremap.c | 4 ++--
> mm/mmap.c | 16 ++++++++++------
> 6 files changed, 42 insertions(+), 22 deletions(-)
>
> diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
> index ae9645c900fa..d8adc0b42cf2 100644
> --- a/arch/x86/mm/pat/memtype.c
> +++ b/arch/x86/mm/pat/memtype.c
> @@ -1046,7 +1046,7 @@ void track_pfn_insert(struct vm_area_struct *vma, pgprot_t *prot, pfn_t pfn)
> * can be for the entire vma (in which case pfn, size are zero).
> */
> void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
> - unsigned long size)
> + unsigned long size, bool mm_wr_locked)
> {
> resource_size_t paddr;
> unsigned long prot;
> @@ -1065,8 +1065,12 @@ void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
> size = vma->vm_end - vma->vm_start;
> }
> free_pfn_range(paddr, size);
> - if (vma)
> - clear_vm_flags(vma, VM_PAT);
> + if (vma) {
> + if (mm_wr_locked)
> + clear_vm_flags(vma, VM_PAT);
> + else
> + mod_vm_flags_nolock(vma, 0, VM_PAT);
> + }
> }
>
> /*
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 1ab5f73360f2..86bf043136f3 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -656,12 +656,22 @@ static inline void clear_vm_flags(struct vm_area_struct *vma,
> ACCESS_PRIVATE(vma, __vm_flags) &= ~flags;
> }
>
> +/*
> + * Use only if VMA has been previously isolated, is not part of the VMA tree
> + * and therefore needs no locking.
> + */
> +static inline void mod_vm_flags_nolock(struct vm_area_struct *vma,
> + vm_flags_t set, vm_flags_t clear)
> +{
> + ACCESS_PRIVATE(vma, __vm_flags) |= set;
> + ACCESS_PRIVATE(vma, __vm_flags) &= ~clear;
> +}
> +
> static inline void mod_vm_flags(struct vm_area_struct *vma,
> vm_flags_t set, vm_flags_t clear)
> {
> mmap_assert_write_locked(vma->vm_mm);
> - ACCESS_PRIVATE(vma, __vm_flags) |= set;
> - ACCESS_PRIVATE(vma, __vm_flags) &= ~clear;
> + mod_vm_flags_nolock(vma, set, clear);
> }
>
> static inline void vma_set_anonymous(struct vm_area_struct *vma)
> @@ -2087,7 +2097,7 @@ static inline void zap_vma_pages(struct vm_area_struct *vma)
> }
> void unmap_vmas(struct mmu_gather *tlb, struct maple_tree *mt,
> struct vm_area_struct *start_vma, unsigned long start,
> - unsigned long end);
> + unsigned long end, bool mm_wr_locked);
>
> struct mmu_notifier_range;
>
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index 5fd45454c073..c63cd44777ec 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -1185,7 +1185,8 @@ static inline int track_pfn_copy(struct vm_area_struct *vma)
> * can be for the entire vma (in which case pfn, size are zero).
> */
> static inline void untrack_pfn(struct vm_area_struct *vma,
> - unsigned long pfn, unsigned long size)
> + unsigned long pfn, unsigned long size,
> + bool mm_wr_locked)
> {
> }
>
> @@ -1203,7 +1204,7 @@ extern void track_pfn_insert(struct vm_area_struct *vma, pgprot_t *prot,
> pfn_t pfn);
> extern int track_pfn_copy(struct vm_area_struct *vma);
> extern void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
> - unsigned long size);
> + unsigned long size, bool mm_wr_locked);
> extern void untrack_pfn_moved(struct vm_area_struct *vma);
> #endif
>
> diff --git a/mm/memory.c b/mm/memory.c
> index d6902065e558..5b11b50e2c4a 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -1613,7 +1613,7 @@ void unmap_page_range(struct mmu_gather *tlb,
> static void unmap_single_vma(struct mmu_gather *tlb,
> struct vm_area_struct *vma, unsigned long start_addr,
> unsigned long end_addr,
> - struct zap_details *details)
> + struct zap_details *details, bool mm_wr_locked)
> {
> unsigned long start = max(vma->vm_start, start_addr);
> unsigned long end;
> @@ -1628,7 +1628,7 @@ static void unmap_single_vma(struct mmu_gather *tlb,
> uprobe_munmap(vma, start, end);
>
> if (unlikely(vma->vm_flags & VM_PFNMAP))
> - untrack_pfn(vma, 0, 0);
> + untrack_pfn(vma, 0, 0, mm_wr_locked);
>
> if (start != end) {
> if (unlikely(is_vm_hugetlb_page(vma))) {
> @@ -1675,7 +1675,7 @@ static void unmap_single_vma(struct mmu_gather *tlb,
> */
> void unmap_vmas(struct mmu_gather *tlb, struct maple_tree *mt,
> struct vm_area_struct *vma, unsigned long start_addr,
> - unsigned long end_addr)
> + unsigned long end_addr, bool mm_wr_locked)
> {
> struct mmu_notifier_range range;
> struct zap_details details = {
> @@ -1689,7 +1689,8 @@ void unmap_vmas(struct mmu_gather *tlb, struct maple_tree *mt,
> start_addr, end_addr);
> mmu_notifier_invalidate_range_start(&range);
> do {
> - unmap_single_vma(tlb, vma, start_addr, end_addr, &details);
> + unmap_single_vma(tlb, vma, start_addr, end_addr, &details,
> + mm_wr_locked);
> } while ((vma = mas_find(&mas, end_addr - 1)) != NULL);
> mmu_notifier_invalidate_range_end(&range);
> }
> @@ -1723,7 +1724,7 @@ void zap_page_range_single(struct vm_area_struct *vma, unsigned long address,
> * unmap 'address-end' not 'range.start-range.end' as range
> * could have been expanded for hugetlb pmd sharing.
> */
> - unmap_single_vma(&tlb, vma, address, end, details);
> + unmap_single_vma(&tlb, vma, address, end, details, false);
> mmu_notifier_invalidate_range_end(&range);
> tlb_finish_mmu(&tlb);
> }
> @@ -2492,7 +2493,7 @@ int remap_pfn_range(struct vm_area_struct *vma, unsigned long addr,
>
> err = remap_pfn_range_notrack(vma, addr, pfn, size, prot);
> if (err)
> - untrack_pfn(vma, pfn, PAGE_ALIGN(size));
> + untrack_pfn(vma, pfn, PAGE_ALIGN(size), true);
> return err;
> }
> EXPORT_SYMBOL(remap_pfn_range);
> diff --git a/mm/memremap.c b/mm/memremap.c
> index 08cbf54fe037..2f88f43d4a01 100644
> --- a/mm/memremap.c
> +++ b/mm/memremap.c
> @@ -129,7 +129,7 @@ static void pageunmap_range(struct dev_pagemap *pgmap, int range_id)
> }
> mem_hotplug_done();
>
> - untrack_pfn(NULL, PHYS_PFN(range->start), range_len(range));
> + untrack_pfn(NULL, PHYS_PFN(range->start), range_len(range), true);
> pgmap_array_delete(range);
> }
>
> @@ -276,7 +276,7 @@ static int pagemap_range(struct dev_pagemap *pgmap, struct mhp_params *params,
> if (!is_private)
> kasan_remove_zero_shadow(__va(range->start), range_len(range));
> err_kasan:
> - untrack_pfn(NULL, PHYS_PFN(range->start), range_len(range));
> + untrack_pfn(NULL, PHYS_PFN(range->start), range_len(range), true);
> err_pfn_remap:
> pgmap_array_delete(range);
> return error;
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 2c6e9072e6a8..69d440997648 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -78,7 +78,7 @@ core_param(ignore_rlimit_data, ignore_rlimit_data, bool, 0644);
> static void unmap_region(struct mm_struct *mm, struct maple_tree *mt,
> struct vm_area_struct *vma, struct vm_area_struct *prev,
> struct vm_area_struct *next, unsigned long start,
> - unsigned long end);
> + unsigned long end, bool mm_wr_locked);
>
> static pgprot_t vm_pgprot_modify(pgprot_t oldprot, unsigned long vm_flags)
> {
> @@ -2136,14 +2136,14 @@ static inline void remove_mt(struct mm_struct *mm, struct ma_state *mas)
> static void unmap_region(struct mm_struct *mm, struct maple_tree *mt,
> struct vm_area_struct *vma, struct vm_area_struct *prev,
> struct vm_area_struct *next,
> - unsigned long start, unsigned long end)
> + unsigned long start, unsigned long end, bool mm_wr_locked)
> {
> struct mmu_gather tlb;
>
> lru_add_drain();
> tlb_gather_mmu(&tlb, mm);
> update_hiwater_rss(mm);
> - unmap_vmas(&tlb, mt, vma, start, end);
> + unmap_vmas(&tlb, mt, vma, start, end, mm_wr_locked);
> free_pgtables(&tlb, mt, vma, prev ? prev->vm_end : FIRST_USER_ADDRESS,
> next ? next->vm_start : USER_PGTABLES_CEILING);
> tlb_finish_mmu(&tlb);
> @@ -2391,7 +2391,11 @@ do_vmi_align_munmap(struct vma_iterator *vmi, struct vm_area_struct *vma,
> mmap_write_downgrade(mm);
> }
>
> - unmap_region(mm, &mt_detach, vma, prev, next, start, end);
> + /*
> + * We can free page tables without write-locking mmap_lock because VMAs
> + * were isolated before we downgraded mmap_lock.
> + */
> + unmap_region(mm, &mt_detach, vma, prev, next, start, end, !downgrade);
> /* Statistics and freeing VMAs */
> mas_set(&mas_detach, start);
> remove_mt(mm, &mas_detach);
> @@ -2704,7 +2708,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
>
> /* Undo any partial mapping done by a device driver. */
> unmap_region(mm, &mm->mm_mt, vma, prev, next, vma->vm_start,
> - vma->vm_end);
> + vma->vm_end, true);
> }
> if (file && (vm_flags & VM_SHARED))
> mapping_unmap_writable(file->f_mapping);
> @@ -3031,7 +3035,7 @@ void exit_mmap(struct mm_struct *mm)
> tlb_gather_mmu_fullmm(&tlb, mm);
> /* update_hiwater_rss(mm) here? but nobody should be looking */
> /* Use ULONG_MAX here to ensure all VMAs in the mm are unmapped */
> - unmap_vmas(&tlb, &mm->mm_mt, vma, 0, ULONG_MAX);
> + unmap_vmas(&tlb, &mm->mm_mt, vma, 0, ULONG_MAX, false);
> mmap_read_unlock(mm);
>
> /*
> --
> 2.39.1

--
Michal Hocko
SUSE Labs

2023-01-26 11:52:38

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH v3 1/7] kernel/fork: convert vma assignment to a memcpy

On Wed, Jan 25, 2023 at 05:34:49PM -0800, Andrew Morton wrote:
> On Wed, 25 Jan 2023 16:50:01 -0800 Suren Baghdasaryan <[email protected]> wrote:
>
> > On Wed, Jan 25, 2023 at 4:22 PM Andrew Morton <[email protected]> wrote:
> > >
> > > On Wed, 25 Jan 2023 15:35:48 -0800 Suren Baghdasaryan <[email protected]> wrote:
> > >
> > > > Convert vma assignment in vm_area_dup() to a memcpy() to prevent compiler
> > > > errors when we add a const modifier to vma->vm_flags.
> > > >
> > > > ...
> > > >
> > > > --- a/kernel/fork.c
> > > > +++ b/kernel/fork.c
> > > > @@ -482,7 +482,7 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struct *orig)
> > > > * orig->shared.rb may be modified concurrently, but the clone
> > > > * will be reinitialized.
> > > > */
> > > > - *new = data_race(*orig);
> > > > + memcpy(new, orig, sizeof(*new));
> > >
> > > The data_race() removal is unchangelogged?
> >
> > True. I'll add a note in the changelog about that. Ideally I would
> > like to preserve it but I could not find a way to do that.
> >
>
> Perhaps Paul can comment?
>
> I wonder if KCSAN knows how to detect this race, given that it's now in
> a memcpy. I assume so.

data_race() is just wrapping an expression around
__kcsan_[en|dis]able_current and ensuring the expression is evaluated once
and returning the correct type. I believe the following should be sufficient.

diff --git a/kernel/fork.c b/kernel/fork.c
index 9f7fe3541897..1b30ee568e02 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -472,7 +472,7 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struct *orig)
* orig->shared.rb may be modified concurrently, but the clone
* will be reinitialized.
*/
- *new = data_race(*orig);
+ data_race(memcpy(new, orig, sizeof(*new)));
INIT_LIST_HEAD(&new->anon_vma_chain);
dup_anon_vma_name(orig, new);
}

I don't see how memcpy could automagically figure out whether the memcpy
is prone to races or not in an arbitrary context.

Assuming using data_race this way is ok then

Acked-by: Mel Gorman <[email protected]>

--
Mel Gorman
SUSE Labs

2023-01-26 13:58:33

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH v3 2/7] mm: introduce vma->vm_flags wrapper functions

On Wed, Jan 25, 2023 at 03:35:49PM -0800, Suren Baghdasaryan wrote:
> vm_flags are among VMA attributes which affect decisions like VMA merging
> and splitting. Therefore all vm_flags modifications are performed after
> taking exclusive mmap_lock to prevent vm_flags updates racing with such
> operations. Introduce modifier functions for vm_flags to be used whenever
> flags are updated. This way we can better check and control correct
> locking behavior during these updates.
>
> Signed-off-by: Suren Baghdasaryan <[email protected]>

With or without the suggested rename;

Acked-by: Mel Gorman <[email protected]>

--
Mel Gorman
SUSE Labs

2023-01-26 14:00:35

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH v3 3/7] mm: replace VM_LOCKED_CLEAR_MASK with VM_LOCKED_MASK

On Wed, Jan 25, 2023 at 03:35:50PM -0800, Suren Baghdasaryan wrote:
> To simplify the usage of VM_LOCKED_CLEAR_MASK in clear_vm_flags(),
> replace it with VM_LOCKED_MASK bitmask and convert all users.
>
> Signed-off-by: Suren Baghdasaryan <[email protected]>
> Acked-by: Michal Hocko <[email protected]>

Acked-by: Mel Gorman <[email protected]>

--
Mel Gorman
SUSE Labs

2023-01-26 14:17:40

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH v3 3/7] mm: replace VM_LOCKED_CLEAR_MASK with VM_LOCKED_MASK

On Wed, Jan 25, 2023 at 03:35:50PM -0800, Suren Baghdasaryan wrote:
> To simplify the usage of VM_LOCKED_CLEAR_MASK in clear_vm_flags(),
> replace it with VM_LOCKED_MASK bitmask and convert all users.
>
> Signed-off-by: Suren Baghdasaryan <[email protected]>
> Acked-by: Michal Hocko <[email protected]>

Acked-by: Mel Gorman <[email protected]>

--
Mel Gorman
SUSE Labs

2023-01-26 15:10:27

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH v3 4/7] mm: replace vma->vm_flags direct modifications with modifier calls

On Wed, Jan 25, 2023 at 03:35:51PM -0800, Suren Baghdasaryan wrote:
> Replace direct modifications to vma->vm_flags with calls to modifier
> functions to be able to track flag changes and to keep vma locking
> correctness.
>
> Signed-off-by: Suren Baghdasaryan <[email protected]>
> Acked-by: Michal Hocko <[email protected]>

Acked-by: Mel Gorman <[email protected]>

Minor comments that are safe to ignore.

I think a better name for mod_vm_flags is set_clear_vm_flags to hint that
the first flags are to be set and the second flags are to be cleared.
For this patch, it doesn't matter, but it might avoid accidental swapping
in the future.

reset_vm_flags might also be better named as reinit_vma_flags (or
vma_flags_reinit). Maybe also encourage the use of [set|clear_mod]_vm_flags
where possible in the comment to track exactly what is changing and
why. Some cases like userfaultfd just want to clear __VM_UFFD_FLAGS but
altering the flow in this patch is inappropriate and error prone. Others
such as the infiniband changes and madvise are a lot more complex.

--
Mel Gorman
SUSE Labs

2023-01-26 15:29:12

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH v3 5/7] mm: replace vma->vm_flags indirect modification in ksm_madvise

On Wed, Jan 25, 2023 at 03:35:52PM -0800, Suren Baghdasaryan wrote:
> Replace indirect modifications to vma->vm_flags with calls to modifier
> functions to be able to track flag changes and to keep vma locking
> correctness.
>
> Signed-off-by: Suren Baghdasaryan <[email protected]>
> Acked-by: Michal Hocko <[email protected]>
> ---
> arch/powerpc/kvm/book3s_hv_uvmem.c | 5 ++++-
> arch/s390/mm/gmap.c | 5 ++++-
> 2 files changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
> index 1d67baa5557a..325a7a47d348 100644
> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
> @@ -393,6 +393,7 @@ static int kvmppc_memslot_page_merge(struct kvm *kvm,
> {
> unsigned long gfn = memslot->base_gfn;
> unsigned long end, start = gfn_to_hva(kvm, gfn);
> + unsigned long vm_flags;
> int ret = 0;
> struct vm_area_struct *vma;
> int merge_flag = (merge) ? MADV_MERGEABLE : MADV_UNMERGEABLE;
> @@ -409,12 +410,14 @@ static int kvmppc_memslot_page_merge(struct kvm *kvm,
> ret = H_STATE;
> break;
> }
> + vm_flags = vma->vm_flags;
> ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
> - merge_flag, &vma->vm_flags);
> + merge_flag, &vm_flags);
> if (ret) {
> ret = H_STATE;
> break;
> }
> + reset_vm_flags(vma, vm_flags);
> start = vma->vm_end;
> } while (end > vma->vm_end);

Add a comment on why the vm_flags are copied in case someone "optimises"
this in the future? Something like

/* Copy vm_flags to avoid any partial modifications in ksm_madvise. */

>
> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> index 3a695b8a1e3c..d5eb47dcdacb 100644
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -2587,14 +2587,17 @@ int gmap_mark_unmergeable(void)
> {
> struct mm_struct *mm = current->mm;
> struct vm_area_struct *vma;
> + unsigned long vm_flags;
> int ret;
> VMA_ITERATOR(vmi, mm, 0);
>
> for_each_vma(vmi, vma) {
> + vm_flags = vma->vm_flags;
> ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
> - MADV_UNMERGEABLE, &vma->vm_flags);
> + MADV_UNMERGEABLE, &vm_flags);
> if (ret)
> return ret;
> + reset_vm_flags(vma, vm_flags);

Same.

Not necessary as such as there are few users of ksm_madvise and I doubt
it'll introduce new surprises.

With or without the comment;

Acked-by: Mel Gorman <[email protected]>

--
Mel Gorman
SUSE Labs

2023-01-26 15:47:50

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH v3 6/7] mm: introduce mod_vm_flags_nolock and use it in untrack_pfn

On Wed, Jan 25, 2023 at 03:35:53PM -0800, Suren Baghdasaryan wrote:
> In cases when VMA flags are modified after VMA was isolated and mmap_lock
> was downgraded, flags modifications would result in an assertion because
> mmap write lock is not held.

Add note that it's also used during exit when the locking of the VMAs
becomes irrelevant (mm users is 0, should be no VMA modifications taking
place other than zap).

The typical naming pattern when a caller either knows it holds the necessary
lock or knows it does not matter is __mod_vm_flags()

> Introduce mod_vm_flags_nolock to be used in such situation, when VMA is
> not part of VMA tree and locking it is not required.

Instead of such situations, describe in as "used when the caller takes
responsibility for the required locking".

> Pass a hint to untrack_pfn to conditionally use mod_vm_flags_nolock for
> flags modification and to avoid assertion.
>
> Signed-off-by: Suren Baghdasaryan <[email protected]>

Patch itself looks ok. It strays close to being "conditional locking"
though which might attract some complaints.

--
Mel Gorman
SUSE Labs

2023-01-26 15:59:50

by Suren Baghdasaryan

[permalink] [raw]
Subject: Re: [PATCH v3 1/7] kernel/fork: convert vma assignment to a memcpy

On Thu, Jan 26, 2023 at 3:52 AM Mel Gorman <[email protected]> wrote:
>
> On Wed, Jan 25, 2023 at 05:34:49PM -0800, Andrew Morton wrote:
> > On Wed, 25 Jan 2023 16:50:01 -0800 Suren Baghdasaryan <[email protected]> wrote:
> >
> > > On Wed, Jan 25, 2023 at 4:22 PM Andrew Morton <[email protected]> wrote:
> > > >
> > > > On Wed, 25 Jan 2023 15:35:48 -0800 Suren Baghdasaryan <[email protected]> wrote:
> > > >
> > > > > Convert vma assignment in vm_area_dup() to a memcpy() to prevent compiler
> > > > > errors when we add a const modifier to vma->vm_flags.
> > > > >
> > > > > ...
> > > > >
> > > > > --- a/kernel/fork.c
> > > > > +++ b/kernel/fork.c
> > > > > @@ -482,7 +482,7 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struct *orig)
> > > > > * orig->shared.rb may be modified concurrently, but the clone
> > > > > * will be reinitialized.
> > > > > */
> > > > > - *new = data_race(*orig);
> > > > > + memcpy(new, orig, sizeof(*new));
> > > >
> > > > The data_race() removal is unchangelogged?
> > >
> > > True. I'll add a note in the changelog about that. Ideally I would
> > > like to preserve it but I could not find a way to do that.
> > >
> >
> > Perhaps Paul can comment?
> >
> > I wonder if KCSAN knows how to detect this race, given that it's now in
> > a memcpy. I assume so.
>
> data_race() is just wrapping an expression around
> __kcsan_[en|dis]able_current and ensuring the expression is evaluated once
> and returning the correct type. I believe the following should be sufficient.

Thanks for the suggestion, Mel! I'll try that.

>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 9f7fe3541897..1b30ee568e02 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -472,7 +472,7 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struct *orig)
> * orig->shared.rb may be modified concurrently, but the clone
> * will be reinitialized.
> */
> - *new = data_race(*orig);
> + data_race(memcpy(new, orig, sizeof(*new)));
> INIT_LIST_HEAD(&new->anon_vma_chain);
> dup_anon_vma_name(orig, new);
> }
>
> I don't see how memcpy could automagically figure out whether the memcpy
> is prone to races or not in an arbitrary context.
>
> Assuming using data_race this way is ok then
>
> Acked-by: Mel Gorman <[email protected]>

Thanks!

>
> --
> Mel Gorman
> SUSE Labs

2023-01-26 16:01:27

by Suren Baghdasaryan

[permalink] [raw]
Subject: Re: [PATCH v3 2/7] mm: introduce vma->vm_flags wrapper functions

On Thu, Jan 26, 2023 at 5:58 AM Mel Gorman <[email protected]> wrote:
>
> On Wed, Jan 25, 2023 at 03:35:49PM -0800, Suren Baghdasaryan wrote:
> > vm_flags are among VMA attributes which affect decisions like VMA merging
> > and splitting. Therefore all vm_flags modifications are performed after
> > taking exclusive mmap_lock to prevent vm_flags updates racing with such
> > operations. Introduce modifier functions for vm_flags to be used whenever
> > flags are updated. This way we can better check and control correct
> > locking behavior during these updates.
> >
> > Signed-off-by: Suren Baghdasaryan <[email protected]>
>
> With or without the suggested rename;
>
> Acked-by: Mel Gorman <[email protected]>

Thanks! I'll make the renames and repost the patchset.
vm_flags_init(), vm_flags_reset(), etc. sounds like a good naming for
this.

>
> --
> Mel Gorman
> SUSE Labs

2023-01-26 16:10:42

by Suren Baghdasaryan

[permalink] [raw]
Subject: Re: [PATCH v3 4/7] mm: replace vma->vm_flags direct modifications with modifier calls

On Thu, Jan 26, 2023 at 7:10 AM Mel Gorman <[email protected]> wrote:
>
> On Wed, Jan 25, 2023 at 03:35:51PM -0800, Suren Baghdasaryan wrote:
> > Replace direct modifications to vma->vm_flags with calls to modifier
> > functions to be able to track flag changes and to keep vma locking
> > correctness.
> >
> > Signed-off-by: Suren Baghdasaryan <[email protected]>
> > Acked-by: Michal Hocko <[email protected]>
>
> Acked-by: Mel Gorman <[email protected]>
>
> Minor comments that are safe to ignore.
>
> I think a better name for mod_vm_flags is set_clear_vm_flags to hint that
> the first flags are to be set and the second flags are to be cleared.
> For this patch, it doesn't matter, but it might avoid accidental swapping
> in the future.
>
> reset_vm_flags might also be better named as reinit_vma_flags (or
> vma_flags_reinit). Maybe also encourage the use of [set|clear_mod]_vm_flags
> where possible in the comment to track exactly what is changing and
> why. Some cases like userfaultfd just want to clear __VM_UFFD_FLAGS but
> altering the flow in this patch is inappropriate and error prone. Others
> such as the infiniband changes and madvise are a lot more complex.

That's a good point, but I don't want people to use mod_vm_flags() for
the cases when the order of set/clear really matters. In such cases
set_vm_flags() and clear_vm_flags() should be explicitly used. Maybe
to make that clear I should add a comment and rewrite the functions
as:

void mod_vm_flags(vma, set, clear) {
vma.vm_flags = vma.vm_flags | set & clear;
}

In this patchset it's not that obvious but mod_vm_flags() was really
introduced in the original per-VMA lock patchset for efficiency to
avoid taking extra per-VMA locks. A combo of
set_vm_flags()+clear_vm_flags() would try to retake the same per-VMA
lock in the second call while mod_vm_flags() takes the lock only once
and does both operations. Not a huge overhead because we check if the
lock is already taken and bail out early but still...
So, would the above modification to mod_vm_flags() address your concern?

>
> --
> Mel Gorman
> SUSE Labs

2023-01-26 16:12:09

by Suren Baghdasaryan

[permalink] [raw]
Subject: Re: [PATCH v3 5/7] mm: replace vma->vm_flags indirect modification in ksm_madvise

On Thu, Jan 26, 2023 at 7:19 AM Mel Gorman <[email protected]> wrote:
>
> On Wed, Jan 25, 2023 at 03:35:52PM -0800, Suren Baghdasaryan wrote:
> > Replace indirect modifications to vma->vm_flags with calls to modifier
> > functions to be able to track flag changes and to keep vma locking
> > correctness.
> >
> > Signed-off-by: Suren Baghdasaryan <[email protected]>
> > Acked-by: Michal Hocko <[email protected]>
> > ---
> > arch/powerpc/kvm/book3s_hv_uvmem.c | 5 ++++-
> > arch/s390/mm/gmap.c | 5 ++++-
> > 2 files changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
> > index 1d67baa5557a..325a7a47d348 100644
> > --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
> > +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
> > @@ -393,6 +393,7 @@ static int kvmppc_memslot_page_merge(struct kvm *kvm,
> > {
> > unsigned long gfn = memslot->base_gfn;
> > unsigned long end, start = gfn_to_hva(kvm, gfn);
> > + unsigned long vm_flags;
> > int ret = 0;
> > struct vm_area_struct *vma;
> > int merge_flag = (merge) ? MADV_MERGEABLE : MADV_UNMERGEABLE;
> > @@ -409,12 +410,14 @@ static int kvmppc_memslot_page_merge(struct kvm *kvm,
> > ret = H_STATE;
> > break;
> > }
> > + vm_flags = vma->vm_flags;
> > ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
> > - merge_flag, &vma->vm_flags);
> > + merge_flag, &vm_flags);
> > if (ret) {
> > ret = H_STATE;
> > break;
> > }
> > + reset_vm_flags(vma, vm_flags);
> > start = vma->vm_end;
> > } while (end > vma->vm_end);
>
> Add a comment on why the vm_flags are copied in case someone "optimises"
> this in the future? Something like
>
> /* Copy vm_flags to avoid any partial modifications in ksm_madvise. */

Ack.

>
> >
> > diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> > index 3a695b8a1e3c..d5eb47dcdacb 100644
> > --- a/arch/s390/mm/gmap.c
> > +++ b/arch/s390/mm/gmap.c
> > @@ -2587,14 +2587,17 @@ int gmap_mark_unmergeable(void)
> > {
> > struct mm_struct *mm = current->mm;
> > struct vm_area_struct *vma;
> > + unsigned long vm_flags;
> > int ret;
> > VMA_ITERATOR(vmi, mm, 0);
> >
> > for_each_vma(vmi, vma) {
> > + vm_flags = vma->vm_flags;
> > ret = ksm_madvise(vma, vma->vm_start, vma->vm_end,
> > - MADV_UNMERGEABLE, &vma->vm_flags);
> > + MADV_UNMERGEABLE, &vm_flags);
> > if (ret)
> > return ret;
> > + reset_vm_flags(vma, vm_flags);
>
> Same.
>
> Not necessary as such as there are few users of ksm_madvise and I doubt
> it'll introduce new surprises.
>
> With or without the comment;
>
> Acked-by: Mel Gorman <[email protected]>

Thanks!

>
> --
> Mel Gorman
> SUSE Labs

2023-01-26 16:19:11

by Suren Baghdasaryan

[permalink] [raw]
Subject: Re: [PATCH v3 6/7] mm: introduce mod_vm_flags_nolock and use it in untrack_pfn

On Thu, Jan 26, 2023 at 7:47 AM Mel Gorman <[email protected]> wrote:
>
> On Wed, Jan 25, 2023 at 03:35:53PM -0800, Suren Baghdasaryan wrote:
> > In cases when VMA flags are modified after VMA was isolated and mmap_lock
> > was downgraded, flags modifications would result in an assertion because
> > mmap write lock is not held.
>
> Add note that it's also used during exit when the locking of the VMAs
> becomes irrelevant (mm users is 0, should be no VMA modifications taking
> place other than zap).

Ack.

>
> The typical naming pattern when a caller either knows it holds the necessary
> lock or knows it does not matter is __mod_vm_flags()

Ok. It sounds less explicit but plenty of examples, so I'm fine with
such rename. Will apply in the next version.

>
> > Introduce mod_vm_flags_nolock to be used in such situation, when VMA is
> > not part of VMA tree and locking it is not required.
>
> Instead of such situations, describe in as "used when the caller takes
> responsibility for the required locking".

Ack.

>
> > Pass a hint to untrack_pfn to conditionally use mod_vm_flags_nolock for
> > flags modification and to avoid assertion.
> >
> > Signed-off-by: Suren Baghdasaryan <[email protected]>
>
> Patch itself looks ok. It strays close to being "conditional locking"
> though which might attract some complaints.

The description seems to accurately describe what's done here but I'm
open to better suggestions.
Thanks!

>
> --
> Mel Gorman
> SUSE Labs

2023-01-26 17:27:55

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH v3 4/7] mm: replace vma->vm_flags direct modifications with modifier calls

On Thu, Jan 26, 2023 at 08:10:26AM -0800, Suren Baghdasaryan wrote:
> On Thu, Jan 26, 2023 at 7:10 AM Mel Gorman <[email protected]> wrote:
> >
> > On Wed, Jan 25, 2023 at 03:35:51PM -0800, Suren Baghdasaryan wrote:
> > > Replace direct modifications to vma->vm_flags with calls to modifier
> > > functions to be able to track flag changes and to keep vma locking
> > > correctness.
> > >
> > > Signed-off-by: Suren Baghdasaryan <[email protected]>
> > > Acked-by: Michal Hocko <[email protected]>
> >
> > Acked-by: Mel Gorman <[email protected]>
> >
> > Minor comments that are safe to ignore.
> >
> > I think a better name for mod_vm_flags is set_clear_vm_flags to hint that
> > the first flags are to be set and the second flags are to be cleared.
> > For this patch, it doesn't matter, but it might avoid accidental swapping
> > in the future.
> >
> > reset_vm_flags might also be better named as reinit_vma_flags (or
> > vma_flags_reinit). Maybe also encourage the use of [set|clear_mod]_vm_flags
> > where possible in the comment to track exactly what is changing and
> > why. Some cases like userfaultfd just want to clear __VM_UFFD_FLAGS but
> > altering the flow in this patch is inappropriate and error prone. Others
> > such as the infiniband changes and madvise are a lot more complex.
>
> That's a good point, but I don't want people to use mod_vm_flags() for
> the cases when the order of set/clear really matters. In such cases
> set_vm_flags() and clear_vm_flags() should be explicitly used. Maybe
> to make that clear I should add a comment and rewrite the functions
> as:
>
> void mod_vm_flags(vma, set, clear) {
> vma.vm_flags = vma.vm_flags | set & clear;
> }
>

Offhand, I'm not thinking of a case where that really matters and as they
are not necessarily ordered, it's raising a read flag so yes, it definitely
it needs a comment if the ordering matters.

> In this patchset it's not that obvious but mod_vm_flags() was really
> introduced in the original per-VMA lock patchset for efficiency to
> avoid taking extra per-VMA locks. A combo of
> set_vm_flags()+clear_vm_flags() would try to retake the same per-VMA
> lock in the second call while mod_vm_flags() takes the lock only once
> and does both operations.

Ok, that seems fair but still needs a comment on why a mod_vm_flags is
not necessarily equivalent to a set_vm_flags + clear_vm_flags in terms of
correctness if that is indeed the case.

> Not a huge overhead because we check if the
> lock is already taken and bail out early but still...
> So, would the above modification to mod_vm_flags() address your concern?
>

My concerns are entirely with the callers, not the implementation. If
someone is modifying a call site using mod_vm_flags, they have to read
through all the preceding logic to ensure the final combination of flags
is valid. It's a code maintenance issue, not a correctness issue.

--
Mel Gorman
SUSE Labs

2023-01-26 17:28:23

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH v3 1/7] kernel/fork: convert vma assignment to a memcpy

On Wed, Jan 25, 2023 at 05:34:49PM -0800, Andrew Morton wrote:
> On Wed, 25 Jan 2023 16:50:01 -0800 Suren Baghdasaryan <[email protected]> wrote:
>
> > On Wed, Jan 25, 2023 at 4:22 PM Andrew Morton <[email protected]> wrote:
> > >
> > > On Wed, 25 Jan 2023 15:35:48 -0800 Suren Baghdasaryan <[email protected]> wrote:
> > >
> > > > Convert vma assignment in vm_area_dup() to a memcpy() to prevent compiler
> > > > errors when we add a const modifier to vma->vm_flags.
> > > >
> > > > ...
> > > >
> > > > --- a/kernel/fork.c
> > > > +++ b/kernel/fork.c
> > > > @@ -482,7 +482,7 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struct *orig)
> > > > * orig->shared.rb may be modified concurrently, but the clone
> > > > * will be reinitialized.
> > > > */
> > > > - *new = data_race(*orig);
> > > > + memcpy(new, orig, sizeof(*new));
> > >
> > > The data_race() removal is unchangelogged?
> >
> > True. I'll add a note in the changelog about that. Ideally I would
> > like to preserve it but I could not find a way to do that.
>
> Perhaps Paul can comment?
>
> I wonder if KCSAN knows how to detect this race, given that it's now in
> a memcpy. I assume so.

I ran an experiment memcpy()ing between a static array and an onstack
array, and KCSAN did not complain. But maybe I was setting it up wrong.

This is what I did:

long myid = (long)arg; /* different value for each task */
static unsigned long z1[10] = { 0 };
unsigned long z2[10];

...

memcpy(z1, z2, ARRAY_SIZE(z1) * sizeof(z1[0]));
for (zi = 0; zi < ARRAY_SIZE(z1); zi++)
z2[zi] += myid;
memcpy(z2, z1, ARRAY_SIZE(z1) * sizeof(z1[0]));

Adding Marco on CC for his thoughts.

Thanx, Paul

2023-01-26 17:29:17

by Suren Baghdasaryan

[permalink] [raw]
Subject: Re: [PATCH v3 4/7] mm: replace vma->vm_flags direct modifications with modifier calls

On Thu, Jan 26, 2023 at 9:27 AM Mel Gorman <[email protected]> wrote:
>
> On Thu, Jan 26, 2023 at 08:10:26AM -0800, Suren Baghdasaryan wrote:
> > On Thu, Jan 26, 2023 at 7:10 AM Mel Gorman <[email protected]> wrote:
> > >
> > > On Wed, Jan 25, 2023 at 03:35:51PM -0800, Suren Baghdasaryan wrote:
> > > > Replace direct modifications to vma->vm_flags with calls to modifier
> > > > functions to be able to track flag changes and to keep vma locking
> > > > correctness.
> > > >
> > > > Signed-off-by: Suren Baghdasaryan <[email protected]>
> > > > Acked-by: Michal Hocko <[email protected]>
> > >
> > > Acked-by: Mel Gorman <[email protected]>
> > >
> > > Minor comments that are safe to ignore.
> > >
> > > I think a better name for mod_vm_flags is set_clear_vm_flags to hint that
> > > the first flags are to be set and the second flags are to be cleared.
> > > For this patch, it doesn't matter, but it might avoid accidental swapping
> > > in the future.
> > >
> > > reset_vm_flags might also be better named as reinit_vma_flags (or
> > > vma_flags_reinit). Maybe also encourage the use of [set|clear_mod]_vm_flags
> > > where possible in the comment to track exactly what is changing and
> > > why. Some cases like userfaultfd just want to clear __VM_UFFD_FLAGS but
> > > altering the flow in this patch is inappropriate and error prone. Others
> > > such as the infiniband changes and madvise are a lot more complex.
> >
> > That's a good point, but I don't want people to use mod_vm_flags() for
> > the cases when the order of set/clear really matters. In such cases
> > set_vm_flags() and clear_vm_flags() should be explicitly used. Maybe
> > to make that clear I should add a comment and rewrite the functions
> > as:
> >
> > void mod_vm_flags(vma, set, clear) {
> > vma.vm_flags = vma.vm_flags | set & clear;
> > }
> >
>
> Offhand, I'm not thinking of a case where that really matters and as they
> are not necessarily ordered, it's raising a read flag so yes, it definitely
> it needs a comment if the ordering matters.
>
> > In this patchset it's not that obvious but mod_vm_flags() was really
> > introduced in the original per-VMA lock patchset for efficiency to
> > avoid taking extra per-VMA locks. A combo of
> > set_vm_flags()+clear_vm_flags() would try to retake the same per-VMA
> > lock in the second call while mod_vm_flags() takes the lock only once
> > and does both operations.
>
> Ok, that seems fair but still needs a comment on why a mod_vm_flags is
> not necessarily equivalent to a set_vm_flags + clear_vm_flags in terms of
> correctness if that is indeed the case.
>
> > Not a huge overhead because we check if the
> > lock is already taken and bail out early but still...
> > So, would the above modification to mod_vm_flags() address your concern?
> >
>
> My concerns are entirely with the callers, not the implementation. If
> someone is modifying a call site using mod_vm_flags, they have to read
> through all the preceding logic to ensure the final combination of flags
> is valid. It's a code maintenance issue, not a correctness issue.

Got it. I'll modify the implementation to make a single assignment and
will add a comment to use only when order doesn't matter.
Thanks!

>
> --
> Mel Gorman
> SUSE Labs

2023-01-26 17:33:03

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH v3 6/7] mm: introduce mod_vm_flags_nolock and use it in untrack_pfn

On Thu, Jan 26, 2023 at 08:18:31AM -0800, Suren Baghdasaryan wrote:
> On Thu, Jan 26, 2023 at 7:47 AM Mel Gorman <[email protected]> wrote:
> >
> > On Wed, Jan 25, 2023 at 03:35:53PM -0800, Suren Baghdasaryan wrote:
> > > In cases when VMA flags are modified after VMA was isolated and mmap_lock
> > > was downgraded, flags modifications would result in an assertion because
> > > mmap write lock is not held.
> >
> > Add note that it's also used during exit when the locking of the VMAs
> > becomes irrelevant (mm users is 0, should be no VMA modifications taking
> > place other than zap).
>
> Ack.
>
> >
> > The typical naming pattern when a caller either knows it holds the necessary
> > lock or knows it does not matter is __mod_vm_flags()
>
> Ok. It sounds less explicit but plenty of examples, so I'm fine with
> such rename. Will apply in the next version.
>

It might be a personal thing. nolock to me is ambigious because it might
mean "lock is already held", "no lock is necessary" or "no lock is acquired"
where as *for me*, calling foo vs __foo *usually* means "direct callers of
__foo take care of the locking, memory ordering, per-cpu pinning details etc"
depending on the context. Of course, this convention is not universally true.

> > > Pass a hint to untrack_pfn to conditionally use mod_vm_flags_nolock for
> > > flags modification and to avoid assertion.
> > >
> > > Signed-off-by: Suren Baghdasaryan <[email protected]>
> >
> > Patch itself looks ok. It strays close to being "conditional locking"
> > though which might attract some complaints.
>
> The description seems to accurately describe what's done here but I'm
> open to better suggestions.

I don't have alternative suggestions but if someone else reads the patch and
says "this is conditional locking", you can at least claim that someone
else considered "conditional locking" and didn't think it was a major
problem in this specific patch.

--
Mel Gorman
SUSE Labs

2023-01-26 17:35:02

by Suren Baghdasaryan

[permalink] [raw]
Subject: Re: [PATCH v3 6/7] mm: introduce mod_vm_flags_nolock and use it in untrack_pfn

On Thu, Jan 26, 2023 at 9:32 AM Mel Gorman <[email protected]> wrote:
>
> On Thu, Jan 26, 2023 at 08:18:31AM -0800, Suren Baghdasaryan wrote:
> > On Thu, Jan 26, 2023 at 7:47 AM Mel Gorman <[email protected]> wrote:
> > >
> > > On Wed, Jan 25, 2023 at 03:35:53PM -0800, Suren Baghdasaryan wrote:
> > > > In cases when VMA flags are modified after VMA was isolated and mmap_lock
> > > > was downgraded, flags modifications would result in an assertion because
> > > > mmap write lock is not held.
> > >
> > > Add note that it's also used during exit when the locking of the VMAs
> > > becomes irrelevant (mm users is 0, should be no VMA modifications taking
> > > place other than zap).
> >
> > Ack.
> >
> > >
> > > The typical naming pattern when a caller either knows it holds the necessary
> > > lock or knows it does not matter is __mod_vm_flags()
> >
> > Ok. It sounds less explicit but plenty of examples, so I'm fine with
> > such rename. Will apply in the next version.
> >
>
> It might be a personal thing. nolock to me is ambigious because it might
> mean "lock is already held", "no lock is necessary" or "no lock is acquired"
> where as *for me*, calling foo vs __foo *usually* means "direct callers of
> __foo take care of the locking, memory ordering, per-cpu pinning details etc"
> depending on the context. Of course, this convention is not universally true.
>
> > > > Pass a hint to untrack_pfn to conditionally use mod_vm_flags_nolock for
> > > > flags modification and to avoid assertion.
> > > >
> > > > Signed-off-by: Suren Baghdasaryan <[email protected]>
> > >
> > > Patch itself looks ok. It strays close to being "conditional locking"
> > > though which might attract some complaints.
> >
> > The description seems to accurately describe what's done here but I'm
> > open to better suggestions.
>
> I don't have alternative suggestions but if someone else reads the patch and
> says "this is conditional locking", you can at least claim that someone
> else considered "conditional locking" and didn't think it was a major
> problem in this specific patch.

Perfect. Thanks!

>
> --
> Mel Gorman
> SUSE Labs

2023-01-26 18:16:23

by Davidlohr Bueso

[permalink] [raw]
Subject: Re: [PATCH v3 2/7] mm: introduce vma->vm_flags wrapper functions

On Wed, 25 Jan 2023, Andrew Morton wrote:

>On Wed, 25 Jan 2023 15:35:49 -0800 Suren Baghdasaryan <[email protected]> wrote:
>
>> vm_flags are among VMA attributes which affect decisions like VMA merging
>> and splitting. Therefore all vm_flags modifications are performed after
>> taking exclusive mmap_lock to prevent vm_flags updates racing with such
>> operations. Introduce modifier functions for vm_flags to be used whenever
>> flags are updated. This way we can better check and control correct
>> locking behavior during these updates.
>>
>> ...
>>
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> +static inline void init_vm_flags(struct vm_area_struct *vma,
>> +static inline void reset_vm_flags(struct vm_area_struct *vma,
>> +static inline void set_vm_flags(struct vm_area_struct *vma,
>> +static inline void clear_vm_flags(struct vm_area_struct *vma,
>> +static inline void mod_vm_flags(struct vm_area_struct *vma,
>
>vm_flags_init(), vm_flags_reset(), etc?
>
>This would be more idiomatic and I do think the most-significant-first
>naming style is preferable.

I tend to prefer this naming yes, but lgtm regardless.

Reviewed-by: Davidlohr Bueso <[email protected]>

2023-02-07 17:17:31

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH v3 1/7] kernel/fork: convert vma assignment to a memcpy

On Thu, Jan 26, 2023 at 09:27AM -0800, Paul E. McKenney wrote:
> On Wed, Jan 25, 2023 at 05:34:49PM -0800, Andrew Morton wrote:
> > On Wed, 25 Jan 2023 16:50:01 -0800 Suren Baghdasaryan <[email protected]> wrote:
> >
> > > On Wed, Jan 25, 2023 at 4:22 PM Andrew Morton <[email protected]> wrote:
> > > >
> > > > On Wed, 25 Jan 2023 15:35:48 -0800 Suren Baghdasaryan <[email protected]> wrote:
> > > >
> > > > > Convert vma assignment in vm_area_dup() to a memcpy() to prevent compiler
> > > > > errors when we add a const modifier to vma->vm_flags.
> > > > >
> > > > > ...
> > > > >
> > > > > --- a/kernel/fork.c
> > > > > +++ b/kernel/fork.c
> > > > > @@ -482,7 +482,7 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struct *orig)
> > > > > * orig->shared.rb may be modified concurrently, but the clone
> > > > > * will be reinitialized.
> > > > > */
> > > > > - *new = data_race(*orig);
> > > > > + memcpy(new, orig, sizeof(*new));
> > > >
> > > > The data_race() removal is unchangelogged?
> > >
> > > True. I'll add a note in the changelog about that. Ideally I would
> > > like to preserve it but I could not find a way to do that.
> >
> > Perhaps Paul can comment?
> >
> > I wonder if KCSAN knows how to detect this race, given that it's now in
> > a memcpy. I assume so.
>
> I ran an experiment memcpy()ing between a static array and an onstack
> array, and KCSAN did not complain. But maybe I was setting it up wrong.
>
> This is what I did:
>
> long myid = (long)arg; /* different value for each task */
> static unsigned long z1[10] = { 0 };
> unsigned long z2[10];
>
> ...
>
> memcpy(z1, z2, ARRAY_SIZE(z1) * sizeof(z1[0]));
> for (zi = 0; zi < ARRAY_SIZE(z1); zi++)
> z2[zi] += myid;
> memcpy(z2, z1, ARRAY_SIZE(z1) * sizeof(z1[0]));
>
> Adding Marco on CC for his thoughts.

( Sorry for not seeing it earlier - just saw this by chance. )

memcpy() data races will be detected as of (given a relatively recent
Clang compiler):

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7c201739beef

Also beware that the compiler is free to "optimize" things by either
inlining memcpy() (turning an explicit memcpy() into just a bunch of
loads/stores), or outline plain assignments into memcpy() calls. So the
only way to be sure what ends up there is to look at the disassembled
code.

The data_race() was introduced by:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cda099b37d716

It says:
"vm_area_dup() blindly copies all fields of original VMA to the new one.
This includes coping vm_area_struct::shared.rb which is normally
protected by i_mmap_lock. But this is fine because the read value will
be overwritten on the following __vma_link_file() under proper
protection. Thus, mark it as an intentional data race and insert a few
assertions for the fields that should not be modified concurrently."

And as far as I can tell this hasn't changed.

Thanks,
-- Marco

2023-02-07 17:24:09

by Suren Baghdasaryan

[permalink] [raw]
Subject: Re: [PATCH v3 1/7] kernel/fork: convert vma assignment to a memcpy

On Tue, Feb 7, 2023 at 9:16 AM Marco Elver <[email protected]> wrote:
>
> On Thu, Jan 26, 2023 at 09:27AM -0800, Paul E. McKenney wrote:
> > On Wed, Jan 25, 2023 at 05:34:49PM -0800, Andrew Morton wrote:
> > > On Wed, 25 Jan 2023 16:50:01 -0800 Suren Baghdasaryan <[email protected]> wrote:
> > >
> > > > On Wed, Jan 25, 2023 at 4:22 PM Andrew Morton <[email protected]> wrote:
> > > > >
> > > > > On Wed, 25 Jan 2023 15:35:48 -0800 Suren Baghdasaryan <[email protected]> wrote:
> > > > >
> > > > > > Convert vma assignment in vm_area_dup() to a memcpy() to prevent compiler
> > > > > > errors when we add a const modifier to vma->vm_flags.
> > > > > >
> > > > > > ...
> > > > > >
> > > > > > --- a/kernel/fork.c
> > > > > > +++ b/kernel/fork.c
> > > > > > @@ -482,7 +482,7 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struct *orig)
> > > > > > * orig->shared.rb may be modified concurrently, but the clone
> > > > > > * will be reinitialized.
> > > > > > */
> > > > > > - *new = data_race(*orig);
> > > > > > + memcpy(new, orig, sizeof(*new));
> > > > >
> > > > > The data_race() removal is unchangelogged?
> > > >
> > > > True. I'll add a note in the changelog about that. Ideally I would
> > > > like to preserve it but I could not find a way to do that.
> > >
> > > Perhaps Paul can comment?
> > >
> > > I wonder if KCSAN knows how to detect this race, given that it's now in
> > > a memcpy. I assume so.
> >
> > I ran an experiment memcpy()ing between a static array and an onstack
> > array, and KCSAN did not complain. But maybe I was setting it up wrong.
> >
> > This is what I did:
> >
> > long myid = (long)arg; /* different value for each task */
> > static unsigned long z1[10] = { 0 };
> > unsigned long z2[10];
> >
> > ...
> >
> > memcpy(z1, z2, ARRAY_SIZE(z1) * sizeof(z1[0]));
> > for (zi = 0; zi < ARRAY_SIZE(z1); zi++)
> > z2[zi] += myid;
> > memcpy(z2, z1, ARRAY_SIZE(z1) * sizeof(z1[0]));
> >
> > Adding Marco on CC for his thoughts.
>
> ( Sorry for not seeing it earlier - just saw this by chance. )
>
> memcpy() data races will be detected as of (given a relatively recent
> Clang compiler):
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7c201739beef
>
> Also beware that the compiler is free to "optimize" things by either
> inlining memcpy() (turning an explicit memcpy() into just a bunch of
> loads/stores), or outline plain assignments into memcpy() calls. So the
> only way to be sure what ends up there is to look at the disassembled
> code.
>
> The data_race() was introduced by:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cda099b37d716
>
> It says:
> "vm_area_dup() blindly copies all fields of original VMA to the new one.
> This includes coping vm_area_struct::shared.rb which is normally
> protected by i_mmap_lock. But this is fine because the read value will
> be overwritten on the following __vma_link_file() under proper
> protection. Thus, mark it as an intentional data race and insert a few
> assertions for the fields that should not be modified concurrently."
>
> And as far as I can tell this hasn't changed.

Thanks for the feedback, Marco!
So, IIUC Mel's proposal to use data_race(memcpy(new, orig,
sizeof(*new))); is fine in this case, right?
Thanks,
Suren.

>
> Thanks,
> -- Marco

2023-02-07 17:52:21

by Marco Elver

[permalink] [raw]
Subject: Re: [PATCH v3 1/7] kernel/fork: convert vma assignment to a memcpy

On Tue, 7 Feb 2023 at 18:24, Suren Baghdasaryan <[email protected]> wrote:
>
> On Tue, Feb 7, 2023 at 9:16 AM Marco Elver <[email protected]> wrote:
> >
> > On Thu, Jan 26, 2023 at 09:27AM -0800, Paul E. McKenney wrote:
> > > On Wed, Jan 25, 2023 at 05:34:49PM -0800, Andrew Morton wrote:
> > > > On Wed, 25 Jan 2023 16:50:01 -0800 Suren Baghdasaryan <[email protected]> wrote:
> > > >
> > > > > On Wed, Jan 25, 2023 at 4:22 PM Andrew Morton <[email protected]> wrote:
> > > > > >
> > > > > > On Wed, 25 Jan 2023 15:35:48 -0800 Suren Baghdasaryan <[email protected]> wrote:
> > > > > >
> > > > > > > Convert vma assignment in vm_area_dup() to a memcpy() to prevent compiler
> > > > > > > errors when we add a const modifier to vma->vm_flags.
> > > > > > >
> > > > > > > ...
> > > > > > >
> > > > > > > --- a/kernel/fork.c
> > > > > > > +++ b/kernel/fork.c
> > > > > > > @@ -482,7 +482,7 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struct *orig)
> > > > > > > * orig->shared.rb may be modified concurrently, but the clone
> > > > > > > * will be reinitialized.
> > > > > > > */
> > > > > > > - *new = data_race(*orig);
> > > > > > > + memcpy(new, orig, sizeof(*new));
> > > > > >
> > > > > > The data_race() removal is unchangelogged?
> > > > >
> > > > > True. I'll add a note in the changelog about that. Ideally I would
> > > > > like to preserve it but I could not find a way to do that.
> > > >
> > > > Perhaps Paul can comment?
> > > >
> > > > I wonder if KCSAN knows how to detect this race, given that it's now in
> > > > a memcpy. I assume so.
> > >
> > > I ran an experiment memcpy()ing between a static array and an onstack
> > > array, and KCSAN did not complain. But maybe I was setting it up wrong.
> > >
> > > This is what I did:
> > >
> > > long myid = (long)arg; /* different value for each task */
> > > static unsigned long z1[10] = { 0 };
> > > unsigned long z2[10];
> > >
> > > ...
> > >
> > > memcpy(z1, z2, ARRAY_SIZE(z1) * sizeof(z1[0]));
> > > for (zi = 0; zi < ARRAY_SIZE(z1); zi++)
> > > z2[zi] += myid;
> > > memcpy(z2, z1, ARRAY_SIZE(z1) * sizeof(z1[0]));
> > >
> > > Adding Marco on CC for his thoughts.
> >
> > ( Sorry for not seeing it earlier - just saw this by chance. )
> >
> > memcpy() data races will be detected as of (given a relatively recent
> > Clang compiler):
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7c201739beef
> >
> > Also beware that the compiler is free to "optimize" things by either
> > inlining memcpy() (turning an explicit memcpy() into just a bunch of
> > loads/stores), or outline plain assignments into memcpy() calls. So the
> > only way to be sure what ends up there is to look at the disassembled
> > code.
> >
> > The data_race() was introduced by:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cda099b37d716
> >
> > It says:
> > "vm_area_dup() blindly copies all fields of original VMA to the new one.
> > This includes coping vm_area_struct::shared.rb which is normally
> > protected by i_mmap_lock. But this is fine because the read value will
> > be overwritten on the following __vma_link_file() under proper
> > protection. Thus, mark it as an intentional data race and insert a few
> > assertions for the fields that should not be modified concurrently."
> >
> > And as far as I can tell this hasn't changed.
>
> Thanks for the feedback, Marco!
> So, IIUC Mel's proposal to use data_race(memcpy(new, orig,
> sizeof(*new))); is fine in this case, right?

Yes, that'd work.