Hi,
For v4, is just some small non-functional changes from Christophe Leroy,
including splitting two patches.
Also, rebase to v6.9-rc1. This included converting two new
mm->get_unmapped_area() callsites that popped up.
v3:
https://lore.kernel.org/lkml/[email protected]/
v2:
https://lore.kernel.org/lkml/[email protected]/
v1:
https://lore.kernel.org/lkml/[email protected]/
=======
In working on x86’s shadow stack feature, I came across some limitations
around the kernel’s handling of guard gaps. AFAICT these limitations are
not too important for the traditional stack usage of guard gaps, but have
bigger impact on shadow stack’s usage. And now in addition to x86, we have
two other architectures implementing shadow stack like features that plan
to use guard gaps. I wanted to see about addressing them, but I have not
worked on mmap() placement related code before, so would greatly
appreciate if people could take a look and point me in the right
direction.
The nature of the limitations of concern is as follows. In order to ensure
guard gaps between mappings, mmap() would need to consider two things:
1. That the new mapping isn’t placed in an any existing mapping’s guard
gap.
2. That the new mapping isn’t placed such that any existing mappings are
not in *its* guard gaps
Currently mmap never considers (2), and (1) is not considered in some
situations.
When not passing an address hint, or passing one without
MAP_FIXED_NOREPLACE, (1) is enforced. With MAP_FIXED_NOREPLACE, (1) is not
enforced. With MAP_FIXED, (1) is not considered, but this seems to be
expected since MAP_FIXED can already clobber existing mappings. For
MAP_FIXED_NOREPLACE I would have guessed it should respect the guard gaps
of existing mappings, but it is probably a little ambiguous.
In this series I just tried to add enforcement of (2) for the normal (no
address hint) case and only for the newer shadow stack memory (not
stacks). The reason is that with the no-address-hint situation, landing
next to a guard gap could come up naturally and so be more influencable by
attackers such that two shadow stacks could be adjacent without a guard
gap. Where as the address-hint scenarios would require more control -
being able to call mmap() with specific arguments. As for why not just fix
the other corner cases anyway, I thought it might have some greater
possibility of affecting existing apps.
Thanks,
Rick
Rick Edgecombe (14):
proc: Refactor pde_get_unmapped_area as prep
mm: Switch mm->get_unmapped_area() to a flag
mm: Introduce arch_get_unmapped_area_vmflags()
mm: Remove export for get_unmapped_area()
mm: Use get_unmapped_area_vmflags()
thp: Add thp_get_unmapped_area_vmflags()
csky: Use initializer for struct vm_unmapped_area_info
parisc: Use initializer for struct vm_unmapped_area_info
powerpc: Use initializer for struct vm_unmapped_area_info
treewide: Use initializer for struct vm_unmapped_area_info
mm: Take placement mappings gap into account
x86/mm: Implement HAVE_ARCH_UNMAPPED_AREA_VMFLAGS
x86/mm: Care about shadow stack guard gap during placement
selftests/x86: Add placement guard gap test for shstk
arch/alpha/kernel/osf_sys.c | 5 +-
arch/arc/mm/mmap.c | 4 +-
arch/arm/mm/mmap.c | 5 +-
arch/csky/abiv1/mmap.c | 12 +-
arch/loongarch/mm/mmap.c | 3 +-
arch/mips/mm/mmap.c | 3 +-
arch/parisc/kernel/sys_parisc.c | 6 +-
arch/powerpc/mm/book3s64/slice.c | 20 ++--
arch/s390/mm/hugetlbpage.c | 9 +-
arch/s390/mm/mmap.c | 9 +-
arch/sh/mm/mmap.c | 5 +-
arch/sparc/kernel/sys_sparc_32.c | 3 +-
arch/sparc/kernel/sys_sparc_64.c | 20 ++--
arch/sparc/mm/hugetlbpage.c | 9 +-
arch/x86/include/asm/pgtable_64.h | 1 +
arch/x86/kernel/cpu/sgx/driver.c | 2 +-
arch/x86/kernel/sys_x86_64.c | 42 +++++--
arch/x86/mm/hugetlbpage.c | 9 +-
arch/x86/mm/mmap.c | 4 +-
drivers/char/mem.c | 2 +-
drivers/dax/device.c | 6 +-
fs/hugetlbfs/inode.c | 11 +-
fs/proc/inode.c | 10 +-
fs/ramfs/file-mmu.c | 2 +-
include/linux/huge_mm.h | 11 ++
include/linux/mm.h | 12 +-
include/linux/mm_types.h | 6 +-
include/linux/sched/coredump.h | 5 +-
include/linux/sched/mm.h | 22 ++++
io_uring/io_uring.c | 2 +-
kernel/bpf/arena.c | 2 +-
kernel/bpf/syscall.c | 2 +-
mm/debug.c | 6 -
mm/huge_memory.c | 26 +++--
mm/mmap.c | 106 +++++++++++++-----
mm/shmem.c | 11 +-
mm/util.c | 6 +-
.../testing/selftests/x86/test_shadow_stack.c | 67 ++++++++++-
38 files changed, 312 insertions(+), 174 deletions(-)
--
2.34.1
The mm/mmap.c function get_unmapped_area() is not used from any modules,
so it doesn't need to be exported. Remove the export.
Signed-off-by: Rick Edgecombe <[email protected]>
---
v4:
- New patch split from "mm: Use get_unmapped_area_vmflags()"
(Christophe Leroy)
---
mm/mmap.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index 2bd7580b8f0b..d160e88b1b1e 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1887,8 +1887,6 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
return error ? error : addr;
}
-EXPORT_SYMBOL(get_unmapped_area);
-
unsigned long
mm_get_unmapped_area(struct mm_struct *mm, struct file *file,
unsigned long addr, unsigned long len,
--
2.34.1
When memory is being placed, mmap() will take care to respect the guard
gaps of certain types of memory (VM_SHADOWSTACK, VM_GROWSUP and
VM_GROWSDOWN). In order to ensure guard gaps between mappings, mmap()
needs to consider two things:
1. That the new mapping isn’t placed in an any existing mappings guard
gaps.
2. That the new mapping isn’t placed such that any existing mappings
are not in *its* guard gaps.
The long standing behavior of mmap() is to ensure 1, but not take any care
around 2. So for example, if there is a PAGE_SIZE free area, and a
mmap() with a PAGE_SIZE size, and a type that has a guard gap is being
placed, mmap() may place the shadow stack in the PAGE_SIZE free area. Then
the mapping that is supposed to have a guard gap will not have a gap to
the adjacent VMA.
In order to take the start gap into account, the maple tree search needs
to know the size of start gap the new mapping will need. The call chain
from do_mmap() to the actual maple tree search looks like this:
do_mmap(size, vm_flags, map_flags, ..)
mm/mmap.c:get_unmapped_area(size, map_flags, ...)
arch_get_unmapped_area(size, map_flags, ...)
vm_unmapped_area(struct vm_unmapped_area_info)
One option would be to add another MAP_ flag to mean a one page start gap
(as is for shadow stack), but this consumes a flag unnecessarily. Another
option could be to simply increase the size passed in do_mmap() by the
start gap size, and adjust after the fact, but this will interfere with
the alignment requirements passed in struct vm_unmapped_area_info, and
unknown to mmap.c. Instead, introduce variants of
arch_get_unmapped_area/_topdown() that take vm_flags. In future changes,
these variants can be used in mmap.c:get_unmapped_area() to allow the
vm_flags to be passed through to vm_unmapped_area(), while preserving the
normal arch_get_unmapped_area/_topdown() for the existing callers.
Signed-off-by: Rick Edgecombe <[email protected]>
---
v4:
- Remove externs (Christophe Leroy)
---
include/linux/sched/mm.h | 17 +++++++++++++++++
mm/mmap.c | 28 ++++++++++++++++++++++++++++
2 files changed, 45 insertions(+)
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index ed1caa26c8be..91546493c43d 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -191,6 +191,23 @@ unsigned long mm_get_unmapped_area(struct mm_struct *mm, struct file *filp,
unsigned long addr, unsigned long len,
unsigned long pgoff, unsigned long flags);
+unsigned long
+arch_get_unmapped_area_vmflags(struct file *filp, unsigned long addr,
+ unsigned long len, unsigned long pgoff,
+ unsigned long flags, vm_flags_t vm_flags);
+unsigned long
+arch_get_unmapped_area_topdown_vmflags(struct file *filp, unsigned long addr,
+ unsigned long len, unsigned long pgoff,
+ unsigned long flags, vm_flags_t);
+
+unsigned long mm_get_unmapped_area_vmflags(struct mm_struct *mm,
+ struct file *filp,
+ unsigned long addr,
+ unsigned long len,
+ unsigned long pgoff,
+ unsigned long flags,
+ vm_flags_t vm_flags);
+
unsigned long
generic_get_unmapped_area(struct file *filp, unsigned long addr,
unsigned long len, unsigned long pgoff,
diff --git a/mm/mmap.c b/mm/mmap.c
index 224e9ce1e2fd..2bd7580b8f0b 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1808,6 +1808,34 @@ arch_get_unmapped_area_topdown(struct file *filp, unsigned long addr,
}
#endif
+#ifndef HAVE_ARCH_UNMAPPED_AREA_VMFLAGS
+unsigned long
+arch_get_unmapped_area_vmflags(struct file *filp, unsigned long addr, unsigned long len,
+ unsigned long pgoff, unsigned long flags, vm_flags_t vm_flags)
+{
+ return arch_get_unmapped_area(filp, addr, len, pgoff, flags);
+}
+
+unsigned long
+arch_get_unmapped_area_topdown_vmflags(struct file *filp, unsigned long addr,
+ unsigned long len, unsigned long pgoff,
+ unsigned long flags, vm_flags_t vm_flags)
+{
+ return arch_get_unmapped_area_topdown(filp, addr, len, pgoff, flags);
+}
+#endif
+
+unsigned long mm_get_unmapped_area_vmflags(struct mm_struct *mm, struct file *filp,
+ unsigned long addr, unsigned long len,
+ unsigned long pgoff, unsigned long flags,
+ vm_flags_t vm_flags)
+{
+ if (test_bit(MMF_TOPDOWN, &mm->flags))
+ return arch_get_unmapped_area_topdown_vmflags(filp, addr, len, pgoff,
+ flags, vm_flags);
+ return arch_get_unmapped_area_vmflags(filp, addr, len, pgoff, flags, vm_flags);
+}
+
unsigned long
get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
unsigned long pgoff, unsigned long flags)
--
2.34.1
The mm_struct contains a function pointer *get_unmapped_area(), which
is set to either arch_get_unmapped_area() or
arch_get_unmapped_area_topdown() during the initialization of the mm.
Since the function pointer only ever points to two functions that are named
the same across all arch's, a function pointer is not really required. In
addition future changes will want to add versions of the functions that
take additional arguments. So to save a pointers worth of bytes in
mm_struct, and prevent adding additional function pointers to mm_struct in
future changes, remove it and keep the information about which
get_unmapped_area() to use in a flag.
Add the new flag to MMF_INIT_MASK so it doesn't get clobbered on fork by
mmf_init_flags(). Most MM flags get clobbered on fork. In the pre-existing
behavior mm->get_unmapped_area() would get copied to the new mm in
dup_mm(), so not clobbering the flag preserves the existing behavior
around inheriting the topdown-ness.
Introduce a helper, mm_get_unmapped_area(), to easily convert code that
refers to the old function pointer to instead select and call either
arch_get_unmapped_area() or arch_get_unmapped_area_topdown() based on the
flag. Then drop the mm->get_unmapped_area() function pointer. Leave the
get_unmapped_area() pointer in struct file_operations alone. The main
purpose of this change is to reorganize in preparation for future changes,
but it also converts the calls of mm->get_unmapped_area() from indirect
branches into a direct ones.
The stress-ng bigheap benchmark calls realloc a lot, which calls through
get_unmapped_area() in the kernel. On x86, the change yielded a ~1%
improvement there on a retpoline config.
In testing a few x86 configs, removing the pointer unfortunately didn't
result in any actual size reductions in the compiled layout of mm_struct.
But depending on compiler or arch alignment requirements, the change could
shrink the size of mm_struct.
Signed-off-by: Rick Edgecombe <[email protected]>
Acked-by: Dave Hansen <[email protected]>
Acked-by: Liam R. Howlett <[email protected]>
Reviewed-by: Kirill A. Shutemov <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
v4:
- Split out pde_get_unmapped_area() refactor into separate patch
(Christophe Leroy)
v3:
- Fix comment that still referred to mm->get_unmapped_area()
- Resolve trivial rebase conflicts with "mm: thp_get_unmapped_area must
honour topdown preference"
- Spelling fix in log
v2:
- Fix comment on MMF_TOPDOWN (Kirill, rppt)
- Move MMF_TOPDOWN to actually unused bit
- Add MMF_TOPDOWN to MMF_INIT_MASK so it doesn't get clobbered on fork,
and result in the children using the search up path.
- New lower performance results after above bug fix
- Add Reviews and Acks
---
arch/s390/mm/hugetlbpage.c | 2 +-
arch/s390/mm/mmap.c | 4 ++--
arch/sparc/kernel/sys_sparc_64.c | 15 ++++++---------
arch/sparc/mm/hugetlbpage.c | 2 +-
arch/x86/kernel/cpu/sgx/driver.c | 2 +-
arch/x86/mm/hugetlbpage.c | 2 +-
arch/x86/mm/mmap.c | 4 ++--
drivers/char/mem.c | 2 +-
drivers/dax/device.c | 6 +++---
fs/hugetlbfs/inode.c | 4 ++--
fs/proc/inode.c | 3 ++-
fs/ramfs/file-mmu.c | 2 +-
include/linux/mm_types.h | 6 +-----
include/linux/sched/coredump.h | 5 ++++-
include/linux/sched/mm.h | 5 +++++
io_uring/io_uring.c | 2 +-
kernel/bpf/arena.c | 2 +-
kernel/bpf/syscall.c | 2 +-
mm/debug.c | 6 ------
mm/huge_memory.c | 9 ++++-----
mm/mmap.c | 21 ++++++++++++++++++---
mm/shmem.c | 11 +++++------
mm/util.c | 6 +++---
23 files changed, 66 insertions(+), 57 deletions(-)
diff --git a/arch/s390/mm/hugetlbpage.c b/arch/s390/mm/hugetlbpage.c
index c2e8242bd15d..219d906fe830 100644
--- a/arch/s390/mm/hugetlbpage.c
+++ b/arch/s390/mm/hugetlbpage.c
@@ -328,7 +328,7 @@ unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
goto check_asce_limit;
}
- if (mm->get_unmapped_area == arch_get_unmapped_area)
+ if (!test_bit(MMF_TOPDOWN, &mm->flags))
addr = hugetlb_get_unmapped_area_bottomup(file, addr, len,
pgoff, flags);
else
diff --git a/arch/s390/mm/mmap.c b/arch/s390/mm/mmap.c
index b14fc0887654..6b2e4436ad4a 100644
--- a/arch/s390/mm/mmap.c
+++ b/arch/s390/mm/mmap.c
@@ -185,10 +185,10 @@ void arch_pick_mmap_layout(struct mm_struct *mm, struct rlimit *rlim_stack)
*/
if (mmap_is_legacy(rlim_stack)) {
mm->mmap_base = mmap_base_legacy(random_factor);
- mm->get_unmapped_area = arch_get_unmapped_area;
+ clear_bit(MMF_TOPDOWN, &mm->flags);
} else {
mm->mmap_base = mmap_base(random_factor, rlim_stack);
- mm->get_unmapped_area = arch_get_unmapped_area_topdown;
+ set_bit(MMF_TOPDOWN, &mm->flags);
}
}
diff --git a/arch/sparc/kernel/sys_sparc_64.c b/arch/sparc/kernel/sys_sparc_64.c
index 1e9a9e016237..1dbf7211666e 100644
--- a/arch/sparc/kernel/sys_sparc_64.c
+++ b/arch/sparc/kernel/sys_sparc_64.c
@@ -218,14 +218,10 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
unsigned long get_fb_unmapped_area(struct file *filp, unsigned long orig_addr, unsigned long len, unsigned long pgoff, unsigned long flags)
{
unsigned long align_goal, addr = -ENOMEM;
- unsigned long (*get_area)(struct file *, unsigned long,
- unsigned long, unsigned long, unsigned long);
-
- get_area = current->mm->get_unmapped_area;
if (flags & MAP_FIXED) {
/* Ok, don't mess with it. */
- return get_area(NULL, orig_addr, len, pgoff, flags);
+ return mm_get_unmapped_area(current->mm, NULL, orig_addr, len, pgoff, flags);
}
flags &= ~MAP_SHARED;
@@ -238,7 +234,8 @@ unsigned long get_fb_unmapped_area(struct file *filp, unsigned long orig_addr, u
align_goal = (64UL * 1024);
do {
- addr = get_area(NULL, orig_addr, len + (align_goal - PAGE_SIZE), pgoff, flags);
+ addr = mm_get_unmapped_area(current->mm, NULL, orig_addr,
+ len + (align_goal - PAGE_SIZE), pgoff, flags);
if (!(addr & ~PAGE_MASK)) {
addr = (addr + (align_goal - 1UL)) & ~(align_goal - 1UL);
break;
@@ -256,7 +253,7 @@ unsigned long get_fb_unmapped_area(struct file *filp, unsigned long orig_addr, u
* be obtained.
*/
if (addr & ~PAGE_MASK)
- addr = get_area(NULL, orig_addr, len, pgoff, flags);
+ addr = mm_get_unmapped_area(current->mm, NULL, orig_addr, len, pgoff, flags);
return addr;
}
@@ -292,7 +289,7 @@ void arch_pick_mmap_layout(struct mm_struct *mm, struct rlimit *rlim_stack)
gap == RLIM_INFINITY ||
sysctl_legacy_va_layout) {
mm->mmap_base = TASK_UNMAPPED_BASE + random_factor;
- mm->get_unmapped_area = arch_get_unmapped_area;
+ clear_bit(MMF_TOPDOWN, &mm->flags);
} else {
/* We know it's 32-bit */
unsigned long task_size = STACK_TOP32;
@@ -303,7 +300,7 @@ void arch_pick_mmap_layout(struct mm_struct *mm, struct rlimit *rlim_stack)
gap = (task_size / 6 * 5);
mm->mmap_base = PAGE_ALIGN(task_size - gap - random_factor);
- mm->get_unmapped_area = arch_get_unmapped_area_topdown;
+ set_bit(MMF_TOPDOWN, &mm->flags);
}
}
diff --git a/arch/sparc/mm/hugetlbpage.c b/arch/sparc/mm/hugetlbpage.c
index b432500c13a5..38a1bef47efb 100644
--- a/arch/sparc/mm/hugetlbpage.c
+++ b/arch/sparc/mm/hugetlbpage.c
@@ -123,7 +123,7 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
(!vma || addr + len <= vm_start_gap(vma)))
return addr;
}
- if (mm->get_unmapped_area == arch_get_unmapped_area)
+ if (!test_bit(MMF_TOPDOWN, &mm->flags))
return hugetlb_get_unmapped_area_bottomup(file, addr, len,
pgoff, flags);
else
diff --git a/arch/x86/kernel/cpu/sgx/driver.c b/arch/x86/kernel/cpu/sgx/driver.c
index 262f5fb18d74..22b65a5f5ec6 100644
--- a/arch/x86/kernel/cpu/sgx/driver.c
+++ b/arch/x86/kernel/cpu/sgx/driver.c
@@ -113,7 +113,7 @@ static unsigned long sgx_get_unmapped_area(struct file *file,
if (flags & MAP_FIXED)
return addr;
- return current->mm->get_unmapped_area(file, addr, len, pgoff, flags);
+ return mm_get_unmapped_area(current->mm, file, addr, len, pgoff, flags);
}
#ifdef CONFIG_COMPAT
diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
index 5804bbae4f01..6d77c0039617 100644
--- a/arch/x86/mm/hugetlbpage.c
+++ b/arch/x86/mm/hugetlbpage.c
@@ -141,7 +141,7 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
}
get_unmapped_area:
- if (mm->get_unmapped_area == arch_get_unmapped_area)
+ if (!test_bit(MMF_TOPDOWN, &mm->flags))
return hugetlb_get_unmapped_area_bottomup(file, addr, len,
pgoff, flags);
else
diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index c90c20904a60..a2cabb1c81e1 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -129,9 +129,9 @@ static void arch_pick_mmap_base(unsigned long *base, unsigned long *legacy_base,
void arch_pick_mmap_layout(struct mm_struct *mm, struct rlimit *rlim_stack)
{
if (mmap_is_legacy())
- mm->get_unmapped_area = arch_get_unmapped_area;
+ clear_bit(MMF_TOPDOWN, &mm->flags);
else
- mm->get_unmapped_area = arch_get_unmapped_area_topdown;
+ set_bit(MMF_TOPDOWN, &mm->flags);
arch_pick_mmap_base(&mm->mmap_base, &mm->mmap_legacy_base,
arch_rnd(mmap64_rnd_bits), task_size_64bit(0),
diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index 3c6670cf905f..9b80e622ae80 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -544,7 +544,7 @@ static unsigned long get_unmapped_area_zero(struct file *file,
}
/* Otherwise flags & MAP_PRIVATE: with no shmem object beneath it */
- return current->mm->get_unmapped_area(file, addr, len, pgoff, flags);
+ return mm_get_unmapped_area(current->mm, file, addr, len, pgoff, flags);
#else
return -ENOSYS;
#endif
diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index 93ebedc5ec8c..47c126d37b59 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -329,14 +329,14 @@ static unsigned long dax_get_unmapped_area(struct file *filp,
if ((off + len_align) < off)
goto out;
- addr_align = current->mm->get_unmapped_area(filp, addr, len_align,
- pgoff, flags);
+ addr_align = mm_get_unmapped_area(current->mm, filp, addr, len_align,
+ pgoff, flags);
if (!IS_ERR_VALUE(addr_align)) {
addr_align += (off - addr_align) & (align - 1);
return addr_align;
}
out:
- return current->mm->get_unmapped_area(filp, addr, len, pgoff, flags);
+ return mm_get_unmapped_area(current->mm, filp, addr, len, pgoff, flags);
}
static const struct address_space_operations dev_dax_aops = {
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 6502c7e776d1..3dee18bf47ed 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -249,11 +249,11 @@ generic_hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
}
/*
- * Use mm->get_unmapped_area value as a hint to use topdown routine.
+ * Use MMF_TOPDOWN flag as a hint to use topdown routine.
* If architectures have special needs, they should define their own
* version of hugetlb_get_unmapped_area.
*/
- if (mm->get_unmapped_area == arch_get_unmapped_area_topdown)
+ if (test_bit(MMF_TOPDOWN, &mm->flags))
return hugetlb_get_unmapped_area_topdown(file, addr, len,
pgoff, flags);
return hugetlb_get_unmapped_area_bottomup(file, addr, len,
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 75396a24fd8c..d19434e2a58e 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -455,8 +455,9 @@ pde_get_unmapped_area(struct proc_dir_entry *pde, struct file *file, unsigned lo
return pde->proc_ops->proc_get_unmapped_area(file, orig_addr, len, pgoff, flags);
#ifdef CONFIG_MMU
- return current->mm->get_unmapped_area(file, orig_addr, len, pgoff, flags);
+ return mm_get_unmapped_area(current->mm, file, orig_addr, len, pgoff, flags);
#endif
+
return orig_addr;
}
diff --git a/fs/ramfs/file-mmu.c b/fs/ramfs/file-mmu.c
index c7a1aa3c882b..b45c7edc3225 100644
--- a/fs/ramfs/file-mmu.c
+++ b/fs/ramfs/file-mmu.c
@@ -35,7 +35,7 @@ static unsigned long ramfs_mmu_get_unmapped_area(struct file *file,
unsigned long addr, unsigned long len, unsigned long pgoff,
unsigned long flags)
{
- return current->mm->get_unmapped_area(file, addr, len, pgoff, flags);
+ return mm_get_unmapped_area(current->mm, file, addr, len, pgoff, flags);
}
const struct file_operations ramfs_file_operations = {
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 5240bd7bca33..9313e43123d4 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -777,11 +777,7 @@ struct mm_struct {
} ____cacheline_aligned_in_smp;
struct maple_tree mm_mt;
-#ifdef CONFIG_MMU
- unsigned long (*get_unmapped_area) (struct file *filp,
- unsigned long addr, unsigned long len,
- unsigned long pgoff, unsigned long flags);
-#endif
+
unsigned long mmap_base; /* base of mmap area */
unsigned long mmap_legacy_base; /* base of mmap area in bottom-up allocations */
#ifdef CONFIG_HAVE_ARCH_COMPAT_MMAP_BASES
diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
index 02f5090ffea2..e62ff805cfc9 100644
--- a/include/linux/sched/coredump.h
+++ b/include/linux/sched/coredump.h
@@ -92,9 +92,12 @@ static inline int get_dumpable(struct mm_struct *mm)
#define MMF_VM_MERGE_ANY 30
#define MMF_VM_MERGE_ANY_MASK (1 << MMF_VM_MERGE_ANY)
+#define MMF_TOPDOWN 31 /* mm searches top down by default */
+#define MMF_TOPDOWN_MASK (1 << MMF_TOPDOWN)
+
#define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\
MMF_DISABLE_THP_MASK | MMF_HAS_MDWE_MASK |\
- MMF_VM_MERGE_ANY_MASK)
+ MMF_VM_MERGE_ANY_MASK | MMF_TOPDOWN_MASK)
static inline unsigned long mmf_init_flags(unsigned long flags)
{
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index b6543f9d78d6..ed1caa26c8be 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -8,6 +8,7 @@
#include <linux/mm_types.h>
#include <linux/gfp.h>
#include <linux/sync_core.h>
+#include <linux/sched/coredump.h>
/*
* Routines for handling mm_structs
@@ -186,6 +187,10 @@ arch_get_unmapped_area_topdown(struct file *filp, unsigned long addr,
unsigned long len, unsigned long pgoff,
unsigned long flags);
+unsigned long mm_get_unmapped_area(struct mm_struct *mm, struct file *filp,
+ unsigned long addr, unsigned long len,
+ unsigned long pgoff, unsigned long flags);
+
unsigned long
generic_get_unmapped_area(struct file *filp, unsigned long addr,
unsigned long len, unsigned long pgoff,
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 5d4b448fdc50..405bab0a560c 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -3520,7 +3520,7 @@ static unsigned long io_uring_mmu_get_unmapped_area(struct file *filp,
#else
addr = 0UL;
#endif
- return current->mm->get_unmapped_area(filp, addr, len, pgoff, flags);
+ return mm_get_unmapped_area(current->mm, filp, addr, len, pgoff, flags);
}
#else /* !CONFIG_MMU */
diff --git a/kernel/bpf/arena.c b/kernel/bpf/arena.c
index 86571e760dd6..74d566dcd2cb 100644
--- a/kernel/bpf/arena.c
+++ b/kernel/bpf/arena.c
@@ -314,7 +314,7 @@ static unsigned long arena_get_unmapped_area(struct file *filp, unsigned long ad
return -EINVAL;
}
- ret = current->mm->get_unmapped_area(filp, addr, len * 2, 0, flags);
+ ret = mm_get_unmapped_area(current->mm, filp, addr, len * 2, 0, flags);
if (IS_ERR_VALUE(ret))
return ret;
if ((ret >> 32) == ((ret + len - 1) >> 32))
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index ae2ff73bde7e..dead5e1977d8 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -980,7 +980,7 @@ static unsigned long bpf_get_unmapped_area(struct file *filp, unsigned long addr
if (map->ops->map_get_unmapped_area)
return map->ops->map_get_unmapped_area(filp, addr, len, pgoff, flags);
#ifdef CONFIG_MMU
- return current->mm->get_unmapped_area(filp, addr, len, pgoff, flags);
+ return mm_get_unmapped_area(current->mm, filp, addr, len, pgoff, flags);
#else
return addr;
#endif
diff --git a/mm/debug.c b/mm/debug.c
index c1c1a6a484e4..37a17f77df9f 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -180,9 +180,6 @@ EXPORT_SYMBOL(dump_vma);
void dump_mm(const struct mm_struct *mm)
{
pr_emerg("mm %px task_size %lu\n"
-#ifdef CONFIG_MMU
- "get_unmapped_area %px\n"
-#endif
"mmap_base %lu mmap_legacy_base %lu\n"
"pgd %px mm_users %d mm_count %d pgtables_bytes %lu map_count %d\n"
"hiwater_rss %lx hiwater_vm %lx total_vm %lx locked_vm %lx\n"
@@ -208,9 +205,6 @@ void dump_mm(const struct mm_struct *mm)
"def_flags: %#lx(%pGv)\n",
mm, mm->task_size,
-#ifdef CONFIG_MMU
- mm->get_unmapped_area,
-#endif
mm->mmap_base, mm->mmap_legacy_base,
mm->pgd, atomic_read(&mm->mm_users),
atomic_read(&mm->mm_count),
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9859aa4f7553..cede9ccb84dc 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -824,8 +824,8 @@ static unsigned long __thp_get_unmapped_area(struct file *filp,
if (len_pad < len || (off + len_pad) < off)
return 0;
- ret = current->mm->get_unmapped_area(filp, addr, len_pad,
- off >> PAGE_SHIFT, flags);
+ ret = mm_get_unmapped_area(current->mm, filp, addr, len_pad,
+ off >> PAGE_SHIFT, flags);
/*
* The failure might be due to length padding. The caller will retry
@@ -843,8 +843,7 @@ static unsigned long __thp_get_unmapped_area(struct file *filp,
off_sub = (off - ret) & (size - 1);
- if (current->mm->get_unmapped_area == arch_get_unmapped_area_topdown &&
- !off_sub)
+ if (test_bit(MMF_TOPDOWN, ¤t->mm->flags) && !off_sub)
return ret + size;
ret += off_sub;
@@ -861,7 +860,7 @@ unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr,
if (ret)
return ret;
- return current->mm->get_unmapped_area(filp, addr, len, pgoff, flags);
+ return mm_get_unmapped_area(current->mm, filp, addr, len, pgoff, flags);
}
EXPORT_SYMBOL_GPL(thp_get_unmapped_area);
diff --git a/mm/mmap.c b/mm/mmap.c
index 6dbda99a47da..224e9ce1e2fd 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1813,7 +1813,8 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
unsigned long pgoff, unsigned long flags)
{
unsigned long (*get_area)(struct file *, unsigned long,
- unsigned long, unsigned long, unsigned long);
+ unsigned long, unsigned long, unsigned long)
+ = NULL;
unsigned long error = arch_mmap_check(addr, len, flags);
if (error)
@@ -1823,7 +1824,6 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
if (len > TASK_SIZE)
return -ENOMEM;
- get_area = current->mm->get_unmapped_area;
if (file) {
if (file->f_op->get_unmapped_area)
get_area = file->f_op->get_unmapped_area;
@@ -1842,7 +1842,11 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
if (!file)
pgoff = 0;
- addr = get_area(file, addr, len, pgoff, flags);
+ if (get_area)
+ addr = get_area(file, addr, len, pgoff, flags);
+ else
+ addr = mm_get_unmapped_area(current->mm, file, addr, len,
+ pgoff, flags);
if (IS_ERR_VALUE(addr))
return addr;
@@ -1857,6 +1861,17 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
EXPORT_SYMBOL(get_unmapped_area);
+unsigned long
+mm_get_unmapped_area(struct mm_struct *mm, struct file *file,
+ unsigned long addr, unsigned long len,
+ unsigned long pgoff, unsigned long flags)
+{
+ if (test_bit(MMF_TOPDOWN, &mm->flags))
+ return arch_get_unmapped_area_topdown(file, addr, len, pgoff, flags);
+ return arch_get_unmapped_area(file, addr, len, pgoff, flags);
+}
+EXPORT_SYMBOL(mm_get_unmapped_area);
+
/**
* find_vma_intersection() - Look up the first VMA which intersects the interval
* @mm: The process address space.
diff --git a/mm/shmem.c b/mm/shmem.c
index 0aad0d9a621b..4078c3a1b2d0 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2273,8 +2273,6 @@ unsigned long shmem_get_unmapped_area(struct file *file,
unsigned long uaddr, unsigned long len,
unsigned long pgoff, unsigned long flags)
{
- unsigned long (*get_area)(struct file *,
- unsigned long, unsigned long, unsigned long, unsigned long);
unsigned long addr;
unsigned long offset;
unsigned long inflated_len;
@@ -2284,8 +2282,8 @@ unsigned long shmem_get_unmapped_area(struct file *file,
if (len > TASK_SIZE)
return -ENOMEM;
- get_area = current->mm->get_unmapped_area;
- addr = get_area(file, uaddr, len, pgoff, flags);
+ addr = mm_get_unmapped_area(current->mm, file, uaddr, len, pgoff,
+ flags);
if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
return addr;
@@ -2342,7 +2340,8 @@ unsigned long shmem_get_unmapped_area(struct file *file,
if (inflated_len < len)
return addr;
- inflated_addr = get_area(NULL, uaddr, inflated_len, 0, flags);
+ inflated_addr = mm_get_unmapped_area(current->mm, NULL, uaddr,
+ inflated_len, 0, flags);
if (IS_ERR_VALUE(inflated_addr))
return addr;
if (inflated_addr & ~PAGE_MASK)
@@ -4807,7 +4806,7 @@ unsigned long shmem_get_unmapped_area(struct file *file,
unsigned long addr, unsigned long len,
unsigned long pgoff, unsigned long flags)
{
- return current->mm->get_unmapped_area(file, addr, len, pgoff, flags);
+ return mm_get_unmapped_area(current->mm, file, addr, len, pgoff, flags);
}
#endif
diff --git a/mm/util.c b/mm/util.c
index 669397235787..8619d353a1aa 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -469,17 +469,17 @@ void arch_pick_mmap_layout(struct mm_struct *mm, struct rlimit *rlim_stack)
if (mmap_is_legacy(rlim_stack)) {
mm->mmap_base = TASK_UNMAPPED_BASE + random_factor;
- mm->get_unmapped_area = arch_get_unmapped_area;
+ clear_bit(MMF_TOPDOWN, &mm->flags);
} else {
mm->mmap_base = mmap_base(random_factor, rlim_stack);
- mm->get_unmapped_area = arch_get_unmapped_area_topdown;
+ set_bit(MMF_TOPDOWN, &mm->flags);
}
}
#elif defined(CONFIG_MMU) && !defined(HAVE_ARCH_PICK_MMAP_LAYOUT)
void arch_pick_mmap_layout(struct mm_struct *mm, struct rlimit *rlim_stack)
{
mm->mmap_base = TASK_UNMAPPED_BASE;
- mm->get_unmapped_area = arch_get_unmapped_area;
+ clear_bit(MMF_TOPDOWN, &mm->flags);
}
#endif
--
2.34.1
When memory is being placed, mmap() will take care to respect the guard
gaps of certain types of memory (VM_SHADOWSTACK, VM_GROWSUP and
VM_GROWSDOWN). In order to ensure guard gaps between mappings, mmap()
needs to consider two things:
1. That the new mapping isn’t placed in an any existing mappings guard
gaps.
2. That the new mapping isn’t placed such that any existing mappings
are not in *its* guard gaps.
The long standing behavior of mmap() is to ensure 1, but not take any care
around 2. So for example, if there is a PAGE_SIZE free area, and a
mmap() with a PAGE_SIZE size, and a type that has a guard gap is being
placed, mmap() may place the shadow stack in the PAGE_SIZE free area. Then
the mapping that is supposed to have a guard gap will not have a gap to
the adjacent VMA.
Add a THP implementations of the vm_flags variant of get_unmapped_area().
Future changes will call this from mmap.c in the do_mmap() path to allow
shadow stacks to be placed with consideration taken for the start guard
gap. Shadow stack memory is always private and anonymous and so special
guard gap logic is not needed in a lot of caseis, but it can be mapped by
THP, so needs to be handled.
Signed-off-by: Rick Edgecombe <[email protected]>
Reviewed-by: Christophe Leroy <[email protected]>
---
include/linux/huge_mm.h | 11 +++++++++++
mm/huge_memory.c | 23 ++++++++++++++++-------
mm/mmap.c | 12 +++++++-----
3 files changed, 34 insertions(+), 12 deletions(-)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index de0c89105076..cc599de5e397 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -262,6 +262,9 @@ unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr,
unsigned long len, unsigned long pgoff, unsigned long flags);
+unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long addr,
+ unsigned long len, unsigned long pgoff, unsigned long flags,
+ vm_flags_t vm_flags);
void folio_prep_large_rmappable(struct folio *folio);
bool can_split_folio(struct folio *folio, int *pextra_pins);
@@ -417,6 +420,14 @@ static inline void folio_prep_large_rmappable(struct folio *folio) {}
#define thp_get_unmapped_area NULL
+static inline unsigned long
+thp_get_unmapped_area_vmflags(struct file *filp, unsigned long addr,
+ unsigned long len, unsigned long pgoff,
+ unsigned long flags, vm_flags_t vm_flags)
+{
+ return 0;
+}
+
static inline bool
can_split_folio(struct folio *folio, int *pextra_pins)
{
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index cede9ccb84dc..b29f3e456888 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -808,7 +808,8 @@ static inline bool is_transparent_hugepage(struct folio *folio)
static unsigned long __thp_get_unmapped_area(struct file *filp,
unsigned long addr, unsigned long len,
- loff_t off, unsigned long flags, unsigned long size)
+ loff_t off, unsigned long flags, unsigned long size,
+ vm_flags_t vm_flags)
{
loff_t off_end = off + len;
loff_t off_align = round_up(off, size);
@@ -824,8 +825,8 @@ static unsigned long __thp_get_unmapped_area(struct file *filp,
if (len_pad < len || (off + len_pad) < off)
return 0;
- ret = mm_get_unmapped_area(current->mm, filp, addr, len_pad,
- off >> PAGE_SHIFT, flags);
+ ret = mm_get_unmapped_area_vmflags(current->mm, filp, addr, len_pad,
+ off >> PAGE_SHIFT, flags, vm_flags);
/*
* The failure might be due to length padding. The caller will retry
@@ -850,17 +851,25 @@ static unsigned long __thp_get_unmapped_area(struct file *filp,
return ret;
}
-unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr,
- unsigned long len, unsigned long pgoff, unsigned long flags)
+unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long addr,
+ unsigned long len, unsigned long pgoff, unsigned long flags,
+ vm_flags_t vm_flags)
{
unsigned long ret;
loff_t off = (loff_t)pgoff << PAGE_SHIFT;
- ret = __thp_get_unmapped_area(filp, addr, len, off, flags, PMD_SIZE);
+ ret = __thp_get_unmapped_area(filp, addr, len, off, flags, PMD_SIZE, vm_flags);
if (ret)
return ret;
- return mm_get_unmapped_area(current->mm, filp, addr, len, pgoff, flags);
+ return mm_get_unmapped_area_vmflags(current->mm, filp, addr, len, pgoff, flags,
+ vm_flags);
+}
+
+unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr,
+ unsigned long len, unsigned long pgoff, unsigned long flags)
+{
+ return thp_get_unmapped_area_vmflags(filp, addr, len, pgoff, flags, 0);
}
EXPORT_SYMBOL_GPL(thp_get_unmapped_area);
diff --git a/mm/mmap.c b/mm/mmap.c
index 68b5bfcebadd..f734e4fa6d94 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1861,20 +1861,22 @@ __get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
* so use shmem's get_unmapped_area in case it can be huge.
*/
get_area = shmem_get_unmapped_area;
- } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
- /* Ensures that larger anonymous mappings are THP aligned. */
- get_area = thp_get_unmapped_area;
}
/* Always treat pgoff as zero for anonymous memory. */
if (!file)
pgoff = 0;
- if (get_area)
+ if (get_area) {
addr = get_area(file, addr, len, pgoff, flags);
- else
+ } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
+ /* Ensures that larger anonymous mappings are THP aligned. */
+ addr = thp_get_unmapped_area_vmflags(file, addr, len,
+ pgoff, flags, vm_flags);
+ } else {
addr = mm_get_unmapped_area_vmflags(current->mm, file, addr, len,
pgoff, flags, vm_flags);
+ }
if (IS_ERR_VALUE(addr))
return addr;
--
2.34.1
Future changes will need to add a new member to struct
vm_unmapped_area_info. This would cause trouble for any call site that
doesn't initialize the struct. Currently every caller sets each member
manually, so if new members are added they will be uninitialized and the
core code parsing the struct will see garbage in the new member.
It could be possible to initialize the new member manually to 0 at each
call site. This and a couple other options were discussed, and a working
consensus (see links) was that in general the best way to accomplish this
would be via static initialization with designated member initiators.
Having some struct vm_unmapped_area_info instances not zero initialized
will put those sites at risk of feeding garbage into vm_unmapped_area() if
the convention is to zero initialize the struct and any new member addition
misses a call site that initializes each member manually.
It could be possible to leave the code mostly untouched, and just change
the line:
struct vm_unmapped_area_info info
to:
struct vm_unmapped_area_info info = {};
However, that would leave cleanup for the members that are manually set
to zero, as it would no longer be required.
So to be reduce the chance of bugs via uninitialized members, instead
simply continue the process to initialize the struct this way tree wide.
This will zero any unspecified members. Move the member initializers to the
struct declaration when they are known at that time. Leave the members out
that were manually initialized to zero, as this would be redundant for
designated initializers.
Signed-off-by: Rick Edgecombe <[email protected]>
Reviewed-by: Guo Ren <[email protected]>
Reviewed-by: Christophe Leroy <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/lkml/202402280912.33AEE7A9CF@keescook/#t
Link: https://lore.kernel.org/lkml/j7bfvig3gew3qruouxrh7z7ehjjafrgkbcmg6tcghhfh3rhmzi@wzlcoecgy5rs/
---
v3:
- Fixed spelling errors in log
- Be consistent about field vs member in log
Hi,
This patch was split and refactored out of a tree-wide change [0] to just
zero-init each struct vm_unmapped_area_info. The overall goal of the
series is to help shadow stack guard gaps. Currently, there is only one
arch with shadow stacks, but two more are in progress. It is compile tested
only.
There was further discussion that this method of initializing the structs
while nice in some ways has a greater risk of introducing bugs in some of
the more complicated callers. Since this version was reviewed my arch
maintainers already, leave it as was already acknowledged.
Thanks,
Rick
[0] https://lore.kernel.org/lkml/[email protected]/
---
arch/csky/abiv1/mmap.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/arch/csky/abiv1/mmap.c b/arch/csky/abiv1/mmap.c
index 6792aca49999..7f826331d409 100644
--- a/arch/csky/abiv1/mmap.c
+++ b/arch/csky/abiv1/mmap.c
@@ -28,7 +28,12 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr,
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
int do_align = 0;
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {
+ .length = len,
+ .low_limit = mm->mmap_base,
+ .high_limit = TASK_SIZE,
+ .align_offset = pgoff << PAGE_SHIFT
+ };
/*
* We only need to do colour alignment if either the I or D
@@ -61,11 +66,6 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr,
return addr;
}
- info.flags = 0;
- info.length = len;
- info.low_limit = mm->mmap_base;
- info.high_limit = TASK_SIZE;
info.align_mask = do_align ? (PAGE_MASK & (SHMLBA - 1)) : 0;
- info.align_offset = pgoff << PAGE_SHIFT;
return vm_unmapped_area(&info);
}
--
2.34.1
When memory is being placed, mmap() will take care to respect the guard
gaps of certain types of memory (VM_SHADOWSTACK, VM_GROWSUP and
VM_GROWSDOWN). In order to ensure guard gaps between mappings, mmap()
needs to consider two things:
1. That the new mapping isn’t placed in an any existing mappings guard
gaps.
2. That the new mapping isn’t placed such that any existing mappings
are not in *its* guard gaps.
The long standing behavior of mmap() is to ensure 1, but not take any care
around 2. So for example, if there is a PAGE_SIZE free area, and a
mmap() with a PAGE_SIZE size, and a type that has a guard gap is being
placed, mmap() may place the shadow stack in the PAGE_SIZE free area. Then
the mapping that is supposed to have a guard gap will not have a gap to
the adjacent VMA.
Use mm_get_unmapped_area_vmflags() in the do_mmap() so future changes
can cause shadow stack mappings to be placed with a guard gap. Also use
the THP variant that takes vm_flags, such that THP shadow stack can get the
same treatment. Adjust the vm_flags calculation to happen earlier so that
the vm_flags can be passed into __get_unmapped_area().
Signed-off-by: Rick Edgecombe <[email protected]>
Reviewed-by: Christophe Leroy <[email protected]>
---
v4:
- Split removal of get_unmapped_area() export into a separate patch
(Christophe Leroy)
v2:
- Make get_unmapped_area() a static inline (Kirill)
---
include/linux/mm.h | 11 ++++++++++-
mm/mmap.c | 32 ++++++++++++++++----------------
2 files changed, 26 insertions(+), 17 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0436b919f1c7..8b13cd891b53 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3383,7 +3383,16 @@ extern int install_special_mapping(struct mm_struct *mm,
unsigned long randomize_stack_top(unsigned long stack_top);
unsigned long randomize_page(unsigned long start, unsigned long range);
-extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);
+unsigned long
+__get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
+ unsigned long pgoff, unsigned long flags, vm_flags_t vm_flags);
+
+static inline unsigned long
+get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
+ unsigned long pgoff, unsigned long flags)
+{
+ return __get_unmapped_area(file, addr, len, pgoff, flags, 0);
+}
extern unsigned long mmap_region(struct file *file, unsigned long addr,
unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
diff --git a/mm/mmap.c b/mm/mmap.c
index d160e88b1b1e..68b5bfcebadd 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1255,18 +1255,6 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
if (mm->map_count > sysctl_max_map_count)
return -ENOMEM;
- /* Obtain the address to map to. we verify (or select) it and ensure
- * that it represents a valid section of the address space.
- */
- addr = get_unmapped_area(file, addr, len, pgoff, flags);
- if (IS_ERR_VALUE(addr))
- return addr;
-
- if (flags & MAP_FIXED_NOREPLACE) {
- if (find_vma_intersection(mm, addr, addr + len))
- return -EEXIST;
- }
-
if (prot == PROT_EXEC) {
pkey = execute_only_pkey(mm);
if (pkey < 0)
@@ -1280,6 +1268,18 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
vm_flags |= calc_vm_prot_bits(prot, pkey) | calc_vm_flag_bits(flags) |
mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC;
+ /* Obtain the address to map to. we verify (or select) it and ensure
+ * that it represents a valid section of the address space.
+ */
+ addr = __get_unmapped_area(file, addr, len, pgoff, flags, vm_flags);
+ if (IS_ERR_VALUE(addr))
+ return addr;
+
+ if (flags & MAP_FIXED_NOREPLACE) {
+ if (find_vma_intersection(mm, addr, addr + len))
+ return -EEXIST;
+ }
+
if (flags & MAP_LOCKED)
if (!can_do_mlock())
return -EPERM;
@@ -1837,8 +1837,8 @@ unsigned long mm_get_unmapped_area_vmflags(struct mm_struct *mm, struct file *fi
}
unsigned long
-get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
- unsigned long pgoff, unsigned long flags)
+__get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
+ unsigned long pgoff, unsigned long flags, vm_flags_t vm_flags)
{
unsigned long (*get_area)(struct file *, unsigned long,
unsigned long, unsigned long, unsigned long)
@@ -1873,8 +1873,8 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
if (get_area)
addr = get_area(file, addr, len, pgoff, flags);
else
- addr = mm_get_unmapped_area(current->mm, file, addr, len,
- pgoff, flags);
+ addr = mm_get_unmapped_area_vmflags(current->mm, file, addr, len,
+ pgoff, flags, vm_flags);
if (IS_ERR_VALUE(addr))
return addr;
--
2.34.1
Future changes will need to add a new member to struct
vm_unmapped_area_info. This would cause trouble for any call site that
doesn't initialize the struct. Currently every caller sets each member
manually, so if new members are added they will be uninitialized and the
core code parsing the struct will see garbage in the new member.
It could be possible to initialize the new member manually to 0 at each
call site. This and a couple other options were discussed, and a working
consensus (see links) was that in general the best way to accomplish this
would be via static initialization with designated member initiators.
Having some struct vm_unmapped_area_info instances not zero initialized
will put those sites at risk of feeding garbage into vm_unmapped_area() if
the convention is to zero initialize the struct and any new member addition
misses a call site that initializes each member manually.
It could be possible to leave the code mostly untouched, and just change
the line:
struct vm_unmapped_area_info info
to:
struct vm_unmapped_area_info info = {};
However, that would leave cleanup for the members that are manually set
to zero, as it would no longer be required.
So to be reduce the chance of bugs via uninitialized members, instead
simply continue the process to initialize the struct this way tree wide.
This will zero any unspecified members. Move the member initializers to the
struct declaration when they are known at that time. Leave the members out
that were manually initialized to zero, as this would be redundant for
designated initializers.
Signed-off-by: Rick Edgecombe <[email protected]>
Reviewed-by: Christophe Leroy <[email protected]>
Acked-by: Helge Deller <[email protected]>
Cc: "James E.J. Bottomley" <[email protected]>
Cc: Helge Deller <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/lkml/202402280912.33AEE7A9CF@keescook/#t
Link: https://lore.kernel.org/lkml/j7bfvig3gew3qruouxrh7z7ehjjafrgkbcmg6tcghhfh3rhmzi@wzlcoecgy5rs/
---
v3:
- Fixed spelling errors in log
- Be consistent about field vs member in log
Hi,
This patch was split and refactored out of a tree-wide change [0] to just
zero-init each struct vm_unmapped_area_info. The overall goal of the
series is to help shadow stack guard gaps. Currently, there is only one
arch with shadow stacks, but two more are in progress. It is compile tested
only.
There was further discussion that this method of initializing the structs
while nice in some ways has a greater risk of introducing bugs in some of
the more complicated callers. Since this version was reviewed my arch
maintainers already, leave it as was already acknowledged.
Thanks,
Rick
[0] https://lore.kernel.org/lkml/[email protected]/
---
arch/parisc/kernel/sys_parisc.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/parisc/kernel/sys_parisc.c b/arch/parisc/kernel/sys_parisc.c
index 98af719d5f85..f7722451276e 100644
--- a/arch/parisc/kernel/sys_parisc.c
+++ b/arch/parisc/kernel/sys_parisc.c
@@ -104,7 +104,9 @@ static unsigned long arch_get_unmapped_area_common(struct file *filp,
struct vm_area_struct *vma, *prev;
unsigned long filp_pgoff;
int do_color_align;
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {
+ .length = len
+ };
if (unlikely(len > TASK_SIZE))
return -ENOMEM;
@@ -139,7 +141,6 @@ static unsigned long arch_get_unmapped_area_common(struct file *filp,
return addr;
}
- info.length = len;
info.align_mask = do_color_align ? (PAGE_MASK & (SHM_COLOUR - 1)) : 0;
info.align_offset = shared_align_offset(filp_pgoff, pgoff);
@@ -160,7 +161,6 @@ static unsigned long arch_get_unmapped_area_common(struct file *filp,
*/
}
- info.flags = 0;
info.low_limit = mm->mmap_base;
info.high_limit = mmap_upper_limit(NULL);
return vm_unmapped_area(&info);
--
2.34.1
Future changes will need to add a new member to struct
vm_unmapped_area_info. This would cause trouble for any call site that
doesn't initialize the struct. Currently every caller sets each member
manually, so if new members are added they will be uninitialized and the
core code parsing the struct will see garbage in the new member.
It could be possible to initialize the new member manually to 0 at each
call site. This and a couple other options were discussed, and a working
consensus (see links) was that in general the best way to accomplish this
would be via static initialization with designated member initiators.
Having some struct vm_unmapped_area_info instances not zero initialized
will put those sites at risk of feeding garbage into vm_unmapped_area() if
the convention is to zero initialize the struct and any new member addition
misses a call site that initializes each member manually.
It could be possible to leave the code mostly untouched, and just change
the line:
struct vm_unmapped_area_info info
to:
struct vm_unmapped_area_info info = {};
However, that would leave cleanup for the members that are manually set
to zero, as it would no longer be required.
So to be reduce the chance of bugs via uninitialized members, instead
simply continue the process to initialize the struct this way tree wide.
This will zero any unspecified members. Move the member initializers to the
struct declaration when they are known at that time. Leave the members out
that were manually initialized to zero, as this would be redundant for
designated initializers.
Signed-off-by: Rick Edgecombe <[email protected]>
Acked-by: Michael Ellerman <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Naveen N. Rao <[email protected]>
Cc: [email protected]
Link: https://lore.kernel.org/lkml/202402280912.33AEE7A9CF@keescook/#t
Link: https://lore.kernel.org/lkml/j7bfvig3gew3qruouxrh7z7ehjjafrgkbcmg6tcghhfh3rhmzi@wzlcoecgy5rs/
---
v4:
- Remove designated zero initialization (Christophe Leroy)
v3:
- Fixed spelling errors in log
- Be consistent about field vs member in log
Hi,
This patch was split and refactored out of a tree-wide change [0] to just
zero-init each struct vm_unmapped_area_info. The overall goal of the
series is to help shadow stack guard gaps. Currently, there is only one
arch with shadow stacks, but two more are in progress. It is compile tested
only.
There was further discussion that this method of initializing the structs
while nice in some ways has a greater risk of introducing bugs in some of
the more complicated callers. Since this version was reviewed my arch
maintainers already, leave it as was already acknowledged.
Thanks,
Rick
[0] https://lore.kernel.org/lkml/[email protected]/
---
arch/powerpc/mm/book3s64/slice.c | 20 +++++++++-----------
1 file changed, 9 insertions(+), 11 deletions(-)
diff --git a/arch/powerpc/mm/book3s64/slice.c b/arch/powerpc/mm/book3s64/slice.c
index c0b58afb9a47..ef3ce37f1bb3 100644
--- a/arch/powerpc/mm/book3s64/slice.c
+++ b/arch/powerpc/mm/book3s64/slice.c
@@ -282,12 +282,10 @@ static unsigned long slice_find_area_bottomup(struct mm_struct *mm,
{
int pshift = max_t(int, mmu_psize_defs[psize].shift, PAGE_SHIFT);
unsigned long found, next_end;
- struct vm_unmapped_area_info info;
-
- info.flags = 0;
- info.length = len;
- info.align_mask = PAGE_MASK & ((1ul << pshift) - 1);
- info.align_offset = 0;
+ struct vm_unmapped_area_info info = {
+ .length = len,
+ .align_mask = PAGE_MASK & ((1ul << pshift) - 1),
+ };
/*
* Check till the allow max value for this mmap request
*/
@@ -326,13 +324,13 @@ static unsigned long slice_find_area_topdown(struct mm_struct *mm,
{
int pshift = max_t(int, mmu_psize_defs[psize].shift, PAGE_SHIFT);
unsigned long found, prev;
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {
+ .flags = VM_UNMAPPED_AREA_TOPDOWN,
+ .length = len,
+ .align_mask = PAGE_MASK & ((1ul << pshift) - 1),
+ };
unsigned long min_addr = max(PAGE_SIZE, mmap_min_addr);
- info.flags = VM_UNMAPPED_AREA_TOPDOWN;
- info.length = len;
- info.align_mask = PAGE_MASK & ((1ul << pshift) - 1);
- info.align_offset = 0;
/*
* If we are trying to allocate above DEFAULT_MAP_WINDOW
* Add the different to the mmap_base.
--
2.34.1
When memory is being placed, mmap() will take care to respect the guard
gaps of certain types of memory (VM_SHADOWSTACK, VM_GROWSUP and
VM_GROWSDOWN). In order to ensure guard gaps between mappings, mmap()
needs to consider two things:
1. That the new mapping isn’t placed in an any existing mappings guard
gaps.
2. That the new mapping isn’t placed such that any existing mappings
are not in *its* guard gaps.
The long standing behavior of mmap() is to ensure 1, but not take any care
around 2. So for example, if there is a PAGE_SIZE free area, and a
mmap() with a PAGE_SIZE size, and a type that has a guard gap is being
placed, mmap() may place the shadow stack in the PAGE_SIZE free area. Then
the mapping that is supposed to have a guard gap will not have a gap to
the adjacent VMA.
Now that the vm_flags is passed into the arch get_unmapped_area()'s, and
vm_unmapped_area() is ready to consider it, have VM_SHADOW_STACK's get
guard gap consideration for scenario 2.
Signed-off-by: Rick Edgecombe <[email protected]>
---
arch/x86/kernel/sys_x86_64.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
index 75966afb6251..01d7cd85ef97 100644
--- a/arch/x86/kernel/sys_x86_64.c
+++ b/arch/x86/kernel/sys_x86_64.c
@@ -112,6 +112,14 @@ static void find_start_end(unsigned long addr, unsigned long flags,
*end = task_size_64bit(addr > DEFAULT_MAP_WINDOW);
}
+static inline unsigned long stack_guard_placement(vm_flags_t vm_flags)
+{
+ if (vm_flags & VM_SHADOW_STACK)
+ return PAGE_SIZE;
+
+ return 0;
+}
+
unsigned long
arch_get_unmapped_area_vmflags(struct file *filp, unsigned long addr, unsigned long len,
unsigned long pgoff, unsigned long flags, vm_flags_t vm_flags)
@@ -141,6 +149,7 @@ arch_get_unmapped_area_vmflags(struct file *filp, unsigned long addr, unsigned l
info.low_limit = begin;
info.high_limit = end;
info.align_offset = pgoff << PAGE_SHIFT;
+ info.start_gap = stack_guard_placement(vm_flags);
if (filp) {
info.align_mask = get_align_mask();
info.align_offset += get_align_bits();
@@ -190,6 +199,7 @@ arch_get_unmapped_area_topdown_vmflags(struct file *filp, unsigned long addr0,
info.low_limit = PAGE_SIZE;
info.high_limit = get_mmap_base(0);
+ info.start_gap = stack_guard_placement(vm_flags);
/*
* If hint address is above DEFAULT_MAP_WINDOW, look for unmapped area
--
2.34.1
Future changes will need to add a new member to struct
vm_unmapped_area_info. This would cause trouble for any call site that
doesn't initialize the struct. Currently every caller sets each member
manually, so if new ones are added they will be uninitialized and the
core code parsing the struct will see garbage in the new member.
It could be possible to initialize the new member manually to 0 at each
call site. This and a couple other options were discussed. Having some
struct vm_unmapped_area_info instances not zero initialized will put
those sites at risk of feeding garbage into vm_unmapped_area(), if the
convention is to zero initialize the struct and any new field addition
missed a call site that initializes each field manually. So it is
useful to do things similar across the kernel.
The consensus (see links) was that in general the best way to accomplish
taking into account both code cleanliness and minimizing the chance of
introducing bugs, was to do C99 static initialization. As in:
struct vm_unmapped_area_info info = {};
With this method of initialization, the whole struct will be zero
initialized, and any statements setting fields to zero will be unneeded.
The change should not leave cleanup at the call sides.
While iterating though the possible solutions a few archs kindly acked
other variations that still zero initialized the struct. These sites have
been modified in previous changes using the pattern acked by the respective
arch.
So to be reduce the chance of bugs via uninitialized fields, perform a
tree wide change using the consensus for the best general way to do this
change. Use C99 static initializing to zero the struct and remove and
statements that simply set members to zero.
Signed-off-by: Rick Edgecombe <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: https://lore.kernel.org/lkml/202402280912.33AEE7A9CF@keescook/#t
Link: https://lore.kernel.org/lkml/j7bfvig3gew3qruouxrh7z7ehjjafrgkbcmg6tcghhfh3rhmzi@wzlcoecgy5rs/
Link: https://lore.kernel.org/lkml/[email protected]/
---
v4:
- Trivial rebase conflict in s390
Hi archs,
For some context, this is part of a larger series to improve shadow stack
guard gaps. It involves plumbing a new field via
struct vm_unmapped_area_info. The first user is x86, but arm and riscv may
likely use it as well. The change is compile tested only for non-x86.
Thanks,
Rick
---
arch/alpha/kernel/osf_sys.c | 5 +----
arch/arc/mm/mmap.c | 4 +---
arch/arm/mm/mmap.c | 5 ++---
arch/loongarch/mm/mmap.c | 3 +--
arch/mips/mm/mmap.c | 3 +--
arch/s390/mm/hugetlbpage.c | 7 ++-----
arch/s390/mm/mmap.c | 5 ++---
arch/sh/mm/mmap.c | 5 ++---
arch/sparc/kernel/sys_sparc_32.c | 3 +--
arch/sparc/kernel/sys_sparc_64.c | 5 ++---
arch/sparc/mm/hugetlbpage.c | 7 ++-----
arch/x86/kernel/sys_x86_64.c | 7 ++-----
arch/x86/mm/hugetlbpage.c | 7 ++-----
fs/hugetlbfs/inode.c | 7 ++-----
mm/mmap.c | 9 ++-------
15 files changed, 25 insertions(+), 57 deletions(-)
diff --git a/arch/alpha/kernel/osf_sys.c b/arch/alpha/kernel/osf_sys.c
index 5db88b627439..e5f881bc8288 100644
--- a/arch/alpha/kernel/osf_sys.c
+++ b/arch/alpha/kernel/osf_sys.c
@@ -1218,14 +1218,11 @@ static unsigned long
arch_get_unmapped_area_1(unsigned long addr, unsigned long len,
unsigned long limit)
{
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
- info.flags = 0;
info.length = len;
info.low_limit = addr;
info.high_limit = limit;
- info.align_mask = 0;
- info.align_offset = 0;
return vm_unmapped_area(&info);
}
diff --git a/arch/arc/mm/mmap.c b/arch/arc/mm/mmap.c
index 3c1c7ae73292..69a915297155 100644
--- a/arch/arc/mm/mmap.c
+++ b/arch/arc/mm/mmap.c
@@ -27,7 +27,7 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr,
{
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
/*
* We enforce the MAP_FIXED case.
@@ -51,11 +51,9 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr,
return addr;
}
- info.flags = 0;
info.length = len;
info.low_limit = mm->mmap_base;
info.high_limit = TASK_SIZE;
- info.align_mask = 0;
info.align_offset = pgoff << PAGE_SHIFT;
return vm_unmapped_area(&info);
}
diff --git a/arch/arm/mm/mmap.c b/arch/arm/mm/mmap.c
index a0f8a0ca0788..d65d0e6ed10a 100644
--- a/arch/arm/mm/mmap.c
+++ b/arch/arm/mm/mmap.c
@@ -34,7 +34,7 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr,
struct vm_area_struct *vma;
int do_align = 0;
int aliasing = cache_is_vipt_aliasing();
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
/*
* We only need to do colour alignment if either the I or D
@@ -68,7 +68,6 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr,
return addr;
}
- info.flags = 0;
info.length = len;
info.low_limit = mm->mmap_base;
info.high_limit = TASK_SIZE;
@@ -87,7 +86,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
unsigned long addr = addr0;
int do_align = 0;
int aliasing = cache_is_vipt_aliasing();
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
/*
* We only need to do colour alignment if either the I or D
diff --git a/arch/loongarch/mm/mmap.c b/arch/loongarch/mm/mmap.c
index a9630a81b38a..4bbd449b4a47 100644
--- a/arch/loongarch/mm/mmap.c
+++ b/arch/loongarch/mm/mmap.c
@@ -24,7 +24,7 @@ static unsigned long arch_get_unmapped_area_common(struct file *filp,
struct vm_area_struct *vma;
unsigned long addr = addr0;
int do_color_align;
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
if (unlikely(len > TASK_SIZE))
return -ENOMEM;
@@ -82,7 +82,6 @@ static unsigned long arch_get_unmapped_area_common(struct file *filp,
*/
}
- info.flags = 0;
info.low_limit = mm->mmap_base;
info.high_limit = TASK_SIZE;
return vm_unmapped_area(&info);
diff --git a/arch/mips/mm/mmap.c b/arch/mips/mm/mmap.c
index 00fe90c6db3e..7e11d7b58761 100644
--- a/arch/mips/mm/mmap.c
+++ b/arch/mips/mm/mmap.c
@@ -34,7 +34,7 @@ static unsigned long arch_get_unmapped_area_common(struct file *filp,
struct vm_area_struct *vma;
unsigned long addr = addr0;
int do_color_align;
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
if (unlikely(len > TASK_SIZE))
return -ENOMEM;
@@ -92,7 +92,6 @@ static unsigned long arch_get_unmapped_area_common(struct file *filp,
*/
}
- info.flags = 0;
info.low_limit = mm->mmap_base;
info.high_limit = TASK_SIZE;
return vm_unmapped_area(&info);
diff --git a/arch/s390/mm/hugetlbpage.c b/arch/s390/mm/hugetlbpage.c
index 219d906fe830..46de7a4c0309 100644
--- a/arch/s390/mm/hugetlbpage.c
+++ b/arch/s390/mm/hugetlbpage.c
@@ -258,14 +258,12 @@ static unsigned long hugetlb_get_unmapped_area_bottomup(struct file *file,
unsigned long pgoff, unsigned long flags)
{
struct hstate *h = hstate_file(file);
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
- info.flags = 0;
info.length = len;
info.low_limit = current->mm->mmap_base;
info.high_limit = TASK_SIZE;
info.align_mask = PAGE_MASK & ~huge_page_mask(h);
- info.align_offset = 0;
return vm_unmapped_area(&info);
}
@@ -274,7 +272,7 @@ static unsigned long hugetlb_get_unmapped_area_topdown(struct file *file,
unsigned long pgoff, unsigned long flags)
{
struct hstate *h = hstate_file(file);
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
unsigned long addr;
info.flags = VM_UNMAPPED_AREA_TOPDOWN;
@@ -282,7 +280,6 @@ static unsigned long hugetlb_get_unmapped_area_topdown(struct file *file,
info.low_limit = PAGE_SIZE;
info.high_limit = current->mm->mmap_base;
info.align_mask = PAGE_MASK & ~huge_page_mask(h);
- info.align_offset = 0;
addr = vm_unmapped_area(&info);
/*
diff --git a/arch/s390/mm/mmap.c b/arch/s390/mm/mmap.c
index 6b2e4436ad4a..206756946589 100644
--- a/arch/s390/mm/mmap.c
+++ b/arch/s390/mm/mmap.c
@@ -86,7 +86,7 @@ unsigned long arch_get_unmapped_area(struct file *filp, unsigned long addr,
{
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
if (len > TASK_SIZE - mmap_min_addr)
return -ENOMEM;
@@ -102,7 +102,6 @@ unsigned long arch_get_unmapped_area(struct file *filp, unsigned long addr,
goto check_asce_limit;
}
- info.flags = 0;
info.length = len;
info.low_limit = mm->mmap_base;
info.high_limit = TASK_SIZE;
@@ -122,7 +121,7 @@ unsigned long arch_get_unmapped_area_topdown(struct file *filp, unsigned long ad
{
struct vm_area_struct *vma;
struct mm_struct *mm = current->mm;
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
/* requested length too big for entire address space */
if (len > TASK_SIZE - mmap_min_addr)
diff --git a/arch/sh/mm/mmap.c b/arch/sh/mm/mmap.c
index b82199878b45..bee329d4149a 100644
--- a/arch/sh/mm/mmap.c
+++ b/arch/sh/mm/mmap.c
@@ -57,7 +57,7 @@ unsigned long arch_get_unmapped_area(struct file *filp, unsigned long addr,
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
int do_colour_align;
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
if (flags & MAP_FIXED) {
/* We do not accept a shared mapping if it would violate
@@ -88,7 +88,6 @@ unsigned long arch_get_unmapped_area(struct file *filp, unsigned long addr,
return addr;
}
- info.flags = 0;
info.length = len;
info.low_limit = TASK_UNMAPPED_BASE;
info.high_limit = TASK_SIZE;
@@ -106,7 +105,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
struct mm_struct *mm = current->mm;
unsigned long addr = addr0;
int do_colour_align;
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
if (flags & MAP_FIXED) {
/* We do not accept a shared mapping if it would violate
diff --git a/arch/sparc/kernel/sys_sparc_32.c b/arch/sparc/kernel/sys_sparc_32.c
index 082a551897ed..08a19727795c 100644
--- a/arch/sparc/kernel/sys_sparc_32.c
+++ b/arch/sparc/kernel/sys_sparc_32.c
@@ -41,7 +41,7 @@ SYSCALL_DEFINE0(getpagesize)
unsigned long arch_get_unmapped_area(struct file *filp, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags)
{
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
if (flags & MAP_FIXED) {
/* We do not accept a shared mapping if it would violate
@@ -59,7 +59,6 @@ unsigned long arch_get_unmapped_area(struct file *filp, unsigned long addr, unsi
if (!addr)
addr = TASK_UNMAPPED_BASE;
- info.flags = 0;
info.length = len;
info.low_limit = addr;
info.high_limit = TASK_SIZE;
diff --git a/arch/sparc/kernel/sys_sparc_64.c b/arch/sparc/kernel/sys_sparc_64.c
index 1dbf7211666e..d9c3b34ca744 100644
--- a/arch/sparc/kernel/sys_sparc_64.c
+++ b/arch/sparc/kernel/sys_sparc_64.c
@@ -93,7 +93,7 @@ unsigned long arch_get_unmapped_area(struct file *filp, unsigned long addr, unsi
struct vm_area_struct * vma;
unsigned long task_size = TASK_SIZE;
int do_color_align;
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
if (flags & MAP_FIXED) {
/* We do not accept a shared mapping if it would violate
@@ -126,7 +126,6 @@ unsigned long arch_get_unmapped_area(struct file *filp, unsigned long addr, unsi
return addr;
}
- info.flags = 0;
info.length = len;
info.low_limit = TASK_UNMAPPED_BASE;
info.high_limit = min(task_size, VA_EXCLUDE_START);
@@ -154,7 +153,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
unsigned long task_size = STACK_TOP32;
unsigned long addr = addr0;
int do_color_align;
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
/* This should only ever run for 32-bit processes. */
BUG_ON(!test_thread_flag(TIF_32BIT));
diff --git a/arch/sparc/mm/hugetlbpage.c b/arch/sparc/mm/hugetlbpage.c
index 38a1bef47efb..4caf56b32e26 100644
--- a/arch/sparc/mm/hugetlbpage.c
+++ b/arch/sparc/mm/hugetlbpage.c
@@ -31,17 +31,15 @@ static unsigned long hugetlb_get_unmapped_area_bottomup(struct file *filp,
{
struct hstate *h = hstate_file(filp);
unsigned long task_size = TASK_SIZE;
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
if (test_thread_flag(TIF_32BIT))
task_size = STACK_TOP32;
- info.flags = 0;
info.length = len;
info.low_limit = TASK_UNMAPPED_BASE;
info.high_limit = min(task_size, VA_EXCLUDE_START);
info.align_mask = PAGE_MASK & ~huge_page_mask(h);
- info.align_offset = 0;
addr = vm_unmapped_area(&info);
if ((addr & ~PAGE_MASK) && task_size > VA_EXCLUDE_END) {
@@ -63,7 +61,7 @@ hugetlb_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
struct hstate *h = hstate_file(filp);
struct mm_struct *mm = current->mm;
unsigned long addr = addr0;
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
/* This should only ever run for 32-bit processes. */
BUG_ON(!test_thread_flag(TIF_32BIT));
@@ -73,7 +71,6 @@ hugetlb_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
info.low_limit = PAGE_SIZE;
info.high_limit = mm->mmap_base;
info.align_mask = PAGE_MASK & ~huge_page_mask(h);
- info.align_offset = 0;
addr = vm_unmapped_area(&info);
/*
diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
index cb9fa1d5c66f..96b9d29aead0 100644
--- a/arch/x86/kernel/sys_x86_64.c
+++ b/arch/x86/kernel/sys_x86_64.c
@@ -118,7 +118,7 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr,
{
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
unsigned long begin, end;
if (flags & MAP_FIXED)
@@ -137,11 +137,9 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr,
return addr;
}
- info.flags = 0;
info.length = len;
info.low_limit = begin;
info.high_limit = end;
- info.align_mask = 0;
info.align_offset = pgoff << PAGE_SHIFT;
if (filp) {
info.align_mask = get_align_mask();
@@ -158,7 +156,7 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
struct vm_area_struct *vma;
struct mm_struct *mm = current->mm;
unsigned long addr = addr0;
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
/* requested length too big for entire address space */
if (len > TASK_SIZE)
@@ -203,7 +201,6 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
if (addr > DEFAULT_MAP_WINDOW && !in_32bit_syscall())
info.high_limit += TASK_SIZE_MAX - DEFAULT_MAP_WINDOW;
- info.align_mask = 0;
info.align_offset = pgoff << PAGE_SHIFT;
if (filp) {
info.align_mask = get_align_mask();
diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
index 6d77c0039617..fb600949a355 100644
--- a/arch/x86/mm/hugetlbpage.c
+++ b/arch/x86/mm/hugetlbpage.c
@@ -51,9 +51,8 @@ static unsigned long hugetlb_get_unmapped_area_bottomup(struct file *file,
unsigned long pgoff, unsigned long flags)
{
struct hstate *h = hstate_file(file);
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
- info.flags = 0;
info.length = len;
info.low_limit = get_mmap_base(1);
@@ -65,7 +64,6 @@ static unsigned long hugetlb_get_unmapped_area_bottomup(struct file *file,
task_size_32bit() : task_size_64bit(addr > DEFAULT_MAP_WINDOW);
info.align_mask = PAGE_MASK & ~huge_page_mask(h);
- info.align_offset = 0;
return vm_unmapped_area(&info);
}
@@ -74,7 +72,7 @@ static unsigned long hugetlb_get_unmapped_area_topdown(struct file *file,
unsigned long pgoff, unsigned long flags)
{
struct hstate *h = hstate_file(file);
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
info.flags = VM_UNMAPPED_AREA_TOPDOWN;
info.length = len;
@@ -89,7 +87,6 @@ static unsigned long hugetlb_get_unmapped_area_topdown(struct file *file,
info.high_limit += TASK_SIZE_MAX - DEFAULT_MAP_WINDOW;
info.align_mask = PAGE_MASK & ~huge_page_mask(h);
- info.align_offset = 0;
addr = vm_unmapped_area(&info);
/*
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 3dee18bf47ed..2f4e88552d3f 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -176,14 +176,12 @@ hugetlb_get_unmapped_area_bottomup(struct file *file, unsigned long addr,
unsigned long len, unsigned long pgoff, unsigned long flags)
{
struct hstate *h = hstate_file(file);
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
- info.flags = 0;
info.length = len;
info.low_limit = current->mm->mmap_base;
info.high_limit = arch_get_mmap_end(addr, len, flags);
info.align_mask = PAGE_MASK & ~huge_page_mask(h);
- info.align_offset = 0;
return vm_unmapped_area(&info);
}
@@ -192,14 +190,13 @@ hugetlb_get_unmapped_area_topdown(struct file *file, unsigned long addr,
unsigned long len, unsigned long pgoff, unsigned long flags)
{
struct hstate *h = hstate_file(file);
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
info.flags = VM_UNMAPPED_AREA_TOPDOWN;
info.length = len;
info.low_limit = PAGE_SIZE;
info.high_limit = arch_get_mmap_base(addr, current->mm->mmap_base);
info.align_mask = PAGE_MASK & ~huge_page_mask(h);
- info.align_offset = 0;
addr = vm_unmapped_area(&info);
/*
diff --git a/mm/mmap.c b/mm/mmap.c
index f734e4fa6d94..609c087bba8e 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1705,7 +1705,7 @@ generic_get_unmapped_area(struct file *filp, unsigned long addr,
{
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma, *prev;
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags);
if (len > mmap_end - mmap_min_addr)
@@ -1723,12 +1723,9 @@ generic_get_unmapped_area(struct file *filp, unsigned long addr,
return addr;
}
- info.flags = 0;
info.length = len;
info.low_limit = mm->mmap_base;
info.high_limit = mmap_end;
- info.align_mask = 0;
- info.align_offset = 0;
return vm_unmapped_area(&info);
}
@@ -1753,7 +1750,7 @@ generic_get_unmapped_area_topdown(struct file *filp, unsigned long addr,
{
struct vm_area_struct *vma, *prev;
struct mm_struct *mm = current->mm;
- struct vm_unmapped_area_info info;
+ struct vm_unmapped_area_info info = {};
const unsigned long mmap_end = arch_get_mmap_end(addr, len, flags);
/* requested length too big for entire address space */
@@ -1777,8 +1774,6 @@ generic_get_unmapped_area_topdown(struct file *filp, unsigned long addr,
info.length = len;
info.low_limit = PAGE_SIZE;
info.high_limit = arch_get_mmap_base(addr, mm->mmap_base);
- info.align_mask = 0;
- info.align_offset = 0;
addr = vm_unmapped_area(&info);
/*
--
2.34.1
When memory is being placed, mmap() will take care to respect the guard
gaps of certain types of memory (VM_SHADOWSTACK, VM_GROWSUP and
VM_GROWSDOWN). In order to ensure guard gaps between mappings, mmap()
needs to consider two things:
1. That the new mapping isn’t placed in an any existing mappings guard
gaps.
2. That the new mapping isn’t placed such that any existing mappings
are not in *its* guard gaps.
The long standing behavior of mmap() is to ensure 1, but not take any care
around 2. So for example, if there is a PAGE_SIZE free area, and a
mmap() with a PAGE_SIZE size, and a type that has a guard gap is being
placed, mmap() may place the shadow stack in the PAGE_SIZE free area. Then
the mapping that is supposed to have a guard gap will not have a gap to
the adjacent VMA.
For MAP_GROWSDOWN/VM_GROWSDOWN and MAP_GROWSUP/VM_GROWSUP this has not
been a problem in practice because applications place these kinds of
mappings very early, when there is not many mappings to find a space
between. But for shadow stacks, they may be placed throughout the lifetime
of the application.
Use the start_gap field to find a space that includes the guard gap for
the new mapping. Take care to not interfere with the alignment.
Signed-off-by: Rick Edgecombe <[email protected]>
Reviewed-by: Christophe Leroy <[email protected]>
---
v3:
- Spelling fix in comment
v2:
- Remove VM_UNMAPPED_START_GAP_SET and have struct vm_unmapped_area_info
initialized with zeros (in another patch). (Kirill)
- Drop unrelated space change (Kirill)
- Add comment around interactions of alignment and start gap step
(Kirill)
---
include/linux/mm.h | 1 +
mm/mmap.c | 12 +++++++++---
2 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8b13cd891b53..5c7f75edfde1 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3438,6 +3438,7 @@ struct vm_unmapped_area_info {
unsigned long high_limit;
unsigned long align_mask;
unsigned long align_offset;
+ unsigned long start_gap;
};
extern unsigned long vm_unmapped_area(struct vm_unmapped_area_info *info);
diff --git a/mm/mmap.c b/mm/mmap.c
index 609c087bba8e..2d9e7a999774 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1580,7 +1580,7 @@ static unsigned long unmapped_area(struct vm_unmapped_area_info *info)
MA_STATE(mas, ¤t->mm->mm_mt, 0, 0);
/* Adjust search length to account for worst case alignment overhead */
- length = info->length + info->align_mask;
+ length = info->length + info->align_mask + info->start_gap;
if (length < info->length)
return -ENOMEM;
@@ -1592,7 +1592,13 @@ static unsigned long unmapped_area(struct vm_unmapped_area_info *info)
if (mas_empty_area(&mas, low_limit, high_limit - 1, length))
return -ENOMEM;
- gap = mas.index;
+ /*
+ * Adjust for the gap first so it doesn't interfere with the
+ * later alignment. The first step is the minimum needed to
+ * fulill the start gap, the next steps is the minimum to align
+ * that. It is the minimum needed to fulill both.
+ */
+ gap = mas.index + info->start_gap;
gap += (info->align_offset - gap) & info->align_mask;
tmp = mas_next(&mas, ULONG_MAX);
if (tmp && (tmp->vm_flags & VM_STARTGAP_FLAGS)) { /* Avoid prev check if possible */
@@ -1631,7 +1637,7 @@ static unsigned long unmapped_area_topdown(struct vm_unmapped_area_info *info)
MA_STATE(mas, ¤t->mm->mm_mt, 0, 0);
/* Adjust search length to account for worst case alignment overhead */
- length = info->length + info->align_mask;
+ length = info->length + info->align_mask + info->start_gap;
if (length < info->length)
return -ENOMEM;
--
2.34.1
The existing shadow stack test for guard gaps just checks that new
mappings are not placed in an existing mapping's guard gap. Add one that
checks that new mappings are not placed such that preexisting mappings are
in the new mappings guard gap.
Signed-off-by: Rick Edgecombe <[email protected]>
---
.../testing/selftests/x86/test_shadow_stack.c | 67 +++++++++++++++++--
1 file changed, 63 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/x86/test_shadow_stack.c b/tools/testing/selftests/x86/test_shadow_stack.c
index 757e6527f67e..ee909a7927f9 100644
--- a/tools/testing/selftests/x86/test_shadow_stack.c
+++ b/tools/testing/selftests/x86/test_shadow_stack.c
@@ -556,7 +556,7 @@ struct node {
* looked at the shadow stack gaps.
* 5. See if it landed in the gap.
*/
-int test_guard_gap(void)
+int test_guard_gap_other_gaps(void)
{
void *free_area, *shstk, *test_map = (void *)0xFFFFFFFFFFFFFFFF;
struct node *head = NULL, *cur;
@@ -593,11 +593,64 @@ int test_guard_gap(void)
if (shstk - test_map - PAGE_SIZE != PAGE_SIZE)
return 1;
- printf("[OK]\tGuard gap test\n");
+ printf("[OK]\tGuard gap test, other mapping's gaps\n");
return 0;
}
+/* Tests respecting the guard gap of the mapping getting placed */
+int test_guard_gap_new_mappings_gaps(void)
+{
+ void *free_area, *shstk_start, *test_map = (void *)0xFFFFFFFFFFFFFFFF;
+ struct node *head = NULL, *cur;
+ int ret = 0;
+
+ free_area = mmap(0, PAGE_SIZE * 4, PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+ munmap(free_area, PAGE_SIZE * 4);
+
+ /* Test letting map_shadow_stack find a free space */
+ shstk_start = mmap(free_area, PAGE_SIZE, PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+ if (shstk_start == MAP_FAILED || shstk_start != free_area)
+ return 1;
+
+ while (test_map > shstk_start) {
+ test_map = (void *)syscall(__NR_map_shadow_stack, 0, PAGE_SIZE, 0);
+ if (test_map == MAP_FAILED) {
+ printf("[INFO]\tmap_shadow_stack MAP_FAILED\n");
+ ret = 1;
+ break;
+ }
+
+ cur = malloc(sizeof(*cur));
+ cur->mapping = test_map;
+
+ cur->next = head;
+ head = cur;
+
+ if (test_map == free_area + PAGE_SIZE) {
+ printf("[INFO]\tNew mapping has other mapping in guard gap!\n");
+ ret = 1;
+ break;
+ }
+ }
+
+ while (head) {
+ cur = head;
+ head = cur->next;
+ munmap(cur->mapping, PAGE_SIZE);
+ free(cur);
+ }
+
+ munmap(shstk_start, PAGE_SIZE);
+
+ if (!ret)
+ printf("[OK]\tGuard gap test, placement mapping's gaps\n");
+
+ return ret;
+}
+
/*
* Too complicated to pull it out of the 32 bit header, but also get the
* 64 bit one needed above. Just define a copy here.
@@ -850,9 +903,15 @@ int main(int argc, char *argv[])
goto out;
}
- if (test_guard_gap()) {
+ if (test_guard_gap_other_gaps()) {
ret = 1;
- printf("[FAIL]\tGuard gap test\n");
+ printf("[FAIL]\tGuard gap test, other mappings' gaps\n");
+ goto out;
+ }
+
+ if (test_guard_gap_new_mappings_gaps()) {
+ ret = 1;
+ printf("[FAIL]\tGuard gap test, placement mapping's gaps\n");
goto out;
}
--
2.34.1
When memory is being placed, mmap() will take care to respect the guard
gaps of certain types of memory (VM_SHADOWSTACK, VM_GROWSUP and
VM_GROWSDOWN). In order to ensure guard gaps between mappings, mmap()
needs to consider two things:
1. That the new mapping isn’t placed in an any existing mappings guard
gaps.
2. That the new mapping isn’t placed such that any existing mappings
are not in *its* guard gaps.
The long standing behavior of mmap() is to ensure 1, but not take any care
around 2. So for example, if there is a PAGE_SIZE free area, and a
mmap() with a PAGE_SIZE size, and a type that has a guard gap is being
placed, mmap() may place the shadow stack in the PAGE_SIZE free area. Then
the mapping that is supposed to have a guard gap will not have a gap to
the adjacent VMA.
Add x86 arch implementations of arch_get_unmapped_area_vmflags/_topdown()
so future changes can allow the guard gap of type of vma being placed to
be taken into account. This will be used for shadow stack memory.
Signed-off-by: Rick Edgecombe <[email protected]>
---
v3:
- Commit log grammar
v2:
- Remove unnecessary added extern
---
arch/x86/include/asm/pgtable_64.h | 1 +
arch/x86/kernel/sys_x86_64.c | 25 ++++++++++++++++++++-----
2 files changed, 21 insertions(+), 5 deletions(-)
diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index 7e9db77231ac..3c4407271d08 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -245,6 +245,7 @@ extern void cleanup_highmap(void);
#define HAVE_ARCH_UNMAPPED_AREA
#define HAVE_ARCH_UNMAPPED_AREA_TOPDOWN
+#define HAVE_ARCH_UNMAPPED_AREA_VMFLAGS
#define PAGE_AGP PAGE_KERNEL_NOCACHE
#define HAVE_PAGE_AGP 1
diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
index 96b9d29aead0..75966afb6251 100644
--- a/arch/x86/kernel/sys_x86_64.c
+++ b/arch/x86/kernel/sys_x86_64.c
@@ -113,8 +113,8 @@ static void find_start_end(unsigned long addr, unsigned long flags,
}
unsigned long
-arch_get_unmapped_area(struct file *filp, unsigned long addr,
- unsigned long len, unsigned long pgoff, unsigned long flags)
+arch_get_unmapped_area_vmflags(struct file *filp, unsigned long addr, unsigned long len,
+ unsigned long pgoff, unsigned long flags, vm_flags_t vm_flags)
{
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
@@ -149,9 +149,9 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr,
}
unsigned long
-arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
- const unsigned long len, const unsigned long pgoff,
- const unsigned long flags)
+arch_get_unmapped_area_topdown_vmflags(struct file *filp, unsigned long addr0,
+ unsigned long len, unsigned long pgoff,
+ unsigned long flags, vm_flags_t vm_flags)
{
struct vm_area_struct *vma;
struct mm_struct *mm = current->mm;
@@ -220,3 +220,18 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
*/
return arch_get_unmapped_area(filp, addr0, len, pgoff, flags);
}
+
+unsigned long
+arch_get_unmapped_area(struct file *filp, unsigned long addr,
+ unsigned long len, unsigned long pgoff, unsigned long flags)
+{
+ return arch_get_unmapped_area_vmflags(filp, addr, len, pgoff, flags, 0);
+}
+
+unsigned long
+arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr,
+ const unsigned long len, const unsigned long pgoff,
+ const unsigned long flags)
+{
+ return arch_get_unmapped_area_topdown_vmflags(filp, addr, len, pgoff, flags, 0);
+}
--
2.34.1
On Mon, Mar 25, 2024 at 7:17 PM Rick Edgecombe
<[email protected]> wrote:
>
>
> diff --git a/mm/util.c b/mm/util.c
> index 669397235787..8619d353a1aa 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -469,17 +469,17 @@ void arch_pick_mmap_layout(struct mm_struct *mm, struct rlimit *rlim_stack)
>
> if (mmap_is_legacy(rlim_stack)) {
> mm->mmap_base = TASK_UNMAPPED_BASE + random_factor;
> - mm->get_unmapped_area = arch_get_unmapped_area;
> + clear_bit(MMF_TOPDOWN, &mm->flags);
> } else {
> mm->mmap_base = mmap_base(random_factor, rlim_stack);
> - mm->get_unmapped_area = arch_get_unmapped_area_topdown;
> + set_bit(MMF_TOPDOWN, &mm->flags);
> }
> }
> #elif defined(CONFIG_MMU) && !defined(HAVE_ARCH_PICK_MMAP_LAYOUT)
> void arch_pick_mmap_layout(struct mm_struct *mm, struct rlimit *rlim_stack)
> {
> mm->mmap_base = TASK_UNMAPPED_BASE;
> - mm->get_unmapped_area = arch_get_unmapped_area;
> + clear_bit(MMF_TOPDOWN, &mm->flags);
> }
> #endif
Makes sense to me.
Acked-by: Alexei Starovoitov <[email protected]>
for the idea and for bpf bits.
On Tue Mar 26, 2024 at 4:16 AM EET, Rick Edgecombe wrote:
> The mm_struct contains a function pointer *get_unmapped_area(), which
> is set to either arch_get_unmapped_area() or
> arch_get_unmapped_area_topdown() during the initialization of the mm.
In which conditions which path is used during the initialization of mm
and why is this the case? It is an open claim in the current form.
That would be nice to have documented for the sake of being complete
description. I have zero doubts of the claim being untrue.
BR, Jarkko
On Tue, 2024-03-26 at 13:57 +0200, Jarkko Sakkinen wrote:
> In which conditions which path is used during the initialization of mm
> and why is this the case? It is an open claim in the current form.
There is an arch_pick_mmap_layout() that arch's can have their own rules for. There is also a
generic one. It gets called during exec.
>
> That would be nice to have documented for the sake of being complete
> description. I have zero doubts of the claim being untrue.
...being untrue?
Rick Edgecombe wrote:
> The mm_struct contains a function pointer *get_unmapped_area(), which
> is set to either arch_get_unmapped_area() or
> arch_get_unmapped_area_topdown() during the initialization of the mm.
>
> Since the function pointer only ever points to two functions that are named
> the same across all arch's, a function pointer is not really required. In
> addition future changes will want to add versions of the functions that
> take additional arguments. So to save a pointers worth of bytes in
> mm_struct, and prevent adding additional function pointers to mm_struct in
> future changes, remove it and keep the information about which
> get_unmapped_area() to use in a flag.
>
> Add the new flag to MMF_INIT_MASK so it doesn't get clobbered on fork by
> mmf_init_flags(). Most MM flags get clobbered on fork. In the pre-existing
> behavior mm->get_unmapped_area() would get copied to the new mm in
> dup_mm(), so not clobbering the flag preserves the existing behavior
> around inheriting the topdown-ness.
>
> Introduce a helper, mm_get_unmapped_area(), to easily convert code that
> refers to the old function pointer to instead select and call either
> arch_get_unmapped_area() or arch_get_unmapped_area_topdown() based on the
> flag. Then drop the mm->get_unmapped_area() function pointer. Leave the
> get_unmapped_area() pointer in struct file_operations alone. The main
> purpose of this change is to reorganize in preparation for future changes,
> but it also converts the calls of mm->get_unmapped_area() from indirect
> branches into a direct ones.
>
> The stress-ng bigheap benchmark calls realloc a lot, which calls through
> get_unmapped_area() in the kernel. On x86, the change yielded a ~1%
> improvement there on a retpoline config.
>
> In testing a few x86 configs, removing the pointer unfortunately didn't
> result in any actual size reductions in the compiled layout of mm_struct.
> But depending on compiler or arch alignment requirements, the change could
> shrink the size of mm_struct.
>
> Signed-off-by: Rick Edgecombe <[email protected]>
> Acked-by: Dave Hansen <[email protected]>
> Acked-by: Liam R. Howlett <[email protected]>
> Reviewed-by: Kirill A. Shutemov <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> ---
> v4:
> - Split out pde_get_unmapped_area() refactor into separate patch
> (Christophe Leroy)
>
> v3:
> - Fix comment that still referred to mm->get_unmapped_area()
> - Resolve trivial rebase conflicts with "mm: thp_get_unmapped_area must
> honour topdown preference"
> - Spelling fix in log
>
> v2:
> - Fix comment on MMF_TOPDOWN (Kirill, rppt)
> - Move MMF_TOPDOWN to actually unused bit
> - Add MMF_TOPDOWN to MMF_INIT_MASK so it doesn't get clobbered on fork,
> and result in the children using the search up path.
> - New lower performance results after above bug fix
> - Add Reviews and Acks
> ---
> arch/s390/mm/hugetlbpage.c | 2 +-
> arch/s390/mm/mmap.c | 4 ++--
> arch/sparc/kernel/sys_sparc_64.c | 15 ++++++---------
> arch/sparc/mm/hugetlbpage.c | 2 +-
> arch/x86/kernel/cpu/sgx/driver.c | 2 +-
> arch/x86/mm/hugetlbpage.c | 2 +-
> arch/x86/mm/mmap.c | 4 ++--
> drivers/char/mem.c | 2 +-
> drivers/dax/device.c | 6 +++---
> fs/hugetlbfs/inode.c | 4 ++--
> fs/proc/inode.c | 3 ++-
> fs/ramfs/file-mmu.c | 2 +-
> include/linux/mm_types.h | 6 +-----
> include/linux/sched/coredump.h | 5 ++++-
> include/linux/sched/mm.h | 5 +++++
> io_uring/io_uring.c | 2 +-
> kernel/bpf/arena.c | 2 +-
> kernel/bpf/syscall.c | 2 +-
> mm/debug.c | 6 ------
> mm/huge_memory.c | 9 ++++-----
> mm/mmap.c | 21 ++++++++++++++++++---
> mm/shmem.c | 11 +++++------
> mm/util.c | 6 +++---
> 23 files changed, 66 insertions(+), 57 deletions(-)
>
> diff --git a/arch/s390/mm/hugetlbpage.c b/arch/s390/mm/hugetlbpage.c
> index c2e8242bd15d..219d906fe830 100644
> --- a/arch/s390/mm/hugetlbpage.c
> +++ b/arch/s390/mm/hugetlbpage.c
> @@ -328,7 +328,7 @@ unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
> goto check_asce_limit;
> }
>
> - if (mm->get_unmapped_area == arch_get_unmapped_area)
> + if (!test_bit(MMF_TOPDOWN, &mm->flags))
> addr = hugetlb_get_unmapped_area_bottomup(file, addr, len,
> pgoff, flags);
> else
> diff --git a/arch/s390/mm/mmap.c b/arch/s390/mm/mmap.c
> index b14fc0887654..6b2e4436ad4a 100644
> --- a/arch/s390/mm/mmap.c
> +++ b/arch/s390/mm/mmap.c
> @@ -185,10 +185,10 @@ void arch_pick_mmap_layout(struct mm_struct *mm, struct rlimit *rlim_stack)
> */
> if (mmap_is_legacy(rlim_stack)) {
> mm->mmap_base = mmap_base_legacy(random_factor);
> - mm->get_unmapped_area = arch_get_unmapped_area;
> + clear_bit(MMF_TOPDOWN, &mm->flags);
> } else {
> mm->mmap_base = mmap_base(random_factor, rlim_stack);
> - mm->get_unmapped_area = arch_get_unmapped_area_topdown;
> + set_bit(MMF_TOPDOWN, &mm->flags);
> }
> }
>
> diff --git a/arch/sparc/kernel/sys_sparc_64.c b/arch/sparc/kernel/sys_sparc_64.c
> index 1e9a9e016237..1dbf7211666e 100644
> --- a/arch/sparc/kernel/sys_sparc_64.c
> +++ b/arch/sparc/kernel/sys_sparc_64.c
> @@ -218,14 +218,10 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
> unsigned long get_fb_unmapped_area(struct file *filp, unsigned long orig_addr, unsigned long len, unsigned long pgoff, unsigned long flags)
> {
> unsigned long align_goal, addr = -ENOMEM;
> - unsigned long (*get_area)(struct file *, unsigned long,
> - unsigned long, unsigned long, unsigned long);
> -
> - get_area = current->mm->get_unmapped_area;
>
> if (flags & MAP_FIXED) {
> /* Ok, don't mess with it. */
> - return get_area(NULL, orig_addr, len, pgoff, flags);
> + return mm_get_unmapped_area(current->mm, NULL, orig_addr, len, pgoff, flags);
> }
> flags &= ~MAP_SHARED;
>
> @@ -238,7 +234,8 @@ unsigned long get_fb_unmapped_area(struct file *filp, unsigned long orig_addr, u
> align_goal = (64UL * 1024);
>
> do {
> - addr = get_area(NULL, orig_addr, len + (align_goal - PAGE_SIZE), pgoff, flags);
> + addr = mm_get_unmapped_area(current->mm, NULL, orig_addr,
> + len + (align_goal - PAGE_SIZE), pgoff, flags);
> if (!(addr & ~PAGE_MASK)) {
> addr = (addr + (align_goal - 1UL)) & ~(align_goal - 1UL);
> break;
> @@ -256,7 +253,7 @@ unsigned long get_fb_unmapped_area(struct file *filp, unsigned long orig_addr, u
> * be obtained.
> */
> if (addr & ~PAGE_MASK)
> - addr = get_area(NULL, orig_addr, len, pgoff, flags);
> + addr = mm_get_unmapped_area(current->mm, NULL, orig_addr, len, pgoff, flags);
>
> return addr;
> }
> @@ -292,7 +289,7 @@ void arch_pick_mmap_layout(struct mm_struct *mm, struct rlimit *rlim_stack)
> gap == RLIM_INFINITY ||
> sysctl_legacy_va_layout) {
> mm->mmap_base = TASK_UNMAPPED_BASE + random_factor;
> - mm->get_unmapped_area = arch_get_unmapped_area;
> + clear_bit(MMF_TOPDOWN, &mm->flags);
> } else {
> /* We know it's 32-bit */
> unsigned long task_size = STACK_TOP32;
> @@ -303,7 +300,7 @@ void arch_pick_mmap_layout(struct mm_struct *mm, struct rlimit *rlim_stack)
> gap = (task_size / 6 * 5);
>
> mm->mmap_base = PAGE_ALIGN(task_size - gap - random_factor);
> - mm->get_unmapped_area = arch_get_unmapped_area_topdown;
> + set_bit(MMF_TOPDOWN, &mm->flags);
> }
> }
>
> diff --git a/arch/sparc/mm/hugetlbpage.c b/arch/sparc/mm/hugetlbpage.c
> index b432500c13a5..38a1bef47efb 100644
> --- a/arch/sparc/mm/hugetlbpage.c
> +++ b/arch/sparc/mm/hugetlbpage.c
> @@ -123,7 +123,7 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
> (!vma || addr + len <= vm_start_gap(vma)))
> return addr;
> }
> - if (mm->get_unmapped_area == arch_get_unmapped_area)
> + if (!test_bit(MMF_TOPDOWN, &mm->flags))
> return hugetlb_get_unmapped_area_bottomup(file, addr, len,
> pgoff, flags);
> else
> diff --git a/arch/x86/kernel/cpu/sgx/driver.c b/arch/x86/kernel/cpu/sgx/driver.c
> index 262f5fb18d74..22b65a5f5ec6 100644
> --- a/arch/x86/kernel/cpu/sgx/driver.c
> +++ b/arch/x86/kernel/cpu/sgx/driver.c
> @@ -113,7 +113,7 @@ static unsigned long sgx_get_unmapped_area(struct file *file,
> if (flags & MAP_FIXED)
> return addr;
>
> - return current->mm->get_unmapped_area(file, addr, len, pgoff, flags);
> + return mm_get_unmapped_area(current->mm, file, addr, len, pgoff, flags);
> }
>
> #ifdef CONFIG_COMPAT
> diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
> index 5804bbae4f01..6d77c0039617 100644
> --- a/arch/x86/mm/hugetlbpage.c
> +++ b/arch/x86/mm/hugetlbpage.c
> @@ -141,7 +141,7 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
> }
>
> get_unmapped_area:
> - if (mm->get_unmapped_area == arch_get_unmapped_area)
> + if (!test_bit(MMF_TOPDOWN, &mm->flags))
> return hugetlb_get_unmapped_area_bottomup(file, addr, len,
> pgoff, flags);
> else
> diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
> index c90c20904a60..a2cabb1c81e1 100644
> --- a/arch/x86/mm/mmap.c
> +++ b/arch/x86/mm/mmap.c
> @@ -129,9 +129,9 @@ static void arch_pick_mmap_base(unsigned long *base, unsigned long *legacy_base,
> void arch_pick_mmap_layout(struct mm_struct *mm, struct rlimit *rlim_stack)
> {
> if (mmap_is_legacy())
> - mm->get_unmapped_area = arch_get_unmapped_area;
> + clear_bit(MMF_TOPDOWN, &mm->flags);
> else
> - mm->get_unmapped_area = arch_get_unmapped_area_topdown;
> + set_bit(MMF_TOPDOWN, &mm->flags);
>
> arch_pick_mmap_base(&mm->mmap_base, &mm->mmap_legacy_base,
> arch_rnd(mmap64_rnd_bits), task_size_64bit(0),
> diff --git a/drivers/char/mem.c b/drivers/char/mem.c
> index 3c6670cf905f..9b80e622ae80 100644
> --- a/drivers/char/mem.c
> +++ b/drivers/char/mem.c
> @@ -544,7 +544,7 @@ static unsigned long get_unmapped_area_zero(struct file *file,
> }
>
> /* Otherwise flags & MAP_PRIVATE: with no shmem object beneath it */
> - return current->mm->get_unmapped_area(file, addr, len, pgoff, flags);
> + return mm_get_unmapped_area(current->mm, file, addr, len, pgoff, flags);
> #else
> return -ENOSYS;
> #endif
> diff --git a/drivers/dax/device.c b/drivers/dax/device.c
> index 93ebedc5ec8c..47c126d37b59 100644
> --- a/drivers/dax/device.c
> +++ b/drivers/dax/device.c
> @@ -329,14 +329,14 @@ static unsigned long dax_get_unmapped_area(struct file *filp,
> if ((off + len_align) < off)
> goto out;
>
> - addr_align = current->mm->get_unmapped_area(filp, addr, len_align,
> - pgoff, flags);
> + addr_align = mm_get_unmapped_area(current->mm, filp, addr, len_align,
> + pgoff, flags);
> if (!IS_ERR_VALUE(addr_align)) {
> addr_align += (off - addr_align) & (align - 1);
> return addr_align;
> }
> out:
> - return current->mm->get_unmapped_area(filp, addr, len, pgoff, flags);
> + return mm_get_unmapped_area(current->mm, filp, addr, len, pgoff, flags);
> }
>
> static const struct address_space_operations dev_dax_aops = {
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index 6502c7e776d1..3dee18bf47ed 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -249,11 +249,11 @@ generic_hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
> }
>
> /*
> - * Use mm->get_unmapped_area value as a hint to use topdown routine.
> + * Use MMF_TOPDOWN flag as a hint to use topdown routine.
> * If architectures have special needs, they should define their own
> * version of hugetlb_get_unmapped_area.
> */
> - if (mm->get_unmapped_area == arch_get_unmapped_area_topdown)
> + if (test_bit(MMF_TOPDOWN, &mm->flags))
> return hugetlb_get_unmapped_area_topdown(file, addr, len,
> pgoff, flags);
> return hugetlb_get_unmapped_area_bottomup(file, addr, len,
> diff --git a/fs/proc/inode.c b/fs/proc/inode.c
> index 75396a24fd8c..d19434e2a58e 100644
> --- a/fs/proc/inode.c
> +++ b/fs/proc/inode.c
> @@ -455,8 +455,9 @@ pde_get_unmapped_area(struct proc_dir_entry *pde, struct file *file, unsigned lo
> return pde->proc_ops->proc_get_unmapped_area(file, orig_addr, len, pgoff, flags);
>
> #ifdef CONFIG_MMU
> - return current->mm->get_unmapped_area(file, orig_addr, len, pgoff, flags);
> + return mm_get_unmapped_area(current->mm, file, orig_addr, len, pgoff, flags);
> #endif
> +
> return orig_addr;
> }
>
> diff --git a/fs/ramfs/file-mmu.c b/fs/ramfs/file-mmu.c
> index c7a1aa3c882b..b45c7edc3225 100644
> --- a/fs/ramfs/file-mmu.c
> +++ b/fs/ramfs/file-mmu.c
> @@ -35,7 +35,7 @@ static unsigned long ramfs_mmu_get_unmapped_area(struct file *file,
> unsigned long addr, unsigned long len, unsigned long pgoff,
> unsigned long flags)
> {
> - return current->mm->get_unmapped_area(file, addr, len, pgoff, flags);
> + return mm_get_unmapped_area(current->mm, file, addr, len, pgoff, flags);
> }
>
> const struct file_operations ramfs_file_operations = {
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 5240bd7bca33..9313e43123d4 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -777,11 +777,7 @@ struct mm_struct {
> } ____cacheline_aligned_in_smp;
>
> struct maple_tree mm_mt;
> -#ifdef CONFIG_MMU
> - unsigned long (*get_unmapped_area) (struct file *filp,
> - unsigned long addr, unsigned long len,
> - unsigned long pgoff, unsigned long flags);
> -#endif
> +
> unsigned long mmap_base; /* base of mmap area */
> unsigned long mmap_legacy_base; /* base of mmap area in bottom-up allocations */
> #ifdef CONFIG_HAVE_ARCH_COMPAT_MMAP_BASES
> diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
> index 02f5090ffea2..e62ff805cfc9 100644
> --- a/include/linux/sched/coredump.h
> +++ b/include/linux/sched/coredump.h
> @@ -92,9 +92,12 @@ static inline int get_dumpable(struct mm_struct *mm)
> #define MMF_VM_MERGE_ANY 30
> #define MMF_VM_MERGE_ANY_MASK (1 << MMF_VM_MERGE_ANY)
>
> +#define MMF_TOPDOWN 31 /* mm searches top down by default */
> +#define MMF_TOPDOWN_MASK (1 << MMF_TOPDOWN)
> +
> #define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\
> MMF_DISABLE_THP_MASK | MMF_HAS_MDWE_MASK |\
> - MMF_VM_MERGE_ANY_MASK)
> + MMF_VM_MERGE_ANY_MASK | MMF_TOPDOWN_MASK)
>
> static inline unsigned long mmf_init_flags(unsigned long flags)
> {
> diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> index b6543f9d78d6..ed1caa26c8be 100644
> --- a/include/linux/sched/mm.h
> +++ b/include/linux/sched/mm.h
> @@ -8,6 +8,7 @@
> #include <linux/mm_types.h>
> #include <linux/gfp.h>
> #include <linux/sync_core.h>
> +#include <linux/sched/coredump.h>
>
> /*
> * Routines for handling mm_structs
> @@ -186,6 +187,10 @@ arch_get_unmapped_area_topdown(struct file *filp, unsigned long addr,
> unsigned long len, unsigned long pgoff,
> unsigned long flags);
>
> +unsigned long mm_get_unmapped_area(struct mm_struct *mm, struct file *filp,
> + unsigned long addr, unsigned long len,
> + unsigned long pgoff, unsigned long flags);
> +
> unsigned long
> generic_get_unmapped_area(struct file *filp, unsigned long addr,
> unsigned long len, unsigned long pgoff,
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index 5d4b448fdc50..405bab0a560c 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -3520,7 +3520,7 @@ static unsigned long io_uring_mmu_get_unmapped_area(struct file *filp,
> #else
> addr = 0UL;
> #endif
> - return current->mm->get_unmapped_area(filp, addr, len, pgoff, flags);
> + return mm_get_unmapped_area(current->mm, filp, addr, len, pgoff, flags);
> }
>
> #else /* !CONFIG_MMU */
> diff --git a/kernel/bpf/arena.c b/kernel/bpf/arena.c
> index 86571e760dd6..74d566dcd2cb 100644
> --- a/kernel/bpf/arena.c
> +++ b/kernel/bpf/arena.c
> @@ -314,7 +314,7 @@ static unsigned long arena_get_unmapped_area(struct file *filp, unsigned long ad
> return -EINVAL;
> }
>
> - ret = current->mm->get_unmapped_area(filp, addr, len * 2, 0, flags);
> + ret = mm_get_unmapped_area(current->mm, filp, addr, len * 2, 0, flags);
> if (IS_ERR_VALUE(ret))
> return ret;
> if ((ret >> 32) == ((ret + len - 1) >> 32))
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index ae2ff73bde7e..dead5e1977d8 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -980,7 +980,7 @@ static unsigned long bpf_get_unmapped_area(struct file *filp, unsigned long addr
> if (map->ops->map_get_unmapped_area)
> return map->ops->map_get_unmapped_area(filp, addr, len, pgoff, flags);
> #ifdef CONFIG_MMU
> - return current->mm->get_unmapped_area(filp, addr, len, pgoff, flags);
> + return mm_get_unmapped_area(current->mm, filp, addr, len, pgoff, flags);
> #else
> return addr;
> #endif
> diff --git a/mm/debug.c b/mm/debug.c
> index c1c1a6a484e4..37a17f77df9f 100644
> --- a/mm/debug.c
> +++ b/mm/debug.c
> @@ -180,9 +180,6 @@ EXPORT_SYMBOL(dump_vma);
> void dump_mm(const struct mm_struct *mm)
> {
> pr_emerg("mm %px task_size %lu\n"
> -#ifdef CONFIG_MMU
> - "get_unmapped_area %px\n"
> -#endif
> "mmap_base %lu mmap_legacy_base %lu\n"
> "pgd %px mm_users %d mm_count %d pgtables_bytes %lu map_count %d\n"
> "hiwater_rss %lx hiwater_vm %lx total_vm %lx locked_vm %lx\n"
> @@ -208,9 +205,6 @@ void dump_mm(const struct mm_struct *mm)
> "def_flags: %#lx(%pGv)\n",
>
> mm, mm->task_size,
> -#ifdef CONFIG_MMU
> - mm->get_unmapped_area,
> -#endif
> mm->mmap_base, mm->mmap_legacy_base,
> mm->pgd, atomic_read(&mm->mm_users),
> atomic_read(&mm->mm_count),
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 9859aa4f7553..cede9ccb84dc 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -824,8 +824,8 @@ static unsigned long __thp_get_unmapped_area(struct file *filp,
> if (len_pad < len || (off + len_pad) < off)
> return 0;
>
> - ret = current->mm->get_unmapped_area(filp, addr, len_pad,
> - off >> PAGE_SHIFT, flags);
> + ret = mm_get_unmapped_area(current->mm, filp, addr, len_pad,
> + off >> PAGE_SHIFT, flags);
>
> /*
> * The failure might be due to length padding. The caller will retry
> @@ -843,8 +843,7 @@ static unsigned long __thp_get_unmapped_area(struct file *filp,
>
> off_sub = (off - ret) & (size - 1);
>
> - if (current->mm->get_unmapped_area == arch_get_unmapped_area_topdown &&
> - !off_sub)
> + if (test_bit(MMF_TOPDOWN, ¤t->mm->flags) && !off_sub)
> return ret + size;
>
> ret += off_sub;
> @@ -861,7 +860,7 @@ unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr,
> if (ret)
> return ret;
>
> - return current->mm->get_unmapped_area(filp, addr, len, pgoff, flags);
> + return mm_get_unmapped_area(current->mm, filp, addr, len, pgoff, flags);
> }
> EXPORT_SYMBOL_GPL(thp_get_unmapped_area);
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 6dbda99a47da..224e9ce1e2fd 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1813,7 +1813,8 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
> unsigned long pgoff, unsigned long flags)
> {
> unsigned long (*get_area)(struct file *, unsigned long,
> - unsigned long, unsigned long, unsigned long);
> + unsigned long, unsigned long, unsigned long)
> + = NULL;
>
> unsigned long error = arch_mmap_check(addr, len, flags);
> if (error)
> @@ -1823,7 +1824,6 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
> if (len > TASK_SIZE)
> return -ENOMEM;
>
> - get_area = current->mm->get_unmapped_area;
> if (file) {
> if (file->f_op->get_unmapped_area)
> get_area = file->f_op->get_unmapped_area;
> @@ -1842,7 +1842,11 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
> if (!file)
> pgoff = 0;
>
> - addr = get_area(file, addr, len, pgoff, flags);
> + if (get_area)
> + addr = get_area(file, addr, len, pgoff, flags);
> + else
> + addr = mm_get_unmapped_area(current->mm, file, addr, len,
> + pgoff, flags);
> if (IS_ERR_VALUE(addr))
> return addr;
>
> @@ -1857,6 +1861,17 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
>
> EXPORT_SYMBOL(get_unmapped_area);
>
> +unsigned long
> +mm_get_unmapped_area(struct mm_struct *mm, struct file *file,
> + unsigned long addr, unsigned long len,
> + unsigned long pgoff, unsigned long flags)
> +{
Seems like a small waste to have all call sites now need to push an
additional @mm argument onto the stack just to figure out what function
to call.
> + if (test_bit(MMF_TOPDOWN, &mm->flags))
> + return arch_get_unmapped_area_topdown(file, addr, len, pgoff, flags);
> + return arch_get_unmapped_area(file, addr, len, pgoff, flags);
This seems small enough to be amenable to a static inline, but that
would require exporting the arch calls which might be painful.
Would it not be possible to drop the @mm argument and just reference
current->mm internal to this function? Just call this funcion
current_get_unmapped_area()?
Otherwise looks like a useful change to me.
On Wed, 2024-03-27 at 02:42 +0000, Edgecombe, Rick P wrote:
> On Tue, 2024-03-26 at 13:57 +0200, Jarkko Sakkinen wrote:
> > In which conditions which path is used during the initialization of
> > mm
> > and why is this the case? It is an open claim in the current form.
>
> There is an arch_pick_mmap_layout() that arch's can have their own
> rules for. There is also a
> generic one. It gets called during exec.
>
> >
> > That would be nice to have documented for the sake of being
> > complete
> > description. I have zero doubts of the claim being untrue.
>
> ...being untrue?
>
I mean I believe the change itself makes sense, it is just not
fully documented in the commit message.
BR, Jarkko
On Tue, 2024-03-26 at 23:38 -0700, Dan Williams wrote:
> > +unsigned long
> > +mm_get_unmapped_area(struct mm_struct *mm, struct file *file,
> > + unsigned long addr, unsigned long len,
> > + unsigned long pgoff, unsigned long flags)
> > +{
>
> Seems like a small waste to have all call sites now need to push an
> additional @mm argument onto the stack just to figure out what function
> to call.
>
> > + if (test_bit(MMF_TOPDOWN, &mm->flags))
> > + return arch_get_unmapped_area_topdown(file, addr, len, pgoff, flags);
> > + return arch_get_unmapped_area(file, addr, len, pgoff, flags);
>
> This seems small enough to be amenable to a static inline, but that
> would require exporting the arch calls which might be painful.
>
> Would it not be possible to drop the @mm argument and just reference
> current->mm internal to this function? Just call this funcion
> current_get_unmapped_area()?
Hmm, you are right. The callers all pass current->mm. The mm argument could be removed.
On Wed, 2024-03-27 at 15:15 +0200, Jarkko Sakkinen wrote:
> I mean I believe the change itself makes sense, it is just not
> fully documented in the commit message.
Ah, I see. Yes, there could be more background on arch_pick_mmap_layout().