Hi,
In working on x86’s shadow stack feature, I came across some limitations
around the kernel’s handling of guard gaps. AFAICT these limitations are
not too important for the traditional stack usage of guard gaps, but have
bigger impact on shadow stack’s usage. And now in addition to x86, we have
two other architectures implementing shadow stack like features that plan
to use guard gaps. I wanted to see about addressing them, but I have not
worked on mmap() placement related code before, so would greatly
appreciate if people could take a look and point me in the right
direction.
The nature of the limitations of concern is as follows. In order to ensure
guard gaps between mappings, mmap() would need to consider two things:
1. That the new mapping isn’t placed in an any existing mapping’s guard
gap.
2. That the new mapping isn’t placed such that any existing mappings are
not in *its* guard gaps
Currently mmap never considers (2), and (1) is not considered in some
situations.
When not passing an address hint, or passing one without
MAP_FIXED_NOREPLACE, (1) is enforced. With MAP_FIXED_NOREPLACE, (1) is not
enforced. With MAP_FIXED, (1) is not considered, but this seems to be
expected since MAP_FIXED can already clobber existing mappings. For
MAP_FIXED_NOREPLACE I would have guessed it should respect the guard gaps
of existing mappings, but it is probably a little ambiguous.
In this RFC I just tried to add enforcement of (2) for the normal (no
address hint) case and only for the newer shadow stack memory (not
stacks). The reason is that with the no-address-hint situation, landing
next to a guard gap could come up naturally and so be more influencable by
attackers such that two shadow stacks could be adjacent without a guard
gap. Where as the address-hint scenarios would require more control -
being able to call mmap() with specific arguments. As for why not just fix
the other corner cases anyway, I thought it might have some greater
possibility of affecting existing apps.
Thanks,
Rick
Rick Edgecombe (8):
mm: Switch mm->get_unmapped_area() to a flag
mm: Introduce arch_get_unmapped_area_vmflags()
mm: Use get_unmapped_area_vmflags()
thp: Add thp_get_unmapped_area_vmflags()
mm: Take placement mappings gap into account
x86/mm: Implement HAVE_ARCH_UNMAPPED_AREA_VMFLAGS
x86/mm: Care about shadow stack guard gap during placement
selftests/x86: Add placement guard gap test for shstk
arch/s390/mm/hugetlbpage.c | 2 +-
arch/s390/mm/mmap.c | 4 +-
arch/sparc/kernel/sys_sparc_64.c | 15 +--
arch/sparc/mm/hugetlbpage.c | 2 +-
arch/x86/include/asm/pgtable_64.h | 1 +
arch/x86/kernel/cpu/sgx/driver.c | 2 +-
arch/x86/kernel/sys_x86_64.c | 43 +++++--
arch/x86/mm/hugetlbpage.c | 2 +-
arch/x86/mm/mmap.c | 4 +-
drivers/char/mem.c | 2 +-
drivers/dax/device.c | 6 +-
fs/hugetlbfs/inode.c | 2 +-
fs/proc/inode.c | 15 +--
fs/ramfs/file-mmu.c | 2 +-
include/linux/huge_mm.h | 11 ++
include/linux/mm.h | 4 +
include/linux/mm_types.h | 6 +-
include/linux/sched/coredump.h | 1 +
include/linux/sched/mm.h | 22 ++++
io_uring/io_uring.c | 2 +-
mm/debug.c | 6 -
mm/huge_memory.c | 23 ++--
mm/mmap.c | 108 ++++++++++++++----
mm/shmem.c | 11 +-
mm/util.c | 6 +-
.../testing/selftests/x86/test_shadow_stack.c | 67 ++++++++++-
26 files changed, 273 insertions(+), 96 deletions(-)
--
2.34.1
When memory is being placed, mmap() will take care to respect the guard
gaps of certain types of memory (VM_SHADOWSTACK, VM_GROWSUP and
VM_GROWSDOWN). In order to ensure guard gaps between mappings, mmap()
needs to consider two things:
1. That the new mapping isn’t placed in an any existing mappings guard
gaps.
2. That the new mapping isn’t placed such that any existing mappings
are not in *its* guard gaps.
The long standing behavior of mmap() is to ensure 1, but not take any care
around 2. So for example, if there is a PAGE_SIZE free area, and a
mmap() with a PAGE_SIZE size, and a type that has a guard gap is being
placed, mmap() may place the shadow stack in the PAGE_SIZE free area. Then
the mapping that is supposed to have a guard gap will not have a gap to
the adjacent VMA.
In order to take the start gap into account, the maple tree search needs
to know the size of start gap the new mapping will need. The call chain
from do_mmap() to the actual maple tree search looks like this:
do_mmap(size, vm_flags, map_flags, ..)
mm/mmap.c:get_unmapped_area(size, map_flags, ...)
arch_get_unmapped_area(size, map_flags, ...)
vm_unmapped_area(struct vm_unmapped_area_info)
One option would be to add another MAP_ flag to mean a one page start gap
(as is for shadow stack), but this consumes a flag unnecessarily. Another
option could be to simply increase the size passed in do_mmap() by the
start gap size, and adjust after the fact, but this will interfere with
the alignment requirements passed in struct vm_unmapped_area_info, and
unknown to mmap.c. Instead, introduce variants of
arch_get_unmapped_area/_topdown() that take vm_flags. In future changes,
these variants can be used in mmap.c:get_unmapped_area() to allow the
vm_flags to be passed through to vm_unmapped_area(), while preserving the
normal arch_get_unmapped_area/_topdown() for the existing callers.
Signed-off-by: Rick Edgecombe <[email protected]>
---
include/linux/sched/mm.h | 17 +++++++++++++++++
mm/mmap.c | 28 ++++++++++++++++++++++++++++
2 files changed, 45 insertions(+)
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index cde946e926d8..7b44441865c5 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -191,6 +191,23 @@ unsigned long mm_get_unmapped_area(struct mm_struct *mm, struct file *filp,
unsigned long addr, unsigned long len,
unsigned long pgoff, unsigned long flags);
+extern unsigned long
+arch_get_unmapped_area_vmflags(struct file *filp, unsigned long addr,
+ unsigned long len, unsigned long pgoff,
+ unsigned long flags, vm_flags_t vm_flags);
+extern unsigned long
+arch_get_unmapped_area_topdown_vmflags(struct file *filp, unsigned long addr,
+ unsigned long len, unsigned long pgoff,
+ unsigned long flags, vm_flags_t);
+
+unsigned long mm_get_unmapped_area_vmflags(struct mm_struct *mm,
+ struct file *filp,
+ unsigned long addr,
+ unsigned long len,
+ unsigned long pgoff,
+ unsigned long flags,
+ vm_flags_t vm_flags);
+
unsigned long
generic_get_unmapped_area(struct file *filp, unsigned long addr,
unsigned long len, unsigned long pgoff,
diff --git a/mm/mmap.c b/mm/mmap.c
index b61bc515c729..2021bc040e81 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1802,6 +1802,34 @@ arch_get_unmapped_area_topdown(struct file *filp, unsigned long addr,
}
#endif
+#ifndef HAVE_ARCH_UNMAPPED_AREA_VMFLAGS
+extern unsigned long
+arch_get_unmapped_area_vmflags(struct file *filp, unsigned long addr, unsigned long len,
+ unsigned long pgoff, unsigned long flags, vm_flags_t vm_flags)
+{
+ return arch_get_unmapped_area(filp, addr, len, pgoff, flags);
+}
+
+extern unsigned long
+arch_get_unmapped_area_topdown_vmflags(struct file *filp, unsigned long addr,
+ unsigned long len, unsigned long pgoff,
+ unsigned long flags, vm_flags_t vm_flags)
+{
+ return arch_get_unmapped_area_topdown(filp, addr, len, pgoff, flags);
+}
+#endif
+
+unsigned long mm_get_unmapped_area_vmflags(struct mm_struct *mm, struct file *filp,
+ unsigned long addr, unsigned long len,
+ unsigned long pgoff, unsigned long flags,
+ vm_flags_t vm_flags)
+{
+ if (test_bit(MMF_TOPDOWN, &mm->flags))
+ return arch_get_unmapped_area_topdown_vmflags(filp, addr, len, pgoff,
+ flags, vm_flags);
+ return arch_get_unmapped_area_vmflags(filp, addr, len, pgoff, flags, vm_flags);
+}
+
unsigned long
get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
unsigned long pgoff, unsigned long flags)
--
2.34.1
When memory is being placed, mmap() will take care to respect the guard
gaps of certain types of memory (VM_SHADOWSTACK, VM_GROWSUP and
VM_GROWSDOWN). In order to ensure guard gaps between mappings, mmap()
needs to consider two things:
1. That the new mapping isn’t placed in an any existing mappings guard
gaps.
2. That the new mapping isn’t placed such that any existing mappings
are not in *its* guard gaps.
The long standing behavior of mmap() is to ensure 1, but not take any care
around 2. So for example, if there is a PAGE_SIZE free area, and a
mmap() with a PAGE_SIZE size, and a type that has a guard gap is being
placed, mmap() may place the shadow stack in the PAGE_SIZE free area. Then
the mapping that is supposed to have a guard gap will not have a gap to
the adjacent VMA.
Use mm_get_unmapped_area_vmflags() in the do_mmap() so future changes
can cause shadow stack mappings to be placed with a guard gap. Also use
the THP variant that takes vm_flags, such that THP shadow stack can get the
same treatment. Adjust the vm_flags calculation to happen earlier so that
the vm_flags can be passed into __get_unmapped_area().
Signed-off-by: Rick Edgecombe <[email protected]>
---
include/linux/mm.h | 2 ++
mm/mmap.c | 38 ++++++++++++++++++++++----------------
2 files changed, 24 insertions(+), 16 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index da5219b48d52..9addf16dbf18 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3352,6 +3352,8 @@ unsigned long randomize_stack_top(unsigned long stack_top);
unsigned long randomize_page(unsigned long start, unsigned long range);
extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);
+extern unsigned long __get_unmapped_area(struct file *, unsigned long, unsigned long,
+ unsigned long, unsigned long, vm_flags_t);
extern unsigned long mmap_region(struct file *file, unsigned long addr,
unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
diff --git a/mm/mmap.c b/mm/mmap.c
index 2021bc040e81..2723f26f7c62 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1249,18 +1249,6 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
if (mm->map_count > sysctl_max_map_count)
return -ENOMEM;
- /* Obtain the address to map to. we verify (or select) it and ensure
- * that it represents a valid section of the address space.
- */
- addr = get_unmapped_area(file, addr, len, pgoff, flags);
- if (IS_ERR_VALUE(addr))
- return addr;
-
- if (flags & MAP_FIXED_NOREPLACE) {
- if (find_vma_intersection(mm, addr, addr + len))
- return -EEXIST;
- }
-
if (prot == PROT_EXEC) {
pkey = execute_only_pkey(mm);
if (pkey < 0)
@@ -1274,6 +1262,18 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
vm_flags |= calc_vm_prot_bits(prot, pkey) | calc_vm_flag_bits(flags) |
mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC;
+ /* Obtain the address to map to. we verify (or select) it and ensure
+ * that it represents a valid section of the address space.
+ */
+ addr = __get_unmapped_area(file, addr, len, pgoff, flags, vm_flags);
+ if (IS_ERR_VALUE(addr))
+ return addr;
+
+ if (flags & MAP_FIXED_NOREPLACE) {
+ if (find_vma_intersection(mm, addr, addr + len))
+ return -EEXIST;
+ }
+
if (flags & MAP_LOCKED)
if (!can_do_mlock())
return -EPERM;
@@ -1831,8 +1831,8 @@ unsigned long mm_get_unmapped_area_vmflags(struct mm_struct *mm, struct file *fi
}
unsigned long
-get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
- unsigned long pgoff, unsigned long flags)
+__get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
+ unsigned long pgoff, unsigned long flags, vm_flags_t vm_flags)
{
unsigned long (*get_area)(struct file *, unsigned long,
unsigned long, unsigned long, unsigned long)
@@ -1865,8 +1865,8 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
if (get_area)
addr = get_area(file, addr, len, pgoff, flags);
else
- addr = mm_get_unmapped_area(current->mm, file, addr, len,
- pgoff, flags);
+ addr = mm_get_unmapped_area_vmflags(current->mm, file, addr, len,
+ pgoff, flags, vm_flags);
if (IS_ERR_VALUE(addr))
return addr;
@@ -1879,6 +1879,12 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
return error ? error : addr;
}
+unsigned long
+get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
+ unsigned long pgoff, unsigned long flags)
+{
+ return __get_unmapped_area(file, addr, len, pgoff, flags, 0);
+}
EXPORT_SYMBOL(get_unmapped_area);
unsigned long
--
2.34.1
When memory is being placed, mmap() will take care to respect the guard
gaps of certain types of memory (VM_SHADOWSTACK, VM_GROWSUP and
VM_GROWSDOWN). In order to ensure guard gaps between mappings, mmap()
needs to consider two things:
1. That the new mapping isn’t placed in an any existing mappings guard
gaps.
2. That the new mapping isn’t placed such that any existing mappings
are not in *its* guard gaps.
The long standing behavior of mmap() is to ensure 1, but not take any care
around 2. So for example, if there is a PAGE_SIZE free area, and a
mmap() with a PAGE_SIZE size, and a type that has a guard gap is being
placed, mmap() may place the shadow stack in the PAGE_SIZE free area. Then
the mapping that is supposed to have a guard gap will not have a gap to
the adjacent VMA.
Add a THP implementations of the vm_flags variant of get_unmapped_area().
Future changes will call this from mmap.c in the do_mmap() path to allow
shadow stacks to be placed with consideration taken for the start guard
gap. Shadow stack memory is always private and anonymous and so special
guard gap logic is not needed in a lot of caseis, but it can be mapped by
THP, so needs to be handled.
Signed-off-by: Rick Edgecombe <[email protected]>
---
include/linux/huge_mm.h | 11 +++++++++++
mm/huge_memory.c | 23 ++++++++++++++++-------
mm/mmap.c | 12 +++++++-----
3 files changed, 34 insertions(+), 12 deletions(-)
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index fa0350b0812a..ef7251dfd9f9 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -139,6 +139,9 @@ bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags,
unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr,
unsigned long len, unsigned long pgoff, unsigned long flags);
+unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long addr,
+ unsigned long len, unsigned long pgoff, unsigned long flags,
+ vm_flags_t vm_flags);
void folio_prep_large_rmappable(struct folio *folio);
bool can_split_folio(struct folio *folio, int *pextra_pins);
@@ -286,6 +289,14 @@ static inline void folio_prep_large_rmappable(struct folio *folio) {}
#define thp_get_unmapped_area NULL
+static inline unsigned long
+thp_get_unmapped_area_vmflags(struct file *filp, unsigned long addr,
+ unsigned long len, unsigned long pgoff,
+ unsigned long flags, vm_flags_t vm_flags)
+{
+ return 0;
+}
+
static inline bool
can_split_folio(struct folio *folio, int *pextra_pins)
{
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index e9ef43a719a5..f235f6d3ff62 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -628,7 +628,8 @@ static inline bool is_transparent_hugepage(struct folio *folio)
static unsigned long __thp_get_unmapped_area(struct file *filp,
unsigned long addr, unsigned long len,
- loff_t off, unsigned long flags, unsigned long size)
+ loff_t off, unsigned long flags, unsigned long size,
+ vm_flags_t vm_flags)
{
loff_t off_end = off + len;
loff_t off_align = round_up(off, size);
@@ -641,8 +642,8 @@ static unsigned long __thp_get_unmapped_area(struct file *filp,
if (len_pad < len || (off + len_pad) < off)
return 0;
- ret = mm_get_unmapped_area(current->mm, filp, addr, len_pad,
- off >> PAGE_SHIFT, flags);
+ ret = mm_get_unmapped_area_vmflags(current->mm, filp, addr, len_pad,
+ off >> PAGE_SHIFT, flags, vm_flags);
/*
* The failure might be due to length padding. The caller will retry
@@ -662,17 +663,25 @@ static unsigned long __thp_get_unmapped_area(struct file *filp,
return ret;
}
-unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr,
- unsigned long len, unsigned long pgoff, unsigned long flags)
+unsigned long thp_get_unmapped_area_vmflags(struct file *filp, unsigned long addr,
+ unsigned long len, unsigned long pgoff, unsigned long flags,
+ vm_flags_t vm_flags)
{
unsigned long ret;
loff_t off = (loff_t)pgoff << PAGE_SHIFT;
- ret = __thp_get_unmapped_area(filp, addr, len, off, flags, PMD_SIZE);
+ ret = __thp_get_unmapped_area(filp, addr, len, off, flags, PMD_SIZE, vm_flags);
if (ret)
return ret;
- return mm_get_unmapped_area(current->mm, filp, addr, len, pgoff, flags);
+ return mm_get_unmapped_area_vmflags(current->mm, filp, addr, len, pgoff, flags,
+ vm_flags);
+}
+
+unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr,
+ unsigned long len, unsigned long pgoff, unsigned long flags)
+{
+ return thp_get_unmapped_area_vmflags(filp, addr, len, pgoff, flags, 0);
}
EXPORT_SYMBOL_GPL(thp_get_unmapped_area);
diff --git a/mm/mmap.c b/mm/mmap.c
index 2723f26f7c62..936d728ba1ca 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1857,16 +1857,18 @@ __get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
*/
pgoff = 0;
get_area = shmem_get_unmapped_area;
- } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
- /* Ensures that larger anonymous mappings are THP aligned. */
- get_area = thp_get_unmapped_area;
}
- if (get_area)
+ if (get_area) {
addr = get_area(file, addr, len, pgoff, flags);
- else
+ } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
+ /* Ensures that larger anonymous mappings are THP aligned. */
+ addr = thp_get_unmapped_area_vmflags(file, addr, len,
+ pgoff, flags, vm_flags);
+ } else {
addr = mm_get_unmapped_area_vmflags(current->mm, file, addr, len,
pgoff, flags, vm_flags);
+ }
if (IS_ERR_VALUE(addr))
return addr;
--
2.34.1
When memory is being placed, mmap() will take care to respect the guard
gaps of certain types of memory (VM_SHADOWSTACK, VM_GROWSUP and
VM_GROWSDOWN). In order to ensure guard gaps between mappings, mmap()
needs to consider two things:
1. That the new mapping isn’t placed in an any existing mappings guard
gaps.
2. That the new mapping isn’t placed such that any existing mappings
are not in *its* guard gaps.
The long standing behavior of mmap() is to ensure 1, but not take any care
around 2. So for example, if there is a PAGE_SIZE free area, and a
mmap() with a PAGE_SIZE size, and a type that has a guard gap is being
placed, mmap() may place the shadow stack in the PAGE_SIZE free area. Then
the mapping that is supposed to have a guard gap will not have a gap to
the adjacent VMA.
For MAP_GROWSDOWN/VM_GROWSDOWN and MAP_GROWSUP/VM_GROWSUP this has not
been a problem in practice because applications place these kinds of
mappings very early, when there is not many mappings to find a space
between. But for shadow stacks, they may be placed throughout the lifetime
of the application.
So define a VM_UNMAPPED_START_GAP_SET flag to specify that a start_gap
field has been set, as most vm_unmapped_area_info structs are not zeroed,
so the added field will often contain garbage. Use
VM_UNMAPPED_START_GAP_SET in unmapped_area/_topdown() to find a space that
includes the guard gap for the new mapping. Take care to not interfere
with the alignment.
Signed-off-by: Rick Edgecombe <[email protected]>
---
include/linux/mm.h | 2 ++
mm/mmap.c | 21 ++++++++++++++-------
2 files changed, 16 insertions(+), 7 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 9addf16dbf18..160bb6db7a16 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3393,12 +3393,14 @@ extern unsigned long __must_check vm_mmap(struct file *, unsigned long,
struct vm_unmapped_area_info {
#define VM_UNMAPPED_AREA_TOPDOWN 1
+#define VM_UNMAPPED_START_GAP_SET 2
unsigned long flags;
unsigned long length;
unsigned long low_limit;
unsigned long high_limit;
unsigned long align_mask;
unsigned long align_offset;
+ unsigned long start_gap;
};
extern unsigned long vm_unmapped_area(struct vm_unmapped_area_info *info);
diff --git a/mm/mmap.c b/mm/mmap.c
index 936d728ba1ca..1b6c333656f9 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1567,14 +1567,17 @@ static inline int accountable_mapping(struct file *file, vm_flags_t vm_flags)
*/
static unsigned long unmapped_area(struct vm_unmapped_area_info *info)
{
- unsigned long length, gap;
+ unsigned long length, gap, start_gap = 0;
unsigned long low_limit, high_limit;
struct vm_area_struct *tmp;
MA_STATE(mas, ¤t->mm->mm_mt, 0, 0);
+ if (info->flags & VM_UNMAPPED_START_GAP_SET)
+ start_gap = info->start_gap;
+
/* Adjust search length to account for worst case alignment overhead */
- length = info->length + info->align_mask;
+ length = info->length + info->align_mask + start_gap;
if (length < info->length)
return -ENOMEM;
@@ -1586,7 +1589,7 @@ static unsigned long unmapped_area(struct vm_unmapped_area_info *info)
if (mas_empty_area(&mas, low_limit, high_limit - 1, length))
return -ENOMEM;
- gap = mas.index;
+ gap = mas.index + start_gap;
gap += (info->align_offset - gap) & info->align_mask;
tmp = mas_next(&mas, ULONG_MAX);
if (tmp && (tmp->vm_flags & VM_STARTGAP_FLAGS)) { /* Avoid prev check if possible */
@@ -1619,13 +1622,17 @@ static unsigned long unmapped_area(struct vm_unmapped_area_info *info)
*/
static unsigned long unmapped_area_topdown(struct vm_unmapped_area_info *info)
{
- unsigned long length, gap, gap_end;
+ unsigned long length, gap, gap_end, start_gap = 0;
unsigned long low_limit, high_limit;
struct vm_area_struct *tmp;
MA_STATE(mas, ¤t->mm->mm_mt, 0, 0);
+
+ if (info->flags & VM_UNMAPPED_START_GAP_SET)
+ start_gap = info->start_gap;
+
/* Adjust search length to account for worst case alignment overhead */
- length = info->length + info->align_mask;
+ length = info->length + info->align_mask + start_gap;
if (length < info->length)
return -ENOMEM;
@@ -1832,7 +1839,7 @@ unsigned long mm_get_unmapped_area_vmflags(struct mm_struct *mm, struct file *fi
unsigned long
__get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
- unsigned long pgoff, unsigned long flags, vm_flags_t vm_flags)
+ unsigned long pgoff, unsigned long flags, vm_flags_t vm_flags)
{
unsigned long (*get_area)(struct file *, unsigned long,
unsigned long, unsigned long, unsigned long)
@@ -1883,7 +1890,7 @@ __get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
unsigned long
get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
- unsigned long pgoff, unsigned long flags)
+ unsigned long pgoff, unsigned long flags)
{
return __get_unmapped_area(file, addr, len, pgoff, flags, 0);
}
--
2.34.1
When memory is being placed, mmap() will take care to respect the guard
gaps of certain types of memory (VM_SHADOWSTACK, VM_GROWSUP and
VM_GROWSDOWN). In order to ensure guard gaps between mappings, mmap()
needs to consider two things:
1. That the new mapping isn’t placed in an any existing mappings guard
gaps.
2. That the new mapping isn’t placed such that any existing mappings
are not in *its* guard gaps.
The long standing behavior of mmap() is to ensure 1, but not take any care
around 2. So for example, if there is a PAGE_SIZE free area, and a
mmap() with a PAGE_SIZE size, and a type that has a guard gap is being
placed, mmap() may place the shadow stack in the PAGE_SIZE free area. Then
the mapping that is supposed to have a guard gap will not have a gap to
the adjacent VMA.
Add x86 arch implementations of arch_get_unmapped_area_vmflags/_topdown()
so future changes can allow the the guard gap of type of vma being placed
to be taken into account. This will be used for shadow stack memory.
Signed-off-by: Rick Edgecombe <[email protected]>
---
arch/x86/include/asm/pgtable_64.h | 1 +
arch/x86/kernel/sys_x86_64.c | 29 ++++++++++++++++++++++-------
2 files changed, 23 insertions(+), 7 deletions(-)
diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index a629b1b9f65a..eb09a11621ad 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -244,6 +244,7 @@ extern void cleanup_highmap(void);
#define HAVE_ARCH_UNMAPPED_AREA
#define HAVE_ARCH_UNMAPPED_AREA_TOPDOWN
+#define HAVE_ARCH_UNMAPPED_AREA_VMFLAGS
#define PAGE_AGP PAGE_KERNEL_NOCACHE
#define HAVE_PAGE_AGP 1
diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
index c783aeb37dce..f92780cf9662 100644
--- a/arch/x86/kernel/sys_x86_64.c
+++ b/arch/x86/kernel/sys_x86_64.c
@@ -119,9 +119,9 @@ static void find_start_end(unsigned long addr, unsigned long flags,
*end = task_size_64bit(addr > DEFAULT_MAP_WINDOW);
}
-unsigned long
-arch_get_unmapped_area(struct file *filp, unsigned long addr,
- unsigned long len, unsigned long pgoff, unsigned long flags)
+extern unsigned long
+arch_get_unmapped_area_vmflags(struct file *filp, unsigned long addr, unsigned long len,
+ unsigned long pgoff, unsigned long flags, vm_flags_t vm_flags)
{
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
@@ -157,10 +157,10 @@ arch_get_unmapped_area(struct file *filp, unsigned long addr,
return vm_unmapped_area(&info);
}
-unsigned long
-arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
- const unsigned long len, const unsigned long pgoff,
- const unsigned long flags)
+extern unsigned long
+arch_get_unmapped_area_topdown_vmflags(struct file *filp, unsigned long addr0,
+ unsigned long len, unsigned long pgoff,
+ unsigned long flags, vm_flags_t vm_flags)
{
struct vm_area_struct *vma;
struct mm_struct *mm = current->mm;
@@ -230,3 +230,18 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
*/
return arch_get_unmapped_area(filp, addr0, len, pgoff, flags);
}
+
+unsigned long
+arch_get_unmapped_area(struct file *filp, unsigned long addr,
+ unsigned long len, unsigned long pgoff, unsigned long flags)
+{
+ return arch_get_unmapped_area_vmflags(filp, addr, len, pgoff, flags, 0);
+}
+
+unsigned long
+arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr,
+ const unsigned long len, const unsigned long pgoff,
+ const unsigned long flags)
+{
+ return arch_get_unmapped_area_topdown_vmflags(filp, addr, len, pgoff, flags, 0);
+}
--
2.34.1
The mm_struct contains a function pointer *get_unmapped_area(), which
is set to either arch_get_unmapped_area() or
arch_get_unmapped_area_topdown() during the initialization of the mm.
Since the function pointer only ever points to two functions that are named
the same across all arch's, a function pointer is not really required. In
addition future changes will want to add versions of the functions that
take additional arguments. So to save a pointers worth of bytes in
mm_struct, and prevent adding additional function pointers to mm_struct in
future changes, remove it and keep the information about which
get_unmapped_area() to use in a flag.
Introduce a helper, mm_get_unmapped_area(), to easily convert code that
refers to the old function pointer to instead select and call either
arch_get_unmapped_area() or arch_get_unmapped_area_topdown() based on the
flag. Then drop the mm->get_unmapped_area() function pointer. Leave the
get_unmapped_area() pointer in struct file_operations alone. The main
purpose of this change is to reorganize in preparation for future changes,
but it also converts the calls of mm->get_unmapped_area() from indirect
branches into a direct ones.
The stress-ng bigheap benchmark calls realloc a lot, which calls through
get_unmapped_area() in the kernel. On x86, the change yielded a ~4%
improvement there. (bogo ops/s (usr+sys time))
In testing a few x86 configs, removing the pointer unfortunately didn't
result in any actual size reductions in the compiled layout of mm_struct.
But depending on compiler or arch alignment requirements, the change could
possibly shrink the size of mm_struct.
Signed-off-by: Rick Edgecombe <[email protected]>
---
arch/s390/mm/hugetlbpage.c | 2 +-
arch/s390/mm/mmap.c | 4 ++--
arch/sparc/kernel/sys_sparc_64.c | 15 ++++++---------
arch/sparc/mm/hugetlbpage.c | 2 +-
arch/x86/kernel/cpu/sgx/driver.c | 2 +-
arch/x86/mm/hugetlbpage.c | 2 +-
arch/x86/mm/mmap.c | 4 ++--
drivers/char/mem.c | 2 +-
drivers/dax/device.c | 6 +++---
fs/hugetlbfs/inode.c | 2 +-
fs/proc/inode.c | 15 ++++++++-------
fs/ramfs/file-mmu.c | 2 +-
include/linux/mm_types.h | 6 +-----
include/linux/sched/coredump.h | 1 +
include/linux/sched/mm.h | 5 +++++
io_uring/io_uring.c | 2 +-
mm/debug.c | 6 ------
mm/huge_memory.c | 6 +++---
mm/mmap.c | 21 ++++++++++++++++++---
mm/shmem.c | 11 +++++------
mm/util.c | 6 +++---
21 files changed, 65 insertions(+), 57 deletions(-)
diff --git a/arch/s390/mm/hugetlbpage.c b/arch/s390/mm/hugetlbpage.c
index 297a6d897d5a..c2d2850ec8d5 100644
--- a/arch/s390/mm/hugetlbpage.c
+++ b/arch/s390/mm/hugetlbpage.c
@@ -328,7 +328,7 @@ unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
goto check_asce_limit;
}
- if (mm->get_unmapped_area == arch_get_unmapped_area)
+ if (!test_bit(MMF_TOPDOWN, &mm->flags))
addr = hugetlb_get_unmapped_area_bottomup(file, addr, len,
pgoff, flags);
else
diff --git a/arch/s390/mm/mmap.c b/arch/s390/mm/mmap.c
index fc9a7dc26c5e..cd52d72b59cf 100644
--- a/arch/s390/mm/mmap.c
+++ b/arch/s390/mm/mmap.c
@@ -182,10 +182,10 @@ void arch_pick_mmap_layout(struct mm_struct *mm, struct rlimit *rlim_stack)
*/
if (mmap_is_legacy(rlim_stack)) {
mm->mmap_base = mmap_base_legacy(random_factor);
- mm->get_unmapped_area = arch_get_unmapped_area;
+ clear_bit(MMF_TOPDOWN, &mm->flags);
} else {
mm->mmap_base = mmap_base(random_factor, rlim_stack);
- mm->get_unmapped_area = arch_get_unmapped_area_topdown;
+ set_bit(MMF_TOPDOWN, &mm->flags);
}
}
diff --git a/arch/sparc/kernel/sys_sparc_64.c b/arch/sparc/kernel/sys_sparc_64.c
index 1e9a9e016237..1dbf7211666e 100644
--- a/arch/sparc/kernel/sys_sparc_64.c
+++ b/arch/sparc/kernel/sys_sparc_64.c
@@ -218,14 +218,10 @@ arch_get_unmapped_area_topdown(struct file *filp, const unsigned long addr0,
unsigned long get_fb_unmapped_area(struct file *filp, unsigned long orig_addr, unsigned long len, unsigned long pgoff, unsigned long flags)
{
unsigned long align_goal, addr = -ENOMEM;
- unsigned long (*get_area)(struct file *, unsigned long,
- unsigned long, unsigned long, unsigned long);
-
- get_area = current->mm->get_unmapped_area;
if (flags & MAP_FIXED) {
/* Ok, don't mess with it. */
- return get_area(NULL, orig_addr, len, pgoff, flags);
+ return mm_get_unmapped_area(current->mm, NULL, orig_addr, len, pgoff, flags);
}
flags &= ~MAP_SHARED;
@@ -238,7 +234,8 @@ unsigned long get_fb_unmapped_area(struct file *filp, unsigned long orig_addr, u
align_goal = (64UL * 1024);
do {
- addr = get_area(NULL, orig_addr, len + (align_goal - PAGE_SIZE), pgoff, flags);
+ addr = mm_get_unmapped_area(current->mm, NULL, orig_addr,
+ len + (align_goal - PAGE_SIZE), pgoff, flags);
if (!(addr & ~PAGE_MASK)) {
addr = (addr + (align_goal - 1UL)) & ~(align_goal - 1UL);
break;
@@ -256,7 +253,7 @@ unsigned long get_fb_unmapped_area(struct file *filp, unsigned long orig_addr, u
* be obtained.
*/
if (addr & ~PAGE_MASK)
- addr = get_area(NULL, orig_addr, len, pgoff, flags);
+ addr = mm_get_unmapped_area(current->mm, NULL, orig_addr, len, pgoff, flags);
return addr;
}
@@ -292,7 +289,7 @@ void arch_pick_mmap_layout(struct mm_struct *mm, struct rlimit *rlim_stack)
gap == RLIM_INFINITY ||
sysctl_legacy_va_layout) {
mm->mmap_base = TASK_UNMAPPED_BASE + random_factor;
- mm->get_unmapped_area = arch_get_unmapped_area;
+ clear_bit(MMF_TOPDOWN, &mm->flags);
} else {
/* We know it's 32-bit */
unsigned long task_size = STACK_TOP32;
@@ -303,7 +300,7 @@ void arch_pick_mmap_layout(struct mm_struct *mm, struct rlimit *rlim_stack)
gap = (task_size / 6 * 5);
mm->mmap_base = PAGE_ALIGN(task_size - gap - random_factor);
- mm->get_unmapped_area = arch_get_unmapped_area_topdown;
+ set_bit(MMF_TOPDOWN, &mm->flags);
}
}
diff --git a/arch/sparc/mm/hugetlbpage.c b/arch/sparc/mm/hugetlbpage.c
index b432500c13a5..38a1bef47efb 100644
--- a/arch/sparc/mm/hugetlbpage.c
+++ b/arch/sparc/mm/hugetlbpage.c
@@ -123,7 +123,7 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
(!vma || addr + len <= vm_start_gap(vma)))
return addr;
}
- if (mm->get_unmapped_area == arch_get_unmapped_area)
+ if (!test_bit(MMF_TOPDOWN, &mm->flags))
return hugetlb_get_unmapped_area_bottomup(file, addr, len,
pgoff, flags);
else
diff --git a/arch/x86/kernel/cpu/sgx/driver.c b/arch/x86/kernel/cpu/sgx/driver.c
index 262f5fb18d74..22b65a5f5ec6 100644
--- a/arch/x86/kernel/cpu/sgx/driver.c
+++ b/arch/x86/kernel/cpu/sgx/driver.c
@@ -113,7 +113,7 @@ static unsigned long sgx_get_unmapped_area(struct file *file,
if (flags & MAP_FIXED)
return addr;
- return current->mm->get_unmapped_area(file, addr, len, pgoff, flags);
+ return mm_get_unmapped_area(current->mm, file, addr, len, pgoff, flags);
}
#ifdef CONFIG_COMPAT
diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
index 5804bbae4f01..6d77c0039617 100644
--- a/arch/x86/mm/hugetlbpage.c
+++ b/arch/x86/mm/hugetlbpage.c
@@ -141,7 +141,7 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
}
get_unmapped_area:
- if (mm->get_unmapped_area == arch_get_unmapped_area)
+ if (!test_bit(MMF_TOPDOWN, &mm->flags))
return hugetlb_get_unmapped_area_bottomup(file, addr, len,
pgoff, flags);
else
diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index c90c20904a60..a2cabb1c81e1 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -129,9 +129,9 @@ static void arch_pick_mmap_base(unsigned long *base, unsigned long *legacy_base,
void arch_pick_mmap_layout(struct mm_struct *mm, struct rlimit *rlim_stack)
{
if (mmap_is_legacy())
- mm->get_unmapped_area = arch_get_unmapped_area;
+ clear_bit(MMF_TOPDOWN, &mm->flags);
else
- mm->get_unmapped_area = arch_get_unmapped_area_topdown;
+ set_bit(MMF_TOPDOWN, &mm->flags);
arch_pick_mmap_base(&mm->mmap_base, &mm->mmap_legacy_base,
arch_rnd(mmap64_rnd_bits), task_size_64bit(0),
diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index 3c6670cf905f..9b80e622ae80 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -544,7 +544,7 @@ static unsigned long get_unmapped_area_zero(struct file *file,
}
/* Otherwise flags & MAP_PRIVATE: with no shmem object beneath it */
- return current->mm->get_unmapped_area(file, addr, len, pgoff, flags);
+ return mm_get_unmapped_area(current->mm, file, addr, len, pgoff, flags);
#else
return -ENOSYS;
#endif
diff --git a/drivers/dax/device.c b/drivers/dax/device.c
index 93ebedc5ec8c..47c126d37b59 100644
--- a/drivers/dax/device.c
+++ b/drivers/dax/device.c
@@ -329,14 +329,14 @@ static unsigned long dax_get_unmapped_area(struct file *filp,
if ((off + len_align) < off)
goto out;
- addr_align = current->mm->get_unmapped_area(filp, addr, len_align,
- pgoff, flags);
+ addr_align = mm_get_unmapped_area(current->mm, filp, addr, len_align,
+ pgoff, flags);
if (!IS_ERR_VALUE(addr_align)) {
addr_align += (off - addr_align) & (align - 1);
return addr_align;
}
out:
- return current->mm->get_unmapped_area(filp, addr, len, pgoff, flags);
+ return mm_get_unmapped_area(current->mm, filp, addr, len, pgoff, flags);
}
static const struct address_space_operations dev_dax_aops = {
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index f757d4f7ad98..a63d2eee086f 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -242,7 +242,7 @@ generic_hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
* If architectures have special needs, they should define their own
* version of hugetlb_get_unmapped_area.
*/
- if (mm->get_unmapped_area == arch_get_unmapped_area_topdown)
+ if (test_bit(MMF_TOPDOWN, &mm->flags))
return hugetlb_get_unmapped_area_topdown(file, addr, len,
pgoff, flags);
return hugetlb_get_unmapped_area_bottomup(file, addr, len,
diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index b33e490e3fd9..6f4c2e21e68f 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -454,15 +454,16 @@ pde_get_unmapped_area(struct proc_dir_entry *pde, struct file *file, unsigned lo
unsigned long len, unsigned long pgoff,
unsigned long flags)
{
- typeof_member(struct proc_ops, proc_get_unmapped_area) get_area;
-
- get_area = pde->proc_ops->proc_get_unmapped_area;
+ if (pde->proc_ops->proc_get_unmapped_area)
+ return pde->proc_ops->proc_get_unmapped_area(file, orig_addr,
+ len, pgoff,
+ flags);
#ifdef CONFIG_MMU
- if (!get_area)
- get_area = current->mm->get_unmapped_area;
+ else
+ return mm_get_unmapped_area(current->mm, file, orig_addr,
+ len, pgoff, flags);
#endif
- if (get_area)
- return get_area(file, orig_addr, len, pgoff, flags);
+
return orig_addr;
}
diff --git a/fs/ramfs/file-mmu.c b/fs/ramfs/file-mmu.c
index c7a1aa3c882b..b45c7edc3225 100644
--- a/fs/ramfs/file-mmu.c
+++ b/fs/ramfs/file-mmu.c
@@ -35,7 +35,7 @@ static unsigned long ramfs_mmu_get_unmapped_area(struct file *file,
unsigned long addr, unsigned long len, unsigned long pgoff,
unsigned long flags)
{
- return current->mm->get_unmapped_area(file, addr, len, pgoff, flags);
+ return mm_get_unmapped_area(current->mm, file, addr, len, pgoff, flags);
}
const struct file_operations ramfs_file_operations = {
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 957ce38768b2..f01c01b4a4fc 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -745,11 +745,7 @@ struct mm_struct {
} ____cacheline_aligned_in_smp;
struct maple_tree mm_mt;
-#ifdef CONFIG_MMU
- unsigned long (*get_unmapped_area) (struct file *filp,
- unsigned long addr, unsigned long len,
- unsigned long pgoff, unsigned long flags);
-#endif
+
unsigned long mmap_base; /* base of mmap area */
unsigned long mmap_legacy_base; /* base of mmap area in bottom-up allocations */
#ifdef CONFIG_HAVE_ARCH_COMPAT_MMAP_BASES
diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
index 02f5090ffea2..428e440424c5 100644
--- a/include/linux/sched/coredump.h
+++ b/include/linux/sched/coredump.h
@@ -74,6 +74,7 @@ static inline int get_dumpable(struct mm_struct *mm)
#define MMF_DISABLE_THP_MASK (1 << MMF_DISABLE_THP)
#define MMF_OOM_REAP_QUEUED 25 /* mm was queued for oom_reaper */
#define MMF_MULTIPROCESS 26 /* mm is shared between processes */
+#define MMF_TOPDOWN 27 /* mm is shared between processes */
/*
* MMF_HAS_PINNED: Whether this mm has pinned any pages. This can be either
* replaced in the future by mm.pinned_vm when it becomes stable, or grow into
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index 9a19f1b42f64..cde946e926d8 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -8,6 +8,7 @@
#include <linux/mm_types.h>
#include <linux/gfp.h>
#include <linux/sync_core.h>
+#include <linux/sched/coredump.h>
/*
* Routines for handling mm_structs
@@ -186,6 +187,10 @@ arch_get_unmapped_area_topdown(struct file *filp, unsigned long addr,
unsigned long len, unsigned long pgoff,
unsigned long flags);
+unsigned long mm_get_unmapped_area(struct mm_struct *mm, struct file *filp,
+ unsigned long addr, unsigned long len,
+ unsigned long pgoff, unsigned long flags);
+
unsigned long
generic_get_unmapped_area(struct file *filp, unsigned long addr,
unsigned long len, unsigned long pgoff,
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 9626a363f121..b3c4ec115989 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -3569,7 +3569,7 @@ static unsigned long io_uring_mmu_get_unmapped_area(struct file *filp,
#else
addr = 0UL;
#endif
- return current->mm->get_unmapped_area(filp, addr, len, pgoff, flags);
+ return mm_get_unmapped_area(current->mm, filp, addr, len, pgoff, flags);
}
#else /* !CONFIG_MMU */
diff --git a/mm/debug.c b/mm/debug.c
index ee533a5ceb79..32db5de8e1e7 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -162,9 +162,6 @@ EXPORT_SYMBOL(dump_vma);
void dump_mm(const struct mm_struct *mm)
{
pr_emerg("mm %px task_size %lu\n"
-#ifdef CONFIG_MMU
- "get_unmapped_area %px\n"
-#endif
"mmap_base %lu mmap_legacy_base %lu\n"
"pgd %px mm_users %d mm_count %d pgtables_bytes %lu map_count %d\n"
"hiwater_rss %lx hiwater_vm %lx total_vm %lx locked_vm %lx\n"
@@ -190,9 +187,6 @@ void dump_mm(const struct mm_struct *mm)
"def_flags: %#lx(%pGv)\n",
mm, mm->task_size,
-#ifdef CONFIG_MMU
- mm->get_unmapped_area,
-#endif
mm->mmap_base, mm->mmap_legacy_base,
mm->pgd, atomic_read(&mm->mm_users),
atomic_read(&mm->mm_count),
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 86ee29b5c39c..e9ef43a719a5 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -641,8 +641,8 @@ static unsigned long __thp_get_unmapped_area(struct file *filp,
if (len_pad < len || (off + len_pad) < off)
return 0;
- ret = current->mm->get_unmapped_area(filp, addr, len_pad,
- off >> PAGE_SHIFT, flags);
+ ret = mm_get_unmapped_area(current->mm, filp, addr, len_pad,
+ off >> PAGE_SHIFT, flags);
/*
* The failure might be due to length padding. The caller will retry
@@ -672,7 +672,7 @@ unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr,
if (ret)
return ret;
- return current->mm->get_unmapped_area(filp, addr, len, pgoff, flags);
+ return mm_get_unmapped_area(current->mm, filp, addr, len, pgoff, flags);
}
EXPORT_SYMBOL_GPL(thp_get_unmapped_area);
diff --git a/mm/mmap.c b/mm/mmap.c
index aa82eec17489..b61bc515c729 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1807,7 +1807,8 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
unsigned long pgoff, unsigned long flags)
{
unsigned long (*get_area)(struct file *, unsigned long,
- unsigned long, unsigned long, unsigned long);
+ unsigned long, unsigned long, unsigned long)
+ = NULL;
unsigned long error = arch_mmap_check(addr, len, flags);
if (error)
@@ -1817,7 +1818,6 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
if (len > TASK_SIZE)
return -ENOMEM;
- get_area = current->mm->get_unmapped_area;
if (file) {
if (file->f_op->get_unmapped_area)
get_area = file->f_op->get_unmapped_area;
@@ -1834,7 +1834,11 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
get_area = thp_get_unmapped_area;
}
- addr = get_area(file, addr, len, pgoff, flags);
+ if (get_area)
+ addr = get_area(file, addr, len, pgoff, flags);
+ else
+ addr = mm_get_unmapped_area(current->mm, file, addr, len,
+ pgoff, flags);
if (IS_ERR_VALUE(addr))
return addr;
@@ -1849,6 +1853,17 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
EXPORT_SYMBOL(get_unmapped_area);
+unsigned long
+mm_get_unmapped_area(struct mm_struct *mm, struct file *file,
+ unsigned long addr, unsigned long len,
+ unsigned long pgoff, unsigned long flags)
+{
+ if (test_bit(MMF_TOPDOWN, &mm->flags))
+ return arch_get_unmapped_area_topdown(file, addr, len, pgoff, flags);
+ return arch_get_unmapped_area(file, addr, len, pgoff, flags);
+}
+EXPORT_SYMBOL(mm_get_unmapped_area);
+
/**
* find_vma_intersection() - Look up the first VMA which intersects the interval
* @mm: The process address space.
diff --git a/mm/shmem.c b/mm/shmem.c
index 0d1ce70bce38..a8cac92dff88 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2242,8 +2242,6 @@ unsigned long shmem_get_unmapped_area(struct file *file,
unsigned long uaddr, unsigned long len,
unsigned long pgoff, unsigned long flags)
{
- unsigned long (*get_area)(struct file *,
- unsigned long, unsigned long, unsigned long, unsigned long);
unsigned long addr;
unsigned long offset;
unsigned long inflated_len;
@@ -2253,8 +2251,8 @@ unsigned long shmem_get_unmapped_area(struct file *file,
if (len > TASK_SIZE)
return -ENOMEM;
- get_area = current->mm->get_unmapped_area;
- addr = get_area(file, uaddr, len, pgoff, flags);
+ addr = mm_get_unmapped_area(current->mm, file, uaddr, len, pgoff,
+ flags);
if (!IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE))
return addr;
@@ -2311,7 +2309,8 @@ unsigned long shmem_get_unmapped_area(struct file *file,
if (inflated_len < len)
return addr;
- inflated_addr = get_area(NULL, uaddr, inflated_len, 0, flags);
+ inflated_addr = mm_get_unmapped_area(current->mm, NULL, uaddr,
+ inflated_len, 0, flags);
if (IS_ERR_VALUE(inflated_addr))
return addr;
if (inflated_addr & ~PAGE_MASK)
@@ -4757,7 +4756,7 @@ unsigned long shmem_get_unmapped_area(struct file *file,
unsigned long addr, unsigned long len,
unsigned long pgoff, unsigned long flags)
{
- return current->mm->get_unmapped_area(file, addr, len, pgoff, flags);
+ return mm_get_unmapped_area(current->mm, file, addr, len, pgoff, flags);
}
#endif
diff --git a/mm/util.c b/mm/util.c
index 744b4d7e3fae..10c836a75e66 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -452,17 +452,17 @@ void arch_pick_mmap_layout(struct mm_struct *mm, struct rlimit *rlim_stack)
if (mmap_is_legacy(rlim_stack)) {
mm->mmap_base = TASK_UNMAPPED_BASE + random_factor;
- mm->get_unmapped_area = arch_get_unmapped_area;
+ clear_bit(MMF_TOPDOWN, &mm->flags);
} else {
mm->mmap_base = mmap_base(random_factor, rlim_stack);
- mm->get_unmapped_area = arch_get_unmapped_area_topdown;
+ set_bit(MMF_TOPDOWN, &mm->flags);
}
}
#elif defined(CONFIG_MMU) && !defined(HAVE_ARCH_PICK_MMAP_LAYOUT)
void arch_pick_mmap_layout(struct mm_struct *mm, struct rlimit *rlim_stack)
{
mm->mmap_base = TASK_UNMAPPED_BASE;
- mm->get_unmapped_area = arch_get_unmapped_area;
+ clear_bit(MMF_TOPDOWN, &mm->flags);
}
#endif
--
2.34.1
When memory is being placed, mmap() will take care to respect the guard
gaps of certain types of memory (VM_SHADOWSTACK, VM_GROWSUP and
VM_GROWSDOWN). In order to ensure guard gaps between mappings, mmap()
needs to consider two things:
1. That the new mapping isn’t placed in an any existing mappings guard
gaps.
2. That the new mapping isn’t placed such that any existing mappings
are not in *its* guard gaps.
The long standing behavior of mmap() is to ensure 1, but not take any care
around 2. So for example, if there is a PAGE_SIZE free area, and a
mmap() with a PAGE_SIZE size, and a type that has a guard gap is being
placed, mmap() may place the shadow stack in the PAGE_SIZE free area. Then
the mapping that is supposed to have a guard gap will not have a gap to
the adjacent VMA.
Now that the vm_flags is passed into the arch get_unmapped_area()'s, and
vm_unmapped_area() is ready to consider it, have VM_SHADOW_STACK's get
guard gap consideration for scenario 2.
Signed-off-by: Rick Edgecombe <[email protected]>
---
arch/x86/kernel/sys_x86_64.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/sys_x86_64.c b/arch/x86/kernel/sys_x86_64.c
index f92780cf9662..3b78fdc235fc 100644
--- a/arch/x86/kernel/sys_x86_64.c
+++ b/arch/x86/kernel/sys_x86_64.c
@@ -119,6 +119,14 @@ static void find_start_end(unsigned long addr, unsigned long flags,
*end = task_size_64bit(addr > DEFAULT_MAP_WINDOW);
}
+static inline unsigned long stack_guard_placement(vm_flags_t vm_flags)
+{
+ if (vm_flags & VM_SHADOW_STACK)
+ return PAGE_SIZE;
+
+ return 0;
+}
+
extern unsigned long
arch_get_unmapped_area_vmflags(struct file *filp, unsigned long addr, unsigned long len,
unsigned long pgoff, unsigned long flags, vm_flags_t vm_flags)
@@ -144,12 +152,13 @@ arch_get_unmapped_area_vmflags(struct file *filp, unsigned long addr, unsigned l
return addr;
}
- info.flags = 0;
+ info.flags = VM_UNMAPPED_START_GAP_SET;
info.length = len;
info.low_limit = begin;
info.high_limit = end;
info.align_mask = 0;
info.align_offset = pgoff << PAGE_SHIFT;
+ info.start_gap = stack_guard_placement(vm_flags);
if (filp) {
info.align_mask = get_align_mask();
info.align_offset += get_align_bits();
@@ -191,7 +200,7 @@ arch_get_unmapped_area_topdown_vmflags(struct file *filp, unsigned long addr0,
}
get_unmapped_area:
- info.flags = VM_UNMAPPED_AREA_TOPDOWN;
+ info.flags = VM_UNMAPPED_AREA_TOPDOWN | VM_UNMAPPED_START_GAP_SET;
info.length = len;
if (!in_32bit_syscall() && (flags & MAP_ABOVE4G))
info.low_limit = SZ_4G;
@@ -199,6 +208,7 @@ arch_get_unmapped_area_topdown_vmflags(struct file *filp, unsigned long addr0,
info.low_limit = PAGE_SIZE;
info.high_limit = get_mmap_base(0);
+ info.start_gap = stack_guard_placement(vm_flags);
/*
* If hint address is above DEFAULT_MAP_WINDOW, look for unmapped area
--
2.34.1
The existing shadow stack test for guard gaps just checks that new
mappings are not placed in an existing mapping's guard gap. Add one that
checks that new mappings are not placed such that preexisting mappings are
in the new mappings guard gap.
Signed-off-by: Rick Edgecombe <[email protected]>
---
.../testing/selftests/x86/test_shadow_stack.c | 67 +++++++++++++++++--
1 file changed, 63 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/x86/test_shadow_stack.c b/tools/testing/selftests/x86/test_shadow_stack.c
index 757e6527f67e..ee909a7927f9 100644
--- a/tools/testing/selftests/x86/test_shadow_stack.c
+++ b/tools/testing/selftests/x86/test_shadow_stack.c
@@ -556,7 +556,7 @@ struct node {
* looked at the shadow stack gaps.
* 5. See if it landed in the gap.
*/
-int test_guard_gap(void)
+int test_guard_gap_other_gaps(void)
{
void *free_area, *shstk, *test_map = (void *)0xFFFFFFFFFFFFFFFF;
struct node *head = NULL, *cur;
@@ -593,11 +593,64 @@ int test_guard_gap(void)
if (shstk - test_map - PAGE_SIZE != PAGE_SIZE)
return 1;
- printf("[OK]\tGuard gap test\n");
+ printf("[OK]\tGuard gap test, other mapping's gaps\n");
return 0;
}
+/* Tests respecting the guard gap of the mapping getting placed */
+int test_guard_gap_new_mappings_gaps(void)
+{
+ void *free_area, *shstk_start, *test_map = (void *)0xFFFFFFFFFFFFFFFF;
+ struct node *head = NULL, *cur;
+ int ret = 0;
+
+ free_area = mmap(0, PAGE_SIZE * 4, PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+ munmap(free_area, PAGE_SIZE * 4);
+
+ /* Test letting map_shadow_stack find a free space */
+ shstk_start = mmap(free_area, PAGE_SIZE, PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+ if (shstk_start == MAP_FAILED || shstk_start != free_area)
+ return 1;
+
+ while (test_map > shstk_start) {
+ test_map = (void *)syscall(__NR_map_shadow_stack, 0, PAGE_SIZE, 0);
+ if (test_map == MAP_FAILED) {
+ printf("[INFO]\tmap_shadow_stack MAP_FAILED\n");
+ ret = 1;
+ break;
+ }
+
+ cur = malloc(sizeof(*cur));
+ cur->mapping = test_map;
+
+ cur->next = head;
+ head = cur;
+
+ if (test_map == free_area + PAGE_SIZE) {
+ printf("[INFO]\tNew mapping has other mapping in guard gap!\n");
+ ret = 1;
+ break;
+ }
+ }
+
+ while (head) {
+ cur = head;
+ head = cur->next;
+ munmap(cur->mapping, PAGE_SIZE);
+ free(cur);
+ }
+
+ munmap(shstk_start, PAGE_SIZE);
+
+ if (!ret)
+ printf("[OK]\tGuard gap test, placement mapping's gaps\n");
+
+ return ret;
+}
+
/*
* Too complicated to pull it out of the 32 bit header, but also get the
* 64 bit one needed above. Just define a copy here.
@@ -850,9 +903,15 @@ int main(int argc, char *argv[])
goto out;
}
- if (test_guard_gap()) {
+ if (test_guard_gap_other_gaps()) {
ret = 1;
- printf("[FAIL]\tGuard gap test\n");
+ printf("[FAIL]\tGuard gap test, other mappings' gaps\n");
+ goto out;
+ }
+
+ if (test_guard_gap_new_mappings_gaps()) {
+ ret = 1;
+ printf("[FAIL]\tGuard gap test, placement mapping's gaps\n");
goto out;
}
--
2.34.1
On 2/15/24 15:13, Rick Edgecombe wrote:
> The mm_struct contains a function pointer *get_unmapped_area(), which
> is set to either arch_get_unmapped_area() or
> arch_get_unmapped_area_topdown() during the initialization of the mm.
>
> Since the function pointer only ever points to two functions that are named
> the same across all arch's, a function pointer is not really required. In
> addition future changes will want to add versions of the functions that
> take additional arguments. So to save a pointers worth of bytes in
> mm_struct, and prevent adding additional function pointers to mm_struct in
> future changes, remove it and keep the information about which
> get_unmapped_area() to use in a flag.
Indirect calls are just all kinds of evil, especially when Spectre-v2 is
in play. This is a really good idea.
Acked-by: Dave Hansen <[email protected]>
* Dave Hansen <[email protected]> [240215 19:30]:
> On 2/15/24 15:13, Rick Edgecombe wrote:
> > The mm_struct contains a function pointer *get_unmapped_area(), which
> > is set to either arch_get_unmapped_area() or
> > arch_get_unmapped_area_topdown() during the initialization of the mm.
> >
> > Since the function pointer only ever points to two functions that are named
> > the same across all arch's, a function pointer is not really required. In
> > addition future changes will want to add versions of the functions that
> > take additional arguments. So to save a pointers worth of bytes in
> > mm_struct, and prevent adding additional function pointers to mm_struct in
> > future changes, remove it and keep the information about which
> > get_unmapped_area() to use in a flag.
>
> Indirect calls are just all kinds of evil, especially when Spectre-v2 is
> in play. This is a really good idea.
>
> Acked-by: Dave Hansen <[email protected]>
Yes, this change can stand on its own I believe.
Acked-by: Liam R. Howlett <[email protected]>
On Thu, Feb 15, 2024 at 03:13:25PM -0800, Rick Edgecombe wrote:
> diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
> index 02f5090ffea2..428e440424c5 100644
> --- a/include/linux/sched/coredump.h
> +++ b/include/linux/sched/coredump.h
> @@ -74,6 +74,7 @@ static inline int get_dumpable(struct mm_struct *mm)
> #define MMF_DISABLE_THP_MASK (1 << MMF_DISABLE_THP)
> #define MMF_OOM_REAP_QUEUED 25 /* mm was queued for oom_reaper */
> #define MMF_MULTIPROCESS 26 /* mm is shared between processes */
> +#define MMF_TOPDOWN 27 /* mm is shared between processes */
> /*
> * MMF_HAS_PINNED: Whether this mm has pinned any pages. This can be either
> * replaced in the future by mm.pinned_vm when it becomes stable, or grow into
Comment has to be updated.
Otherwise, looks good:
Reviewed-by: Kirill A. Shutemov <[email protected]>
--
Kiryl Shutsemau / Kirill A. Shutemov
On Thu, Feb 15, 2024 at 03:13:27PM -0800, Rick Edgecombe wrote:
> @@ -1879,6 +1879,12 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
> return error ? error : addr;
> }
>
> +unsigned long
> +get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
> + unsigned long pgoff, unsigned long flags)
> +{
> + return __get_unmapped_area(file, addr, len, pgoff, flags, 0);
> +}
> EXPORT_SYMBOL(get_unmapped_area);
Any reason it is not a static inline function in the header file?
--
Kiryl Shutsemau / Kirill A. Shutemov
On Thu, Feb 15, 2024 at 03:13:28PM -0800, Rick Edgecombe wrote:
> +unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr,
> + unsigned long len, unsigned long pgoff, unsigned long flags)
> +{
> + return thp_get_unmapped_area_vmflags(filp, addr, len, pgoff, flags, 0);
> }
> EXPORT_SYMBOL_GPL(thp_get_unmapped_area);
Again, static inline should be fine.
--
Kiryl Shutsemau / Kirill A. Shutemov
On Thu, Feb 15, 2024 at 03:13:29PM -0800, Rick Edgecombe wrote:
> When memory is being placed, mmap() will take care to respect the guard
> gaps of certain types of memory (VM_SHADOWSTACK, VM_GROWSUP and
> VM_GROWSDOWN). In order to ensure guard gaps between mappings, mmap()
> needs to consider two things:
> 1. That the new mapping isn’t placed in an any existing mappings guard
> gaps.
> 2. That the new mapping isn’t placed such that any existing mappings
> are not in *its* guard gaps.
>
> The long standing behavior of mmap() is to ensure 1, but not take any care
> around 2. So for example, if there is a PAGE_SIZE free area, and a
> mmap() with a PAGE_SIZE size, and a type that has a guard gap is being
> placed, mmap() may place the shadow stack in the PAGE_SIZE free area. Then
> the mapping that is supposed to have a guard gap will not have a gap to
> the adjacent VMA.
>
> For MAP_GROWSDOWN/VM_GROWSDOWN and MAP_GROWSUP/VM_GROWSUP this has not
> been a problem in practice because applications place these kinds of
> mappings very early, when there is not many mappings to find a space
> between. But for shadow stacks, they may be placed throughout the lifetime
> of the application.
>
> So define a VM_UNMAPPED_START_GAP_SET flag to specify that a start_gap
> field has been set, as most vm_unmapped_area_info structs are not zeroed,
> so the added field will often contain garbage. Use
> VM_UNMAPPED_START_GAP_SET in unmapped_area/_topdown() to find a space that
> includes the guard gap for the new mapping. Take care to not interfere
> with the alignment.
>
> Signed-off-by: Rick Edgecombe <[email protected]>
> ---
> include/linux/mm.h | 2 ++
> mm/mmap.c | 21 ++++++++++++++-------
> 2 files changed, 16 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 9addf16dbf18..160bb6db7a16 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3393,12 +3393,14 @@ extern unsigned long __must_check vm_mmap(struct file *, unsigned long,
>
> struct vm_unmapped_area_info {
> #define VM_UNMAPPED_AREA_TOPDOWN 1
> +#define VM_UNMAPPED_START_GAP_SET 2
The flag seems to be an workaround not to clear the structure. I think
users need to be updated to clear the structure. In most cases rework code
to use C99 struct initializer would do the trick.
> unsigned long flags;
> unsigned long length;
> unsigned long low_limit;
> unsigned long high_limit;
> unsigned long align_mask;
> unsigned long align_offset;
> + unsigned long start_gap;
> };
>
> extern unsigned long vm_unmapped_area(struct vm_unmapped_area_info *info);
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 936d728ba1ca..1b6c333656f9 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1567,14 +1567,17 @@ static inline int accountable_mapping(struct file *file, vm_flags_t vm_flags)
> */
> static unsigned long unmapped_area(struct vm_unmapped_area_info *info)
> {
> - unsigned long length, gap;
> + unsigned long length, gap, start_gap = 0;
> unsigned long low_limit, high_limit;
> struct vm_area_struct *tmp;
>
> MA_STATE(mas, ¤t->mm->mm_mt, 0, 0);
>
> + if (info->flags & VM_UNMAPPED_START_GAP_SET)
> + start_gap = info->start_gap;
> +
> /* Adjust search length to account for worst case alignment overhead */
> - length = info->length + info->align_mask;
> + length = info->length + info->align_mask + start_gap;
> if (length < info->length)
> return -ENOMEM;
>
> @@ -1586,7 +1589,7 @@ static unsigned long unmapped_area(struct vm_unmapped_area_info *info)
> if (mas_empty_area(&mas, low_limit, high_limit - 1, length))
> return -ENOMEM;
>
> - gap = mas.index;
> + gap = mas.index + start_gap;
> gap += (info->align_offset - gap) & info->align_mask;
Do we care to check if alignment itself would satisfy start_gap
requirement?
> tmp = mas_next(&mas, ULONG_MAX);
> if (tmp && (tmp->vm_flags & VM_STARTGAP_FLAGS)) { /* Avoid prev check if possible */
> @@ -1619,13 +1622,17 @@ static unsigned long unmapped_area(struct vm_unmapped_area_info *info)
> */
> static unsigned long unmapped_area_topdown(struct vm_unmapped_area_info *info)
> {
> - unsigned long length, gap, gap_end;
> + unsigned long length, gap, gap_end, start_gap = 0;
> unsigned long low_limit, high_limit;
> struct vm_area_struct *tmp;
>
> MA_STATE(mas, ¤t->mm->mm_mt, 0, 0);
> +
> + if (info->flags & VM_UNMAPPED_START_GAP_SET)
> + start_gap = info->start_gap;
> +
> /* Adjust search length to account for worst case alignment overhead */
> - length = info->length + info->align_mask;
> + length = info->length + info->align_mask + start_gap;
> if (length < info->length)
> return -ENOMEM;
>
> @@ -1832,7 +1839,7 @@ unsigned long mm_get_unmapped_area_vmflags(struct mm_struct *mm, struct file *fi
>
> unsigned long
> __get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
> - unsigned long pgoff, unsigned long flags, vm_flags_t vm_flags)
> + unsigned long pgoff, unsigned long flags, vm_flags_t vm_flags)
Unrelated space change.
> {
> unsigned long (*get_area)(struct file *, unsigned long,
> unsigned long, unsigned long, unsigned long)
> @@ -1883,7 +1890,7 @@ __get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
>
> unsigned long
> get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
> - unsigned long pgoff, unsigned long flags)
> + unsigned long pgoff, unsigned long flags)
Ditto.
> {
> return __get_unmapped_area(file, addr, len, pgoff, flags, 0);
> }
> --
> 2.34.1
>
--
Kiryl Shutsemau / Kirill A. Shutemov
On Fri, 2024-02-16 at 14:30 +0200, Kirill A. Shutemov wrote:
> Comment has to be updated.
Doh! I thought I fixed that.
>
> Otherwise, looks good:
>
> Reviewed-by: Kirill A. Shutemov <[email protected]>
Thanks.
On Fri, 2024-02-16 at 14:56 +0200, Kirill A. Shutemov wrote:
> > +unsigned long
> > +get_unmapped_area(struct file *file, unsigned long addr, unsigned
> > long len,
> > + unsigned long pgoff, unsigned long flags)
> > +{
> > + return __get_unmapped_area(file, addr, len, pgoff, flags,
> > 0);
> > +}
> > EXPORT_SYMBOL(get_unmapped_area);
>
> Any reason it is not a static inline function in the header file?
get_unmapped_area() doesn't seem to be referenced from any modules. I
don't think it needs to be exported actually. Maybe it used to be?
It could be a static inline it seems. Why are you thinking it would be
better?
I think maybe get_unmapped_area() should stay as is, static-inline
wise, but remove the export, and the newly added __get_unmapped_area()
should be made static. Does it sound reasonable?
On Fri, 2024-02-16 at 14:59 +0200, Kirill A. Shutemov wrote:
> On Thu, Feb 15, 2024 at 03:13:28PM -0800, Rick Edgecombe wrote:
> > +unsigned long thp_get_unmapped_area(struct file *filp, unsigned
> > long addr,
> > + unsigned long len, unsigned long pgoff, unsigned
> > long flags)
> > +{
> > + return thp_get_unmapped_area_vmflags(filp, addr, len,
> > pgoff, flags, 0);
> > }
> > EXPORT_SYMBOL_GPL(thp_get_unmapped_area);
>
> Again, static inline should be fine.
Not sure if the diff might be a bit misleading here, the signature of
thp_get_unmapped_area() didn't actually change.
If thp_get_unmapped_area() is made into a static inline, then
thp_get_unmapped_area_vmflags() would need to be exported instead so it
could be used in some modules. Unlike get_unmapped_area()
thp_get_unmapped_area is actually is used that way.
Better to export the more limited version?
On Fri, 2024-02-16 at 15:12 +0200, Kirill A. Shutemov wrote:
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 9addf16dbf18..160bb6db7a16 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -3393,12 +3393,14 @@ extern unsigned long __must_check
> > vm_mmap(struct file *, unsigned long,
> >
> > struct vm_unmapped_area_info {
> > #define VM_UNMAPPED_AREA_TOPDOWN 1
> > +#define VM_UNMAPPED_START_GAP_SET 2
>
> The flag seems to be an workaround not to clear the structure. I
> think
> users need to be updated to clear the structure. In most cases rework
> code
> to use C99 struct initializer would do the trick.
Yea, it's just a treewide change to initialize them all. It should be
easy to review at least. I'll add a patch to do this.
> > @@ -1586,7 +1589,7 @@ static unsigned long unmapped_area(struct
> > vm_unmapped_area_info *info)
> > if (mas_empty_area(&mas, low_limit, high_limit - 1,
> > length))
> > return -ENOMEM;
> >
> > - gap = mas.index;
> > + gap = mas.index + start_gap;
> > gap += (info->align_offset - gap) & info->align_mask;
>
> Do we care to check if alignment itself would satisfy start_gap
> requirement?
Ugh, I think actually the alignment stuff clobbers the guard gap in the
search up scenario. I'm also seeing some weird results as I throw test
values into the existing logic, but very likely I just need to look at
this not late on a Friday. Thanks for pointing it out.
> > unsigned long
> > __get_unmapped_area(struct file *file, unsigned long addr,
> > unsigned long len,
> > - unsigned long pgoff, unsigned long flags,
> > vm_flags_t vm_flags)
> > + unsigned long pgoff, unsigned long flags,
> > vm_flags_t vm_flags)
>
> Unrelated space change.
Sure.
On Fri, Feb 16, 2024 at 10:15:05PM +0000, Edgecombe, Rick P wrote:
> On Fri, 2024-02-16 at 14:56 +0200, Kirill A. Shutemov wrote:
> > > +unsigned long
> > > +get_unmapped_area(struct file *file, unsigned long addr, unsigned
> > > long len,
> > > +???????????????unsigned long pgoff, unsigned long flags)
> > > +{
> > > +???????return __get_unmapped_area(file, addr, len, pgoff, flags,
> > > 0);
> > > +}
> > > ? EXPORT_SYMBOL(get_unmapped_area);
> >
> > Any reason it is not a static inline function in the header file?
>
> get_unmapped_area() doesn't seem to be referenced from any modules. I
> don't think it needs to be exported actually. Maybe it used to be?
>
> It could be a static inline it seems. Why are you thinking it would be
> better?
That's just what I would do for legacy interface wrapper for new function
interface. And save a function call for caller (it shouldn't matter in
this case, but still).
> I think maybe get_unmapped_area() should stay as is, static-inline
> wise, but remove the export, and the newly added __get_unmapped_area()
> should be made static. Does it sound reasonable?
Up to you.
--
Kiryl Shutsemau / Kirill A. Shutemov
On Fri, Feb 16, 2024 at 10:21:13PM +0000, Edgecombe, Rick P wrote:
> On Fri, 2024-02-16 at 14:59 +0200, Kirill A. Shutemov wrote:
> > On Thu, Feb 15, 2024 at 03:13:28PM -0800, Rick Edgecombe wrote:
> > > +unsigned long thp_get_unmapped_area(struct file *filp, unsigned
> > > long addr,
> > > +???????????????unsigned long len, unsigned long pgoff, unsigned
> > > long flags)
> > > +{
> > > +???????return thp_get_unmapped_area_vmflags(filp, addr, len,
> > > pgoff, flags, 0);
> > > ? }
> > > ? EXPORT_SYMBOL_GPL(thp_get_unmapped_area);
> >
> > Again, static inline should be fine.
>
> Not sure if the diff might be a bit misleading here, the signature of
> thp_get_unmapped_area() didn't actually change.
>
> If thp_get_unmapped_area() is made into a static inline, then
> thp_get_unmapped_area_vmflags() would need to be exported instead so it
> could be used in some modules. Unlike get_unmapped_area()
> thp_get_unmapped_area is actually is used that way.
>
> Better to export the more limited version?
Okay, it is a valid point.
--
Kiryl Shutsemau / Kirill A. Shutemov
On Fri, 2024-02-16 at 17:11 -0800, Rick Edgecombe wrote:
> > Do we care to check if alignment itself would satisfy start_gap
> > requirement?
>
> Ugh, I think actually the alignment stuff clobbers the guard gap in
> the
> search up scenario. I'm also seeing some weird results as I throw
> test
> values into the existing logic, but very likely I just need to look
> at
> this not late on a Friday. Thanks for pointing it out.
Ok, playing around with the address adjustment math in a separate test
program, I think it is all ok functionally. But there are two gotchas:
1. The existing math for search up assumes that the requested length is
bigger than the alignment mask. If the length is smaller, non-
cannonical addresses can result (more than ->high_limit). I don't think
any callers can call with this combination so it's fine functionally.
2. The newly added code can only hit the scenario you highlight if the
start gap is more than the alignment size. If alignment mask is more
than the start gap, the alignment will only shift the address more than
the adjustment made for the start gap.
So if it skips the start gap adjustment in the case of alignment adding
the necessary gap it won't change the result and just add a branch.
Similarly, if the start gap fulfills the alignment, there is no
adjustment during the alignment step.
I think maybe I'll add a comment covering both gotchas and leave the
logic as is, unless there are any objections. Or maybe a VM_WARN_ON,
hmm.
On Thu, Feb 15, 2024 at 03:13:25PM -0800, Rick Edgecombe wrote:
> The mm_struct contains a function pointer *get_unmapped_area(), which
> is set to either arch_get_unmapped_area() or
> arch_get_unmapped_area_topdown() during the initialization of the mm.
>
> Since the function pointer only ever points to two functions that are named
> the same across all arch's, a function pointer is not really required. In
> addition future changes will want to add versions of the functions that
> take additional arguments. So to save a pointers worth of bytes in
> mm_struct, and prevent adding additional function pointers to mm_struct in
> future changes, remove it and keep the information about which
> get_unmapped_area() to use in a flag.
>
> Introduce a helper, mm_get_unmapped_area(), to easily convert code that
> refers to the old function pointer to instead select and call either
> arch_get_unmapped_area() or arch_get_unmapped_area_topdown() based on the
> flag. Then drop the mm->get_unmapped_area() function pointer. Leave the
> get_unmapped_area() pointer in struct file_operations alone. The main
> purpose of this change is to reorganize in preparation for future changes,
> but it also converts the calls of mm->get_unmapped_area() from indirect
> branches into a direct ones.
>
> The stress-ng bigheap benchmark calls realloc a lot, which calls through
> get_unmapped_area() in the kernel. On x86, the change yielded a ~4%
> improvement there. (bogo ops/s (usr+sys time))
>
> In testing a few x86 configs, removing the pointer unfortunately didn't
> result in any actual size reductions in the compiled layout of mm_struct.
> But depending on compiler or arch alignment requirements, the change could
> possibly shrink the size of mm_struct.
>
> Signed-off-by: Rick Edgecombe <[email protected]>
> ---
> arch/s390/mm/hugetlbpage.c | 2 +-
> arch/s390/mm/mmap.c | 4 ++--
> arch/sparc/kernel/sys_sparc_64.c | 15 ++++++---------
> arch/sparc/mm/hugetlbpage.c | 2 +-
> arch/x86/kernel/cpu/sgx/driver.c | 2 +-
> arch/x86/mm/hugetlbpage.c | 2 +-
> arch/x86/mm/mmap.c | 4 ++--
> drivers/char/mem.c | 2 +-
> drivers/dax/device.c | 6 +++---
> fs/hugetlbfs/inode.c | 2 +-
> fs/proc/inode.c | 15 ++++++++-------
> fs/ramfs/file-mmu.c | 2 +-
> include/linux/mm_types.h | 6 +-----
> include/linux/sched/coredump.h | 1 +
> include/linux/sched/mm.h | 5 +++++
> io_uring/io_uring.c | 2 +-
> mm/debug.c | 6 ------
> mm/huge_memory.c | 6 +++---
> mm/mmap.c | 21 ++++++++++++++++++---
> mm/shmem.c | 11 +++++------
> mm/util.c | 6 +++---
> 21 files changed, 65 insertions(+), 57 deletions(-)
>
> diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
> index 02f5090ffea2..428e440424c5 100644
> --- a/include/linux/sched/coredump.h
> +++ b/include/linux/sched/coredump.h
> @@ -74,6 +74,7 @@ static inline int get_dumpable(struct mm_struct *mm)
> #define MMF_DISABLE_THP_MASK (1 << MMF_DISABLE_THP)
> #define MMF_OOM_REAP_QUEUED 25 /* mm was queued for oom_reaper */
> #define MMF_MULTIPROCESS 26 /* mm is shared between processes */
> +#define MMF_TOPDOWN 27 /* mm is shared between processes */
Nit: you may want to update the comment here ;-)
> /*
> * MMF_HAS_PINNED: Whether this mm has pinned any pages. This can be either
> * replaced in the future by mm.pinned_vm when it becomes stable, or grow into
--
Sincerely yours,
Mike.
On Wed, 2024-02-21 at 09:10 +0200, Mike Rapoport wrote:
> > #define MMF_MULTIPROCESS 26 /* mm is shared between
> > processes */
> > +#define MMF_TOPDOWN 27 /* mm is shared between
> > processes */
>
> Nit: you may want to update the comment here ;-)
Yep, thanks.