2018-02-27 13:17:27

by Ilya Smith

[permalink] [raw]
Subject: [RFC PATCH] Randomization of address chosen by mmap.

This is more proof of concept. Current implementation doesn't randomize
address returned by mmap. All the entropy ends with choosing mmap_base_addr
at the process creation. After that mmap build very predictable layout
of address space. It allows to bypass ASLR in many cases.
This patch make randomization of address on any mmap call.
It works good on 64 bit system, but usage under 32 bit systems is not
recommended. This approach uses current implementation to simplify search
of address.

Here I would like to discuss this approach.

Signed-off-by: Ilya Smith <[email protected]>
---
include/linux/mm.h | 4 ++
mm/mmap.c | 171 +++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 175 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ad06d42adb1a..f81b6c8a0bc5 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -25,6 +25,7 @@
#include <linux/err.h>
#include <linux/page_ref.h>
#include <linux/memremap.h>
+#include <linux/sched.h>

struct mempolicy;
struct anon_vma;
@@ -2253,6 +2254,7 @@ struct vm_unmapped_area_info {
unsigned long align_offset;
};

+extern unsigned long unmapped_area_random(struct vm_unmapped_area_info *info);
extern unsigned long unmapped_area(struct vm_unmapped_area_info *info);
extern unsigned long unmapped_area_topdown(struct vm_unmapped_area_info *info);

@@ -2268,6 +2270,8 @@ extern unsigned long unmapped_area_topdown(struct vm_unmapped_area_info *info);
static inline unsigned long
vm_unmapped_area(struct vm_unmapped_area_info *info)
{
+ if (current->flags & PF_RANDOMIZE)
+ return unmapped_area_random(info);
if (info->flags & VM_UNMAPPED_AREA_TOPDOWN)
return unmapped_area_topdown(info);
else
diff --git a/mm/mmap.c b/mm/mmap.c
index 9efdc021ad22..58110e065417 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -45,6 +45,7 @@
#include <linux/moduleparam.h>
#include <linux/pkeys.h>
#include <linux/oom.h>
+#include <linux/random.h>

#include <linux/uaccess.h>
#include <asm/cacheflush.h>
@@ -1780,6 +1781,176 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
return error;
}

+unsigned long unmapped_area_random(struct vm_unmapped_area_info *info)
+{
+ // first lets find right border with unmapped_area_topdown
+ struct mm_struct *mm = current->mm;
+ struct vm_area_struct *vma;
+ struct vm_area_struct *right_vma = 0;
+ unsigned long entropy;
+ unsigned int entropy_count;
+ unsigned long length, low_limit, high_limit, gap_start, gap_end;
+ unsigned long addr, low, high;
+
+ /* Adjust search length to account for worst case alignment overhead */
+ length = info->length + info->align_mask;
+ if (length < info->length)
+ return -ENOMEM;
+
+ /*
+ * Adjust search limits by the desired length.
+ * See implementation comment at top of unmapped_area().
+ */
+ gap_end = info->high_limit;
+ if (gap_end < length)
+ return -ENOMEM;
+ high_limit = gap_end - length;
+
+ info->low_limit = 0x10000;
+ if (info->low_limit > high_limit)
+ return -ENOMEM;
+ low_limit = info->low_limit + length;
+
+ /* Check highest gap, which does not precede any rbtree node */
+ gap_start = mm->highest_vm_end;
+ if (gap_start <= high_limit)
+ goto found;
+
+ /* Check if rbtree root looks promising */
+ if (RB_EMPTY_ROOT(&mm->mm_rb))
+ return -ENOMEM;
+ vma = rb_entry(mm->mm_rb.rb_node, struct vm_area_struct, vm_rb);
+ if (vma->rb_subtree_gap < length)
+ return -ENOMEM;
+
+ while (true) {
+ /* Visit right subtree if it looks promising */
+ gap_start = vma->vm_prev ? vm_end_gap(vma->vm_prev) : 0;
+ if (gap_start <= high_limit && vma->vm_rb.rb_right) {
+ struct vm_area_struct *right =
+ rb_entry(vma->vm_rb.rb_right,
+ struct vm_area_struct, vm_rb);
+ if (right->rb_subtree_gap >= length) {
+ vma = right;
+ continue;
+ }
+ }
+
+check_current_down:
+ /* Check if current node has a suitable gap */
+ gap_end = vm_start_gap(vma);
+ if (gap_end < low_limit)
+ return -ENOMEM;
+ if (gap_start <= high_limit &&
+ gap_end > gap_start && gap_end - gap_start >= length)
+ goto found;
+
+ /* Visit left subtree if it looks promising */
+ if (vma->vm_rb.rb_left) {
+ struct vm_area_struct *left =
+ rb_entry(vma->vm_rb.rb_left,
+ struct vm_area_struct, vm_rb);
+ if (left->rb_subtree_gap >= length) {
+ vma = left;
+ continue;
+ }
+ }
+
+ /* Go back up the rbtree to find next candidate node */
+ while (true) {
+ struct rb_node *prev = &vma->vm_rb;
+
+ if (!rb_parent(prev))
+ return -ENOMEM;
+ vma = rb_entry(rb_parent(prev),
+ struct vm_area_struct, vm_rb);
+ if (prev == vma->vm_rb.rb_right) {
+ gap_start = vma->vm_prev ?
+ vm_end_gap(vma->vm_prev) : 0;
+ goto check_current_down;
+ }
+ }
+ }
+
+found:
+ right_vma = vma;
+ low = gap_start;
+ high = gap_end - length;
+
+ entropy = get_random_long();
+ entropy_count = 0;
+
+ // from left node to right we check if node is fine and
+ // randomly select it.
+ vma = mm->mmap;
+ while (vma != right_vma) {
+ /* Visit left subtree if it looks promising */
+ gap_end = vm_start_gap(vma);
+ if (gap_end >= low_limit && vma->vm_rb.rb_left) {
+ struct vm_area_struct *left =
+ rb_entry(vma->vm_rb.rb_left,
+ struct vm_area_struct, vm_rb);
+ if (left->rb_subtree_gap >= length) {
+ vma = left;
+ continue;
+ }
+ }
+
+ gap_start = vma->vm_prev ? vm_end_gap(vma->vm_prev) : low_limit;
+check_current_up:
+ /* Check if current node has a suitable gap */
+ if (gap_start > high_limit)
+ break;
+ if (gap_end >= low_limit &&
+ gap_end > gap_start && gap_end - gap_start >= length) {
+ if (entropy & 1) {
+ low = gap_start;
+ high = gap_end - length;
+ }
+ entropy >>= 1;
+ if (++entropy_count == 64) {
+ entropy = get_random_long();
+ entropy_count = 0;
+ }
+ }
+
+ /* Visit right subtree if it looks promising */
+ if (vma->vm_rb.rb_right) {
+ struct vm_area_struct *right =
+ rb_entry(vma->vm_rb.rb_right,
+ struct vm_area_struct, vm_rb);
+ if (right->rb_subtree_gap >= length) {
+ vma = right;
+ continue;
+ }
+ }
+
+ /* Go back up the rbtree to find next candidate node */
+ while (true) {
+ struct rb_node *prev = &vma->vm_rb;
+
+ if (!rb_parent(prev))
+ BUG(); // this should not happen
+ vma = rb_entry(rb_parent(prev),
+ struct vm_area_struct, vm_rb);
+ if (prev == vma->vm_rb.rb_left) {
+ gap_start = vm_end_gap(vma->vm_prev);
+ gap_end = vm_start_gap(vma);
+ if (vma == right_vma)
+ break;
+ goto check_current_up;
+ }
+ }
+ }
+
+ if (high == low)
+ return low;
+
+ addr = get_random_long() % ((high - low) >> PAGE_SHIFT);
+ addr = low + (addr << PAGE_SHIFT);
+ return addr;
+}
+
unsigned long unmapped_area(struct vm_unmapped_area_info *info)
{
/*
--
2.14.1



2018-02-27 20:53:57

by Kees Cook

[permalink] [raw]
Subject: Re: [RFC PATCH] Randomization of address chosen by mmap.

On Tue, Feb 27, 2018 at 5:13 AM, Ilya Smith <[email protected]> wrote:
> This is more proof of concept. Current implementation doesn't randomize
> address returned by mmap. All the entropy ends with choosing mmap_base_addr
> at the process creation. After that mmap build very predictable layout
> of address space. It allows to bypass ASLR in many cases.

I'd like more details on the threat model here; if it's just a matter
of .so loading order, I wonder if load order randomization would get a
comparable level of uncertainty without the memory fragmentation,
like:
https://android-review.googlesource.com/c/platform/bionic/+/178130/2
If glibc, for example, could do this too, it would go a long way to
improving things. Obviously, it's not as extreme as loading stuff all
over the place, but it seems like the effect for an attack would be
similar. The search _area_ remains small, but the ordering wouldn't be
deterministic any more.

> This patch make randomization of address on any mmap call.
> It works good on 64 bit system, but usage under 32 bit systems is not
> recommended. This approach uses current implementation to simplify search
> of address.

It would be worth spelling out the "not recommended" bit some more
too: this fragments the mmap space, which has some serious issues on
smaller address spaces if you get into a situation where you cannot
allocate a hole large enough between the other allocations.

>
> Here I would like to discuss this approach.
>
> Signed-off-by: Ilya Smith <[email protected]>
> ---
> include/linux/mm.h | 4 ++
> mm/mmap.c | 171 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 175 insertions(+)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index ad06d42adb1a..f81b6c8a0bc5 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -25,6 +25,7 @@
> #include <linux/err.h>
> #include <linux/page_ref.h>
> #include <linux/memremap.h>
> +#include <linux/sched.h>
>
> struct mempolicy;
> struct anon_vma;
> @@ -2253,6 +2254,7 @@ struct vm_unmapped_area_info {
> unsigned long align_offset;
> };
>
> +extern unsigned long unmapped_area_random(struct vm_unmapped_area_info *info);
> extern unsigned long unmapped_area(struct vm_unmapped_area_info *info);
> extern unsigned long unmapped_area_topdown(struct vm_unmapped_area_info *info);
>
> @@ -2268,6 +2270,8 @@ extern unsigned long unmapped_area_topdown(struct vm_unmapped_area_info *info);
> static inline unsigned long
> vm_unmapped_area(struct vm_unmapped_area_info *info)
> {
> + if (current->flags & PF_RANDOMIZE)
> + return unmapped_area_random(info);

I think this will need a larger knob -- doing this by default is
likely to break stuff, I'd imagine? Bikeshedding: I'm not sure if this
should be setting "3" for /proc/sys/kernel/randomize_va_space, or a
separate one like /proc/sys/mm/randomize_mmap_allocation.

> if (info->flags & VM_UNMAPPED_AREA_TOPDOWN)
> return unmapped_area_topdown(info);
> else
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 9efdc021ad22..58110e065417 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -45,6 +45,7 @@
> #include <linux/moduleparam.h>
> #include <linux/pkeys.h>
> #include <linux/oom.h>
> +#include <linux/random.h>
>
> #include <linux/uaccess.h>
> #include <asm/cacheflush.h>
> @@ -1780,6 +1781,176 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> return error;
> }
>
> +unsigned long unmapped_area_random(struct vm_unmapped_area_info *info)
> +{
> + // first lets find right border with unmapped_area_topdown

Nit: kernel comments are /* */. (It's a good idea to run patches
through scripts/checkpatch.pl first.)

> + struct mm_struct *mm = current->mm;
> + struct vm_area_struct *vma;
> + struct vm_area_struct *right_vma = 0;
> + unsigned long entropy;
> + unsigned int entropy_count;
> + unsigned long length, low_limit, high_limit, gap_start, gap_end;
> + unsigned long addr, low, high;
> +
> + /* Adjust search length to account for worst case alignment overhead */
> + length = info->length + info->align_mask;
> + if (length < info->length)
> + return -ENOMEM;
> +
> + /*
> + * Adjust search limits by the desired length.
> + * See implementation comment at top of unmapped_area().
> + */
> + gap_end = info->high_limit;
> + if (gap_end < length)
> + return -ENOMEM;
> + high_limit = gap_end - length;
> +
> + info->low_limit = 0x10000;
> + if (info->low_limit > high_limit)
> + return -ENOMEM;
> + low_limit = info->low_limit + length;
> +
> + /* Check highest gap, which does not precede any rbtree node */
> + gap_start = mm->highest_vm_end;
> + if (gap_start <= high_limit)
> + goto found;
> +
> + /* Check if rbtree root looks promising */
> + if (RB_EMPTY_ROOT(&mm->mm_rb))
> + return -ENOMEM;
> + vma = rb_entry(mm->mm_rb.rb_node, struct vm_area_struct, vm_rb);
> + if (vma->rb_subtree_gap < length)
> + return -ENOMEM;
> +
> + while (true) {
> + /* Visit right subtree if it looks promising */
> + gap_start = vma->vm_prev ? vm_end_gap(vma->vm_prev) : 0;
> + if (gap_start <= high_limit && vma->vm_rb.rb_right) {
> + struct vm_area_struct *right =
> + rb_entry(vma->vm_rb.rb_right,
> + struct vm_area_struct, vm_rb);
> + if (right->rb_subtree_gap >= length) {
> + vma = right;
> + continue;
> + }
> + }
> +
> +check_current_down:
> + /* Check if current node has a suitable gap */
> + gap_end = vm_start_gap(vma);
> + if (gap_end < low_limit)
> + return -ENOMEM;
> + if (gap_start <= high_limit &&
> + gap_end > gap_start && gap_end - gap_start >= length)
> + goto found;
> +
> + /* Visit left subtree if it looks promising */
> + if (vma->vm_rb.rb_left) {
> + struct vm_area_struct *left =
> + rb_entry(vma->vm_rb.rb_left,
> + struct vm_area_struct, vm_rb);
> + if (left->rb_subtree_gap >= length) {
> + vma = left;
> + continue;
> + }
> + }
> +
> + /* Go back up the rbtree to find next candidate node */
> + while (true) {
> + struct rb_node *prev = &vma->vm_rb;
> +
> + if (!rb_parent(prev))
> + return -ENOMEM;
> + vma = rb_entry(rb_parent(prev),
> + struct vm_area_struct, vm_rb);
> + if (prev == vma->vm_rb.rb_right) {
> + gap_start = vma->vm_prev ?
> + vm_end_gap(vma->vm_prev) : 0;
> + goto check_current_down;
> + }
> + }
> + }

Everything from here up is identical to the existing
unmapped_area_topdown(), yes? This likely needs to be refactored
instead of copy/pasted, and adjust to handle both unmapped_area() and
unmapped_area_topdown().

> +
> +found:
> + right_vma = vma;
> + low = gap_start;
> + high = gap_end - length;
> +
> + entropy = get_random_long();
> + entropy_count = 0;
> +
> + // from left node to right we check if node is fine and
> + // randomly select it.
> + vma = mm->mmap;
> + while (vma != right_vma) {
> + /* Visit left subtree if it looks promising */
> + gap_end = vm_start_gap(vma);
> + if (gap_end >= low_limit && vma->vm_rb.rb_left) {
> + struct vm_area_struct *left =
> + rb_entry(vma->vm_rb.rb_left,
> + struct vm_area_struct, vm_rb);
> + if (left->rb_subtree_gap >= length) {
> + vma = left;
> + continue;
> + }
> + }
> +
> + gap_start = vma->vm_prev ? vm_end_gap(vma->vm_prev) : low_limit;
> +check_current_up:
> + /* Check if current node has a suitable gap */
> + if (gap_start > high_limit)
> + break;
> + if (gap_end >= low_limit &&
> + gap_end > gap_start && gap_end - gap_start >= length) {
> + if (entropy & 1) {
> + low = gap_start;
> + high = gap_end - length;
> + }
> + entropy >>= 1;
> + if (++entropy_count == 64) {
> + entropy = get_random_long();
> + entropy_count = 0;
> + }
> + }
> +
> + /* Visit right subtree if it looks promising */
> + if (vma->vm_rb.rb_right) {
> + struct vm_area_struct *right =
> + rb_entry(vma->vm_rb.rb_right,
> + struct vm_area_struct, vm_rb);
> + if (right->rb_subtree_gap >= length) {
> + vma = right;
> + continue;
> + }
> + }
> +
> + /* Go back up the rbtree to find next candidate node */
> + while (true) {
> + struct rb_node *prev = &vma->vm_rb;
> +
> + if (!rb_parent(prev))
> + BUG(); // this should not happen
> + vma = rb_entry(rb_parent(prev),
> + struct vm_area_struct, vm_rb);
> + if (prev == vma->vm_rb.rb_left) {
> + gap_start = vm_end_gap(vma->vm_prev);
> + gap_end = vm_start_gap(vma);
> + if (vma == right_vma)

mm/mmap.c: In function ‘unmapped_area_random’:
mm/mmap.c:1939:8: warning: ‘vma’ may be used uninitialized in this
function [-Wmaybe-uninitialized]
if (vma == right_vma)
^

> + break;
> + goto check_current_up;
> + }
> + }
> + }

What are the two phases here? Could this second one get collapsed into
the first?

> +
> + if (high == low)
> + return low;
> +
> + addr = get_random_long() % ((high - low) >> PAGE_SHIFT);
> + addr = low + (addr << PAGE_SHIFT);
> + return addr;
> +}
> +
> unsigned long unmapped_area(struct vm_unmapped_area_info *info)
> {
> /*

How large are the gaps intended to be? Looking at the gaps on
something like Xorg they differ a lot.

Otherwise, looks interesting!

-Kees

--
Kees Cook
Pixel Security

2018-02-27 21:32:57

by lazytyped

[permalink] [raw]
Subject: Re: [RFC PATCH] Randomization of address chosen by mmap.



On 2/27/18 9:52 PM, Kees Cook wrote:
> I'd like more details on the threat model here; if it's just a matter
> of .so loading order, I wonder if load order randomization would get a
> comparable level of uncertainty without the memory fragmentation,

This also seems to assume that leaking the address of one single library
isn't enough to mount a ROP attack to either gain enough privileges or
generate a primitive that can leak further information. Is this really
the case? Do you have some further data around this?


       -  twiz

2018-02-28 17:14:29

by Ilya Smith

[permalink] [raw]
Subject: Re: [RFC PATCH] Randomization of address chosen by mmap.

Hello Kees,

Thanks for your time spent on that!

> On 27 Feb 2018, at 23:52, Kees Cook <[email protected]> wrote:
>
> I'd like more details on the threat model here; if it's just a matter
> of .so loading order, I wonder if load order randomization would get a
> comparable level of uncertainty without the memory fragmentation,
> like:
> https://android-review.googlesource.com/c/platform/bionic/+/178130/2
> If glibc, for example, could do this too, it would go a long way to
> improving things. Obviously, it's not as extreme as loading stuff all
> over the place, but it seems like the effect for an attack would be
> similar. The search _area_ remains small, but the ordering wouldn't be
> deterministic any more.
>

I’m afraid library order randomization wouldn’t help much, there are several
cases described in chapter 2 here:
http://www.openwall.com/lists/oss-security/2018/02/27/5
when it is possible to bypass ASLR.

I’m agree library randomizaiton is a good improvement but after my patch
I think not much valuable. On my GitHub https://github.com/blackzert/aslur
I provided tests and will make them 'all in one’ chain later.

> It would be worth spelling out the "not recommended" bit some more
> too: this fragments the mmap space, which has some serious issues on
> smaller address spaces if you get into a situation where you cannot
> allocate a hole large enough between the other allocations.
>

I’m agree, that's the point.

>> vm_unmapped_area(struct vm_unmapped_area_info *info)
>> {
>> + if (current->flags & PF_RANDOMIZE)
>> + return unmapped_area_random(info);
>
> I think this will need a larger knob -- doing this by default is
> likely to break stuff, I'd imagine? Bikeshedding: I'm not sure if this
> should be setting "3" for /proc/sys/kernel/randomize_va_space, or a
> separate one like /proc/sys/mm/randomize_mmap_allocation.

I will improve it like you said. It looks like a better option.

>> + // first lets find right border with unmapped_area_topdown
>
> Nit: kernel comments are /* */. (It's a good idea to run patches
> through scripts/checkpatch.pl first.)
>

Sorry, I will fix it. Thanks!


>> + if (!rb_parent(prev))
>> + return -ENOMEM;
>> + vma = rb_entry(rb_parent(prev),
>> + struct vm_area_struct, vm_rb);
>> + if (prev == vma->vm_rb.rb_right) {
>> + gap_start = vma->vm_prev ?
>> + vm_end_gap(vma->vm_prev) : 0;
>> + goto check_current_down;
>> + }
>> + }
>> + }
>
> Everything from here up is identical to the existing
> unmapped_area_topdown(), yes? This likely needs to be refactored
> instead of copy/pasted, and adjust to handle both unmapped_area() and
> unmapped_area_topdown().
>

This part also keeps ‘right_vma' as a border. If it is ok, that combined version
will return vma struct, I’ll do it.

>> + /* Go back up the rbtree to find next candidate node */
>> + while (true) {
>> + struct rb_node *prev = &vma->vm_rb;
>> +
>> + if (!rb_parent(prev))
>> + BUG(); // this should not happen
>> + vma = rb_entry(rb_parent(prev),
>> + struct vm_area_struct, vm_rb);
>> + if (prev == vma->vm_rb.rb_left) {
>> + gap_start = vm_end_gap(vma->vm_prev);
>> + gap_end = vm_start_gap(vma);
>> + if (vma == right_vma)
>
> mm/mmap.c: In function ‘unmapped_area_random’:
> mm/mmap.c:1939:8: warning: ‘vma’ may be used uninitialized in this
> function [-Wmaybe-uninitialized]
> if (vma == right_vma)
> ^

Thanks, fixed!

>> + break;
>> + goto check_current_up;
>> + }
>> + }
>> + }
>
> What are the two phases here? Could this second one get collapsed into
> the first?
>

Let me explain.
1. we use current implementation to get larger address. Remember it as
‘right_vma’.
2. we walk tree from mm->mmap what is lowest vma.
3. we check if current vma gap satisfies length and low/high constrains
4. if so, we call random() to decide if we choose it. This how we randomly choos
e vma and gap
5. we walk tree from lowest vma to highest and ignore subtrees with less gap.
we do it until reach ‘right_vma’

Once we found gap, we may randomly choose address inside it.

>> + addr = get_random_long() % ((high - low) >> PAGE_SHIFT);
>> + addr = low + (addr << PAGE_SHIFT);
>> + return addr;
>>
>
> How large are the gaps intended to be? Looking at the gaps on
> something like Xorg they differ a lot.

Sorry, I can’t get clue. What's the context? You tried patch or whats the case?

Thanks,
Ilya




2018-02-28 18:34:59

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [RFC PATCH] Randomization of address chosen by mmap.

On Wed, Feb 28, 2018 at 08:13:00PM +0300, Ilya Smith wrote:
> > It would be worth spelling out the "not recommended" bit some more
> > too: this fragments the mmap space, which has some serious issues on
> > smaller address spaces if you get into a situation where you cannot
> > allocate a hole large enough between the other allocations.
> >
>
> I’m agree, that's the point.

Would it be worth randomising the address returned just ever so slightly?
ie instead of allocating exactly the next address, put in a guard hole
of (configurable, by default maybe) 1-15 pages? Is that enough extra
entropy to foil an interesting number of attacks, or do we need the full
randomise-the-address-space approach in order to be useful?


2018-02-28 19:56:18

by Kees Cook

[permalink] [raw]
Subject: Re: [RFC PATCH] Randomization of address chosen by mmap.

On Wed, Feb 28, 2018 at 9:13 AM, Ilya Smith <[email protected]> wrote:
>> On 27 Feb 2018, at 23:52, Kees Cook <[email protected]> wrote:
>> What are the two phases here? Could this second one get collapsed into
>> the first?
>>
>
> Let me explain.
> 1. we use current implementation to get larger address. Remember it as
> ‘right_vma’.
> 2. we walk tree from mm->mmap what is lowest vma.
> 3. we check if current vma gap satisfies length and low/high constrains
> 4. if so, we call random() to decide if we choose it. This how we randomly choose vma and gap
> 5. we walk tree from lowest vma to highest and ignore subtrees with less gap.
> we do it until reach ‘right_vma’
>
> Once we found gap, we may randomly choose address inside it.
>
>>> + addr = get_random_long() % ((high - low) >> PAGE_SHIFT);
>>> + addr = low + (addr << PAGE_SHIFT);
>>> + return addr;
>>>
>>
>> How large are the gaps intended to be? Looking at the gaps on
>> something like Xorg they differ a lot.
>
> Sorry, I can’t get clue. What's the context? You tried patch or whats the case?

I was trying to understand the target entropy level, and I'm worried
it's a bit biased. For example, if the first allocation lands at 1/4th
of the memory space, the next allocation (IIUC) has a 50% chance of
falling on either side of it. If it goes on the small side, it then
has much less entropy than if it had gone on the other side. I think
this may be less entropy than choosing a random address and just
seeing if it fits or not. Dealing with collisions could be done either
by pushing the address until it doesn't collide or picking another
random address, etc. This is probably more expensive, though, since it
would need to walk the vma tree repeatedly. Anyway, I was ultimately
curious about your measured entropy and what alternatives you
considered.

-Kees

--
Kees Cook
Pixel Security

2018-02-28 21:03:49

by Daniel Micay

[permalink] [raw]
Subject: Re: [RFC PATCH] Randomization of address chosen by mmap.

The option to add at least one guard page would be useful whether or
not it's tied to randomization. It's not feasible to do that in
userspace for mmap as a whole, only specific users of mmap like malloc
and it adds significant overhead vs. a kernel implementation. It could
optionally let you choose a minimum and maximum guard region size with
it picking random sizes if they're not equal. It's important for it to
be an enforced gap rather than something that can be filled in by
another allocation. It will obviously help a lot more when it's being
used with a hardened allocator designed to take advantage of this
rather than glibc malloc or jemalloc.

I don't think it makes sense for the kernel to attempt mitigations to
hide libraries. The best way to do that is in userspace, by having the
linker reserve a large PROT_NONE region for mapping libraries (both at
initialization and for dlopen) including a random gap to act as a
separate ASLR base. If an attacker has library addresses, it's hard to
see much point in hiding the other libraries from them. It does make
sense to keep them from knowing the location of any executable code if
they leak non-library addresses. An isolated library region + gap is a
feature we implemented in CopperheadOS and it works well, although we
haven't ported it to Android 7.x or 8.x. I don't think the kernel can
bring much / anything to the table for it. It's inherently the
responsibility of libc to randomize the lower bits for secondary
stacks too.

Fine-grained randomized mmap isn't going to be used if it causes
unpredictable levels of fragmentation or has a high / unpredictable
performance cost. I don't think it makes sense to approach it
aggressively in a way that people can't use. The OpenBSD randomized
mmap is a fairly conservative implementation to avoid causing
excessive fragmentation. I think they do a bit more than adding random
gaps by switching between different 'pivots' but that isn't very high
benefit. The main benefit is having random bits of unmapped space all
over the heap when combined with their hardened allocator which
heavily uses small mmap mappings and has a fair bit of malloc-level
randomization (it's a bitmap / hash table based slab allocator using
4k regions with a page span cache and we use a port of it to Android
with added hardening features but we're missing the fine-grained mmap
rand it's meant to have underneath what it does itself).

The default vm.max_map_count = 65530 is also a major problem for doing
fine-grained mmap randomization of any kind and there's the 32-bit
reference count overflow issue on high memory machines with
max_map_count * pid_max which isn't resolved yet.

2018-03-01 13:53:47

by Ilya Smith

[permalink] [raw]
Subject: Re: [RFC PATCH] Randomization of address chosen by mmap.


> On 28 Feb 2018, at 22:54, Kees Cook <[email protected]> wrote:
>
> I was trying to understand the target entropy level, and I'm worried
> it's a bit biased. For example, if the first allocation lands at 1/4th
> of the memory space, the next allocation (IIUC) has a 50% chance of
> falling on either side of it. If it goes on the small side, it then
> has much less entropy than if it had gone on the other side. I think
> this may be less entropy than choosing a random address and just
> seeing if it fits or not. Dealing with collisions could be done either
> by pushing the address until it doesn't collide or picking another
> random address, etc. This is probably more expensive, though, since it
> would need to walk the vma tree repeatedly. Anyway, I was ultimately
> curious about your measured entropy and what alternatives you
> considered.

Let me please start with the options we have here.
Let's pretend we need to choose random address from free memory pool. Let’s
pretend we have an array of gaps sorted by size of gap descending. First we
find the highest index satisfies requested length. For each suitable gap (with
less index) we count how many pages in this gap satisfies request. And compute
total count of pages satisfies request. Now we get random by module of total
number. Subtracting from this value count of suitable gap pages for gaps until
this value greater we will find needed gap and offset inside it. Add gap start
to offset we will randomly choose suitable address.
In this scheme we have to keep array of gaps. Each time address space is
changed we have to keep the gaps array consistent and apply this changes. It is
a very big overhead on any change.

Pure random looks really expensive. Lets try to improve something.

We can’t just choose random address and try do it again and again until we find
something - this approach has non-deterministic behaviour. Nobody knows when it
stops. Same if we try to walk tree in random direction.

We can walk tree and try to build array of suitable gaps and choose something
from there. In my current approach (proof of concept) length of array is 1 and
thats why last gaps would be chosen with more probability. I’m agree. It is
possible to increase array spending some memory. For example struct mm may have
to array of 1024 gaps. We do the same, walk tree and randomly fill this array (
everything locked under write_mem semaphore). When we filled it or walked whole
tree - choose gap randomly. What do you think about it?

Thanks,
Ilya




2018-03-02 07:19:05

by Fengguang Wu

[permalink] [raw]
Subject: 097eb0af45: kernel_BUG_at_mm/hugetlb.c

FYI, we noticed the following commit (built with gcc-7):

commit: 097eb0af45c0010f9d5cbbc5f623058b3a275950 ("Randomization of address chosen by mmap.")
url: https://github.com/0day-ci/linux/commits/Ilya-Smith/Randomization-of-address-chosen-by-mmap/20180302-092859
base: git://git.cmpxchg.org/linux-mmotm.git master

in testcase: trinity
with following parameters:

runtime: 300s

test-description: Trinity is a linux system call fuzz tester.
test-url: http://codemonkey.org.uk/projects/trinity/


on test machine: qemu-system-x86_64 -enable-kvm -cpu host -smp 2 -m 1G

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


+------------------------------------------+------------+------------+
| | 745388a346 | 097eb0af45 |
+------------------------------------------+------------+------------+
| boot_successes | 6 | 9 |
| boot_failures | 0 | 4 |
| kernel_BUG_at_mm/hugetlb.c | 0 | 4 |
| invalid_opcode:#[##] | 0 | 4 |
| RIP:__unmap_hugepage_range | 0 | 4 |
| Kernel_panic-not_syncing:Fatal_exception | 0 | 4 |
+------------------------------------------+------------+------------+



[ 21.297686] kernel BUG at mm/hugetlb.c:3329!
[ 21.299026] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[ 21.300197] CPU: 1 PID: 507 Comm: trinity-c3 Not tainted 4.16.0-rc2-mm1-00153-g097eb0a #101
[ 21.304957] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[ 21.306766] RIP: 0010:__unmap_hugepage_range+0x5f/0x274
[ 21.308305] RSP: 0018:ffffa333c0bf7d20 EFLAGS: 00010206
[ 21.309410] RAX: 00000000001fffff RBX: ffff8d51ff3a1170 RCX: 0000000000000009
[ 21.310950] RDX: 00007f6e7bf10000 RSI: ffff8d51ff3a1170 RDI: ffffa333c0bf7df0
[ 21.312471] RBP: 00007f6e7c110000 R08: 0000000000000000 R09: 00007f6e7c110000
[ 21.313961] R10: ffffa333c0bf7cc0 R11: 0000000000000000 R12: 00007f6e7bf10000
[ 21.315541] R13: ffffa333c0bf7df0 R14: ffff8d51fe8e06f8 R15: ffffffffa4ad4d20
[ 21.317080] FS: 0000000000000000(0000) GS:ffff8d51f5800000(0000) knlGS:0000000000000000
[ 21.318828] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 21.320055] CR2: 0000560f12c38000 CR3: 000000002a816000 CR4: 00000000000006e0
[ 21.322177] DR0: 00007f66fb684000 DR1: 0000000000000000 DR2: 0000000000000000
[ 21.324102] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
[ 21.325642] Call Trace:
[ 21.326268] __unmap_hugepage_range_final+0x9/0x13
[ 21.327314] unmap_single_vma+0x8d/0xcd
[ 21.328143] unmap_vmas+0x30/0x3d
[ 21.328840] exit_mmap+0x93/0x13d
[ 21.329553] mmput+0x64/0xe5
[ 21.330227] do_exit+0x3f1/0x995
[ 21.330908] do_group_exit+0xad/0xad
[ 21.331691] SyS_exit_group+0xb/0xb
[ 21.332450] do_syscall_64+0x6d/0x103
[ 21.333246] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 21.334358] RIP: 0033:0x6f45afc331c8
[ 21.335126] RSP: 002b:00007ffd436fcaa8 EFLAGS: 00000202 ORIG_RAX: 00000000000000e7
[ 21.336525] RAX: ffffffffffffffda RBX: 4a4a4a4a4a4a4a4a RCX: 00006f45afc331c8
[ 21.337836] RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
[ 21.339366] RBP: 00006bd156196064 R08: 00000000000000e7 R09: ffffffffffffff98
[ 21.340895] R10: 0000000000000207 R11: 0000000000000202 R12: 0000000000000045
[ 21.342406] R13: 000000000000001a R14: 0000560f120153a0 R15: 00000000cccccccd
[ 21.343895] Code: 07 00 00 4c 8b 78 58 b8 00 10 00 00 41 8b 4f 08 48 d3 e0 f6 46 52 40 48 89 04 24 75 02 0f 0b 49 8b 47 10 48 f7 d0 48 85 d0 74 02 <0f> 0b 4c 85 c8 74 02 0f 0b 8b 04 24 48 8b 6e 40 49 89 fc 4c 89
[ 21.346557] RIP: __unmap_hugepage_range+0x5f/0x274 RSP: ffffa333c0bf7d20
[ 21.348945] 01 00 00 00 48 00 00 00 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 39 f2 07 05 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 21.348955]
[ 21.350744] ---[ end trace 685bd0bde9f67ae5 ]---


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> job-script # job-script is attached in this email



Thanks,
lkp


Attachments:
(No filename) (4.45 kB)
config-4.16.0-rc2-mm1-00153-g097eb0a (130.57 kB)
job-script (3.83 kB)
dmesg.xz (15.32 kB)
Download all attachments

2018-03-02 21:53:07

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [RFC PATCH] Randomization of address chosen by mmap.

On Fri, Mar 02, 2018 at 11:30:28PM +0300, Ilya Smith wrote:
> This is a really good question. Lets think we choose address with random-length
> guard hole. This length is limited by some configuration as you described. For
> instance let it be 1MB. Now according to current implementation, we still may
> fill this gap with small allocations with size less than 1MB. Attacker will
> going to build attack base on this predictable behaviour - he jus need to spray
> with 1 MB chunks (or less, with some expectation). This attack harder but not
> impossible.

Ah, I didn't mean that. I was thinking that we can change the
implementation to reserve 1-N pages after the end of the mapping.
So you can't map anything else in there, and any load/store into that
region will segfault.


2018-03-03 03:29:05

by Ilya Smith

[permalink] [raw]
Subject: Re: [RFC PATCH] Randomization of address chosen by mmap.

> On 28 Feb 2018, at 21:33, Matthew Wilcox <[email protected]> wrote:
>
> On Wed, Feb 28, 2018 at 08:13:00PM +0300, Ilya Smith wrote:
>>> It would be worth spelling out the "not recommended" bit some more
>>> too: this fragments the mmap space, which has some serious issues on
>>> smaller address spaces if you get into a situation where you cannot
>>> allocate a hole large enough between the other allocations.
>>>
>>
>> I’m agree, that's the point.
>
> Would it be worth randomising the address returned just ever so slightly?
> ie instead of allocating exactly the next address, put in a guard hole
> of (configurable, by default maybe) 1-15 pages? Is that enough extra
> entropy to foil an interesting number of attacks, or do we need the full
> randomise-the-address-space approach in order to be useful?
>

This is a really good question. Lets think we choose address with random-length
guard hole. This length is limited by some configuration as you described. For
instance let it be 1MB. Now according to current implementation, we still may
fill this gap with small allocations with size less than 1MB. Attacker will
going to build attack base on this predictable behaviour - he jus need to spray
with 1 MB chunks (or less, with some expectation). This attack harder but not
impossible.

Now lets say we will increase this 1MB to 128MB. Attack is the same, successful
rate less and more regions needed. Now we increase this value to 48 bit entropy
and will get my patch (in some form ;))

I hope full randomise-the-address-space approach will work for a long time.

Thanks,
Ilya


2018-03-03 14:00:11

by Ilya Smith

[permalink] [raw]
Subject: Re: [RFC PATCH] Randomization of address chosen by mmap.

Hello Daniel, thanks for sharing you experience!

> On 1 Mar 2018, at 00:02, Daniel Micay <[email protected]> wrote:
>
> I don't think it makes sense for the kernel to attempt mitigations to
> hide libraries. The best way to do that is in userspace, by having the
> linker reserve a large PROT_NONE region for mapping libraries (both at
> initialization and for dlopen) including a random gap to act as a
> separate ASLR base.
Why this is the best and what is the limit of this large region?
Let’s think out of the box.
What you say here means you made a separate memory region for libraries without
changing kernel. But the basic idea - you have a separate region for libraries
only. Probably you also want to have separate regions for any thread stack, for
mmaped files, shared memory, etc. This allows to protect memory regions of
different types from each other. It is impossible to implement this without
keeping the whole memory map. This map should be secure from any leak attack to
prevent ASLR bypass. The only one way to do it is to implement it in the kernel
and provide different syscalls like uselib or allocstack, etc. This one is
really hard in current kernel implementation.

My approach was to hide memory regions from attacker and from each other.

> If an attacker has library addresses, it's hard to
> see much point in hiding the other libraries from them.

In some cases attacker has only one leak for whole attack. And we should do the best
to make even this leak useless.

> It does make
> sense to keep them from knowing the location of any executable code if
> they leak non-library addresses. An isolated library region + gap is a
> feature we implemented in CopperheadOS and it works well, although we
> haven't ported it to Android 7.x or 8.x.
This one interesting to know and I would like to try to attack it, but it's out of the
scope of current conversation.

> I don't think the kernel can
> bring much / anything to the table for it. It's inherently the
> responsibility of libc to randomize the lower bits for secondary
> stacks too.

I think any bit of secondary stack should be randomized to provide attacker as
less information as we can.

> Fine-grained randomized mmap isn't going to be used if it causes
> unpredictable levels of fragmentation or has a high / unpredictable
> performance cost.

Lets pretend any chosen address is pure random and always satisfies request. At
some time we failed to mmap new chunk with size N. What does this means? This
means that all chunks with size of N are occupied and we even can’t find place
between them. Now lets count already allocated memory. Let’s pretend on all of
these occupied chunks lies one page minimum. So count of these pages is
TASK_SIZE / N. Total bytes already allocated is PASGE_SIZE * TASK_SIZE / N. Now
we can calculate. TASK_SIZE is 2^48 bytes. PAGE_SIZE 4096. If N is 1MB,
allocated memory minimum 1125899906842624, that is very big number. Ok. is N is
256 MB, we already consumed 4TB of memory. And this one is still ok. if N is
1GB we allocated 1GB and it looks like a problem. If we allocated 1GB of memory
we can’t mmap chunk size of 1GB. Sounds scary, but this is absolutely bad case
when we consume 1 page on 1GB chunk. In reality this number would be much
bigger and random according this patch.

Here lets stop and think - if we know that application going to consume memory.
The question here would be can we protect it? Attacker will know he has a good
probability to guess address with read permissions. In this case ASLR may not
work at all. For such applications we can turn off address randomization or
decrease entropy level since it any way will not help much.

Would be good to know whats the performance costs you can see here. Can
you please tell?

> I don't think it makes sense to approach it
> aggressively in a way that people can't use. The OpenBSD randomized
> mmap is a fairly conservative implementation to avoid causing
> excessive fragmentation. I think they do a bit more than adding random
> gaps by switching between different 'pivots' but that isn't very high
> benefit. The main benefit is having random bits of unmapped space all
> over the heap when combined with their hardened allocator which
> heavily uses small mmap mappings and has a fair bit of malloc-level
> randomization (it's a bitmap / hash table based slab allocator using
> 4k regions with a page span cache and we use a port of it to Android
> with added hardening features but we're missing the fine-grained mmap
> rand it's meant to have underneath what it does itself).
>

So you think OpenBSD implementation even better? It seems like you like it
after all.

> The default vm.max_map_count = 65530 is also a major problem for doing
> fine-grained mmap randomization of any kind and there's the 32-bit
> reference count overflow issue on high memory machines with
> max_map_count * pid_max which isn't resolved yet.

I’ve read a thread about it. This one is what should be fixed anyway.

Thanks,
Ilya


2018-03-03 15:14:27

by Ilya Smith

[permalink] [raw]
Subject: Re: [RFC PATCH] Randomization of address chosen by mmap.

> On 2 Mar 2018, at 23:48, Matthew Wilcox <[email protected]> wrote:
> Ah, I didn't mean that. I was thinking that we can change the
> implementation to reserve 1-N pages after the end of the mapping.
> So you can't map anything else in there, and any load/store into that
> region will segfault.
>

I’m afraid it still will allow many attacks. The formula for new address would
be like: address_next = address_prev - mmap_size - random(N) as you suggested.
To prevent brute-force attacks N should be big enough like more 2^32 for
example. This number 2^32 is just an example and right now I don’t know the
exact value. What I’m trying to say that address computation formula has
dependency on concrete predictable address. In my scheme even address_prev was
chose randomly.

Best regards,
Ilya

2018-03-03 21:01:48

by Daniel Micay

[permalink] [raw]
Subject: Re: [RFC PATCH] Randomization of address chosen by mmap.

On 3 March 2018 at 08:58, Ilya Smith <[email protected]> wrote:
> Hello Daniel, thanks for sharing you experience!
>
>> On 1 Mar 2018, at 00:02, Daniel Micay <[email protected]> wrote:
>>
>> I don't think it makes sense for the kernel to attempt mitigations to
>> hide libraries. The best way to do that is in userspace, by having the
>> linker reserve a large PROT_NONE region for mapping libraries (both at
>> initialization and for dlopen) including a random gap to act as a
>> separate ASLR base.
> Why this is the best and what is the limit of this large region?
> Let’s think out of the box.
> What you say here means you made a separate memory region for libraries without
> changing kernel. But the basic idea - you have a separate region for libraries
> only. Probably you also want to have separate regions for any thread stack, for
> mmaped files, shared memory, etc. This allows to protect memory regions of
> different types from each other. It is impossible to implement this without
> keeping the whole memory map. This map should be secure from any leak attack to
> prevent ASLR bypass. The only one way to do it is to implement it in the kernel
> and provide different syscalls like uselib or allocstack, etc. This one is
> really hard in current kernel implementation.

There's the option of reserving PROT_NONE regions and managing memory
within them using a similar best-fit allocation scheme to get separate
random bases. The kernel could offer something like that but it's
already possible to do it for libc mmap usage within libc as we did
for libraries.

The kernel's help is needed to cover non-libc users of mmap, i.e. not
the linker, malloc, etc. It's not possible for libc to assume that
everything goes through the libc mmap/mremap/munmap wrappers and it
would be a mess so I'm not saying the kernel doesn't have a part to
play. I'm only saying it makes sense to look at the whole picture and
if something can be done better in libc or the linker, to do it there
instead. There isn't an API for dividing stuff up into regions, so it
has to be done in userspace right now and I think it works a lot
better when it's an option.

>
> My approach was to hide memory regions from attacker and from each other.
>
>> If an attacker has library addresses, it's hard to
>> see much point in hiding the other libraries from them.
>
> In some cases attacker has only one leak for whole attack. And we should do the best
> to make even this leak useless.
>
>> It does make
>> sense to keep them from knowing the location of any executable code if
>> they leak non-library addresses. An isolated library region + gap is a
>> feature we implemented in CopperheadOS and it works well, although we
>> haven't ported it to Android 7.x or 8.x.
> This one interesting to know and I would like to try to attack it, but it's out of the
> scope of current conversation.

I don't think it's out-of-scope. There are different approaches to
this kind of finer-grained randomization and they can be done
together.

>> I don't think the kernel can
>> bring much / anything to the table for it. It's inherently the
>> responsibility of libc to randomize the lower bits for secondary
>> stacks too.
>
> I think any bit of secondary stack should be randomized to provide attacker as
> less information as we can.

The issue is that the kernel is only providing a mapping so it can add
a random gap or randomize it in other ways but it's ultimately up to
libc and other userspace code to do randomization without those
mappings.

A malloc implementation is similarly going to request fairly large
mappings from the kernel to manage a bunch of stuff within them
itself. The kernel can't protect against stuff like heap spray attacks
very well all by itself. It definitely has a part to play in that but
is a small piece of it (unless the malloc impl actually manages
virtual memory regions itself, which is already done by
performance-oriented allocators for very different reasons).

>> Fine-grained randomized mmap isn't going to be used if it causes
>> unpredictable levels of fragmentation or has a high / unpredictable
>> performance cost.
>
> Lets pretend any chosen address is pure random and always satisfies request. At
> some time we failed to mmap new chunk with size N. What does this means? This
> means that all chunks with size of N are occupied and we even can’t find place
> between them. Now lets count already allocated memory. Let’s pretend on all of
> these occupied chunks lies one page minimum. So count of these pages is
> TASK_SIZE / N. Total bytes already allocated is PASGE_SIZE * TASK_SIZE / N. Now
> we can calculate. TASK_SIZE is 2^48 bytes. PAGE_SIZE 4096. If N is 1MB,
> allocated memory minimum 1125899906842624, that is very big number. Ok. is N is
> 256 MB, we already consumed 4TB of memory. And this one is still ok. if N is
> 1GB we allocated 1GB and it looks like a problem. If we allocated 1GB of memory
> we can’t mmap chunk size of 1GB. Sounds scary, but this is absolutely bad case
> when we consume 1 page on 1GB chunk. In reality this number would be much
> bigger and random according this patch.
>
> Here lets stop and think - if we know that application going to consume memory.
> The question here would be can we protect it? Attacker will know he has a good
> probability to guess address with read permissions. In this case ASLR may not
> work at all. For such applications we can turn off address randomization or
> decrease entropy level since it any way will not help much.
>
> Would be good to know whats the performance costs you can see here. Can
> you please tell?

Fragmenting the virtual address space means having more TLB cache
misses, etc. Spreading out the mappings more also increases memory
usage and overhead for anything tied to the number of VMAs, which is
hopefully all O(log n) where it matters but O(log n) doesn't mean
increasing `n` is free.

>> I don't think it makes sense to approach it
>> aggressively in a way that people can't use. The OpenBSD randomized
>> mmap is a fairly conservative implementation to avoid causing
>> excessive fragmentation. I think they do a bit more than adding random
>> gaps by switching between different 'pivots' but that isn't very high
>> benefit. The main benefit is having random bits of unmapped space all
>> over the heap when combined with their hardened allocator which
>> heavily uses small mmap mappings and has a fair bit of malloc-level
>> randomization (it's a bitmap / hash table based slab allocator using
>> 4k regions with a page span cache and we use a port of it to Android
>> with added hardening features but we're missing the fine-grained mmap
>> rand it's meant to have underneath what it does itself).
>>
>
> So you think OpenBSD implementation even better? It seems like you like it
> after all.

I think they found a good compromise between low fragmentation vs.
some security benefits.

The main thing I'd like to see is just the option to get a guarantee
of enforced gaps around mappings, without necessarily even having
randomization of the gap size. It's possible to add guard pages in
userspace but it adds overhead by doubling the number of system calls
to map memory (mmap PROT_NONE region, mprotect the inner portion to
PROT_READ|PROT_WRITE) and *everything* using mmap would need to
cooperate which is unrealistic.

>> The default vm.max_map_count = 65530 is also a major problem for doing
>> fine-grained mmap randomization of any kind and there's the 32-bit
>> reference count overflow issue on high memory machines with
>> max_map_count * pid_max which isn't resolved yet.
>
> I’ve read a thread about it. This one is what should be fixed anyway.
>
> Thanks,
> Ilya
>

Yeah, the correctness issue should definitely be fixed. The default
value would *really* need to be raised if greatly increasing the
number of VMAs by not using a performance / low-fragmentation focused
best-fit algorithm without randomization or enforced gaps.

2018-03-04 03:50:15

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [RFC PATCH] Randomization of address chosen by mmap.

On Sat, Mar 03, 2018 at 04:00:45PM -0500, Daniel Micay wrote:
> The main thing I'd like to see is just the option to get a guarantee
> of enforced gaps around mappings, without necessarily even having
> randomization of the gap size. It's possible to add guard pages in
> userspace but it adds overhead by doubling the number of system calls
> to map memory (mmap PROT_NONE region, mprotect the inner portion to
> PROT_READ|PROT_WRITE) and *everything* using mmap would need to
> cooperate which is unrealistic.

So something like this?

To use it, OR in PROT_GUARD(n) to the PROT flags of mmap, and it should
pad the map by n pages. I haven't tested it, so I'm sure it's buggy,
but it seems like a fairly cheap way to give us padding after every
mapping.

Running it on an old kernel will result in no padding, so to see if it
worked or not, try mapping something immediately after it.

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 4ef7fb1726ab..9da6df7f62fc 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2183,8 +2183,8 @@ extern int install_special_mapping(struct mm_struct *mm,
extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);

extern unsigned long mmap_region(struct file *file, unsigned long addr,
- unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
- struct list_head *uf);
+ unsigned long len, unsigned long pad_len, vm_flags_t vm_flags,
+ unsigned long pgoff, struct list_head *uf);
extern unsigned long do_mmap(struct file *file, unsigned long addr,
unsigned long len, unsigned long prot, unsigned long flags,
vm_flags_t vm_flags, unsigned long pgoff, unsigned long *populate,
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 1c5dea402501..9c2b66fa0561 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -299,6 +299,7 @@ struct vm_area_struct {
struct mm_struct *vm_mm; /* The address space we belong to. */
pgprot_t vm_page_prot; /* Access permissions of this VMA. */
unsigned long vm_flags; /* Flags, see mm.h. */
+ unsigned int vm_guard; /* Number of trailing guard pages */

/*
* For areas with an address space and backing store,
diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h
index f8b134f5608f..d88babdf97f9 100644
--- a/include/uapi/asm-generic/mman-common.h
+++ b/include/uapi/asm-generic/mman-common.h
@@ -12,6 +12,7 @@
#define PROT_EXEC 0x4 /* page can be executed */
#define PROT_SEM 0x8 /* page may be used for atomic ops */
#define PROT_NONE 0x0 /* page can not be accessed */
+#define PROT_GUARD(x) ((x) & 0xffff) << 4 /* guard pages */
#define PROT_GROWSDOWN 0x01000000 /* mprotect flag: extend change to start of growsdown vma */
#define PROT_GROWSUP 0x02000000 /* mprotect flag: extend change to end of growsup vma */

diff --git a/mm/memory.c b/mm/memory.c
index 1cfc4699db42..5b0f87afa0af 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4125,6 +4125,9 @@ int handle_mm_fault(struct vm_area_struct *vma, unsigned long address,
flags & FAULT_FLAG_REMOTE))
return VM_FAULT_SIGSEGV;

+ if (DIV_ROUND_UP(vma->vm_end - address, PAGE_SIZE) < vma->vm_guard)
+ return VM_FAULT_SIGSEGV;
+
/*
* Enable the memcg OOM handling for faults triggered in user
* space. Kernel faults are handled more gracefully.
diff --git a/mm/mmap.c b/mm/mmap.c
index 575766ec02f8..b9844b810ee7 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1433,6 +1433,7 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
unsigned long pgoff, unsigned long *populate,
struct list_head *uf)
{
+ unsigned int guard_len = ((prot >> 4) & 0xffff) << PAGE_SHIFT;
struct mm_struct *mm = current->mm;
int pkey = 0;

@@ -1458,6 +1459,8 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
len = PAGE_ALIGN(len);
if (!len)
return -ENOMEM;
+ if (len + guard_len < len)
+ return -ENOMEM;

/* offset overflow? */
if ((pgoff + (len >> PAGE_SHIFT)) < pgoff)
@@ -1472,7 +1475,7 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
/* Obtain the address to map to. we verify (or select) it and ensure
* that it represents a valid section of the address space.
*/
- addr = get_unmapped_area(file, addr, len, pgoff, flags);
+ addr = get_unmapped_area(file, addr, len + guard_len, pgoff, flags);
if (offset_in_page(addr))
return addr;

@@ -1591,7 +1594,7 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
vm_flags |= VM_NORESERVE;
}

- addr = mmap_region(file, addr, len, vm_flags, pgoff, uf);
+ addr = mmap_region(file, addr, len, len + guard_len, vm_flags, pgoff, uf);
if (!IS_ERR_VALUE(addr) &&
((vm_flags & VM_LOCKED) ||
(flags & (MAP_POPULATE | MAP_NONBLOCK)) == MAP_POPULATE))
@@ -1727,8 +1730,8 @@ static inline int accountable_mapping(struct file *file, vm_flags_t vm_flags)
}

unsigned long mmap_region(struct file *file, unsigned long addr,
- unsigned long len, vm_flags_t vm_flags, unsigned long pgoff,
- struct list_head *uf)
+ unsigned long len, unsigned long pad_len, vm_flags_t vm_flags,
+ unsigned long pgoff, struct list_head *uf)
{
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma, *prev;
@@ -1737,24 +1740,24 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
unsigned long charged = 0;

/* Check against address space limit. */
- if (!may_expand_vm(mm, vm_flags, len >> PAGE_SHIFT)) {
+ if (!may_expand_vm(mm, vm_flags, pad_len >> PAGE_SHIFT)) {
unsigned long nr_pages;

/*
* MAP_FIXED may remove pages of mappings that intersects with
* requested mapping. Account for the pages it would unmap.
*/
- nr_pages = count_vma_pages_range(mm, addr, addr + len);
+ nr_pages = count_vma_pages_range(mm, addr, addr + pad_len);

if (!may_expand_vm(mm, vm_flags,
- (len >> PAGE_SHIFT) - nr_pages))
+ (pad_len >> PAGE_SHIFT) - nr_pages))
return -ENOMEM;
}

/* Clear old maps */
- while (find_vma_links(mm, addr, addr + len, &prev, &rb_link,
+ while (find_vma_links(mm, addr, addr + pad_len, &prev, &rb_link,
&rb_parent)) {
- if (do_munmap(mm, addr, len, uf))
+ if (do_munmap(mm, addr, pad_len, uf))
return -ENOMEM;
}

@@ -1771,7 +1774,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
/*
* Can we just expand an old mapping?
*/
- vma = vma_merge(mm, prev, addr, addr + len, vm_flags,
+ vma = vma_merge(mm, prev, addr, addr + pad_len, vm_flags,
NULL, file, pgoff, NULL, NULL_VM_UFFD_CTX);
if (vma)
goto out;
@@ -1789,9 +1792,10 @@ unsigned long mmap_region(struct file *file, unsigned long addr,

vma->vm_mm = mm;
vma->vm_start = addr;
- vma->vm_end = addr + len;
+ vma->vm_end = addr + pad_len;
vma->vm_flags = vm_flags;
vma->vm_page_prot = vm_get_page_prot(vm_flags);
+ vma->vm_guard = (pad_len - len) >> PAGE_SHIFT;
vma->vm_pgoff = pgoff;
INIT_LIST_HEAD(&vma->anon_vma_chain);


2018-03-04 21:48:54

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [RFC PATCH] Randomization of address chosen by mmap.

On Sat, Mar 03, 2018 at 07:47:04PM -0800, Matthew Wilcox wrote:
> On Sat, Mar 03, 2018 at 04:00:45PM -0500, Daniel Micay wrote:
> > The main thing I'd like to see is just the option to get a guarantee
> > of enforced gaps around mappings, without necessarily even having
> > randomization of the gap size. It's possible to add guard pages in
> > userspace but it adds overhead by doubling the number of system calls
> > to map memory (mmap PROT_NONE region, mprotect the inner portion to
> > PROT_READ|PROT_WRITE) and *everything* using mmap would need to
> > cooperate which is unrealistic.
>
> So something like this?
>
> To use it, OR in PROT_GUARD(n) to the PROT flags of mmap, and it should
> pad the map by n pages. I haven't tested it, so I'm sure it's buggy,
> but it seems like a fairly cheap way to give us padding after every
> mapping.
>
> Running it on an old kernel will result in no padding, so to see if it
> worked or not, try mapping something immediately after it.

Thinking about this more ...

- When you call munmap, if you pass in the same (addr, length) that were
used for mmap, then it should unmap the guard pages as well (that
wasn't part of the patch, so it would have to be added)
- If 'addr' is higher than the mapped address, and length at least
reaches the end of the mapping, then I would expect the guard pages to
"move down" and be after the end of the newly-shortened mapping.
- If 'addr' is higher than the mapped address, and the length doesn't
reach the end of the old mapping, we split the old mapping into two.
I would expect the guard pages to apply to both mappings, insofar as
they'll fit. For an example, suppose we have a five-page mapping with
two guard pages (MMMMMGG), and then we unmap the fourth page. Now we
have a three-page mapping with one guard page followed immediately
by a one-page mapping with two guard pages (MMMGMGG).

I would say that mremap cannot change the number of guard pages.
Although I'm a little tempted to add an mremap flag to permit the mapping
to expand into the guard pages. That would give us a nice way to reserve
address space for a mapping we think is going to expand.


2018-03-05 14:18:46

by Ilya Smith

[permalink] [raw]
Subject: Re: [RFC PATCH] Randomization of address chosen by mmap.


> On 4 Mar 2018, at 23:56, Matthew Wilcox <[email protected]> wrote:
> Thinking about this more ...
>
> - When you call munmap, if you pass in the same (addr, length) that were
> used for mmap, then it should unmap the guard pages as well (that
> wasn't part of the patch, so it would have to be added)
> - If 'addr' is higher than the mapped address, and length at least
> reaches the end of the mapping, then I would expect the guard pages to
> "move down" and be after the end of the newly-shortened mapping.
> - If 'addr' is higher than the mapped address, and the length doesn't
> reach the end of the old mapping, we split the old mapping into two.
> I would expect the guard pages to apply to both mappings, insofar as
> they'll fit. For an example, suppose we have a five-page mapping with
> two guard pages (MMMMMGG), and then we unmap the fourth page. Now we
> have a three-page mapping with one guard page followed immediately
> by a one-page mapping with two guard pages (MMMGMGG).

I’m analysing that approach and see much more problems:
- each time you call mmap like this, you still increase count of vmas as my
patch did
- now feature vma_merge shouldn’t work at all, until MAP_FIXED is set or
PROT_GUARD(0)
- the entropy you provide is like 16 bit, that is really not so hard to brute
- in your patch you don’t use vm_guard at address searching, I see many roots
of bugs here
- if you unmap/remap one page inside region, field vma_guard will show head
or tail pages for vma, not both; kernel don’t know how to handle it
- user mode now choose entropy with PROT_GUARD macro, where did he gets it?
User mode shouldn’t be responsible for entropy at all

I can’t understand what direction this conversation is going to. I was talking
about weak implementation in Linux kernel but got many comments about ASLR
should be implemented in user mode what is really weird to me.

I think it is possible to add GUARD pages into my implementations, but initially
problem was about entropy of address choosing. I would like to resolve it step by
step.

Thanks,
Ilya

2018-03-05 14:25:12

by Daniel Micay

[permalink] [raw]
Subject: Re: [RFC PATCH] Randomization of address chosen by mmap.

On 5 March 2018 at 08:09, Ilya Smith <[email protected]> wrote:
>
>> On 4 Mar 2018, at 23:56, Matthew Wilcox <[email protected]> wrote:
>> Thinking about this more ...
>>
>> - When you call munmap, if you pass in the same (addr, length) that were
>> used for mmap, then it should unmap the guard pages as well (that
>> wasn't part of the patch, so it would have to be added)
>> - If 'addr' is higher than the mapped address, and length at least
>> reaches the end of the mapping, then I would expect the guard pages to
>> "move down" and be after the end of the newly-shortened mapping.
>> - If 'addr' is higher than the mapped address, and the length doesn't
>> reach the end of the old mapping, we split the old mapping into two.
>> I would expect the guard pages to apply to both mappings, insofar as
>> they'll fit. For an example, suppose we have a five-page mapping with
>> two guard pages (MMMMMGG), and then we unmap the fourth page. Now we
>> have a three-page mapping with one guard page followed immediately
>> by a one-page mapping with two guard pages (MMMGMGG).
>
> I’m analysing that approach and see much more problems:
> - each time you call mmap like this, you still increase count of vmas as my
> patch did
> - now feature vma_merge shouldn’t work at all, until MAP_FIXED is set or
> PROT_GUARD(0)
> - the entropy you provide is like 16 bit, that is really not so hard to brute
> - in your patch you don’t use vm_guard at address searching, I see many roots
> of bugs here
> - if you unmap/remap one page inside region, field vma_guard will show head
> or tail pages for vma, not both; kernel don’t know how to handle it
> - user mode now choose entropy with PROT_GUARD macro, where did he gets it?
> User mode shouldn’t be responsible for entropy at all

I didn't suggest this as the way of implementing fine-grained
randomization but rather a small starting point for hardening address
space layout further. I don't think it should be tied to a mmap flag
but rather something like a personality flag or a global sysctl. It
doesn't need to be random at all to be valuable, and it's just a first
step. It doesn't mean there can't be switches between random pivots
like OpenBSD mmap, etc. I'm not so sure that randomly switching around
is going to result in isolating things very well though.

The VMA count issue is at least something very predictable with a
performance cost only for kernel operations.

> I can’t understand what direction this conversation is going to. I was talking
> about weak implementation in Linux kernel but got many comments about ASLR
> should be implemented in user mode what is really weird to me.

That's not what I said. I was saying that splitting things into
regions based on the type of allocation works really well and allows
for high entropy bases, but that the kernel can't really do that right
now. It could split up code that starts as PROT_EXEC into a region but
that's generally not how libraries are mapped in so it won't know
until mprotect which is obviously too late. Unless it had some kind of
type key passed from userspace, it can't really do that.

> I think it is possible to add GUARD pages into my implementations, but initially
> problem was about entropy of address choosing. I would like to resolve it step by
> step.

Starting with fairly aggressive fragmentation of the address space is
going to be a really hard sell. The costs of a very spread out address
space in terms of TLB misses, etc. are unclear. Starting with enforced
gaps (1 page) and randomization for those wouldn't rule out having
finer-grained randomization, like randomly switching between different
regions. This needs to be cheap enough that people want to enable it,
and the goals need to be clearly spelled out. The goal needs to be
clearer than "more randomization == good" and then accepting a high
performance cost for that.

I'm not dictating how things should be done, I don't have any say
about that. I'm just trying to discuss it.

2018-03-05 16:07:45

by Ilya Smith

[permalink] [raw]
Subject: Re: [RFC PATCH] Randomization of address chosen by mmap.

> On 5 Mar 2018, at 17:23, Daniel Micay <[email protected]> wrote:
> I didn't suggest this as the way of implementing fine-grained
> randomization but rather a small starting point for hardening address
> space layout further. I don't think it should be tied to a mmap flag
> but rather something like a personality flag or a global sysctl. It
> doesn't need to be random at all to be valuable, and it's just a first
> step. It doesn't mean there can't be switches between random pivots
> like OpenBSD mmap, etc. I'm not so sure that randomly switching around
> is going to result in isolating things very well though.
>

Here I like the idea of Kees Cook:
> I think this will need a larger knob -- doing this by default is
> likely to break stuff, I'd imagine? Bikeshedding: I'm not sure if this
> should be setting "3" for /proc/sys/kernel/randomize_va_space, or a
> separate one like /proc/sys/mm/randomize_mmap_allocation.
I mean it should be a way to turn randomization off since some applications are
really need huge memory.
If you have suggestion here, would be really helpful to discuss.
I think one switch might be done globally for system administrate like
/proc/sys/mm/randomize_mmap_allocation and another one would be good to have
some ioctl to switch it of in case if application knows what to do.

I would like to implement it in v2 of the patch.

>> I can’t understand what direction this conversation is going to. I was talking
>> about weak implementation in Linux kernel but got many comments about ASLR
>> should be implemented in user mode what is really weird to me.
>
> That's not what I said. I was saying that splitting things into
> regions based on the type of allocation works really well and allows
> for high entropy bases, but that the kernel can't really do that right
> now. It could split up code that starts as PROT_EXEC into a region but
> that's generally not how libraries are mapped in so it won't know
> until mprotect which is obviously too late. Unless it had some kind of
> type key passed from userspace, it can't really do that.

Yes, thats really true. I wrote about earlier. This is the issue - kernel can’t
provide such interface thats why I try to get maximum from current mmap design.
May be later we could split mmap on different actions by different types of
memory it handles. But it will be a very long road I think.

>> I think it is possible to add GUARD pages into my implementations, but initially
>> problem was about entropy of address choosing. I would like to resolve it step by
>> step.
>
> Starting with fairly aggressive fragmentation of the address space is
> going to be a really hard sell. The costs of a very spread out address
> space in terms of TLB misses, etc. are unclear. Starting with enforced
> gaps (1 page) and randomization for those wouldn't rule out having
> finer-grained randomization, like randomly switching between different
> regions. This needs to be cheap enough that people want to enable it,
> and the goals need to be clearly spelled out. The goal needs to be
> clearer than "more randomization == good" and then accepting a high
> performance cost for that.
>

I want to clarify. As I know TLB caches doesn’t care about distance between
pages, since it works with pages. So in theory TLB miss is not an issue here. I
agree, I need to show the performance costs here. I will. Just give some time
please.

The enforced gaps, in my case:
+ addr = get_random_long() % ((high - low) >> PAGE_SHIFT);
+ addr = low + (addr << PAGE_SHIFT);
but what you saying, entropy here should be decreased.

How about something like this:
+ addr = get_random_long() % min(((high - low) >> PAGE_SHIFT),
MAX_SECURE_GAP );
+ addr = high - (addr << PAGE_SHIFT);
where MAX_SECURE_GAP is configurable. Probably with sysctl.

How do you like it?

> I'm not dictating how things should be done, I don't have any say
> about that. I'm just trying to discuss it.

Sorry, thanks for your involvement. I’m really appreciate it.

Thanks,
Ilya


2018-03-05 16:26:10

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [RFC PATCH] Randomization of address chosen by mmap.

On Mon, Mar 05, 2018 at 04:09:31PM +0300, Ilya Smith wrote:
> > On 4 Mar 2018, at 23:56, Matthew Wilcox <[email protected]> wrote:
> > Thinking about this more ...
> >
> > - When you call munmap, if you pass in the same (addr, length) that were
> > used for mmap, then it should unmap the guard pages as well (that
> > wasn't part of the patch, so it would have to be added)
> > - If 'addr' is higher than the mapped address, and length at least
> > reaches the end of the mapping, then I would expect the guard pages to
> > "move down" and be after the end of the newly-shortened mapping.
> > - If 'addr' is higher than the mapped address, and the length doesn't
> > reach the end of the old mapping, we split the old mapping into two.
> > I would expect the guard pages to apply to both mappings, insofar as
> > they'll fit. For an example, suppose we have a five-page mapping with
> > two guard pages (MMMMMGG), and then we unmap the fourth page. Now we
> > have a three-page mapping with one guard page followed immediately
> > by a one-page mapping with two guard pages (MMMGMGG).
>
> I’m analysing that approach and see much more problems:
> - each time you call mmap like this, you still increase count of vmas as my
> patch did

Umm ... yes, each time you call mmap, you get a VMA. I'm not sure why
that's a problem with my patch. I was trying to solve the problem Daniel
pointed out, that mapping a guard region after each mmap cost twice as
many VMAs, and it solves that problem.

> - now feature vma_merge shouldn’t work at all, until MAP_FIXED is set or
> PROT_GUARD(0)

That's true.

> - the entropy you provide is like 16 bit, that is really not so hard to brute

It's 16 bits per mapping. I think that'll make enough attacks harder
to be worthwhile.

> - in your patch you don’t use vm_guard at address searching, I see many roots
> of bugs here

Don't need to. vm_end includes the guard pages.

> - if you unmap/remap one page inside region, field vma_guard will show head
> or tail pages for vma, not both; kernel don’t know how to handle it

There are no head pages. The guard pages are only placed after the real end.

> - user mode now choose entropy with PROT_GUARD macro, where did he gets it?
> User mode shouldn’t be responsible for entropy at all

I can't agree with that. The user has plenty of opportunities to get
randomness; from /dev/random is the easiest, but you could also do timing
attacks on your own cachelines, for example.


2018-03-05 19:28:44

by Ilya Smith

[permalink] [raw]
Subject: Re: [RFC PATCH] Randomization of address chosen by mmap.




> On 5 Mar 2018, at 19:23, Matthew Wilcox <[email protected]> wrote:
>
> On Mon, Mar 05, 2018 at 04:09:31PM +0300, Ilya Smith wrote:
>>
>> I’m analysing that approach and see much more problems:
>> - each time you call mmap like this, you still increase count of vmas as my
>> patch did
>
> Umm ... yes, each time you call mmap, you get a VMA. I'm not sure why
> that's a problem with my patch. I was trying to solve the problem Daniel
> pointed out, that mapping a guard region after each mmap cost twice as
> many VMAs, and it solves that problem.
>
The issue was in VMAs count as Daniel mentioned.
The more count, the harder walk tree. I think this is fine

>> - the entropy you provide is like 16 bit, that is really not so hard to brute
>
> It's 16 bits per mapping. I think that'll make enough attacks harder
> to be worthwhile.

Well yes, its ok, sorry. I just would like to have 32 bit entropy maximum some day :)

>> - in your patch you don’t use vm_guard at address searching, I see many roots
>> of bugs here
>
> Don't need to. vm_end includes the guard pages.
>
>> - if you unmap/remap one page inside region, field vma_guard will show head
>> or tail pages for vma, not both; kernel don’t know how to handle it
>
> There are no head pages. The guard pages are only placed after the real end.
>

Ok, we have MG where G = vm_guard, right? so when you do vm_split,
you may come to situation - m1g1m2G, how to handle it? I mean when M is
split with only one page inside this region. How to handle it?

>> - user mode now choose entropy with PROT_GUARD macro, where did he gets it?
>> User mode shouldn’t be responsible for entropy at all
>
> I can't agree with that. The user has plenty of opportunities to get
> randomness; from /dev/random is the easiest, but you could also do timing
> attacks on your own cachelines, for example.

I think the usual case to use randomization for any mmap or not use it at all
for whole process. So here I think would be nice to have some variable
changeable with sysctl (root only) and ioctl (for greedy processes).

Well, let me summary:
My approach chose random gap inside gap range with following strings:

+ addr = get_random_long() % ((high - low) >> PAGE_SHIFT);
+ addr = low + (addr << PAGE_SHIFT);

Could be improved limiting maximum possible entropy in this shift.
To prevent situation when attacker may massage allocations and
predict chosen address, I randomly choose memory region. I’m still
like my idea, but not going to push it anymore, since you have yours now.

Your idea just provide random non-mappable and non-accessable offset
from best-fit region. This consumes memory (1GB gap if random value
is 0xffff). But it works and should work faster and should resolve the issue.

My point was that current implementation need to be changed and you
have your own approach for that. :)
Lets keep mine in the mind till better times (or worse?) ;)
Will you finish your approach and upstream it?

Best regards,
Ilya


2018-03-05 19:48:48

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [RFC PATCH] Randomization of address chosen by mmap.

On Mon, Mar 05, 2018 at 10:27:32PM +0300, Ilya Smith wrote:
> > On 5 Mar 2018, at 19:23, Matthew Wilcox <[email protected]> wrote:
> > On Mon, Mar 05, 2018 at 04:09:31PM +0300, Ilya Smith wrote:
> >> I’m analysing that approach and see much more problems:
> >> - each time you call mmap like this, you still increase count of vmas as my
> >> patch did
> >
> > Umm ... yes, each time you call mmap, you get a VMA. I'm not sure why
> > that's a problem with my patch. I was trying to solve the problem Daniel
> > pointed out, that mapping a guard region after each mmap cost twice as
> > many VMAs, and it solves that problem.
> >
> The issue was in VMAs count as Daniel mentioned.
> The more count, the harder walk tree. I think this is fine

The performance problem Daniel was mentioning with your patch was not
with the number of VMAs but with the scattering of addresses across the
page table tree.

> >> - the entropy you provide is like 16 bit, that is really not so hard to brute
> >
> > It's 16 bits per mapping. I think that'll make enough attacks harder
> > to be worthwhile.
>
> Well yes, its ok, sorry. I just would like to have 32 bit entropy maximum some day :)

We could put 32 bits of padding into the prot argument on 64-bit systems
(and obviously you need a 64-bit address space to use that many bits). The
thing is that you can't then put anything else into those pages (without
using MAP_FIXED).

> >> - if you unmap/remap one page inside region, field vma_guard will show head
> >> or tail pages for vma, not both; kernel don’t know how to handle it
> >
> > There are no head pages. The guard pages are only placed after the real end.
>
> Ok, we have MG where G = vm_guard, right? so when you do vm_split,
> you may come to situation - m1g1m2G, how to handle it? I mean when M is
> split with only one page inside this region. How to handle it?

I thought I covered that in my earlier email. Using one letter per page,
and a five-page mapping with two guard pages: MMMMMGG. Now unmap the
fourth page, and the VMA gets split into two. You get: MMMGMGG.

> > I can't agree with that. The user has plenty of opportunities to get
> > randomness; from /dev/random is the easiest, but you could also do timing
> > attacks on your own cachelines, for example.
>
> I think the usual case to use randomization for any mmap or not use it at all
> for whole process. So here I think would be nice to have some variable
> changeable with sysctl (root only) and ioctl (for greedy processes).

I think this functionality can just as well live inside libc as in
the kernel.

> Well, let me summary:
> My approach chose random gap inside gap range with following strings:
>
> + addr = get_random_long() % ((high - low) >> PAGE_SHIFT);
> + addr = low + (addr << PAGE_SHIFT);
>
> Could be improved limiting maximum possible entropy in this shift.
> To prevent situation when attacker may massage allocations and
> predict chosen address, I randomly choose memory region. I’m still
> like my idea, but not going to push it anymore, since you have yours now.
>
> Your idea just provide random non-mappable and non-accessable offset
> from best-fit region. This consumes memory (1GB gap if random value
> is 0xffff). But it works and should work faster and should resolve the issue.

umm ... 64k * 4k is a 256MB gap, not 1GB. And it consumes address space,
not memory.

> My point was that current implementation need to be changed and you
> have your own approach for that. :)
> Lets keep mine in the mind till better times (or worse?) ;)
> Will you finish your approach and upstream it?

I'm just putting it out there for discussion. If people think this is
the right approach, then I'm happy to finish it off. If the consensus
is that we should randomly pick addresses instead, I'm happy if your
approach gets merged.

2018-03-05 20:21:46

by Ilya Smith

[permalink] [raw]
Subject: Re: [RFC PATCH] Randomization of address chosen by mmap.

> On 5 Mar 2018, at 22:47, Matthew Wilcox <[email protected]> wrote:
>>>> - the entropy you provide is like 16 bit, that is really not so hard to brute
>>>
>>> It's 16 bits per mapping. I think that'll make enough attacks harder
>>> to be worthwhile.
>>
>> Well yes, its ok, sorry. I just would like to have 32 bit entropy maximum some day :)
>
> We could put 32 bits of padding into the prot argument on 64-bit systems
> (and obviously you need a 64-bit address space to use that many bits). The
> thing is that you can't then put anything else into those pages (without
> using MAP_FIXED).
>

This one sounds good to me. In my approach it is possible to map there, but ok.

>>>> - if you unmap/remap one page inside region, field vma_guard will show head
>>>> or tail pages for vma, not both; kernel don’t know how to handle it
>>>
>>> There are no head pages. The guard pages are only placed after the real end.
>>
>> Ok, we have MG where G = vm_guard, right? so when you do vm_split,
>> you may come to situation - m1g1m2G, how to handle it? I mean when M is
>> split with only one page inside this region. How to handle it?
>
> I thought I covered that in my earlier email. Using one letter per page,
> and a five-page mapping with two guard pages: MMMMMGG. Now unmap the
> fourth page, and the VMA gets split into two. You get: MMMGMGG.
>
I was just interesting, it’s not the issue to me. Now its clear, thanks.

>>> I can't agree with that. The user has plenty of opportunities to get
>>> randomness; from /dev/random is the easiest, but you could also do timing
>>> attacks on your own cachelines, for example.
>>
>> I think the usual case to use randomization for any mmap or not use it at all
>> for whole process. So here I think would be nice to have some variable
>> changeable with sysctl (root only) and ioctl (for greedy processes).
>
> I think this functionality can just as well live inside libc as in
> the kernel.
>

Good news for them :)

>> Well, let me summary:
>> My approach chose random gap inside gap range with following strings:
>>
>> + addr = get_random_long() % ((high - low) >> PAGE_SHIFT);
>> + addr = low + (addr << PAGE_SHIFT);
>>
>> Could be improved limiting maximum possible entropy in this shift.
>> To prevent situation when attacker may massage allocations and
>> predict chosen address, I randomly choose memory region. I’m still
>> like my idea, but not going to push it anymore, since you have yours now.
>>
>> Your idea just provide random non-mappable and non-accessable offset
>> from best-fit region. This consumes memory (1GB gap if random value
>> is 0xffff). But it works and should work faster and should resolve the issue.
>
> umm ... 64k * 4k is a 256MB gap, not 1GB. And it consumes address space,
> not memory.
>

hmm, yes… I found 8 bits somewhere.. 256MB should be enough for everyone.

>> My point was that current implementation need to be changed and you
>> have your own approach for that. :)
>> Lets keep mine in the mind till better times (or worse?) ;)
>> Will you finish your approach and upstream it?
>
> I'm just putting it out there for discussion. If people think this is
> the right approach, then I'm happy to finish it off. If the consensus
> is that we should randomly pick addresses instead, I'm happy if your
> approach gets merged.

So now, its time to call for people? Sorry, I’m new here.

Thanks,
Ilya