This patch-set introduces the possibility of protecting memory that has
been allocated dynamically.
The memory is managed in pools: when a memory pool is protected, all the
memory that is currently part of it, will become R/O.
A R/O pool can be expanded (adding more protectable memory).
It can also be destroyed, to recover its memory, but it cannot be
turned back into normal R/W mode.
This is intentional. This feature is meant for data that either doesn't
need further modifications after initialization, or it will change very
seldom.
The data might need to be released, for example as part of module unloading.
The pool, therefore, can be destroyed.
For those cases where the data is never completely stable, however it can
stay unmodified for very long periods, there is a possibility of
allocating it from a "rare write" pool, which allows modification to its
data, through an helper function.
I did not want to overcomplicate the first version of rare write, but it
might be needed to add disabling/enabling of preemption, however I would
appreciate comments in general about the implementation through transient
remapping.
An example is provided, showing how to protect one of hte internal states
of SELinux.
Changes since v22:
[http://www.openwall.com/lists/kernel-hardening/2018/04/13/3]
- refactored some helper functions in a separate local header
- expanded the documentation
- introduction of rare write support
- example with SELinux "initialized" field
Igor Stoppa (9):
struct page: add field for vm_struct
vmalloc: rename llist field in vmap_area
Protectable Memory
Documentation for Pmalloc
Pmalloc selftest
lkdtm: crash on overwriting protected pmalloc var
Pmalloc Rare Write: modify selected pools
Preliminary self test for pmalloc rare write
Protect SELinux initialized state with pmalloc
Documentation/core-api/index.rst | 1 +
Documentation/core-api/pmalloc.rst | 189 ++++++++++++++++++++++++++
drivers/misc/lkdtm/core.c | 3 +
drivers/misc/lkdtm/lkdtm.h | 1 +
drivers/misc/lkdtm/perms.c | 25 ++++
include/linux/mm_types.h | 1 +
include/linux/pmalloc.h | 170 ++++++++++++++++++++++++
include/linux/test_pmalloc.h | 24 ++++
include/linux/vmalloc.h | 6 +-
init/main.c | 2 +
mm/Kconfig | 16 +++
mm/Makefile | 2 +
mm/pmalloc.c | 258 ++++++++++++++++++++++++++++++++++++
mm/pmalloc_helpers.h | 210 +++++++++++++++++++++++++++++
mm/test_pmalloc.c | 213 +++++++++++++++++++++++++++++
mm/usercopy.c | 9 ++
mm/vmalloc.c | 10 +-
security/selinux/hooks.c | 12 +-
security/selinux/include/security.h | 2 +-
security/selinux/ss/services.c | 51 ++++---
20 files changed, 1174 insertions(+), 31 deletions(-)
create mode 100644 Documentation/core-api/pmalloc.rst
create mode 100644 include/linux/pmalloc.h
create mode 100644 include/linux/test_pmalloc.h
create mode 100644 mm/pmalloc.c
create mode 100644 mm/pmalloc_helpers.h
create mode 100644 mm/test_pmalloc.c
--
2.14.1
While the vanilla version of pmalloc provides support for permanently
transitioning between writable and read-only of a memory pool, this
patch seeks to support a separate class of data, which would still
benefit from write protection, most of the time, but it still needs to
be modifiable. Maybe very seldom, but still cannot be permanently marked
as read-only.
The major changes are:
- extra parameter, at pool creation, to request modifiable memory
- pmalloc_rare_write function, to alter the value of modifiable allocations
The implementation tries to prevent attacks by reducing the aperture
available for modifying the memory, which is also mapped at a random
address, which is harder to retrieve, even in case of another core
racing with the one performing the modification.
Signed-off-by: Igor Stoppa <[email protected]>
CC: Carlos Chinea Perez <[email protected]>
CC: Remi Denis Courmont <[email protected]>
---
Documentation/core-api/pmalloc.rst | 46 ++++++++++++++++----
drivers/misc/lkdtm/perms.c | 2 +-
include/linux/pmalloc.h | 24 ++++++++++-
include/linux/vmalloc.h | 3 +-
mm/pmalloc.c | 88 +++++++++++++++++++++++++++++++++++++-
mm/pmalloc_helpers.h | 66 +++++++++++++++++++++++++---
mm/test_pmalloc.c | 8 ++--
7 files changed, 214 insertions(+), 23 deletions(-)
diff --git a/Documentation/core-api/pmalloc.rst b/Documentation/core-api/pmalloc.rst
index 27eb7b3eafc4..e0fa4a5462a9 100644
--- a/Documentation/core-api/pmalloc.rst
+++ b/Documentation/core-api/pmalloc.rst
@@ -10,8 +10,9 @@ Purpose
The pmalloc library is meant to provide read-only status to data that,
for some reason, could neither be declared as constant, nor could it take
-advantage of the qualifier __ro_after_init, but it is in spirit
-write-once/read-only.
+advantage of the qualifier __ro_after_init.
+But it is in spirit either fully write-once/read-only or at least
+write-seldom/mostly-read-only.
At some point it might get teared down, however that doesn't affect how it
is treated, while it's still relevant.
Pmalloc protects data from both accidental and malicious overwrites.
@@ -57,6 +58,11 @@ When to use pmalloc
- Pmalloc can be useful also when the amount of data to protect is not
known at compile time and the memory can only be allocated dynamically.
+- When it's not possible to fix a point in time after which the data
+ becomes immutable, but it's still fairly unlikely that it will change,
+ rare write becomes a less vulnerable alternative to leaving the data
+ located in freely rewritable memory.
+
- Finally, it can be useful also when it is desirable to control
dynamically (for example throguh the kernel command line) if some
specific data ought to be protected or not, without having to rebuild
@@ -70,11 +76,20 @@ When to use pmalloc
When *not* to use pmalloc
-------------------------
-Using pmalloc is not a good idea when optimizing TLB utilization is
-paramount: pmalloc relies on virtual memory areas and will therefore use
-more TLB entries. It still does a better job of it, compared to invoking
-vmalloc for each allocation, but it is undeniably less optimized wrt to
-TLB use than using the physmap directly, through kmalloc or similar.
+Using pmalloc is not a good idea in some cases:
+
+- when optimizing TLB utilization is paramount:
+ pmalloc relies on virtual memory areas and will therefore use more
+ tlb entries. It still does a better job of it, compared to invoking
+ vmalloc for each allocation, but it is undeniably less optimized wrt to
+ TLB use than using the physmap directly, through kmalloc or similar.
+
+- when rare-write is not-so-rare:
+ rare-write does not allow updates in-place, it rather expects to be
+ provided a version of how the data is supposed to be, and then it
+ performs the update accordingly, by modifying the original data.
+ Such procedure takes an amount of time that is proportional to the
+ number of pages affected.
Caveats
@@ -112,6 +127,15 @@ Caveats
But the allocation can take place during init, and its address is known
and constant.
+- The users of rare write must take care of ensuring the atomicity of the
+ action, respect to the way they use the data being altered; for example,
+ take a lock before making a copy of the value to modify (if it's
+ relevant), then alter it, issue the call to rare write and finally
+ release the lock. Some special scenario might be exempt from the need
+ for locking, but in general rare-write must be treated as an operation
+ that can incur into races.
+
+
Utilization
-----------
@@ -122,7 +146,7 @@ Steps to perforn during init:
#. create an "anchor", with the modifier __ro_after_init
-#. create a pool
+#. create a pool, choosing if it can be altered or not, after protection
:c:func:`pmalloc_create_pool`
@@ -147,7 +171,11 @@ init, as long as they strictly come after the previous sequence.
:c::func:`pmalloc_protect_pool`
-#. iterate over the last 2 points as needed
+#. [optional] modify the pool, if it was created as rewritable
+
+ :c::func:`pmalloc_rare_write`
+
+#. iterate over the last 3 points as needed
#. [optional] destroy the pool
diff --git a/drivers/misc/lkdtm/perms.c b/drivers/misc/lkdtm/perms.c
index 3c81e59f9d9d..6dfab1fbc313 100644
--- a/drivers/misc/lkdtm/perms.c
+++ b/drivers/misc/lkdtm/perms.c
@@ -111,7 +111,7 @@ void lkdtm_WRITE_RO_PMALLOC(void)
struct pmalloc_pool *pool;
int *i;
- pool = pmalloc_create_pool();
+ pool = pmalloc_create_pool(PMALLOC_RO);
if (WARN(!pool, "Failed preparing pool for pmalloc test."))
return;
diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h
index eebaf1ebc6f3..0aab95074aa8 100644
--- a/include/linux/pmalloc.h
+++ b/include/linux/pmalloc.h
@@ -23,17 +23,33 @@
* be destroyed.
* Upon destruction of a certain pool, all the related memory is released,
* including its metadata.
+ *
+ * Depending on the type of protection that was chosen, the memory can be
+ * either completely read-only or it can support rare-writes.
+ *
+ * The rare-write mechanism is intended to provide no read overhead and
+ * still some form of protection, while a selected area is modified.
+ * This will incur into a penalty that is partially depending on the
+ * specific architecture, but in general is the price to pay for limiting
+ * the attack surface, while the change takes place.
+ *
+ * For additional safety, it is not possible to have in the same pool both
+ * rare-write and unmodifiable memory.
*/
#define PMALLOC_REFILL_DEFAULT (0)
#define PMALLOC_ALIGN_DEFAULT ARCH_KMALLOC_MINALIGN
+#define PMALLOC_RO 0
+#define PMALLOC_RW 1
struct pmalloc_pool *pmalloc_create_custom_pool(size_t refill,
+ bool rewritable,
unsigned short align_order);
/**
* pmalloc_create_pool() - create a protectable memory pool
+ * @rewritable: can the data be altered after protection
*
* Shorthand for pmalloc_create_custom_pool() with default argument:
* * refill is set to PMALLOC_REFILL_DEFAULT
@@ -43,9 +59,10 @@ struct pmalloc_pool *pmalloc_create_custom_pool(size_t refill,
* * pointer to the new pool - success
* * NULL - error
*/
-static inline struct pmalloc_pool *pmalloc_create_pool(void)
+static inline struct pmalloc_pool *pmalloc_create_pool(bool rewritable)
{
return pmalloc_create_custom_pool(PMALLOC_REFILL_DEFAULT,
+ rewritable,
PMALLOC_ALIGN_DEFAULT);
}
@@ -142,7 +159,12 @@ static inline char *pstrdup(struct pmalloc_pool *pool, const char *s)
return buf;
}
+bool pmalloc_rare_write(struct pmalloc_pool *pool, const void *destination,
+ const void *source, size_t n_bytes);
+
void pmalloc_protect_pool(struct pmalloc_pool *pool);
+void pmalloc_make_pool_ro(struct pmalloc_pool *pool);
+
void pmalloc_destroy_pool(struct pmalloc_pool *pool);
#endif
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 69c12f21200f..d0b747a78271 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -21,7 +21,8 @@ struct notifier_block; /* in notifier.h */
#define VM_NO_GUARD 0x00000040 /* don't add guard page */
#define VM_KASAN 0x00000080 /* has allocated kasan shadow memory */
#define VM_PMALLOC 0x00000100 /* pmalloc area - see docs */
-#define VM_PMALLOC_PROTECTED 0x00000200 /* protected area - see docs */
+#define VM_PMALLOC_REWRITABLE 0x00000200 /* pmalloc rewritable area */
+#define VM_PMALLOC_PROTECTED 0x00000400 /* pmalloc protected area */
/* bits [20..32] reserved for arch specific ioremap internals */
/*
diff --git a/mm/pmalloc.c b/mm/pmalloc.c
index ddaef41837f4..ca7f10b50b25 100644
--- a/mm/pmalloc.c
+++ b/mm/pmalloc.c
@@ -34,6 +34,7 @@ static DEFINE_MUTEX(pools_mutex);
* @refill: the minimum size to allocate when in need of more memory.
* It will be rounded up to a multiple of PAGE_SIZE
* The value of 0 gives the default amount of PAGE_SIZE.
+ * @rewritable: can the data be altered after protection
* @align_order: log2 of the alignment to use when allocating memory
* Negative values give ARCH_KMALLOC_MINALIGN
*
@@ -45,6 +46,7 @@ static DEFINE_MUTEX(pools_mutex);
* * NULL - error
*/
struct pmalloc_pool *pmalloc_create_custom_pool(size_t refill,
+ bool rewritable,
unsigned short align_order)
{
struct pmalloc_pool *pool;
@@ -54,6 +56,7 @@ struct pmalloc_pool *pmalloc_create_custom_pool(size_t refill,
return NULL;
pool->refill = refill ? PAGE_ALIGN(refill) : DEFAULT_REFILL_SIZE;
+ pool->rewritable = rewritable;
pool->align = 1UL << align_order;
mutex_init(&pool->mutex);
@@ -77,7 +80,7 @@ static int grow(struct pmalloc_pool *pool, size_t min_size)
return -ENOMEM;
area = find_vmap_area((unsigned long)addr);
- tag_area(area);
+ tag_area(area, pool->rewritable);
pool->offset = get_area_pages_size(area);
llist_add(&area->area_list, &pool->vm_areas);
return 0;
@@ -144,6 +147,88 @@ void pmalloc_protect_pool(struct pmalloc_pool *pool)
}
EXPORT_SYMBOL(pmalloc_protect_pool);
+static inline bool rare_write(const void *destination,
+ const void *source, size_t n_bytes)
+{
+ struct page *page;
+ void *base;
+ size_t size;
+ unsigned long offset;
+
+ while (n_bytes) {
+ page = vmalloc_to_page(destination);
+ base = vmap(&page, 1, VM_MAP, PAGE_KERNEL);
+ if (WARN(!base, "failed to remap rewritable page"))
+ return false;
+ offset = (unsigned long)destination & ~PAGE_MASK;
+ size = min(n_bytes, (size_t)PAGE_SIZE - offset);
+ memcpy(base + offset, source, size);
+ vunmap(base);
+ destination += size;
+ source += size;
+ n_bytes -= size;
+ }
+}
+
+/**
+ * pmalloc_rare_write() - alters the content of a rewritable pool
+ * @pool: the pool associated to the memory to write-protect
+ * @destination: where to write the new data
+ * @source: the location of the data to replicate into the pool
+ * @n_bytes: the size of the region to modify
+ *
+ * Return:
+ * * true - success
+ * * false - error
+ */
+bool pmalloc_rare_write(struct pmalloc_pool *pool, const void *destination,
+ const void *source, size_t n_bytes)
+{
+ bool retval = false;
+ struct vmap_area *area;
+
+ /*
+ * The following sanitation is meant to make life harder for
+ * attempts at using ROP/JOP to call this function against pools
+ * that are not supposed to be modifiable.
+ */
+ mutex_lock(&pool->mutex);
+ if (WARN(pool->rewritable != PMALLOC_RW,
+ "Attempting to modify non rewritable pool"))
+ goto out;
+ area = pool_get_area(pool, destination, n_bytes);
+ if (WARN(!area, "Destination range not in pool"))
+ goto out;
+ if (WARN(!is_area_rewritable(area),
+ "Attempting to modify non rewritable area"))
+ goto out;
+ rare_write(destination, source, n_bytes);
+ retval = true;
+out:
+ mutex_unlock(&pool->mutex);
+ return retval;
+}
+EXPORT_SYMBOL(pmalloc_rare_write);
+
+/**
+ * pmalloc_make_pool_ro() - drops rare-write permission from a pool
+ * @pool: the pool associated to the memory to make ro
+ *
+ * Drops the possibility to perform controlled writes from both the pool
+ * metadata and all the vm_area structures associated to the pool.
+ */
+void pmalloc_make_pool_ro(struct pmalloc_pool *pool)
+{
+ struct vmap_area *area;
+
+ mutex_lock(&pool->mutex);
+ pool->rewritable = false;
+ llist_for_each_entry(area, pool->vm_areas.first, area_list)
+ protect_area(area);
+ mutex_unlock(&pool->mutex);
+}
+EXPORT_SYMBOL(pmalloc_make_pool_ro);
+
/**
* pmalloc_destroy_pool() - destroys a pool and all the associated memory
* @pool: the pool to destroy
@@ -171,4 +256,3 @@ void pmalloc_destroy_pool(struct pmalloc_pool *pool)
}
}
EXPORT_SYMBOL(pmalloc_destroy_pool);
-
diff --git a/mm/pmalloc_helpers.h b/mm/pmalloc_helpers.h
index 52d4d899e173..538e37564f8f 100644
--- a/mm/pmalloc_helpers.h
+++ b/mm/pmalloc_helpers.h
@@ -26,19 +26,28 @@ struct pmalloc_pool {
size_t refill;
size_t offset;
size_t align;
+ bool rewritable;
};
#define VM_PMALLOC_PROTECTED_MASK (VM_PMALLOC | VM_PMALLOC_PROTECTED)
-#define VM_PMALLOC_MASK (VM_PMALLOC | VM_PMALLOC_PROTECTED)
+#define VM_PMALLOC_REWRITABLE_MASK \
+ (VM_PMALLOC | VM_PMALLOC_REWRITABLE)
+#define VM_PMALLOC_PROTECTED_REWRITABLE_MASK \
+ (VM_PMALLOC | VM_PMALLOC_REWRITABLE | VM_PMALLOC_PROTECTED)
+#define VM_PMALLOC_MASK \
+ (VM_PMALLOC | VM_PMALLOC_REWRITABLE | VM_PMALLOC_PROTECTED)
static __always_inline unsigned long area_flags(struct vmap_area *area)
{
return area->vm->flags & VM_PMALLOC_MASK;
}
-static __always_inline void tag_area(struct vmap_area *area)
+static __always_inline void tag_area(struct vmap_area *area, bool rewritable)
{
- area->vm->flags |= VM_PMALLOC;
+ if (rewritable == PMALLOC_RW)
+ area->vm->flags |= VM_PMALLOC_REWRITABLE_MASK;
+ else
+ area->vm->flags |= VM_PMALLOC;
}
static __always_inline void untag_area(struct vmap_area *area)
@@ -52,10 +61,20 @@ static __always_inline struct vmap_area *current_area(struct pmalloc_pool *pool)
area_list);
}
+static __always_inline bool area_matches_mask(struct vmap_area *area,
+ unsigned long mask)
+{
+ return (area->vm->flags & mask) == mask;
+}
+
static __always_inline bool is_area_protected(struct vmap_area *area)
{
- return (area->vm->flags & VM_PMALLOC_PROTECTED_MASK) ==
- VM_PMALLOC_PROTECTED_MASK;
+ return area_matches_mask(area, VM_PMALLOC_PROTECTED_MASK);
+}
+
+static __always_inline bool is_area_rewritable(struct vmap_area *area)
+{
+ return area_matches_mask(area, VM_PMALLOC_REWRITABLE_MASK);
}
static __always_inline void protect_area(struct vmap_area *area)
@@ -66,6 +85,12 @@ static __always_inline void protect_area(struct vmap_area *area)
area->vm->flags |= VM_PMALLOC_PROTECTED_MASK;
}
+static __always_inline void make_area_ro(struct vmap_area *area)
+{
+ area->vm->flags &= ~VM_PMALLOC_REWRITABLE;
+ protect_area(area);
+}
+
static __always_inline void unprotect_area(struct vmap_area *area)
{
if (likely(is_area_protected(area)))
@@ -150,5 +175,36 @@ static inline void check_pmalloc_object(const void *ptr, unsigned long n,
}
}
+static __always_inline size_t get_area_pages_end(struct vmap_area *area)
+{
+ return area->va_start + get_area_pages_size(area);
+}
+
+static __always_inline bool area_contains_range(struct vmap_area *area,
+ const void *addr,
+ size_t n_bytes)
+{
+ size_t area_end = get_area_pages_end(area);
+ size_t range_start = (size_t)addr;
+ size_t range_end = range_start + n_bytes;
+
+ return (area->va_start <= range_start) &&
+ (range_start < area_end) &&
+ (area->va_start <= range_end) &&
+ (range_end <= area_end);
+}
+
+static __always_inline
+struct vmap_area *pool_get_area(struct pmalloc_pool *pool,
+ const void *addr, size_t n_bytes)
+{
+ struct vmap_area *area;
+
+ llist_for_each_entry(area, pool->vm_areas.first, area_list)
+ if (area_contains_range(area, addr, n_bytes))
+ return area;
+ return NULL;
+}
+
#endif
#endif
diff --git a/mm/test_pmalloc.c b/mm/test_pmalloc.c
index 032e9741c5f1..c8835207a400 100644
--- a/mm/test_pmalloc.c
+++ b/mm/test_pmalloc.c
@@ -43,7 +43,7 @@ static bool create_and_destroy_pool(void)
pr_notice("Testing pool creation and destruction capability");
- pool = pmalloc_create_pool();
+ pool = pmalloc_create_pool(PMALLOC_RO);
if (WARN(!pool, "Cannot allocate memory for pmalloc selftest."))
return false;
pmalloc_destroy_pool(pool);
@@ -58,7 +58,7 @@ static bool test_alloc(void)
static void *p;
pr_notice("Testing allocation capability");
- pool = pmalloc_create_pool();
+ pool = pmalloc_create_pool(PMALLOC_RO);
if (WARN(!pool, "Unable to allocate memory for pmalloc selftest."))
return false;
p = pmalloc(pool, SIZE_1 - 1);
@@ -84,7 +84,7 @@ static bool test_is_pmalloc_object(void)
if (WARN(!vmalloc_p,
"Unable to allocate memory for pmalloc selftest."))
return false;
- pool = pmalloc_create_pool();
+ pool = pmalloc_create_pool(PMALLOC_RO);
if (WARN(!pool, "Unable to allocate memory for pmalloc selftest."))
return false;
pmalloc_p = pmalloc(pool, SIZE_1 - 1);
@@ -111,7 +111,7 @@ static void test_oovm(void)
unsigned int i;
pr_notice("Exhaust vmalloc memory with doubling allocations.");
- pool = pmalloc_create_pool();
+ pool = pmalloc_create_pool(PMALLOC_RO);
if (WARN(!pool, "Failed to create pool"))
return;
for (i = 1; i; i *= 2)
--
2.14.1
Detailed documentation about the protectable memory allocator.
Signed-off-by: Igor Stoppa <[email protected]>
---
Documentation/core-api/index.rst | 1 +
Documentation/core-api/pmalloc.rst | 161 +++++++++++++++++++++++++++++++++++++
2 files changed, 162 insertions(+)
create mode 100644 Documentation/core-api/pmalloc.rst
diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst
index c670a8031786..8f5de42d6571 100644
--- a/Documentation/core-api/index.rst
+++ b/Documentation/core-api/index.rst
@@ -25,6 +25,7 @@ Core utilities
genalloc
errseq
printk-formats
+ pmalloc
Interfaces for kernel debugging
===============================
diff --git a/Documentation/core-api/pmalloc.rst b/Documentation/core-api/pmalloc.rst
new file mode 100644
index 000000000000..27eb7b3eafc4
--- /dev/null
+++ b/Documentation/core-api/pmalloc.rst
@@ -0,0 +1,161 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _pmalloc:
+
+Protectable memory allocator
+============================
+
+Purpose
+-------
+
+The pmalloc library is meant to provide read-only status to data that,
+for some reason, could neither be declared as constant, nor could it take
+advantage of the qualifier __ro_after_init, but it is in spirit
+write-once/read-only.
+At some point it might get teared down, however that doesn't affect how it
+is treated, while it's still relevant.
+Pmalloc protects data from both accidental and malicious overwrites.
+
+Example: A policy that is loaded from userspace.
+
+
+Concept
+-------
+
+The MMU available in the system can be used to write protect memory pages.
+Unfortunately this feature cannot be used as-it-is, to protect sensitive
+data, because this potentially read-only data is typically interleaved
+with other data, which must stay writeable.
+
+pmalloc introduces the concept of protectable memory pools.
+A pool contains a list of areas of virtually contiguous pages of
+memory. When memory is requested from a pool, the requests are satisfied
+by reserving adequate amounts of memory from the active area of memory in
+that pool. A request can cross page boundaries, therefore an area is the
+minimum granularity that pmalloc allows to protect.
+
+There might be special cases where an area contains only one page, but
+they are still addressed as areas.
+
+Areas are allocated on-the-fly, when the space available is insufficient
+for satisfying the latest request received.
+
+To facilitate the conversion of existing code to pmalloc pools, several
+helper functions are provided, mirroring their k/vmalloc counterparts.
+
+However, there is no pfree(), because the memory protected by a pool is
+released exclusively when the pool is destroyed.
+
+
+When to use pmalloc
+-------------------
+
+- Pmalloc memory is intended to complement __ro_after_init.
+ __ro_after_init requires that the initialization value is applied before
+ init is completed. If this is not possible, then pmalloc can be used.
+
+- Pmalloc can be useful also when the amount of data to protect is not
+ known at compile time and the memory can only be allocated dynamically.
+
+- Finally, it can be useful also when it is desirable to control
+ dynamically (for example throguh the kernel command line) if some
+ specific data ought to be protected or not, without having to rebuild
+ the kernel, for toggling a "const" qualifier.
+ This can be used, for example, by a linux distro, to create a more
+ versatile binary kernel and allow its users to toggle between developer
+ (unprotected) or production (protected) modes by reconfiguring the
+ bootloader.
+
+
+When *not* to use pmalloc
+-------------------------
+
+Using pmalloc is not a good idea when optimizing TLB utilization is
+paramount: pmalloc relies on virtual memory areas and will therefore use
+more TLB entries. It still does a better job of it, compared to invoking
+vmalloc for each allocation, but it is undeniably less optimized wrt to
+TLB use than using the physmap directly, through kmalloc or similar.
+
+
+Caveats
+-------
+
+- When a pool is protected, whatever memory would be still available in
+ the current vmap_area (from which allocations are performed) is
+ relinquished.
+
+- As already explained, freeing of memory is not supported. Pages will be
+ returned to the system upon destruction of the memory pool that they
+ belong to. For this reason, no pfree() function is provided
+
+- The address range available for vmalloc (and thus for pmalloc too) is
+ limited, on 32-bit systems. However it shouldn't be an issue, since not
+ much data is expected to be dynamically allocated and turned into
+ read-only.
+
+- Regarding SMP systems, the allocations are expected to happen mostly
+ during an initial transient, after which there should be no more need
+ to perform cross-processor synchronizations of page tables.
+ Loading of kernel modules is an exception to this, but it's not expected
+ to happen with such high frequency to become a problem.
+
+- While pmalloc memory can be protected, since it is allocated dynamically,
+ it is still subject to indirect attacks, where the memory itself is not
+ touched, but anything used as reference to the allocation can be altered.
+ In some cases the allocation from a pmalloc pool is referred to by another
+ allocation, from either the same or another pool, however at some point,
+ there will be a base reference which can be attacked, if it cannot be
+ protected.
+ This base reference, or "anchor" is suitable for protection using
+ __ro_after_init, since it only needs to store the *address* of the
+ pmalloc allocation that will be initialized and protected later on.
+ But the allocation can take place during init, and its address is known
+ and constant.
+
+
+Utilization
+-----------
+
+Typical sequence, when using pmalloc
+
+Steps to perforn during init:
+
+#. create an "anchor", with the modifier __ro_after_init
+
+#. create a pool
+
+ :c:func:`pmalloc_create_pool`
+
+#. issue an allocation requests to the pool with either
+
+ :c:func:`pmalloc`
+
+ or one of its variants, like
+
+ :c:func:`pzalloc`
+
+ assigning its address to the anchor
+
+#. iterate the previous points as needed
+
+The Following steps can be performed at any time, both during and after
+init, as long as they strictly come after the previous sequence.
+
+#. initialize with the desired value the memory obtained from the pool(s)
+
+#. write-protect the memory so far allocated
+
+ :c::func:`pmalloc_protect_pool`
+
+#. iterate over the last 2 points as needed
+
+#. [optional] destroy the pool
+
+ :c:func:`pmalloc_destroy_pool`
+
+
+API
+---
+
+.. kernel-doc:: include/linux/pmalloc.h
+.. kernel-doc:: mm/pmalloc.c
--
2.14.1
Verify that pmalloc read-only protection is in place: trying to
overwrite a protected variable will crash the kernel.
Signed-off-by: Igor Stoppa <[email protected]>
---
drivers/misc/lkdtm/core.c | 3 +++
drivers/misc/lkdtm/lkdtm.h | 1 +
drivers/misc/lkdtm/perms.c | 25 +++++++++++++++++++++++++
3 files changed, 29 insertions(+)
diff --git a/drivers/misc/lkdtm/core.c b/drivers/misc/lkdtm/core.c
index 2154d1bfd18b..c9fd42bda6ee 100644
--- a/drivers/misc/lkdtm/core.c
+++ b/drivers/misc/lkdtm/core.c
@@ -155,6 +155,9 @@ static const struct crashtype crashtypes[] = {
CRASHTYPE(ACCESS_USERSPACE),
CRASHTYPE(WRITE_RO),
CRASHTYPE(WRITE_RO_AFTER_INIT),
+#ifdef CONFIG_PROTECTABLE_MEMORY
+ CRASHTYPE(WRITE_RO_PMALLOC),
+#endif
CRASHTYPE(WRITE_KERN),
CRASHTYPE(REFCOUNT_INC_OVERFLOW),
CRASHTYPE(REFCOUNT_ADD_OVERFLOW),
diff --git a/drivers/misc/lkdtm/lkdtm.h b/drivers/misc/lkdtm/lkdtm.h
index 9e513dcfd809..dcda3ae76ceb 100644
--- a/drivers/misc/lkdtm/lkdtm.h
+++ b/drivers/misc/lkdtm/lkdtm.h
@@ -38,6 +38,7 @@ void lkdtm_READ_BUDDY_AFTER_FREE(void);
void __init lkdtm_perms_init(void);
void lkdtm_WRITE_RO(void);
void lkdtm_WRITE_RO_AFTER_INIT(void);
+void lkdtm_WRITE_RO_PMALLOC(void);
void lkdtm_WRITE_KERN(void);
void lkdtm_EXEC_DATA(void);
void lkdtm_EXEC_STACK(void);
diff --git a/drivers/misc/lkdtm/perms.c b/drivers/misc/lkdtm/perms.c
index 53b85c9d16b8..3c81e59f9d9d 100644
--- a/drivers/misc/lkdtm/perms.c
+++ b/drivers/misc/lkdtm/perms.c
@@ -9,6 +9,7 @@
#include <linux/vmalloc.h>
#include <linux/mman.h>
#include <linux/uaccess.h>
+#include <linux/pmalloc.h>
#include <asm/cacheflush.h>
/* Whether or not to fill the target memory area with do_nothing(). */
@@ -104,6 +105,30 @@ void lkdtm_WRITE_RO_AFTER_INIT(void)
*ptr ^= 0xabcd1234;
}
+#ifdef CONFIG_PROTECTABLE_MEMORY
+void lkdtm_WRITE_RO_PMALLOC(void)
+{
+ struct pmalloc_pool *pool;
+ int *i;
+
+ pool = pmalloc_create_pool();
+ if (WARN(!pool, "Failed preparing pool for pmalloc test."))
+ return;
+
+ i = pmalloc(pool, sizeof(int));
+ if (WARN(!i, "Failed allocating memory for pmalloc test.")) {
+ pmalloc_destroy_pool(pool);
+ return;
+ }
+
+ *i = INT_MAX;
+ pmalloc_protect_pool(pool);
+
+ pr_info("attempting bad pmalloc write at %p\n", i);
+ *i = 0;
+}
+#endif
+
void lkdtm_WRITE_KERN(void)
{
size_t size;
--
2.14.1
Try to alter locked but modifiable pools.
The test neds some cleanup and expansion.
It is provided primarily as reference.
Signed-off-by: Igor Stoppa <[email protected]>
---
mm/test_pmalloc.c | 75 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 75 insertions(+)
diff --git a/mm/test_pmalloc.c b/mm/test_pmalloc.c
index c8835207a400..e8e945e4a4a3 100644
--- a/mm/test_pmalloc.c
+++ b/mm/test_pmalloc.c
@@ -122,6 +122,80 @@ static void test_oovm(void)
pmalloc_destroy_pool(pool);
}
+#define REGION_SIZE (PAGE_SIZE / 4)
+#define REGION_NUMBERS 12
+static inline void fill_region(char *addr, char c)
+{
+ size_t i;
+
+ for (i = 0; i < REGION_SIZE - 1; i++)
+ addr[i] = c;
+ addr[i] = '\0';
+}
+
+static inline void init_regions(char *array)
+{
+ size_t i;
+
+ for (i = 0; i < REGION_NUMBERS; i++)
+ fill_region(array + REGION_SIZE * i, i + 'A');
+}
+
+static inline void show_regions(char *array)
+{
+ size_t i;
+
+ for (i = 0; i < REGION_NUMBERS; i++)
+ pr_info("%s", array + REGION_SIZE * i);
+}
+
+static inline void init_big_injection(char *big_injection)
+{
+ size_t i;
+
+ for (i = 0; i < PAGE_SIZE * 3; i++)
+ big_injection[i] = 'X';
+}
+
+/* Verify rewritable feature. */
+static int test_rare_write(void)
+{
+ struct pmalloc_pool *pool;
+ char *array;
+ char injection[] = "123456789";
+ unsigned short size = sizeof(injection);
+ char *big_injection;
+
+
+ pr_notice("Test pmalloc_rare_write()");
+ pool = pmalloc_create_pool(PMALLOC_RW);
+ array = pzalloc(pool, REGION_SIZE * REGION_NUMBERS);
+ init_regions(array);
+ pmalloc_protect_pool(pool);
+ pr_info("------------------------------------------------------");
+ pmalloc_rare_write(pool, array, injection, size);
+ pmalloc_rare_write(pool, array + REGION_SIZE, injection, size);
+ pmalloc_rare_write(pool,
+ array + 5 * REGION_SIZE / 2 - size / 2,
+ injection, size);
+ pmalloc_rare_write(pool, array + 3 * REGION_SIZE - size / 2,
+ injection, size);
+ show_regions(array);
+ pmalloc_destroy_pool(pool);
+ pr_info("------------------------------------------------------");
+ pool = pmalloc_create_pool(PMALLOC_RW);
+ array = pzalloc(pool, REGION_SIZE * REGION_NUMBERS);
+ init_regions(array);
+ pmalloc_protect_pool(pool);
+ big_injection = vmalloc(PAGE_SIZE * 3);
+ init_big_injection(big_injection);
+ pmalloc_rare_write(pool, array + REGION_SIZE / 2, big_injection,
+ PAGE_SIZE * 2);
+ show_regions(array);
+ pr_info("------------------------------------------------------");
+ return 0;
+}
+
/**
* test_pmalloc() -main entry point for running the test cases
*/
@@ -135,4 +209,5 @@ void test_pmalloc(void)
test_is_pmalloc_object())))
return;
test_oovm();
+ test_rare_write();
}
--
2.14.1
When a page is used for virtual memory, it is often necessary to obtain
a handler to the corresponding vm_struct, which refers to the virtually
continuous area generated when invoking vmalloc.
The struct page has a "mapping" field, which can be re-used, to store a
pointer to the parent area.
This will avoid more expensive searches, later on.
Signed-off-by: Igor Stoppa <[email protected]>
Reviewed-by: Jay Freyensee <[email protected]>
Reviewed-by: Matthew Wilcox <[email protected]>
---
include/linux/mm_types.h | 1 +
mm/vmalloc.c | 2 ++
2 files changed, 3 insertions(+)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 21612347d311..c74e2aa9a48b 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -86,6 +86,7 @@ struct page {
void *s_mem; /* slab first object */
atomic_t compound_mapcount; /* first tail page */
/* page_deferred_list().next -- second tail page */
+ struct vm_struct *area;
};
/* Second double word */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index ebff729cc956..61a1ca22b0f6 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1536,6 +1536,7 @@ static void __vunmap(const void *addr, int deallocate_pages)
struct page *page = area->pages[i];
BUG_ON(!page);
+ page->area = NULL;
__free_pages(page, 0);
}
@@ -1705,6 +1706,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
area->nr_pages = i;
goto fail;
}
+ page->area = area;
area->pages[i] = page;
if (gfpflags_allow_blocking(gfp_mask|highmem_mask))
cond_resched();
--
2.14.1
SELinux is one of the primary targets, when a system running it comes
under attack.
The reason is that, even if an attacker ishould manage to gain root,
SELinux will still prevent most desirable actions.
Even in a fully locked down system, SELinux still presents a vulnerability
that is often exploited, because it is very simple to attack, once
kernel address layout randomization has been defeated and the attacker
has gained capability of writing to kernelunprotected data.
In various places, SELinux relies on an "initialized" internal state
variable, to decide if the policy is loaded and tests should be
performed. Needless to say, it's in the interest of hte attacker to turn
it off and pretend that the policyDB is still uninitialized.
Even if recent patches move the "initialized" state inside a structure,
it is still vulnerable.
This patch seeks to protect it, using it as demo for the pmalloc API,
which is meant to provide additional protection to data which is likely
to not be changed very often, if ever (after a transient).
The patch is probably in need of rework, to make it fit better with the
new SELinux internal data structures, however it shows how to deny an
easy target to the attacker.
In case the kernel is compiled with JOP safeguards, then it becomes far
harder for the attacker to jump into the middle of the function which
calls pmalloc_rare_write, to alter the state.
Signed-off-by: Igor Stoppa <[email protected]>
---
security/selinux/hooks.c | 12 ++++-----
security/selinux/include/security.h | 2 +-
security/selinux/ss/services.c | 51 +++++++++++++++++++++++--------------
3 files changed, 39 insertions(+), 26 deletions(-)
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 4cafe6a19167..6049f80115bc 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -285,7 +285,7 @@ static int __inode_security_revalidate(struct inode *inode,
might_sleep_if(may_sleep);
- if (selinux_state.initialized &&
+ if (*ss_initialized_ptr &&
isec->initialized != LABEL_INITIALIZED) {
if (!may_sleep)
return -ECHILD;
@@ -612,7 +612,7 @@ static int selinux_get_mnt_opts(const struct super_block *sb,
if (!(sbsec->flags & SE_SBINITIALIZED))
return -EINVAL;
- if (!selinux_state.initialized)
+ if (!*ss_initialized_ptr)
return -EINVAL;
/* make sure we always check enough bits to cover the mask */
@@ -735,7 +735,7 @@ static int selinux_set_mnt_opts(struct super_block *sb,
mutex_lock(&sbsec->lock);
- if (!selinux_state.initialized) {
+ if (!*ss_initialized_ptr) {
if (!num_opts) {
/* Defer initialization until selinux_complete_init,
after the initial policy is loaded and the security
@@ -1022,7 +1022,7 @@ static int selinux_sb_clone_mnt_opts(const struct super_block *oldsb,
* if the parent was able to be mounted it clearly had no special lsm
* mount options. thus we can safely deal with this superblock later
*/
- if (!selinux_state.initialized)
+ if (!*ss_initialized_ptr)
return 0;
/*
@@ -3040,7 +3040,7 @@ static int selinux_inode_init_security(struct inode *inode, struct inode *dir,
isec->initialized = LABEL_INITIALIZED;
}
- if (!selinux_state.initialized || !(sbsec->flags & SBLABEL_MNT))
+ if (!*ss_initialized_ptr || !(sbsec->flags & SBLABEL_MNT))
return -EOPNOTSUPP;
if (name)
@@ -7253,7 +7253,7 @@ static void selinux_nf_ip_exit(void)
#ifdef CONFIG_SECURITY_SELINUX_DISABLE
int selinux_disable(struct selinux_state *state)
{
- if (state->initialized) {
+ if (*ss_initialized_ptr) {
/* Not permitted after initial policy load. */
return -EINVAL;
}
diff --git a/security/selinux/include/security.h b/security/selinux/include/security.h
index 23e762d529fa..ec7debb143be 100644
--- a/security/selinux/include/security.h
+++ b/security/selinux/include/security.h
@@ -96,13 +96,13 @@ extern char *selinux_policycap_names[__POLICYDB_CAPABILITY_MAX];
struct selinux_avc;
struct selinux_ss;
+extern bool *ss_initialized_ptr;
struct selinux_state {
bool disabled;
#ifdef CONFIG_SECURITY_SELINUX_DEVELOP
bool enforcing;
#endif
bool checkreqprot;
- bool initialized;
bool policycap[__POLICYDB_CAPABILITY_MAX];
struct selinux_avc *avc;
struct selinux_ss *ss;
diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c
index 8057e19dc15f..c09ca6f9b269 100644
--- a/security/selinux/ss/services.c
+++ b/security/selinux/ss/services.c
@@ -52,6 +52,7 @@
#include <linux/selinux.h>
#include <linux/flex_array.h>
#include <linux/vmalloc.h>
+#include <linux/pmalloc.h>
#include <net/netlabel.h>
#include "flask.h"
@@ -80,10 +81,20 @@ char *selinux_policycap_names[__POLICYDB_CAPABILITY_MAX] = {
"nnp_nosuid_transition"
};
+bool *ss_initialized_ptr __ro_after_init;
+static struct pmalloc_pool *selinux_pool;
static struct selinux_ss selinux_ss;
void selinux_ss_init(struct selinux_ss **ss)
{
+ selinux_pool = pmalloc_create_pool(PMALLOC_RW);
+ if (unlikely(!selinux_pool))
+ panic("SELinux: unable to create pmalloc pool.");
+ ss_initialized_ptr = pmalloc(selinux_pool, sizeof(bool));
+ if (unlikely(!ss_initialized_ptr))
+ panic("SElinux: unable to allocate from pmalloc pool.");
+ *ss_initialized_ptr = false;
+ pmalloc_protect_pool(selinux_pool);
rwlock_init(&selinux_ss.policy_rwlock);
mutex_init(&selinux_ss.status_lock);
*ss = &selinux_ss;
@@ -772,7 +783,7 @@ static int security_compute_validatetrans(struct selinux_state *state,
int rc = 0;
- if (!state->initialized)
+ if (!*ss_initialized_ptr)
return 0;
read_lock(&state->ss->policy_rwlock);
@@ -872,7 +883,7 @@ int security_bounded_transition(struct selinux_state *state,
int index;
int rc;
- if (!state->initialized)
+ if (!*ss_initialized_ptr)
return 0;
read_lock(&state->ss->policy_rwlock);
@@ -1032,7 +1043,7 @@ void security_compute_xperms_decision(struct selinux_state *state,
memset(xpermd->dontaudit->p, 0, sizeof(xpermd->dontaudit->p));
read_lock(&state->ss->policy_rwlock);
- if (!state->initialized)
+ if (!*ss_initialized_ptr)
goto allow;
policydb = &state->ss->policydb;
@@ -1121,7 +1132,7 @@ void security_compute_av(struct selinux_state *state,
read_lock(&state->ss->policy_rwlock);
avd_init(state, avd);
xperms->len = 0;
- if (!state->initialized)
+ if (!*ss_initialized_ptr)
goto allow;
policydb = &state->ss->policydb;
@@ -1175,7 +1186,7 @@ void security_compute_av_user(struct selinux_state *state,
read_lock(&state->ss->policy_rwlock);
avd_init(state, avd);
- if (!state->initialized)
+ if (!*ss_initialized_ptr)
goto allow;
policydb = &state->ss->policydb;
@@ -1294,7 +1305,7 @@ static int security_sid_to_context_core(struct selinux_state *state,
*scontext = NULL;
*scontext_len = 0;
- if (!state->initialized) {
+ if (!*ss_initialized_ptr) {
if (sid <= SECINITSID_NUM) {
char *scontextp;
@@ -1466,7 +1477,7 @@ static int security_context_to_sid_core(struct selinux_state *state,
if (!scontext2)
return -ENOMEM;
- if (!state->initialized) {
+ if (!*ss_initialized_ptr) {
int i;
for (i = 1; i < SECINITSID_NUM; i++) {
@@ -1648,7 +1659,7 @@ static int security_compute_sid(struct selinux_state *state,
int rc = 0;
bool sock;
- if (!state->initialized) {
+ if (!*ss_initialized_ptr) {
switch (orig_tclass) {
case SECCLASS_PROCESS: /* kernel value */
*out_sid = ssid;
@@ -2128,7 +2139,8 @@ int security_load_policy(struct selinux_state *state, void *data, size_t len)
policydb = &state->ss->policydb;
sidtab = &state->ss->sidtab;
- if (!state->initialized) {
+ if (!*ss_initialized_ptr) {
+ bool dummy_initialized = true;
rc = policydb_read(policydb, fp);
if (rc)
goto out;
@@ -2148,7 +2160,8 @@ int security_load_policy(struct selinux_state *state, void *data, size_t len)
}
security_load_policycaps(state);
- state->initialized = 1;
+ pmalloc_rare_write(selinux_pool, ss_initialized_ptr,
+ &dummy_initialized, sizeof(bool));
seqno = ++state->ss->latest_granting;
selinux_complete_init();
avc_ss_reset(state->avc, seqno);
@@ -2578,7 +2591,7 @@ int security_get_user_sids(struct selinux_state *state,
*sids = NULL;
*nel = 0;
- if (!state->initialized)
+ if (!*ss_initialized_ptr)
goto out;
read_lock(&state->ss->policy_rwlock);
@@ -2812,7 +2825,7 @@ int security_get_bools(struct selinux_state *state,
struct policydb *policydb;
int i, rc;
- if (!state->initialized) {
+ if (!*ss_initialized_ptr) {
*len = 0;
*names = NULL;
*values = NULL;
@@ -2987,7 +3000,7 @@ int security_sid_mls_copy(struct selinux_state *state,
int rc;
rc = 0;
- if (!state->initialized || !policydb->mls_enabled) {
+ if (!*ss_initialized_ptr || !policydb->mls_enabled) {
*new_sid = sid;
goto out;
}
@@ -3094,7 +3107,7 @@ int security_net_peersid_resolve(struct selinux_state *state,
/*
* We don't need to check initialized here since the only way both
* nlbl_sid and xfrm_sid are not equal to SECSID_NULL would be if the
- * security server was initialized and state->initialized was true.
+ * security server was initialized and *ss_initialized_ptr was true.
*/
if (!policydb->mls_enabled)
return 0;
@@ -3149,7 +3162,7 @@ int security_get_classes(struct selinux_state *state,
struct policydb *policydb = &state->ss->policydb;
int rc;
- if (!state->initialized) {
+ if (!*ss_initialized_ptr) {
*nclasses = 0;
*classes = NULL;
return 0;
@@ -3298,7 +3311,7 @@ int selinux_audit_rule_init(u32 field, u32 op, char *rulestr, void **vrule)
*rule = NULL;
- if (!state->initialized)
+ if (!*ss_initialized_ptr)
return -EOPNOTSUPP;
switch (field) {
@@ -3598,7 +3611,7 @@ int security_netlbl_secattr_to_sid(struct selinux_state *state,
struct context *ctx;
struct context ctx_new;
- if (!state->initialized) {
+ if (!*ss_initialized_ptr) {
*sid = SECSID_NULL;
return 0;
}
@@ -3665,7 +3678,7 @@ int security_netlbl_sid_to_secattr(struct selinux_state *state,
int rc;
struct context *ctx;
- if (!state->initialized)
+ if (!*ss_initialized_ptr)
return 0;
read_lock(&state->ss->policy_rwlock);
@@ -3704,7 +3717,7 @@ int security_read_policy(struct selinux_state *state,
int rc;
struct policy_file fp;
- if (!state->initialized)
+ if (!*ss_initialized_ptr)
return -EINVAL;
*len = security_policydb_len(state);
--
2.14.1
Add basic self-test functionality for pmalloc.
The testing is introduced as early as possible, right after the main
dependency, genalloc, has passed successfully, so that it can help
diagnosing failures in pmalloc users.
Signed-off-by: Igor Stoppa <[email protected]>
---
include/linux/test_pmalloc.h | 24 ++++++++
init/main.c | 2 +
mm/Kconfig | 10 ++++
mm/Makefile | 1 +
mm/test_pmalloc.c | 138 +++++++++++++++++++++++++++++++++++++++++++
5 files changed, 175 insertions(+)
create mode 100644 include/linux/test_pmalloc.h
create mode 100644 mm/test_pmalloc.c
diff --git a/include/linux/test_pmalloc.h b/include/linux/test_pmalloc.h
new file mode 100644
index 000000000000..c7e2e451c17c
--- /dev/null
+++ b/include/linux/test_pmalloc.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * test_pmalloc.h
+ *
+ * (C) Copyright 2018 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <[email protected]>
+ */
+
+
+#ifndef __LINUX_TEST_PMALLOC_H
+#define __LINUX_TEST_PMALLOC_H
+
+
+#ifdef CONFIG_TEST_PROTECTABLE_MEMORY
+
+void test_pmalloc(void);
+
+#else
+
+static inline void test_pmalloc(void){};
+
+#endif
+
+#endif
diff --git a/init/main.c b/init/main.c
index b795aa341a3a..27f8479c4578 100644
--- a/init/main.c
+++ b/init/main.c
@@ -91,6 +91,7 @@
#include <linux/cache.h>
#include <linux/rodata_test.h>
#include <linux/jump_label.h>
+#include <linux/test_pmalloc.h>
#include <asm/io.h>
#include <asm/bugs.h>
@@ -679,6 +680,7 @@ asmlinkage __visible void __init start_kernel(void)
*/
mem_encrypt_init();
+ test_pmalloc();
#ifdef CONFIG_BLK_DEV_INITRD
if (initrd_start && !initrd_below_start_ok &&
page_to_pfn(virt_to_page((void *)initrd_start)) < min_low_pfn) {
diff --git a/mm/Kconfig b/mm/Kconfig
index d7ef40eaa4e8..f98b4c0aebce 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -758,3 +758,13 @@ config PROTECTABLE_MEMORY
depends on MMU
depends on ARCH_HAS_SET_MEMORY
default y
+
+config TEST_PROTECTABLE_MEMORY
+ bool "Run self test for pmalloc memory allocator"
+ depends on MMU
+ depends on ARCH_HAS_SET_MEMORY
+ select PROTECTABLE_MEMORY
+ default n
+ help
+ Tries to verify that pmalloc works correctly and that the memory
+ is effectively protected.
diff --git a/mm/Makefile b/mm/Makefile
index 6a6668f99799..802cba37013b 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -66,6 +66,7 @@ obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
obj-$(CONFIG_SLOB) += slob.o
obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
obj-$(CONFIG_PROTECTABLE_MEMORY) += pmalloc.o
+obj-$(CONFIG_TEST_PROTECTABLE_MEMORY) += test_pmalloc.o
obj-$(CONFIG_KSM) += ksm.o
obj-$(CONFIG_PAGE_POISONING) += page_poison.o
obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/test_pmalloc.c b/mm/test_pmalloc.c
new file mode 100644
index 000000000000..032e9741c5f1
--- /dev/null
+++ b/mm/test_pmalloc.c
@@ -0,0 +1,138 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * test_pmalloc.c
+ *
+ * (C) Copyright 2018 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <[email protected]>
+ */
+
+#include <linux/pmalloc.h>
+#include <linux/mm.h>
+#include <linux/test_pmalloc.h>
+#include <linux/bug.h>
+
+#include "pmalloc_helpers.h"
+
+#define SIZE_1 (PAGE_SIZE * 3)
+#define SIZE_2 1000
+
+/* wrapper for is_pmalloc_object() with messages */
+static inline bool validate_alloc(bool expected, void *addr,
+ unsigned long size)
+{
+ bool test;
+
+ test = is_pmalloc_object(addr, size) > 0;
+ pr_notice("must be %s: %s",
+ expected ? "ok" : "no", test ? "ok" : "no");
+ return test == expected;
+}
+
+
+#define is_alloc_ok(variable, size) \
+ validate_alloc(true, variable, size)
+
+
+#define is_alloc_no(variable, size) \
+ validate_alloc(false, variable, size)
+
+/* tests the basic life-cycle of a pool */
+static bool create_and_destroy_pool(void)
+{
+ static struct pmalloc_pool *pool;
+
+ pr_notice("Testing pool creation and destruction capability");
+
+ pool = pmalloc_create_pool();
+ if (WARN(!pool, "Cannot allocate memory for pmalloc selftest."))
+ return false;
+ pmalloc_destroy_pool(pool);
+ return true;
+}
+
+
+/* verifies that it's possible to allocate from the pool */
+static bool test_alloc(void)
+{
+ static struct pmalloc_pool *pool;
+ static void *p;
+
+ pr_notice("Testing allocation capability");
+ pool = pmalloc_create_pool();
+ if (WARN(!pool, "Unable to allocate memory for pmalloc selftest."))
+ return false;
+ p = pmalloc(pool, SIZE_1 - 1);
+ pmalloc_protect_pool(pool);
+ pmalloc_destroy_pool(pool);
+ if (WARN(!p, "Failed to allocate memory from the pool"))
+ return false;
+ return true;
+}
+
+
+/* tests the identification of pmalloc ranges */
+static bool test_is_pmalloc_object(void)
+{
+ struct pmalloc_pool *pool;
+ void *pmalloc_p;
+ void *vmalloc_p;
+ bool retval = false;
+
+ pr_notice("Test correctness of is_pmalloc_object()");
+
+ vmalloc_p = vmalloc(SIZE_1);
+ if (WARN(!vmalloc_p,
+ "Unable to allocate memory for pmalloc selftest."))
+ return false;
+ pool = pmalloc_create_pool();
+ if (WARN(!pool, "Unable to allocate memory for pmalloc selftest."))
+ return false;
+ pmalloc_p = pmalloc(pool, SIZE_1 - 1);
+ if (WARN(!pmalloc_p, "Failed to allocate memory from the pool"))
+ goto error;
+ if (WARN_ON(unlikely(!is_alloc_ok(pmalloc_p, 10))) ||
+ WARN_ON(unlikely(!is_alloc_ok(pmalloc_p, SIZE_1))) ||
+ WARN_ON(unlikely(!is_alloc_ok(pmalloc_p, PAGE_SIZE))) ||
+ WARN_ON(unlikely(!is_alloc_no(pmalloc_p, SIZE_1 + 1))) ||
+ WARN_ON(unlikely(!is_alloc_no(vmalloc_p, 10))))
+ goto error;
+ retval = true;
+error:
+ pmalloc_protect_pool(pool);
+ pmalloc_destroy_pool(pool);
+ vfree(vmalloc_p);
+ return retval;
+}
+
+/* Test out of virtually contiguous memory */
+static void test_oovm(void)
+{
+ struct pmalloc_pool *pool;
+ unsigned int i;
+
+ pr_notice("Exhaust vmalloc memory with doubling allocations.");
+ pool = pmalloc_create_pool();
+ if (WARN(!pool, "Failed to create pool"))
+ return;
+ for (i = 1; i; i *= 2)
+ if (unlikely(!pzalloc(pool, i - 1)))
+ break;
+ pr_notice("vmalloc oom at %d allocation", i - 1);
+ pmalloc_protect_pool(pool);
+ pmalloc_destroy_pool(pool);
+}
+
+/**
+ * test_pmalloc() -main entry point for running the test cases
+ */
+void test_pmalloc(void)
+{
+
+ pr_notice("pmalloc-selftest");
+
+ if (unlikely(!(create_and_destroy_pool() &&
+ test_alloc() &&
+ test_is_pmalloc_object())))
+ return;
+ test_oovm();
+}
--
2.14.1
The vmap_area structure has a field of type struct llist_node, named
purge_list and is used when performing lazy purge of the area.
Such field is left unused during the actual utilization of the
structure.
This patch renames the field to a more generic "area_list", to allow for
utilization outside of the purging phase.
Since the purging happens after the vmap_area is dismissed, its use is
mutually exclusive with any use performed while the area is allocated.
Signed-off-by: Igor Stoppa <[email protected]>
---
include/linux/vmalloc.h | 2 +-
mm/vmalloc.c | 6 +++---
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 1e5d8c392f15..2d07dfef3cfd 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -47,7 +47,7 @@ struct vmap_area {
unsigned long flags;
struct rb_node rb_node; /* address sorted rbtree */
struct list_head list; /* address sorted list */
- struct llist_node purge_list; /* "lazy purge" list */
+ struct llist_node area_list; /* generic list of areas */
struct vm_struct *vm;
struct rcu_head rcu_head;
};
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 61a1ca22b0f6..1bb2233bb262 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -682,7 +682,7 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end)
lockdep_assert_held(&vmap_purge_lock);
valist = llist_del_all(&vmap_purge_list);
- llist_for_each_entry(va, valist, purge_list) {
+ llist_for_each_entry(va, valist, area_list) {
if (va->va_start < start)
start = va->va_start;
if (va->va_end > end)
@@ -696,7 +696,7 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end)
flush_tlb_kernel_range(start, end);
spin_lock(&vmap_area_lock);
- llist_for_each_entry_safe(va, n_va, valist, purge_list) {
+ llist_for_each_entry_safe(va, n_va, valist, area_list) {
int nr = (va->va_end - va->va_start) >> PAGE_SHIFT;
__free_vmap_area(va);
@@ -743,7 +743,7 @@ static void free_vmap_area_noflush(struct vmap_area *va)
&vmap_lazy_nr);
/* After this point, we may free va at any time */
- llist_add(&va->purge_list, &vmap_purge_list);
+ llist_add(&va->area_list, &vmap_purge_list);
if (unlikely(nr_lazy > lazy_max_pages()))
try_purge_vmap_area_lazy();
--
2.14.1
The MMU available in many systems running Linux can often provide R/O
protection to the memory pages it handles.
However, the MMU-based protection works efficiently only when said pages
contain exclusively data that will not need further modifications.
Statically allocated variables can be segregated into a dedicated
section (that's how __ro_after_init works), but this does not sit very
well with dynamically allocated ones.
Dynamic allocation does not provide, currently, any means for grouping
variables in memory pages that would contain exclusively data suitable
for conversion to read only access mode.
The allocator here provided (pmalloc - protectable memory allocator)
introduces the concept of pools of protectable memory.
A module can instantiate a pool, and then refer any allocation request to
the pool handler it has received.
A pool is organized ias list of areas of virtually contiguous memory.
Whenever the protection functionality is invoked on a pool, all the
areas it contains that are not yet read-only are write-protected.
The process of growing and protecting the pool can be iterated at will.
Each iteration will prevent further allocation from the memory area
currently active, turn it into read-only mode and then proceed to
secure whatever other area might still be unprotected.
Write-protcting some part of a pool before completing all the
allocations can be wasteful, however it will guarrantee the minimum
window of vulnerability, sice the data can be allocated, initialized
and protected in a single sweep.
There are pros and cons, depending on the allocation patterns, the size
of the areas being allocated, the time intervals between initialization
and protection.
Dstroying a pool is the only way to claim back the associated memory.
It is up to its user to avoid any further references to the memory that
was allocated, once the destruction is invoked.
An example where it is desirable to destroy a pool and claim back its
memory is when unloading a kernel module.
A module can have as many pools as needed.
Since pmalloc memory is obtained from vmalloc, an attacker that has
gained access to the physical mapping, still has to identify where the
target of the attack (in virtually contiguous mapping) is located.
Compared to plain vmalloc, pmalloc does not generate as much TLB
trashing, since it can host multiple allocations in the same page,
where present.
Signed-off-by: Igor Stoppa <[email protected]>
---
include/linux/pmalloc.h | 148 ++++++++++++++++++++++++++++++++++++++++
include/linux/vmalloc.h | 3 +
mm/Kconfig | 6 ++
mm/Makefile | 1 +
mm/pmalloc.c | 174 ++++++++++++++++++++++++++++++++++++++++++++++++
mm/pmalloc_helpers.h | 154 ++++++++++++++++++++++++++++++++++++++++++
mm/usercopy.c | 9 +++
mm/vmalloc.c | 2 +-
8 files changed, 496 insertions(+), 1 deletion(-)
create mode 100644 include/linux/pmalloc.h
create mode 100644 mm/pmalloc.c
create mode 100644 mm/pmalloc_helpers.h
diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h
new file mode 100644
index 000000000000..eebaf1ebc6f3
--- /dev/null
+++ b/include/linux/pmalloc.h
@@ -0,0 +1,148 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * pmalloc.h: Header for Protectable Memory Allocator
+ *
+ * (C) Copyright 2017-18 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <[email protected]>
+ */
+
+#ifndef _LINUX_PMALLOC_H
+#define _LINUX_PMALLOC_H
+
+
+#include <linux/string.h>
+#include <linux/slab.h>
+
+/*
+ * Library for dynamic allocation of pools of protectable memory.
+ * A pool is a single linked list of vmap_area structures.
+ * Whenever a pool is protected, all the areas it contains at that point
+ * are write protected.
+ * More areas can be added and protected, in the same way.
+ * Memory in a pool cannot be individually unprotected, but the pool can
+ * be destroyed.
+ * Upon destruction of a certain pool, all the related memory is released,
+ * including its metadata.
+ */
+
+
+#define PMALLOC_REFILL_DEFAULT (0)
+#define PMALLOC_ALIGN_DEFAULT ARCH_KMALLOC_MINALIGN
+
+struct pmalloc_pool *pmalloc_create_custom_pool(size_t refill,
+ unsigned short align_order);
+
+/**
+ * pmalloc_create_pool() - create a protectable memory pool
+ *
+ * Shorthand for pmalloc_create_custom_pool() with default argument:
+ * * refill is set to PMALLOC_REFILL_DEFAULT
+ * * align_order is set to PMALLOC_ALIGN_DEFAULT
+ *
+ * Return:
+ * * pointer to the new pool - success
+ * * NULL - error
+ */
+static inline struct pmalloc_pool *pmalloc_create_pool(void)
+{
+ return pmalloc_create_custom_pool(PMALLOC_REFILL_DEFAULT,
+ PMALLOC_ALIGN_DEFAULT);
+}
+
+
+void *pmalloc(struct pmalloc_pool *pool, size_t size);
+
+
+/**
+ * pzalloc() - zero-initialized version of pmalloc()
+ * @pool: handle to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ *
+ * Executes pmalloc(), initializing the memory requested to 0, before
+ * returning its address.
+ *
+ * Return:
+ * * pointer to the memory requested - success
+ * * NULL - error
+ */
+static inline void *pzalloc(struct pmalloc_pool *pool, size_t size)
+{
+ void *ptr = pmalloc(pool, size);
+
+ if (likely(ptr))
+ memset(ptr, 0, size);
+ return ptr;
+}
+
+
+/**
+ * pmalloc_array() - array version of pmalloc()
+ * @pool: handle to the pool to be used for memory allocation
+ * @n: number of elements in the array
+ * @size: amount of memory (in bytes) requested for each element
+ *
+ * Executes pmalloc(), on an array.
+ *
+ * Return:
+ * * the pmalloc result - success
+ * * NULL - error
+ */
+
+static inline void *pmalloc_array(struct pmalloc_pool *pool, size_t n,
+ size_t size)
+{
+ if (unlikely(size != 0) && unlikely(n > SIZE_MAX / size))
+ return NULL;
+ return pmalloc(pool, n * size);
+}
+
+
+/**
+ * pcalloc() - array version of pzalloc()
+ * @pool: handle to the pool to be used for memory allocation
+ * @n: number of elements in the array
+ * @size: amount of memory (in bytes) requested for each element
+ *
+ * Executes pzalloc(), on an array.
+ *
+ * Return:
+ * * the pmalloc result - success
+ * * NULL - error
+ */
+static inline void *pcalloc(struct pmalloc_pool *pool, size_t n,
+ size_t size)
+{
+ if (unlikely(size != 0) && unlikely(n > SIZE_MAX / size))
+ return NULL;
+ return pzalloc(pool, n * size);
+}
+
+
+/**
+ * pstrdup() - duplicate a string, using pmalloc()
+ * @pool: handle to the pool to be used for memory allocation
+ * @s: string to duplicate
+ *
+ * Generates a copy of the given string, allocating sufficient memory
+ * from the given pmalloc pool.
+ *
+ * Return:
+ * * pointer to the replica - success
+ * * NULL - error
+ */
+static inline char *pstrdup(struct pmalloc_pool *pool, const char *s)
+{
+ size_t len;
+ char *buf;
+
+ len = strlen(s) + 1;
+ buf = pmalloc(pool, len);
+ if (likely(buf))
+ strncpy(buf, s, len);
+ return buf;
+}
+
+void pmalloc_protect_pool(struct pmalloc_pool *pool);
+
+void pmalloc_destroy_pool(struct pmalloc_pool *pool);
+#endif
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 2d07dfef3cfd..69c12f21200f 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -20,6 +20,8 @@ struct notifier_block; /* in notifier.h */
#define VM_UNINITIALIZED 0x00000020 /* vm_struct is not fully initialized */
#define VM_NO_GUARD 0x00000040 /* don't add guard page */
#define VM_KASAN 0x00000080 /* has allocated kasan shadow memory */
+#define VM_PMALLOC 0x00000100 /* pmalloc area - see docs */
+#define VM_PMALLOC_PROTECTED 0x00000200 /* protected area - see docs */
/* bits [20..32] reserved for arch specific ioremap internals */
/*
@@ -133,6 +135,7 @@ extern struct vm_struct *__get_vm_area_caller(unsigned long size,
const void *caller);
extern struct vm_struct *remove_vm_area(const void *addr);
extern struct vm_struct *find_vm_area(const void *addr);
+extern struct vmap_area *find_vmap_area(unsigned long addr);
extern int map_vm_area(struct vm_struct *area, pgprot_t prot,
struct page **pages);
diff --git a/mm/Kconfig b/mm/Kconfig
index d5004d82a1d6..d7ef40eaa4e8 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -752,3 +752,9 @@ config GUP_BENCHMARK
performance of get_user_pages_fast().
See tools/testing/selftests/vm/gup_benchmark.c
+
+config PROTECTABLE_MEMORY
+ bool
+ depends on MMU
+ depends on ARCH_HAS_SET_MEMORY
+ default y
diff --git a/mm/Makefile b/mm/Makefile
index b4e54a9ae9c5..6a6668f99799 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -65,6 +65,7 @@ obj-$(CONFIG_SPARSEMEM) += sparse.o
obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
obj-$(CONFIG_SLOB) += slob.o
obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
+obj-$(CONFIG_PROTECTABLE_MEMORY) += pmalloc.o
obj-$(CONFIG_KSM) += ksm.o
obj-$(CONFIG_PAGE_POISONING) += page_poison.o
obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/pmalloc.c b/mm/pmalloc.c
new file mode 100644
index 000000000000..ddaef41837f4
--- /dev/null
+++ b/mm/pmalloc.c
@@ -0,0 +1,174 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * pmalloc.c: Protectable Memory Allocator
+ *
+ * (C) Copyright 2017-2018 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <[email protected]>
+ */
+
+#include <linux/printk.h>
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <linux/vmalloc.h>
+#include <linux/kernel.h>
+#include <linux/log2.h>
+#include <linux/slab.h>
+#include <linux/set_memory.h>
+#include <linux/bug.h>
+#include <linux/mutex.h>
+#include <linux/llist.h>
+#include <asm/cacheflush.h>
+#include <asm/page.h>
+
+#include <linux/pmalloc.h>
+#include "pmalloc_helpers.h"
+
+static LIST_HEAD(pools_list);
+static DEFINE_MUTEX(pools_mutex);
+
+#define MAX_ALIGN_ORDER (ilog2(sizeof(void *)))
+#define DEFAULT_REFILL_SIZE PAGE_SIZE
+
+/**
+ * pmalloc_create_custom_pool() - create a new protectable memory pool
+ * @refill: the minimum size to allocate when in need of more memory.
+ * It will be rounded up to a multiple of PAGE_SIZE
+ * The value of 0 gives the default amount of PAGE_SIZE.
+ * @align_order: log2 of the alignment to use when allocating memory
+ * Negative values give ARCH_KMALLOC_MINALIGN
+ *
+ * Creates a new (empty) memory pool for allocation of protectable
+ * memory. Memory will be allocated upon request (through pmalloc).
+ *
+ * Return:
+ * * pointer to the new pool - success
+ * * NULL - error
+ */
+struct pmalloc_pool *pmalloc_create_custom_pool(size_t refill,
+ unsigned short align_order)
+{
+ struct pmalloc_pool *pool;
+
+ pool = kzalloc(sizeof(struct pmalloc_pool), GFP_KERNEL);
+ if (WARN(!pool, "Could not allocate pool meta data."))
+ return NULL;
+
+ pool->refill = refill ? PAGE_ALIGN(refill) : DEFAULT_REFILL_SIZE;
+ pool->align = 1UL << align_order;
+ mutex_init(&pool->mutex);
+
+ mutex_lock(&pools_mutex);
+ list_add(&pool->pool_node, &pools_list);
+ mutex_unlock(&pools_mutex);
+ return pool;
+}
+EXPORT_SYMBOL(pmalloc_create_custom_pool);
+
+
+static int grow(struct pmalloc_pool *pool, size_t min_size)
+{
+ void *addr;
+ struct vmap_area *area;
+ unsigned long size;
+
+ size = (min_size > pool->refill) ? min_size : pool->refill;
+ addr = vmalloc(size);
+ if (WARN(!addr, "Failed to allocate %zd bytes", PAGE_ALIGN(size)))
+ return -ENOMEM;
+
+ area = find_vmap_area((unsigned long)addr);
+ tag_area(area);
+ pool->offset = get_area_pages_size(area);
+ llist_add(&area->area_list, &pool->vm_areas);
+ return 0;
+}
+
+static void *reserve_mem(struct pmalloc_pool *pool, size_t size)
+{
+ pool->offset = round_down(pool->offset - size, pool->align);
+ return (void *)(current_area(pool)->va_start + pool->offset);
+
+}
+
+/**
+ * pmalloc() - allocate protectable memory from a pool
+ * @pool: handle to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ *
+ * Allocates memory from a pool.
+ * If needed, the pool will automatically allocate enough memory to
+ * either satisfy the request or meet the "refill" parameter received
+ * upon creation.
+ * New allocation can happen also if the current memory in the pool is
+ * already write protected.
+ *
+ * Return:
+ * * pointer to the memory requested - success
+ * * NULL - error
+ */
+void *pmalloc(struct pmalloc_pool *pool, size_t size)
+{
+ void *retval = NULL;
+
+ mutex_lock(&pool->mutex);
+ if (unlikely(space_needed(pool, size)) &&
+ unlikely(grow(pool, size)))
+ goto out;
+ retval = reserve_mem(pool, size);
+out:
+ mutex_unlock(&pool->mutex);
+ return retval;
+}
+EXPORT_SYMBOL(pmalloc);
+
+/**
+ * pmalloc_protect_pool() - write-protects the memory in the pool
+ * @pool: the pool associated to the memory to write-protect
+ *
+ * Write-protects all the memory areas currently assigned to the pool
+ * that are still unprotected.
+ * This does not prevent further allocation of additional memory, that
+ * can be initialized and protected.
+ * The catch is that protecting a pool will make unavailable whatever
+ * free memory it might still contain.
+ * Successive allocations will grab more free pages.
+ */
+void pmalloc_protect_pool(struct pmalloc_pool *pool)
+{
+ struct vmap_area *area;
+
+ mutex_lock(&pool->mutex);
+ llist_for_each_entry(area, pool->vm_areas.first, area_list)
+ protect_area(area);
+ mutex_unlock(&pool->mutex);
+}
+EXPORT_SYMBOL(pmalloc_protect_pool);
+
+/**
+ * pmalloc_destroy_pool() - destroys a pool and all the associated memory
+ * @pool: the pool to destroy
+ *
+ * All the memory associated to the pool will be freed, including the
+ * metadata used for the pool.
+ */
+void pmalloc_destroy_pool(struct pmalloc_pool *pool)
+{
+ struct vmap_area *area;
+ struct llist_node *cursor;
+ struct llist_node *tmp;
+
+ mutex_lock(&pools_mutex);
+ list_del(&pool->pool_node);
+ mutex_unlock(&pools_mutex);
+
+ cursor = pool->vm_areas.first;
+ kfree(pool);
+ while (cursor) { /* iteration over llist */
+ tmp = cursor;
+ cursor = cursor->next;
+ area = llist_entry(tmp, struct vmap_area, area_list);
+ destroy_area(area);
+ }
+}
+EXPORT_SYMBOL(pmalloc_destroy_pool);
+
diff --git a/mm/pmalloc_helpers.h b/mm/pmalloc_helpers.h
new file mode 100644
index 000000000000..52d4d899e173
--- /dev/null
+++ b/mm/pmalloc_helpers.h
@@ -0,0 +1,154 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * pmalloc_helpers.h: Protectable Memory Allocator internal header
+ *
+ * (C) Copyright 2018 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <[email protected]>
+ */
+
+#ifndef _MM_VMALLOC_HELPERS_H
+#define _MM_VMALLOC_HELPERS_H
+
+#ifndef CONFIG_PROTECTABLE_MEMORY
+
+static inline void check_pmalloc_object(const void *ptr, unsigned long n,
+ bool to_user)
+{
+}
+
+#else
+
+#include <linux/set_memory.h>
+struct pmalloc_pool {
+ struct mutex mutex;
+ struct list_head pool_node;
+ struct llist_head vm_areas;
+ size_t refill;
+ size_t offset;
+ size_t align;
+};
+
+#define VM_PMALLOC_PROTECTED_MASK (VM_PMALLOC | VM_PMALLOC_PROTECTED)
+#define VM_PMALLOC_MASK (VM_PMALLOC | VM_PMALLOC_PROTECTED)
+
+static __always_inline unsigned long area_flags(struct vmap_area *area)
+{
+ return area->vm->flags & VM_PMALLOC_MASK;
+}
+
+static __always_inline void tag_area(struct vmap_area *area)
+{
+ area->vm->flags |= VM_PMALLOC;
+}
+
+static __always_inline void untag_area(struct vmap_area *area)
+{
+ area->vm->flags &= ~VM_PMALLOC_MASK;
+}
+
+static __always_inline struct vmap_area *current_area(struct pmalloc_pool *pool)
+{
+ return llist_entry(pool->vm_areas.first, struct vmap_area,
+ area_list);
+}
+
+static __always_inline bool is_area_protected(struct vmap_area *area)
+{
+ return (area->vm->flags & VM_PMALLOC_PROTECTED_MASK) ==
+ VM_PMALLOC_PROTECTED_MASK;
+}
+
+static __always_inline void protect_area(struct vmap_area *area)
+{
+ if (unlikely(is_area_protected(area)))
+ return;
+ set_memory_ro(area->va_start, area->vm->nr_pages);
+ area->vm->flags |= VM_PMALLOC_PROTECTED_MASK;
+}
+
+static __always_inline void unprotect_area(struct vmap_area *area)
+{
+ if (likely(is_area_protected(area)))
+ set_memory_rw(area->va_start, area->vm->nr_pages);
+ untag_area(area);
+}
+
+static __always_inline void destroy_area(struct vmap_area *area)
+{
+ WARN(!is_area_protected(area), "Destroying unprotected area.");
+ unprotect_area(area);
+ vfree((void *)area->va_start);
+}
+
+static __always_inline bool empty(struct pmalloc_pool *pool)
+{
+ return unlikely(llist_empty(&pool->vm_areas));
+}
+
+static __always_inline bool protected(struct pmalloc_pool *pool)
+{
+ return is_area_protected(current_area(pool));
+}
+
+static inline bool exhausted(struct pmalloc_pool *pool, size_t size)
+{
+ size_t space_before;
+ size_t space_after;
+
+ space_before = round_down(pool->offset, pool->align);
+ space_after = pool->offset - space_before;
+ return unlikely(space_after < size && space_before < size);
+}
+
+static __always_inline bool space_needed(struct pmalloc_pool *pool, size_t size)
+{
+ return empty(pool) || protected(pool) || exhausted(pool, size);
+}
+
+static __always_inline size_t get_area_pages_size(struct vmap_area *area)
+{
+ return area->vm->nr_pages * PAGE_SIZE;
+}
+
+static inline int is_pmalloc_object(const void *ptr, const unsigned long n)
+{
+ struct vm_struct *area;
+ unsigned long start = (unsigned long)ptr;
+ unsigned long end = start + n;
+ unsigned long area_end;
+
+ if (likely(!is_vmalloc_addr(ptr)))
+ return false;
+
+ area = vmalloc_to_page(ptr)->area;
+ if (unlikely(!(area->flags & VM_PMALLOC)))
+ return false;
+
+ area_end = area->nr_pages * PAGE_SIZE + (unsigned long)area->addr;
+ return (start <= end) && (end <= area_end);
+}
+
+void __noreturn usercopy_abort(const char *name, const char *detail,
+ bool to_user, unsigned long offset,
+ unsigned long len);
+
+static inline void check_pmalloc_object(const void *ptr, unsigned long n,
+ bool to_user)
+{
+ int retv;
+
+ retv = is_pmalloc_object(ptr, n);
+ if (unlikely(retv)) {
+ if (unlikely(!to_user))
+ usercopy_abort("pmalloc",
+ "trying to write to pmalloc object",
+ to_user, (const unsigned long)ptr, n);
+ if (retv < 0)
+ usercopy_abort("pmalloc",
+ "invalid pmalloc object",
+ to_user, (const unsigned long)ptr, n);
+ }
+}
+
+#endif
+#endif
diff --git a/mm/usercopy.c b/mm/usercopy.c
index e9e9325f7638..6c47bd765033 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -20,8 +20,14 @@
#include <linux/sched/task.h>
#include <linux/sched/task_stack.h>
#include <linux/thread_info.h>
+#include <linux/init.h>
+#include <linux/debugfs.h>
+#include <linux/pmalloc.h>
+#include <linux/sched/clock.h>
#include <asm/sections.h>
+#include "pmalloc_helpers.h"
+
/*
* Checks if a given pointer and length is contained by the current
* stack frame (if possible).
@@ -277,5 +283,8 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
/* Check for object in kernel to avoid text exposure. */
check_kernel_text_object((const unsigned long)ptr, n, to_user);
+
+ /* Check if object is from a pmalloc chunk. */
+ check_pmalloc_object(ptr, n, to_user);
}
EXPORT_SYMBOL(__check_object_size);
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 1bb2233bb262..da9cc9cd8b52 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -759,7 +759,7 @@ static void free_unmap_vmap_area(struct vmap_area *va)
free_vmap_area_noflush(va);
}
-static struct vmap_area *find_vmap_area(unsigned long addr)
+struct vmap_area *find_vmap_area(unsigned long addr)
{
struct vmap_area *va;
--
2.14.1
Hi Igor,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on mmotm/master]
[also build test ERROR on v4.17-rc2]
[cannot apply to next-20180423]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Igor-Stoppa/struct-page-add-field-for-vm_struct/20180424-065001
base: git://git.cmpxchg.org/linux-mmotm.git master
config: mips-jz4740 (attached as .config)
compiler: mipsel-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=mips
All errors (new ones prefixed by >>):
security/selinux/ss/services.o: In function `selinux_ss_init':
>> services.c:(.text+0x21f4): undefined reference to `pmalloc_create_custom_pool'
>> services.c:(.text+0x2218): undefined reference to `pmalloc'
>> services.c:(.text+0x223c): undefined reference to `pmalloc_protect_pool'
security/selinux/ss/services.o: In function `security_load_policy':
>> services.c:(.text+0x4ab8): undefined reference to `pmalloc_rare_write'
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
On Mon, Apr 23, 2018 at 04:54:56PM +0400, Igor Stoppa wrote:
> While the vanilla version of pmalloc provides support for permanently
> transitioning between writable and read-only of a memory pool, this
> patch seeks to support a separate class of data, which would still
> benefit from write protection, most of the time, but it still needs to
> be modifiable. Maybe very seldom, but still cannot be permanently marked
> as read-only.
This seems like a horrible idea that basically makes this feature useless.
I would say the right way to do this is to have:
struct modifiable_data {
struct immutable_data *d;
...
};
Then allocate a new pool, change d and destroy the old pool.
On 24/04/18 15:50, Matthew Wilcox wrote:
> On Mon, Apr 23, 2018 at 04:54:56PM +0400, Igor Stoppa wrote:
>> While the vanilla version of pmalloc provides support for permanently
>> transitioning between writable and read-only of a memory pool, this
>> patch seeks to support a separate class of data, which would still
>> benefit from write protection, most of the time, but it still needs to
>> be modifiable. Maybe very seldom, but still cannot be permanently marked
>> as read-only.
>
> This seems like a horrible idea that basically makes this feature useless.
> I would say the right way to do this is to have:
>
> struct modifiable_data {
> struct immutable_data *d;
> ...
> };
>
> Then allocate a new pool, change d and destroy the old pool.
I'm not sure I understand.
The pool, in the patchset, is a collection of related vm_areas.
What I would like to do is to modify some of the memory that has already
been handed out as reference, in a way that the reference is not
altered, nor it requires extensive rewites of all, in place of he usual
assignment.
Are you saying that my idea is fundamentally broken?
If yes, how to break it? :-)
If not, I need more coffee, pherhaps we can have a cup together later? :-)
--
igor
On 4/24/18 1:50 PM, Matthew Wilcox wrote:
> struct modifiable_data {
> struct immutable_data *d;
> ...
> };
>
> Then allocate a new pool, change d and destroy the old pool.
With the above, you have just shifted the target of the arbitrary write
from the immutable data itself to the pointer to the immutable data, so
got no security benefit.
The goal of the patch is to reduce the window when stuff is writeable,
so that an arbitrary write is likely to hit the time when data is read-only.
- Enrico
On 24/04/18 16:32, lazytyped wrote:
>
>
> On 4/24/18 1:50 PM, Matthew Wilcox wrote:
>> struct modifiable_data {
>> struct immutable_data *d;
>> ...
>> };
>>
>> Then allocate a new pool, change d and destroy the old pool.
>
> With the above, you have just shifted the target of the arbitrary write
> from the immutable data itself to the pointer to the immutable data, so
> got no security benefit.
>
> The goal of the patch is to reduce the window when stuff is writeable,
> so that an arbitrary write is likely to hit the time when data is read-only.
Indeed, that was my - poorly explained, I admit it - idea.
For example, that's the reason why I am remapping one page at a time in
a loop, instead of doing the whole array, to limit exposure and increase
randomness.
WRT the implementation, I'm sure there are bugs that need squashing.
But if I have overlooked some aspect in the overall design, I need
guidance, because i still do not see what I am missing :-(
--
igor
On 04/23/2018 08:54 AM, Igor Stoppa wrote:
> SELinux is one of the primary targets, when a system running it comes
> under attack.
>
> The reason is that, even if an attacker ishould manage to gain root,
> SELinux will still prevent most desirable actions.
>
> Even in a fully locked down system, SELinux still presents a vulnerability
> that is often exploited, because it is very simple to attack, once
> kernel address layout randomization has been defeated and the attacker
> has gained capability of writing to kernelunprotected data.
>
> In various places, SELinux relies on an "initialized" internal state
> variable, to decide if the policy is loaded and tests should be
> performed. Needless to say, it's in the interest of hte attacker to turn
> it off and pretend that the policyDB is still uninitialized.
>
> Even if recent patches move the "initialized" state inside a structure,
> it is still vulnerable.
>
> This patch seeks to protect it, using it as demo for the pmalloc API,
> which is meant to provide additional protection to data which is likely
> to not be changed very often, if ever (after a transient).
>
> The patch is probably in need of rework, to make it fit better with the
> new SELinux internal data structures, however it shows how to deny an
> easy target to the attacker.
I know this is just an example, but not sure why you wouldn't just protect the
entire selinux_state. Note btw that the selinux_state encapsulation is preparatory work
for selinux namespaces [1], at which point the structure is in fact dynamically allocated
and there can be multiple instances of it. That however is work-in-progress, highly experimental,
and might not ever make it upstream (if we can't resolve the various challenges it poses in a satisfactory
way).
[1] http://blog.namei.org/2018/01/22/lca-2018-kernel-miniconf-selinux-namespacing-slides/
>
> In case the kernel is compiled with JOP safeguards, then it becomes far
> harder for the attacker to jump into the middle of the function which
> calls pmalloc_rare_write, to alter the state.
>
> Signed-off-by: Igor Stoppa <[email protected]>
> ---
> security/selinux/hooks.c | 12 ++++-----
> security/selinux/include/security.h | 2 +-
> security/selinux/ss/services.c | 51 +++++++++++++++++++++++--------------
> 3 files changed, 39 insertions(+), 26 deletions(-)
>
> diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
> index 4cafe6a19167..6049f80115bc 100644
> --- a/security/selinux/hooks.c
> +++ b/security/selinux/hooks.c
> @@ -285,7 +285,7 @@ static int __inode_security_revalidate(struct inode *inode,
>
> might_sleep_if(may_sleep);
>
> - if (selinux_state.initialized &&
> + if (*ss_initialized_ptr &&
> isec->initialized != LABEL_INITIALIZED) {
> if (!may_sleep)
> return -ECHILD;
> @@ -612,7 +612,7 @@ static int selinux_get_mnt_opts(const struct super_block *sb,
> if (!(sbsec->flags & SE_SBINITIALIZED))
> return -EINVAL;
>
> - if (!selinux_state.initialized)
> + if (!*ss_initialized_ptr)
> return -EINVAL;
>
> /* make sure we always check enough bits to cover the mask */
> @@ -735,7 +735,7 @@ static int selinux_set_mnt_opts(struct super_block *sb,
>
> mutex_lock(&sbsec->lock);
>
> - if (!selinux_state.initialized) {
> + if (!*ss_initialized_ptr) {
> if (!num_opts) {
> /* Defer initialization until selinux_complete_init,
> after the initial policy is loaded and the security
> @@ -1022,7 +1022,7 @@ static int selinux_sb_clone_mnt_opts(const struct super_block *oldsb,
> * if the parent was able to be mounted it clearly had no special lsm
> * mount options. thus we can safely deal with this superblock later
> */
> - if (!selinux_state.initialized)
> + if (!*ss_initialized_ptr)
> return 0;
>
> /*
> @@ -3040,7 +3040,7 @@ static int selinux_inode_init_security(struct inode *inode, struct inode *dir,
> isec->initialized = LABEL_INITIALIZED;
> }
>
> - if (!selinux_state.initialized || !(sbsec->flags & SBLABEL_MNT))
> + if (!*ss_initialized_ptr || !(sbsec->flags & SBLABEL_MNT))
> return -EOPNOTSUPP;
>
> if (name)
> @@ -7253,7 +7253,7 @@ static void selinux_nf_ip_exit(void)
> #ifdef CONFIG_SECURITY_SELINUX_DISABLE
> int selinux_disable(struct selinux_state *state)
> {
> - if (state->initialized) {
> + if (*ss_initialized_ptr) {
> /* Not permitted after initial policy load. */
> return -EINVAL;
> }
> diff --git a/security/selinux/include/security.h b/security/selinux/include/security.h
> index 23e762d529fa..ec7debb143be 100644
> --- a/security/selinux/include/security.h
> +++ b/security/selinux/include/security.h
> @@ -96,13 +96,13 @@ extern char *selinux_policycap_names[__POLICYDB_CAPABILITY_MAX];
> struct selinux_avc;
> struct selinux_ss;
>
> +extern bool *ss_initialized_ptr;
> struct selinux_state {
> bool disabled;
> #ifdef CONFIG_SECURITY_SELINUX_DEVELOP
> bool enforcing;
> #endif
> bool checkreqprot;
> - bool initialized;
> bool policycap[__POLICYDB_CAPABILITY_MAX];
> struct selinux_avc *avc;
> struct selinux_ss *ss;
> diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c
> index 8057e19dc15f..c09ca6f9b269 100644
> --- a/security/selinux/ss/services.c
> +++ b/security/selinux/ss/services.c
> @@ -52,6 +52,7 @@
> #include <linux/selinux.h>
> #include <linux/flex_array.h>
> #include <linux/vmalloc.h>
> +#include <linux/pmalloc.h>
> #include <net/netlabel.h>
>
> #include "flask.h"
> @@ -80,10 +81,20 @@ char *selinux_policycap_names[__POLICYDB_CAPABILITY_MAX] = {
> "nnp_nosuid_transition"
> };
>
> +bool *ss_initialized_ptr __ro_after_init;
> +static struct pmalloc_pool *selinux_pool;
> static struct selinux_ss selinux_ss;
>
> void selinux_ss_init(struct selinux_ss **ss)
> {
> + selinux_pool = pmalloc_create_pool(PMALLOC_RW);
> + if (unlikely(!selinux_pool))
> + panic("SELinux: unable to create pmalloc pool.");
> + ss_initialized_ptr = pmalloc(selinux_pool, sizeof(bool));
> + if (unlikely(!ss_initialized_ptr))
> + panic("SElinux: unable to allocate from pmalloc pool.");
> + *ss_initialized_ptr = false;
> + pmalloc_protect_pool(selinux_pool);
> rwlock_init(&selinux_ss.policy_rwlock);
> mutex_init(&selinux_ss.status_lock);
> *ss = &selinux_ss;
> @@ -772,7 +783,7 @@ static int security_compute_validatetrans(struct selinux_state *state,
> int rc = 0;
>
>
> - if (!state->initialized)
> + if (!*ss_initialized_ptr)
> return 0;
>
> read_lock(&state->ss->policy_rwlock);
> @@ -872,7 +883,7 @@ int security_bounded_transition(struct selinux_state *state,
> int index;
> int rc;
>
> - if (!state->initialized)
> + if (!*ss_initialized_ptr)
> return 0;
>
> read_lock(&state->ss->policy_rwlock);
> @@ -1032,7 +1043,7 @@ void security_compute_xperms_decision(struct selinux_state *state,
> memset(xpermd->dontaudit->p, 0, sizeof(xpermd->dontaudit->p));
>
> read_lock(&state->ss->policy_rwlock);
> - if (!state->initialized)
> + if (!*ss_initialized_ptr)
> goto allow;
>
> policydb = &state->ss->policydb;
> @@ -1121,7 +1132,7 @@ void security_compute_av(struct selinux_state *state,
> read_lock(&state->ss->policy_rwlock);
> avd_init(state, avd);
> xperms->len = 0;
> - if (!state->initialized)
> + if (!*ss_initialized_ptr)
> goto allow;
>
> policydb = &state->ss->policydb;
> @@ -1175,7 +1186,7 @@ void security_compute_av_user(struct selinux_state *state,
>
> read_lock(&state->ss->policy_rwlock);
> avd_init(state, avd);
> - if (!state->initialized)
> + if (!*ss_initialized_ptr)
> goto allow;
>
> policydb = &state->ss->policydb;
> @@ -1294,7 +1305,7 @@ static int security_sid_to_context_core(struct selinux_state *state,
> *scontext = NULL;
> *scontext_len = 0;
>
> - if (!state->initialized) {
> + if (!*ss_initialized_ptr) {
> if (sid <= SECINITSID_NUM) {
> char *scontextp;
>
> @@ -1466,7 +1477,7 @@ static int security_context_to_sid_core(struct selinux_state *state,
> if (!scontext2)
> return -ENOMEM;
>
> - if (!state->initialized) {
> + if (!*ss_initialized_ptr) {
> int i;
>
> for (i = 1; i < SECINITSID_NUM; i++) {
> @@ -1648,7 +1659,7 @@ static int security_compute_sid(struct selinux_state *state,
> int rc = 0;
> bool sock;
>
> - if (!state->initialized) {
> + if (!*ss_initialized_ptr) {
> switch (orig_tclass) {
> case SECCLASS_PROCESS: /* kernel value */
> *out_sid = ssid;
> @@ -2128,7 +2139,8 @@ int security_load_policy(struct selinux_state *state, void *data, size_t len)
> policydb = &state->ss->policydb;
> sidtab = &state->ss->sidtab;
>
> - if (!state->initialized) {
> + if (!*ss_initialized_ptr) {
> + bool dummy_initialized = true;
> rc = policydb_read(policydb, fp);
> if (rc)
> goto out;
> @@ -2148,7 +2160,8 @@ int security_load_policy(struct selinux_state *state, void *data, size_t len)
> }
>
> security_load_policycaps(state);
> - state->initialized = 1;
> + pmalloc_rare_write(selinux_pool, ss_initialized_ptr,
> + &dummy_initialized, sizeof(bool));
> seqno = ++state->ss->latest_granting;
> selinux_complete_init();
> avc_ss_reset(state->avc, seqno);
> @@ -2578,7 +2591,7 @@ int security_get_user_sids(struct selinux_state *state,
> *sids = NULL;
> *nel = 0;
>
> - if (!state->initialized)
> + if (!*ss_initialized_ptr)
> goto out;
>
> read_lock(&state->ss->policy_rwlock);
> @@ -2812,7 +2825,7 @@ int security_get_bools(struct selinux_state *state,
> struct policydb *policydb;
> int i, rc;
>
> - if (!state->initialized) {
> + if (!*ss_initialized_ptr) {
> *len = 0;
> *names = NULL;
> *values = NULL;
> @@ -2987,7 +3000,7 @@ int security_sid_mls_copy(struct selinux_state *state,
> int rc;
>
> rc = 0;
> - if (!state->initialized || !policydb->mls_enabled) {
> + if (!*ss_initialized_ptr || !policydb->mls_enabled) {
> *new_sid = sid;
> goto out;
> }
> @@ -3094,7 +3107,7 @@ int security_net_peersid_resolve(struct selinux_state *state,
> /*
> * We don't need to check initialized here since the only way both
> * nlbl_sid and xfrm_sid are not equal to SECSID_NULL would be if the
> - * security server was initialized and state->initialized was true.
> + * security server was initialized and *ss_initialized_ptr was true.
> */
> if (!policydb->mls_enabled)
> return 0;
> @@ -3149,7 +3162,7 @@ int security_get_classes(struct selinux_state *state,
> struct policydb *policydb = &state->ss->policydb;
> int rc;
>
> - if (!state->initialized) {
> + if (!*ss_initialized_ptr) {
> *nclasses = 0;
> *classes = NULL;
> return 0;
> @@ -3298,7 +3311,7 @@ int selinux_audit_rule_init(u32 field, u32 op, char *rulestr, void **vrule)
>
> *rule = NULL;
>
> - if (!state->initialized)
> + if (!*ss_initialized_ptr)
> return -EOPNOTSUPP;
>
> switch (field) {
> @@ -3598,7 +3611,7 @@ int security_netlbl_secattr_to_sid(struct selinux_state *state,
> struct context *ctx;
> struct context ctx_new;
>
> - if (!state->initialized) {
> + if (!*ss_initialized_ptr) {
> *sid = SECSID_NULL;
> return 0;
> }
> @@ -3665,7 +3678,7 @@ int security_netlbl_sid_to_secattr(struct selinux_state *state,
> int rc;
> struct context *ctx;
>
> - if (!state->initialized)
> + if (!*ss_initialized_ptr)
> return 0;
>
> read_lock(&state->ss->policy_rwlock);
> @@ -3704,7 +3717,7 @@ int security_read_policy(struct selinux_state *state,
> int rc;
> struct policy_file fp;
>
> - if (!state->initialized)
> + if (!*ss_initialized_ptr)
> return -EINVAL;
>
> *len = security_policydb_len(state);
>
On 24/04/18 16:49, Stephen Smalley wrote:
> On 04/23/2018 08:54 AM, Igor Stoppa wrote:
[...]
>> The patch is probably in need of rework, to make it fit better with the
>> new SELinux internal data structures, however it shows how to deny an
>> easy target to the attacker.
>
> I know this is just an example, but not sure why you wouldn't just protect the
> entire selinux_state.
Because I have much more to discuss about SELinux, which would involve
the whole state, the policyDB and the AVC
I will start a separate thread about that. This was merely as simple as
possible example of the use of the API.
I just wanted to have a feeling about how it would be received :-)
> Note btw that the selinux_state encapsulation is preparatory work
> for selinux namespaces [1], at which point the structure is in fact dynamically allocated
> and there can be multiple instances of it. That however is work-in-progress, highly experimental,
> and might not ever make it upstream (if we can't resolve the various challenges it poses in a satisfactory
> way).
Yes, I am aware of this and I would like to discuss also in the light of
the future directions.
I just didn't want to waste too much time on something that you might
want to change radically in a month :-)
I already was caught once by surprise when ss_initalized disappeared
just when I had a patch ready for it :-)
--
igor
On Tue, Apr 24, 2018 at 02:32:36PM +0200, lazytyped wrote:
> On 4/24/18 1:50 PM, Matthew Wilcox wrote:
> > struct modifiable_data {
> > struct immutable_data *d;
> > ...
> > };
> >
> > Then allocate a new pool, change d and destroy the old pool.
>
> With the above, you have just shifted the target of the arbitrary write
> from the immutable data itself to the pointer to the immutable data, so
> got no security benefit.
There's always a pointer to the immutable data. How do you currently
get to the selinux context? file->f_security. You can't make 'file'
immutable, so file->f_security is the target of the arbitrary write.
All you can do is make life harder, and reduce the size of the target.
> The goal of the patch is to reduce the window when stuff is writeable,
> so that an arbitrary write is likely to hit the time when data is read-only.
Yes, reducing the size of the target in time as well as bytes. This patch
gives attackers a great roadmap (maybe even gadget) to unprotecting
a pool.
On 4/24/18 4:44 PM, Matthew Wilcox wrote:
> On Tue, Apr 24, 2018 at 02:32:36PM +0200, lazytyped wrote:
>> On 4/24/18 1:50 PM, Matthew Wilcox wrote:
>>> struct modifiable_data {
>>> struct immutable_data *d;
>>> ...
>>> };
>>>
>>> Then allocate a new pool, change d and destroy the old pool.
>> With the above, you have just shifted the target of the arbitrary write
>> from the immutable data itself to the pointer to the immutable data, so
>> got no security benefit.
> There's always a pointer to the immutable data. How do you currently
> get to the selinux context? file->f_security. You can't make 'file'
> immutable, so file->f_security is the target of the arbitrary write.
> All you can do is make life harder, and reduce the size of the target.
So why adding an extra pointer/indirection helps here? It adds attacking
surface.
>
>> The goal of the patch is to reduce the window when stuff is writeable,
>> so that an arbitrary write is likely to hit the time when data is read-only.
> Yes, reducing the size of the target in time as well as bytes. This patch
> gives attackers a great roadmap (maybe even gadget) to unprotecting
> a pool.
I don't think this is relevant to the threat model this patch addresses.
If the attacker can already execute code, it doesn't matter whether this
specific piece of code exists or not. In general, if an attacker got to
the point of using gadgets, you've lost.
On the contrary, it opens the road to design trusted paths that can
write to or access data that would generally be read-only or not
accessible (with, of course, all the complexity, limitations and
penalties of doing this purely in software on a page sized basis).
- Enrico
On 24/04/18 19:03, lazytyped wrote:
>
>
> On 4/24/18 4:44 PM, Matthew Wilcox wrote:
>> On Tue, Apr 24, 2018 at 02:32:36PM +0200, lazytyped wrote:
>>> On 4/24/18 1:50 PM, Matthew Wilcox wrote:
>>>> struct modifiable_data {
>>>> struct immutable_data *d;
>>>> ...
>>>> };
>>>>
>>>> Then allocate a new pool, change d and destroy the old pool.
>>> With the above, you have just shifted the target of the arbitrary write
>>> from the immutable data itself to the pointer to the immutable data, so
>>> got no security benefit.
>> There's always a pointer to the immutable data. How do you currently
>> get to the selinux context? file->f_security. You can't make 'file'
>> immutable, so file->f_security is the target of the arbitrary write.
>> All you can do is make life harder, and reduce the size of the target.
>
> So why adding an extra pointer/indirection helps here? It adds attacking
> surface.
>>
>>> The goal of the patch is to reduce the window when stuff is writeable,
>>> so that an arbitrary write is likely to hit the time when data is read-only.
>> Yes, reducing the size of the target in time as well as bytes. This patch
>> gives attackers a great roadmap (maybe even gadget) to unprotecting
>> a pool.
>
> I don't think this is relevant to the threat model this patch addresses.
> If the attacker can already execute code, it doesn't matter whether this
> specific piece of code exists or not. In general, if an attacker got to
> the point of using gadgets, you've lost.
Realistically, if the attacker can execute arbitrary code, through
gadgets, there is nothing preventing a direct attack to the physical
page, by remapping it, exactly like the patch does.
Or even changing the page table.
Wrt re-utilizing this specific rare_write() function, it would be
possible to mark it as __always_inline, so that it will be executed only
with the data and pool it is intended for.
Then, if one has access to a compiler plugin that does CFI, it becomes
harder to reuse the inlined function.
Inlining should not be too bad, as size overhead.
OTOH, having the pointer always laying around at a specific address,
allows for easier scanning - and attack - of the data
The remapping to a temporary address should make it harder to figure out
where to write to.
Again, the whole assumption behind pmalloc is that the attacker can do
read and writes, maybe limited execution, in the form of function calls.
But if the attacker can execute arbitrary code, all bets are off and the
system is forfeited.
Really critical data should go into a TEE or similar isolated environment.
> On the contrary, it opens the road to design trusted paths that can
> write to or access data that would generally be read-only or not
> accessible (with, of course, all the complexity, limitations and
> penalties of doing this purely in software on a page sized basis).
I had considered the COW approach, where I would allocate a new page and
swap it atomically, but it is not supported on ARM.
--
igor
On 24/04/18 16:33, Igor Stoppa wrote:
>
>
> On 24/04/18 15:50, Matthew Wilcox wrote:
>> On Mon, Apr 23, 2018 at 04:54:56PM +0400, Igor Stoppa wrote:
>>> While the vanilla version of pmalloc provides support for permanently
>>> transitioning between writable and read-only of a memory pool, this
>>> patch seeks to support a separate class of data, which would still
>>> benefit from write protection, most of the time, but it still needs to
>>> be modifiable. Maybe very seldom, but still cannot be permanently marked
>>> as read-only.
>>
>> This seems like a horrible idea that basically makes this feature
>> useless.
>> I would say the right way to do this is to have:
>>
>> struct modifiable_data {
>> struct immutable_data *d;
>> ...
>> };
>>
>> Then allocate a new pool, change d and destroy the old pool.
>
> I'm not sure I understand.
A few cups of coffee later ...
This seems like a regression from my case.
My case (see the example with the initialized state) is:
static void *pointer_to_pmalloc_memory __ro_after_init;
then, during init:
pointer_to_pmalloc_memory = pmalloc(pool, size);
then init happens
*pointer_to_pmalloc_memory = some_value;
pmalloc_protect_pool(pool9;
and to change the value:
support_variable = some_other_value;
pmalloc_rare_write(pool, pointer_to_pmalloc_memory,
&support_variable, size)
But in this case the pmalloc allocation would be assigned to a writable
variable.
This seems like a regression to me: at this point who cares anymore
about the pmalloc memory?
Just rewrite the pointer to point to somewhere else that is writable and
has the desired (from the attacker) value.
It doesn't even require gadgets. pmalloc becomes useless.
Do I still need more coffee?
--
igor
On 24/04/18 18:44, Matthew Wilcox wrote:
> On Tue, Apr 24, 2018 at 02:32:36PM +0200, lazytyped wrote:
>> On 4/24/18 1:50 PM, Matthew Wilcox wrote:
>>> struct modifiable_data {
>>> struct immutable_data *d;
>>> ...
>>> };
>>>
>>> Then allocate a new pool, change d and destroy the old pool.
>>
>> With the above, you have just shifted the target of the arbitrary write
>> from the immutable data itself to the pointer to the immutable data, so
>> got no security benefit.
>
> There's always a pointer to the immutable data. How do you currently
> get to the selinux context? file->f_security. You can't make 'file'
> immutable, so file->f_security is the target of the arbitrary write.
> All you can do is make life harder, and reduce the size of the target.
In the patch that shows how to secure the selinux initialized state,
there is a static _ro_after_init handle (the 'file' in your example),
which is immutable, after init has completed.
It is as immutable as any const data that is not optimized away.
That is what the code uses to refer to the pmalloc data.
Since the reference is static, I expect the code will use it through
some offset, which will be in the code segment, which is also read-only,
as much as the rest.
Where is the writable pointer in this scenario?
>> The goal of the patch is to reduce the window when stuff is writeable,
>> so that an arbitrary write is likely to hit the time when data is read-only.
>
> Yes, reducing the size of the target in time as well as bytes. This patch
> gives attackers a great roadmap (maybe even gadget) to unprotecting
> a pool.
Gadgets can be removed by inlining the function calls.
Dave Hansen suggested I could do COW and replace the old page with the
new one. I could implement that, if it is preferable, although I think
it would be less efficient, for small writes, but it would not leave the
current page mapped as writable, so there is certainly value in it.
---
igor
On 24/04/18 15:50, Matthew Wilcox wrote:
> On Mon, Apr 23, 2018 at 04:54:56PM +0400, Igor Stoppa wrote:
>> While the vanilla version of pmalloc provides support for permanently
>> transitioning between writable and read-only of a memory pool, this
>> patch seeks to support a separate class of data, which would still
>> benefit from write protection, most of the time, but it still needs to
>> be modifiable. Maybe very seldom, but still cannot be permanently marked
>> as read-only.
>
> This seems like a horrible idea that basically makes this feature useless.
> I would say the right way to do this is to have:
>
> struct modifiable_data {
> struct immutable_data *d;
> ...
> };
>
> Then allocate a new pool, change d and destroy the old pool.
At the end of the summit, we agreed that I would go through the physmap.
But I'm not sure of what is the correct way to access it :-/
Starting from a vmalloc address, say:
int *i = vmalloc(sizeof(int));
I can get its linear counterpart:
int *j = page_to_virt(vmalloc_to_page(i));
and the physical address:
int *k = virt_to_phys(j);
But how do I get to the physmap?
I did not find much about it, apart from papers that talk about specific
hardcoded addresses, but I would expect that if there is any hardcoded
constant, by now, it's hidden behind some macro.
What I have verified, so far, at least on qemu x86_64, is that
protecting "i" will also make "j" unwritable.
--
igor
On 05/03/2018 02:52 PM, Igor Stoppa wrote:
> At the end of the summit, we agreed that I would go through the physmap.
Do you mean the kernel linear map? That's just another name for the
virtual address that you get back from page_to_virt():
int *j = page_to_virt(vmalloc_to_page(i));
On 04/05/18 01:55, Dave Hansen wrote:
> On 05/03/2018 02:52 PM, Igor Stoppa wrote:
>> At the end of the summit, we agreed that I would go through the physmap.
>
> Do you mean the kernel linear map?
Apparently I did mean it. It was confusing, because I couldn't find a
single place stating it explicitly, like you just did.
> That's just another name for the
> virtual address that you get back from page_to_virt():
>
> int *j = page_to_virt(vmalloc_to_page(i));
>
One reason why I was not sure is that also the linear mapping gets
protected, when I protect hte corresponding page in the vmap_area:
if i do:
int *i = vmalloc(sizeof(int));
int *j = page_to_virt(vmalloc_to_page(i));
*i = 1;
set_memory_ro(i, 1);
*j = 2;
I get an error, because also *j has become read only.
I was expecting to have to do the protection of the linear mapping in a
second phase.
It turns out that - at least on x86_64 - it's already in place.
But it invalidates what we agreed, which was based on the assumption
that the linear mapping was writable all the time.
I see two options:
1) continue to go through the linear mapping, unprotecting it for the
time it takes to make the write.
2) use the code I already wrote, which creates an additional, temporary
mapping in R/W mode at a random address.
I'd prefer 2) because it is already designed to make life harder for
someone trying to attack the data in the page: even if one manages to
take over a core and busy loop on it, option 2) will use a random
temporary address, that is harder to figure out.
Option 1) re-uses the linear mapping and therefore the attacker really
only needs to get lucky and, depending on the target, write over some
other data that was in the same page being unprotected, or overwrite the
same data being updated, after the update has taken place.
Is there any objection if I continue with options 2?
--
igor