This patch-set introduces the possibility of protecting memory that has
been allocated dynamically.
The memory is managed in pools: when a memory pool is turned into R/O,
all the memory that is part of it, will become R/O.
A R/O pool can be destroyed, to recover its memory, but it cannot be
turned back into R/W mode.
This is intentional. This feature is meant for data that doesn't need
further modifications after initialization.
However the data might need to be released, for example as part of module
unloading.
To do this, the memory must first be freed, then the pool can be destroyed.
An example is provided, in the form of self-testing.
Changes since the v11 version:
[http://www.openwall.com/lists/kernel-hardening/2018/01/24/4]
- restricted access to sysfs entries created (444 -> 400)
- more explicit reference to documentation
- couple of typos
Igor Stoppa (6):
genalloc: track beginning of allocations
genalloc: selftest
struct page: add field for vm_struct
Protectable Memory
Documentation for Pmalloc
Pmalloc: self-test
Documentation/core-api/pmalloc.txt | 104 ++++++++
include/linux/genalloc-selftest.h | 30 +++
include/linux/genalloc.h | 5 +-
include/linux/mm_types.h | 1 +
include/linux/pmalloc.h | 216 ++++++++++++++++
include/linux/vmalloc.h | 1 +
init/main.c | 2 +
lib/Kconfig | 15 ++
lib/Makefile | 1 +
lib/genalloc-selftest.c | 402 +++++++++++++++++++++++++++++
lib/genalloc.c | 444 +++++++++++++++++++++----------
mm/Kconfig | 7 +
mm/Makefile | 2 +
mm/pmalloc-selftest.c | 65 +++++
mm/pmalloc-selftest.h | 30 +++
mm/pmalloc.c | 516 +++++++++++++++++++++++++++++++++++++
mm/usercopy.c | 25 +-
mm/vmalloc.c | 18 +-
18 files changed, 1744 insertions(+), 140 deletions(-)
create mode 100644 Documentation/core-api/pmalloc.txt
create mode 100644 include/linux/genalloc-selftest.h
create mode 100644 include/linux/pmalloc.h
create mode 100644 lib/genalloc-selftest.c
create mode 100644 mm/pmalloc-selftest.c
create mode 100644 mm/pmalloc-selftest.h
create mode 100644 mm/pmalloc.c
--
2.9.3
When a page is used for virtual memory, it is often necessary to obtian
a handler to the corresponding vm_struct, which refers to the virtually
continuous area generated when invoking vmalloc.
The struct page has a "mapping" field, which can be re-used, to store a
pointer to the parent area. This will avoid more expensive searches.
As example, the function find_vm_area is reimplemented, to take advantage
of the newly introduced field.
Signed-off-by: Igor Stoppa <[email protected]>
---
include/linux/mm_types.h | 1 +
mm/vmalloc.c | 18 +++++++++++++-----
2 files changed, 14 insertions(+), 5 deletions(-)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index cfd0ac4..2abd540 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -56,6 +56,7 @@ struct page {
void *s_mem; /* slab first object */
atomic_t compound_mapcount; /* first tail page */
/* page_deferred_list().next -- second tail page */
+ struct vm_struct *area;
};
/* Second double word */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 6739420..44c5dfc 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1466,13 +1466,16 @@ struct vm_struct *get_vm_area_caller(unsigned long size, unsigned long flags,
*/
struct vm_struct *find_vm_area(const void *addr)
{
- struct vmap_area *va;
+ struct page *page;
- va = find_vmap_area((unsigned long)addr);
- if (va && va->flags & VM_VM_AREA)
- return va->vm;
+ if (unlikely(!is_vmalloc_addr(addr)))
+ return NULL;
- return NULL;
+ page = vmalloc_to_page(addr);
+ if (unlikely(!page))
+ return NULL;
+
+ return page->area;
}
/**
@@ -1536,6 +1539,7 @@ static void __vunmap(const void *addr, int deallocate_pages)
struct page *page = area->pages[i];
BUG_ON(!page);
+ page->area = NULL;
__free_pages(page, 0);
}
@@ -1744,6 +1748,7 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
const void *caller)
{
struct vm_struct *area;
+ unsigned int page_counter;
void *addr;
unsigned long real_size = size;
@@ -1769,6 +1774,9 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
kmemleak_vmalloc(area, size, gfp_mask);
+ for (page_counter = 0; page_counter < area->nr_pages; page_counter++)
+ area->pages[page_counter]->area = area;
+
return addr;
fail:
--
2.9.3
The MMU available in many systems running Linux can often provide R/O
protection to the memory pages it handles.
However, the MMU-based protection works efficiently only when said pages
contain exclusively data that will not need further modifications.
Statically allocated variables can be segregated into a dedicated
section, but this does not sit very well with dynamically allocated
ones.
Dynamic allocation does not provide, currently, any means for grouping
variables in memory pages that would contain exclusively data suitable
for conversion to read only access mode.
The allocator here provided (pmalloc - protectable memory allocator)
introduces the concept of pools of protectable memory.
A module can request a pool and then refer any allocation request to the
pool handler it has received.
Once all the chunks of memory associated to a specific pool are
initialized, the pool can be protected.
After this point, the pool can only be destroyed (it is up to the module
to avoid any further references to the memory from the pool, after
the destruction is invoked).
The latter case is mainly meant for releasing memory, when a module is
unloaded.
A module can have as many pools as needed, for example to support the
protection of data that is initialized in sufficiently distinct phases.
Signed-off-by: Igor Stoppa <[email protected]>
---
include/linux/genalloc.h | 3 +
include/linux/pmalloc.h | 216 ++++++++++++++++++++
include/linux/vmalloc.h | 1 +
lib/genalloc.c | 27 +++
mm/Makefile | 1 +
mm/pmalloc.c | 513 +++++++++++++++++++++++++++++++++++++++++++++++
mm/usercopy.c | 25 ++-
7 files changed, 782 insertions(+), 4 deletions(-)
create mode 100644 include/linux/pmalloc.h
create mode 100644 mm/pmalloc.c
diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index 0377681..a486a26 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -121,6 +121,9 @@ extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t,
extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size,
dma_addr_t *dma);
extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
+
+extern void gen_pool_flush_chunk(struct gen_pool *pool,
+ struct gen_pool_chunk *chunk);
extern void gen_pool_for_each_chunk(struct gen_pool *,
void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *);
extern size_t gen_pool_avail(struct gen_pool *);
diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h
new file mode 100644
index 0000000..ad7d557
--- /dev/null
+++ b/include/linux/pmalloc.h
@@ -0,0 +1,216 @@
+/*
+ * pmalloc.h: Header for Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+#ifndef _PMALLOC_H
+#define _PMALLOC_H
+
+
+#include <linux/genalloc.h>
+#include <linux/string.h>
+#include <linux/gfp.h>
+
+#define PMALLOC_DEFAULT_ALLOC_ORDER (-1)
+
+/*
+ * Library for dynamic allocation of pools of memory that can be,
+ * after initialization, marked as read-only.
+ *
+ * This is intended to complement __read_only_after_init, for those cases
+ * where either it is not possible to know the initialization value before
+ * init is completed, or the amount of data is variable and can be
+ * determined only at run-time.
+ *
+ * ***WARNING***
+ * The user of the API is expected to synchronize:
+ * 1) allocation,
+ * 2) writes to the allocated memory,
+ * 3) write protection of the pool,
+ * 4) freeing of the allocated memory, and
+ * 5) destruction of the pool.
+ *
+ * For a non-threaded scenario, this type of locking is not even required.
+ *
+ * Even if the library were to provide support for locking, point 2)
+ * would still depend on the user taking the lock.
+ */
+
+
+/**
+ * pmalloc_create_pool - create a new protectable memory pool -
+ * @name: the name of the pool, must be unique
+ * @min_alloc_order: log2 of the minimum allocation size obtainable
+ * from the pool
+ *
+ * Creates a new (empty) memory pool for allocation of protectable
+ * memory. Memory will be allocated upon request (through pmalloc).
+ *
+ * Returns a pointer to the new pool upon success, otherwise a NULL.
+ */
+struct gen_pool *pmalloc_create_pool(const char *name,
+ int min_alloc_order);
+
+
+int is_pmalloc_object(const void *ptr, const unsigned long n);
+
+/**
+ * pmalloc_prealloc - tries to allocate a memory chunk of the requested size
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ *
+ * Prepares a chunk of the requested size.
+ * This is intended to both minimize latency in later memory requests and
+ * avoid sleping during allocation.
+ * Memory allocated with prealloc is stored in one single chunk, as
+ * opposite to what is allocated on-demand when pmalloc runs out of free
+ * space already existing in the pool and has to invoke vmalloc.
+ *
+ * Returns true if the vmalloc call was successful, false otherwise.
+ */
+bool pmalloc_prealloc(struct gen_pool *pool, size_t size);
+
+/**
+ * pmalloc - allocate protectable memory from a pool
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Allocates memory from an unprotected pool. If the pool doesn't have
+ * enough memory, and the request did not include GFP_ATOMIC, an attempt
+ * is made to add a new chunk of memory to the pool
+ * (a multiple of PAGE_SIZE), in order to fit the new request.
+ * Otherwise, NULL is returned.
+ *
+ * Returns the pointer to the memory requested upon success,
+ * NULL otherwise (either no memory available or pool already read-only).
+ */
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp);
+
+
+/**
+ * pzalloc - zero-initialized version of pmalloc
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Executes pmalloc, initializing the memory requested to 0,
+ * before returning the pointer to it.
+ *
+ * Returns the pointer to the zeroed memory requested, upon success,
+ * NULL otherwise (either no memory available or pool already read-only).
+ */
+static inline void *pzalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+ return pmalloc(pool, size, gfp | __GFP_ZERO);
+}
+
+/**
+ * pmalloc_array - allocates an array according to the parameters
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Executes pmalloc, if it has a chance to succeed.
+ *
+ * Returns either NULL or the pmalloc result.
+ */
+static inline void *pmalloc_array(struct gen_pool *pool, size_t n,
+ size_t size, gfp_t flags)
+{
+ if (unlikely(!(pool && n && size)))
+ return NULL;
+ return pmalloc(pool, n * size, flags);
+}
+
+/**
+ * pcalloc - allocates a 0-initialized array according to the parameters
+ * @pool: handler to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Executes pmalloc, if it has a chance to succeed.
+ *
+ * Returns either NULL or the pmalloc result.
+ */
+static inline void *pcalloc(struct gen_pool *pool, size_t n,
+ size_t size, gfp_t flags)
+{
+ return pmalloc_array(pool, n, size, flags | __GFP_ZERO);
+}
+
+/**
+ * pstrdup - duplicate a string, using pmalloc as allocator
+ * @pool: handler to the pool to be used for memory allocation
+ * @s: string to duplicate
+ * @gfp: flags for page allocation
+ *
+ * Generates a copy of the given string, allocating sufficient memory
+ * from the given pmalloc pool.
+ *
+ * Returns a pointer to the replica, NULL in case of recoverable error.
+ */
+static inline char *pstrdup(struct gen_pool *pool, const char *s, gfp_t gfp)
+{
+ size_t len;
+ char *buf;
+
+ if (unlikely(pool == NULL || s == NULL))
+ return NULL;
+
+ len = strlen(s) + 1;
+ buf = pmalloc(pool, len, gfp);
+ if (likely(buf))
+ strncpy(buf, s, len);
+ return buf;
+}
+
+/**
+ * pmalloc_protect_pool - turn a read/write pool read-only
+ * @pool: the pool to protect
+ *
+ * Write-protects all the memory chunks assigned to the pool.
+ * This prevents any further allocation.
+ *
+ * Returns 0 upon success, -EINVAL in abnormal cases.
+ */
+int pmalloc_protect_pool(struct gen_pool *pool);
+
+/**
+ * pfree - mark as unused memory that was previously in use
+ * @pool: handler to the pool to be used for memory allocation
+ * @addr: the beginning of the memory area to be freed
+ *
+ * The behavior of pfree is different, depending on the state of the
+ * protection.
+ * If the pool is not yet protected, the memory is marked as unused and
+ * will be availabel for further allocations.
+ * If the pool is already protected, the memory is marked as unused, but
+ * it will still be impossible to perform further allocation, because of
+ * the existing protection.
+ * The freed memory, in this case, will be truly released only when the
+ * pool is destroyed.
+ */
+static inline void pfree(struct gen_pool *pool, const void *addr)
+{
+ gen_pool_free(pool, (unsigned long)addr, 0);
+}
+
+/**
+ * pmalloc_destroy_pool - destroys a pool and all the associated memory
+ * @pool: the pool to destroy
+ *
+ * All the memory that was allocated through pmalloc in the pool will be freed.
+ *
+ * Returns 0 upon success, -EINVAL in abnormal cases.
+ */
+int pmalloc_destroy_pool(struct gen_pool *pool);
+
+#endif
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 1e5d8c3..e8171b6 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -20,6 +20,7 @@ struct notifier_block; /* in notifier.h */
#define VM_UNINITIALIZED 0x00000020 /* vm_struct is not fully initialized */
#define VM_NO_GUARD 0x00000040 /* don't add guard page */
#define VM_KASAN 0x00000080 /* has allocated kasan shadow memory */
+#define VM_PMALLOC 0x00000100 /* pmalloc area - see pmalloc.txt */
/* bits [20..32] reserved for arch specific ioremap internals */
/*
diff --git a/lib/genalloc.c b/lib/genalloc.c
index dde7830..62f69b3 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -519,6 +519,33 @@ void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
}
EXPORT_SYMBOL(gen_pool_free);
+
+/**
+ * gen_pool_flush_chunk - drops all the allocations from a specific chunk
+ * @pool: the generic memory pool
+ * @chunk: The chunk to wipe clear.
+ *
+ * This is meant to be called only while destroying a pool. It's up to the
+ * caller to avoid races, but really, at this point the pool should have
+ * already been retired and have become unavailable for any other sort of
+ * operation.
+ */
+void gen_pool_flush_chunk(struct gen_pool *pool,
+ struct gen_pool_chunk *chunk)
+{
+ size_t size;
+
+ if (unlikely(!(pool && chunk)))
+ return;
+
+ size = chunk->end_addr + 1 - chunk->start_addr;
+ memset(chunk->entries, 0,
+ DIV_ROUND_UP(size >> pool->min_alloc_order * BITS_PER_ENTRY,
+ BITS_PER_BYTE));
+ atomic_set(&chunk->avail, size);
+}
+
+
/**
* gen_pool_for_each_chunk - call func for every chunk of generic memory pool
* @pool: the generic memory pool
diff --git a/mm/Makefile b/mm/Makefile
index e669f02..a6a47e1 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -65,6 +65,7 @@ obj-$(CONFIG_SPARSEMEM) += sparse.o
obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
obj-$(CONFIG_SLOB) += slob.o
obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
+obj-$(CONFIG_ARCH_HAS_SET_MEMORY) += pmalloc.o
obj-$(CONFIG_KSM) += ksm.o
obj-$(CONFIG_PAGE_POISONING) += page_poison.o
obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/pmalloc.c b/mm/pmalloc.c
new file mode 100644
index 0000000..a64ac49
--- /dev/null
+++ b/mm/pmalloc.c
@@ -0,0 +1,513 @@
+/*
+ * pmalloc.c: Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+#include <linux/printk.h>
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <linux/vmalloc.h>
+#include <linux/genalloc.h>
+#include <linux/kernel.h>
+#include <linux/log2.h>
+#include <linux/slab.h>
+#include <linux/device.h>
+#include <linux/atomic.h>
+#include <linux/rculist.h>
+#include <linux/set_memory.h>
+#include <asm/cacheflush.h>
+#include <asm/page.h>
+
+/**
+ * pmalloc_data contains the data specific to a pmalloc pool,
+ * in a format compatible with the design of gen_alloc.
+ * Some of the fields are used for exposing the corresponding parameter
+ * to userspace, through sysfs.
+ */
+struct pmalloc_data {
+ struct gen_pool *pool; /* Link back to the associated pool. */
+ bool protected; /* Status of the pool: RO or RW. */
+ struct kobj_attribute attr_protected; /* Sysfs attribute. */
+ struct kobj_attribute attr_avail; /* Sysfs attribute. */
+ struct kobj_attribute attr_size; /* Sysfs attribute. */
+ struct kobj_attribute attr_chunks; /* Sysfs attribute. */
+ struct kobject *pool_kobject;
+ struct list_head node; /* list of pools */
+};
+
+static LIST_HEAD(pmalloc_final_list);
+static LIST_HEAD(pmalloc_tmp_list);
+static struct list_head *pmalloc_list = &pmalloc_tmp_list;
+static DEFINE_MUTEX(pmalloc_mutex);
+static struct kobject *pmalloc_kobject;
+
+static ssize_t pmalloc_pool_show_protected(struct kobject *dev,
+ struct kobj_attribute *attr,
+ char *buf)
+{
+ struct pmalloc_data *data;
+
+ data = container_of(attr, struct pmalloc_data, attr_protected);
+ if (data->protected)
+ return sprintf(buf, "protected\n");
+ else
+ return sprintf(buf, "unprotected\n");
+}
+
+static ssize_t pmalloc_pool_show_avail(struct kobject *dev,
+ struct kobj_attribute *attr,
+ char *buf)
+{
+ struct pmalloc_data *data;
+
+ data = container_of(attr, struct pmalloc_data, attr_avail);
+ return sprintf(buf, "%lu\n", gen_pool_avail(data->pool));
+}
+
+static ssize_t pmalloc_pool_show_size(struct kobject *dev,
+ struct kobj_attribute *attr,
+ char *buf)
+{
+ struct pmalloc_data *data;
+
+ data = container_of(attr, struct pmalloc_data, attr_size);
+ return sprintf(buf, "%lu\n", gen_pool_size(data->pool));
+}
+
+static void pool_chunk_number(struct gen_pool *pool,
+ struct gen_pool_chunk *chunk, void *data)
+{
+ unsigned long *counter = data;
+
+ (*counter)++;
+}
+
+static ssize_t pmalloc_pool_show_chunks(struct kobject *dev,
+ struct kobj_attribute *attr,
+ char *buf)
+{
+ struct pmalloc_data *data;
+ unsigned long chunks_num = 0;
+
+ data = container_of(attr, struct pmalloc_data, attr_chunks);
+ gen_pool_for_each_chunk(data->pool, pool_chunk_number, &chunks_num);
+ return sprintf(buf, "%lu\n", chunks_num);
+}
+
+/**
+ * Exposes the pool and its attributes through sysfs.
+ */
+static struct kobject *pmalloc_connect(struct pmalloc_data *data)
+{
+ const struct attribute *attrs[] = {
+ &data->attr_protected.attr,
+ &data->attr_avail.attr,
+ &data->attr_size.attr,
+ &data->attr_chunks.attr,
+ NULL
+ };
+ struct kobject *kobj;
+
+ kobj = kobject_create_and_add(data->pool->name, pmalloc_kobject);
+ if (unlikely(!kobj))
+ return NULL;
+
+ if (unlikely(sysfs_create_files(kobj, attrs) < 0)) {
+ kobject_put(kobj);
+ kobj = NULL;
+ }
+ return kobj;
+}
+
+/**
+ * Removes the pool and its attributes from sysfs.
+ */
+static void pmalloc_disconnect(struct pmalloc_data *data,
+ struct kobject *kobj)
+{
+ const struct attribute *attrs[] = {
+ &data->attr_protected.attr,
+ &data->attr_avail.attr,
+ &data->attr_size.attr,
+ &data->attr_chunks.attr,
+ NULL
+ };
+
+ sysfs_remove_files(kobj, attrs);
+ kobject_put(kobj);
+}
+
+/**
+ * Declares an attribute of the pool.
+ */
+
+#define pmalloc_attr_init(data, attr_name) \
+do { \
+ sysfs_attr_init(&data->attr_##attr_name.attr); \
+ data->attr_##attr_name.attr.name = #attr_name; \
+ data->attr_##attr_name.attr.mode = VERIFY_OCTAL_PERMISSIONS(0444); \
+ data->attr_##attr_name.show = pmalloc_pool_show_##attr_name; \
+} while (0)
+
+struct gen_pool *pmalloc_create_pool(const char *name, int min_alloc_order)
+{
+ struct gen_pool *pool;
+ const char *pool_name;
+ struct pmalloc_data *data;
+
+ if (!name) {
+ WARN_ON(1);
+ return NULL;
+ }
+
+ if (min_alloc_order < 0)
+ min_alloc_order = ilog2(sizeof(unsigned long));
+
+ pool = gen_pool_create(min_alloc_order, NUMA_NO_NODE);
+ if (unlikely(!pool))
+ return NULL;
+
+ mutex_lock(&pmalloc_mutex);
+ list_for_each_entry(data, pmalloc_list, node)
+ if (!strcmp(name, data->pool->name))
+ goto same_name_err;
+
+ pool_name = kstrdup(name, GFP_KERNEL);
+ if (unlikely(!pool_name))
+ goto name_alloc_err;
+
+ data = kzalloc(sizeof(struct pmalloc_data), GFP_KERNEL);
+ if (unlikely(!data))
+ goto data_alloc_err;
+
+ data->protected = false;
+ data->pool = pool;
+ pmalloc_attr_init(data, protected);
+ pmalloc_attr_init(data, avail);
+ pmalloc_attr_init(data, size);
+ pmalloc_attr_init(data, chunks);
+ pool->data = data;
+ pool->name = pool_name;
+
+ list_add(&data->node, pmalloc_list);
+ if (pmalloc_list == &pmalloc_final_list)
+ data->pool_kobject = pmalloc_connect(data);
+ mutex_unlock(&pmalloc_mutex);
+ return pool;
+
+data_alloc_err:
+ kfree(pool_name);
+name_alloc_err:
+same_name_err:
+ mutex_unlock(&pmalloc_mutex);
+ gen_pool_destroy(pool);
+ return NULL;
+}
+
+static inline int check_alloc_params(struct gen_pool *pool, size_t req_size)
+{
+ struct pmalloc_data *data;
+ unsigned int order;
+
+ if (unlikely(!req_size || !pool))
+ return -1;
+
+ order = (unsigned int)pool->min_alloc_order;
+ data = pool->data;
+
+ if (data == NULL)
+ return -1;
+
+ if (unlikely(data->protected)) {
+ WARN_ON(1);
+ return -1;
+ }
+ return 0;
+}
+
+
+static inline bool chunk_tagging(void *chunk, bool tag)
+{
+ struct vm_struct *area;
+ struct page *page;
+
+ if (!is_vmalloc_addr(chunk))
+ return false;
+
+ page = vmalloc_to_page(chunk);
+ if (unlikely(!page))
+ return false;
+
+ area = page->area;
+ if (tag)
+ area->flags |= VM_PMALLOC;
+ else
+ area->flags &= ~VM_PMALLOC;
+ return true;
+}
+
+
+static inline bool tag_chunk(void *chunk)
+{
+ return chunk_tagging(chunk, true);
+}
+
+
+static inline bool untag_chunk(void *chunk)
+{
+ return chunk_tagging(chunk, false);
+}
+
+enum {
+ INVALID_PMALLOC_OBJECT = -1,
+ NOT_PMALLOC_OBJECT = 0,
+ VALID_PMALLOC_OBJECT = 1,
+};
+
+int is_pmalloc_object(const void *ptr, const unsigned long n)
+{
+ struct vm_struct *area;
+ struct page *page;
+ unsigned long area_start;
+ unsigned long area_end;
+ unsigned long object_start;
+ unsigned long object_end;
+
+
+ /* is_pmalloc_object gets called pretty late, so chances are high
+ * that the object is indeed of vmalloc type
+ */
+ if (unlikely(!is_vmalloc_addr(ptr)))
+ return NOT_PMALLOC_OBJECT;
+
+ page = vmalloc_to_page(ptr);
+ if (unlikely(!page))
+ return NOT_PMALLOC_OBJECT;
+
+ area = page->area;
+
+ if (likely(!(area->flags & VM_PMALLOC)))
+ return NOT_PMALLOC_OBJECT;
+
+ area_start = (unsigned long)area->addr;
+ area_end = area_start + area->nr_pages * PAGE_SIZE - 1;
+ object_start = (unsigned long)ptr;
+ object_end = object_start + n - 1;
+
+ if (likely((area_start <= object_start) &&
+ (object_end <= area_end)))
+ return VALID_PMALLOC_OBJECT;
+ else
+ return INVALID_PMALLOC_OBJECT;
+}
+
+
+bool pmalloc_prealloc(struct gen_pool *pool, size_t size)
+{
+ void *chunk;
+ size_t chunk_size;
+ bool add_error;
+ unsigned int order;
+
+ if (check_alloc_params(pool, size))
+ return false;
+
+ order = (unsigned int)pool->min_alloc_order;
+
+ /* Expand pool */
+ chunk_size = roundup(size, PAGE_SIZE);
+ chunk = vmalloc(chunk_size);
+ if (unlikely(chunk == NULL))
+ return false;
+
+ /* Locking is already done inside gen_pool_add */
+ add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
+ NUMA_NO_NODE);
+ if (unlikely(add_error != 0))
+ goto abort;
+
+ return true;
+abort:
+ vfree(chunk);
+ return false;
+
+}
+
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+ void *chunk;
+ size_t chunk_size;
+ bool add_error;
+ unsigned long retval;
+ unsigned int order;
+
+ if (check_alloc_params(pool, size))
+ return NULL;
+
+ order = (unsigned int)pool->min_alloc_order;
+
+retry_alloc_from_pool:
+ retval = gen_pool_alloc(pool, size);
+ if (retval)
+ goto return_allocation;
+
+ if (unlikely((gfp & __GFP_ATOMIC))) {
+ if (unlikely((gfp & __GFP_NOFAIL)))
+ goto retry_alloc_from_pool;
+ else
+ return NULL;
+ }
+
+ /* Expand pool */
+ chunk_size = roundup(size, PAGE_SIZE);
+ chunk = vmalloc(chunk_size);
+ if (unlikely(!chunk)) {
+ if (unlikely((gfp & __GFP_NOFAIL)))
+ goto retry_alloc_from_pool;
+ else
+ return NULL;
+ }
+ if (unlikely(!tag_chunk(chunk)))
+ goto free;
+
+ /* Locking is already done inside gen_pool_add */
+ add_error = gen_pool_add(pool, (unsigned long)chunk, chunk_size,
+ NUMA_NO_NODE);
+ if (unlikely(add_error))
+ goto abort;
+
+ retval = gen_pool_alloc(pool, size);
+ if (retval) {
+return_allocation:
+ *(size_t *)retval = size;
+ if (gfp & __GFP_ZERO)
+ memset((void *)retval, 0, size);
+ return (void *)retval;
+ }
+ /* Here there is no test for __GFP_NO_FAIL because, in case of
+ * concurrent allocation, one thread might add a chunk to the
+ * pool and this memory could be allocated by another thread,
+ * before the first thread gets a chance to use it.
+ * As long as vmalloc succeeds, it's ok to retry.
+ */
+ goto retry_alloc_from_pool;
+abort:
+ untag_chunk(chunk);
+free:
+ vfree(chunk);
+ return NULL;
+}
+
+static void pmalloc_chunk_set_protection(struct gen_pool *pool,
+
+ struct gen_pool_chunk *chunk,
+ void *data)
+{
+ const bool *flag = data;
+ size_t chunk_size = chunk->end_addr + 1 - chunk->start_addr;
+ unsigned long pages = chunk_size / PAGE_SIZE;
+
+ BUG_ON(chunk_size & (PAGE_SIZE - 1));
+
+ if (*flag)
+ set_memory_ro(chunk->start_addr, pages);
+ else
+ set_memory_rw(chunk->start_addr, pages);
+}
+
+static int pmalloc_pool_set_protection(struct gen_pool *pool, bool protection)
+{
+ struct pmalloc_data *data;
+ struct gen_pool_chunk *chunk;
+
+ if (unlikely(!pool))
+ return -EINVAL;
+
+ data = pool->data;
+
+ if (unlikely(!data))
+ return -EINVAL;
+
+ if (unlikely(data->protected == protection)) {
+ WARN_ON(1);
+ return 0;
+ }
+
+ data->protected = protection;
+ list_for_each_entry(chunk, &(pool)->chunks, next_chunk)
+ pmalloc_chunk_set_protection(pool, chunk, &protection);
+ return 0;
+}
+
+int pmalloc_protect_pool(struct gen_pool *pool)
+{
+ return pmalloc_pool_set_protection(pool, true);
+}
+
+
+static void pmalloc_chunk_free(struct gen_pool *pool,
+ struct gen_pool_chunk *chunk, void *data)
+{
+ untag_chunk(chunk);
+ gen_pool_flush_chunk(pool, chunk);
+ vfree_atomic((void *)chunk->start_addr);
+}
+
+
+int pmalloc_destroy_pool(struct gen_pool *pool)
+{
+ struct pmalloc_data *data;
+
+ if (unlikely(pool == NULL))
+ return -EINVAL;
+
+ data = pool->data;
+
+ if (unlikely(data == NULL))
+ return -EINVAL;
+
+ mutex_lock(&pmalloc_mutex);
+ list_del(&data->node);
+ mutex_unlock(&pmalloc_mutex);
+
+ if (likely(data->pool_kobject))
+ pmalloc_disconnect(data, data->pool_kobject);
+
+ pmalloc_pool_set_protection(pool, false);
+ gen_pool_for_each_chunk(pool, pmalloc_chunk_free, NULL);
+ gen_pool_destroy(pool);
+ kfree(data);
+ return 0;
+}
+
+/**
+ * When the sysfs is ready to receive registrations, connect all the
+ * pools previously created. Also enable further pools to be connected
+ * right away.
+ */
+static int __init pmalloc_late_init(void)
+{
+ struct pmalloc_data *data, *n;
+
+ pmalloc_kobject = kobject_create_and_add("pmalloc", kernel_kobj);
+
+ mutex_lock(&pmalloc_mutex);
+ pmalloc_list = &pmalloc_final_list;
+
+ if (likely(pmalloc_kobject != NULL)) {
+ list_for_each_entry_safe(data, n, &pmalloc_tmp_list, node) {
+ list_move(&data->node, &pmalloc_final_list);
+ pmalloc_connect(data);
+ }
+ }
+ mutex_unlock(&pmalloc_mutex);
+ return 0;
+}
+late_initcall(pmalloc_late_init);
diff --git a/mm/usercopy.c b/mm/usercopy.c
index a9852b2..c3b1029 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -15,6 +15,7 @@
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
#include <linux/mm.h>
+#include <linux/pmalloc.h>
#include <linux/slab.h>
#include <linux/sched.h>
#include <linux/sched/task.h>
@@ -222,6 +223,7 @@ static inline const char *check_heap_object(const void *ptr, unsigned long n,
void __check_object_size(const void *ptr, unsigned long n, bool to_user)
{
const char *err;
+ int retv;
/* Skip all tests if size is zero. */
if (!n)
@@ -229,12 +231,12 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
/* Check for invalid addresses. */
err = check_bogus_address(ptr, n);
- if (err)
+ if (unlikely(err))
goto report;
/* Check for bad heap object. */
err = check_heap_object(ptr, n, to_user);
- if (err)
+ if (unlikely(err))
goto report;
/* Check for bad stack object. */
@@ -257,8 +259,23 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)
/* Check for object in kernel to avoid text exposure. */
err = check_kernel_text_object(ptr, n);
- if (!err)
- return;
+ if (unlikely(err))
+ goto report;
+
+ /* Check if object is from a pmalloc chunk.
+ */
+ retv = is_pmalloc_object(ptr, n);
+ if (unlikely(retv)) {
+ if (unlikely(!to_user)) {
+ err = "<trying to write to pmalloc object>";
+ goto report;
+ }
+ if (retv < 0) {
+ err = "<invalid pmalloc object>";
+ goto report;
+ }
+ }
+ return;
report:
report_usercopy(ptr, n, to_user, err);
--
2.9.3
Introduce a set of macros for writing concise test cases for genalloc.
The test cases are meant to provide regression testing, when working on
new functionality for genalloc.
Primarily they are meant to confirm that the various allocation strategy
will continue to work as expected.
The execution of the self testing is controlled through a Kconfig option.
Signed-off-by: Igor Stoppa <[email protected]>
---
include/linux/genalloc-selftest.h | 30 +++
init/main.c | 2 +
lib/Kconfig | 15 ++
lib/Makefile | 1 +
lib/genalloc-selftest.c | 402 ++++++++++++++++++++++++++++++++++++++
5 files changed, 450 insertions(+)
create mode 100644 include/linux/genalloc-selftest.h
create mode 100644 lib/genalloc-selftest.c
diff --git a/include/linux/genalloc-selftest.h b/include/linux/genalloc-selftest.h
new file mode 100644
index 0000000..7af1901
--- /dev/null
+++ b/include/linux/genalloc-selftest.h
@@ -0,0 +1,30 @@
+/*
+ * genalloc-selftest.h
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+
+#ifndef __GENALLOC_SELFTEST_H__
+#define __GENALLOC_SELFTEST_H__
+
+
+#ifdef CONFIG_GENERIC_ALLOCATOR_SELFTEST
+
+#include <linux/genalloc.h>
+
+void genalloc_selftest(void);
+
+#else
+
+static inline void genalloc_selftest(void){};
+
+#endif
+
+#endif
diff --git a/init/main.c b/init/main.c
index a8100b9..fb844aa 100644
--- a/init/main.c
+++ b/init/main.c
@@ -89,6 +89,7 @@
#include <linux/io.h>
#include <linux/cache.h>
#include <linux/rodata_test.h>
+#include <linux/genalloc-selftest.h>
#include <asm/io.h>
#include <asm/bugs.h>
@@ -660,6 +661,7 @@ asmlinkage __visible void __init start_kernel(void)
*/
mem_encrypt_init();
+ genalloc_selftest();
#ifdef CONFIG_BLK_DEV_INITRD
if (initrd_start && !initrd_below_start_ok &&
page_to_pfn(virt_to_page((void *)initrd_start)) < min_low_pfn) {
diff --git a/lib/Kconfig b/lib/Kconfig
index c5e84fb..430026d0 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -287,6 +287,21 @@ config DECOMPRESS_LZ4
config GENERIC_ALLOCATOR
bool
+config GENERIC_ALLOCATOR_SELFTEST
+ bool "genalloc tester"
+ default n
+ select GENERIC_ALLOCATOR
+ help
+ Enable automated testing of the generic allocator.
+ The testing is primarily for the tracking of allocated space.
+
+config GENERIC_ALLOCATOR_SELFTEST_VERBOSE
+ bool "make the genalloc tester more verbose"
+ default n
+ select GENERIC_ALLOCATOR_SELFTEST
+ help
+ More information will be displayed during the self-testing.
+
#
# reed solomon support is select'ed if needed
#
diff --git a/lib/Makefile b/lib/Makefile
index d11c48e..ba06e83 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -108,6 +108,7 @@ obj-$(CONFIG_LIBCRC32C) += libcrc32c.o
obj-$(CONFIG_CRC8) += crc8.o
obj-$(CONFIG_XXHASH) += xxhash.o
obj-$(CONFIG_GENERIC_ALLOCATOR) += genalloc.o
+obj-$(CONFIG_GENERIC_ALLOCATOR_SELFTEST) += genalloc-selftest.o
obj-$(CONFIG_842_COMPRESS) += 842/
obj-$(CONFIG_842_DECOMPRESS) += 842/
diff --git a/lib/genalloc-selftest.c b/lib/genalloc-selftest.c
new file mode 100644
index 0000000..007a0cf
--- /dev/null
+++ b/lib/genalloc-selftest.c
@@ -0,0 +1,402 @@
+/*
+ * genalloc-selftest.c
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+#include <linux/module.h>
+#include <linux/printk.h>
+#include <linux/init.h>
+#include <linux/vmalloc.h>
+#include <asm/set_memory.h>
+#include <linux/string.h>
+#include <linux/debugfs.h>
+#include <linux/atomic.h>
+#include <linux/genalloc.h>
+
+
+
+/* Keep the bitmap small, while including case of cross-ulong mapping.
+ * For simplicity, the test cases use only 1 chunk of memory.
+ */
+#define BITMAP_SIZE_C 16
+#define ALLOC_ORDER 0
+
+#define ULONG_SIZE (sizeof(unsigned long))
+#define BITMAP_SIZE_UL (BITMAP_SIZE_C / ULONG_SIZE)
+#define MIN_ALLOC_SIZE (1 << ALLOC_ORDER)
+#define ENTRIES (BITMAP_SIZE_C * 8)
+#define CHUNK_SIZE (MIN_ALLOC_SIZE * ENTRIES)
+
+#ifndef CONFIG_GENERIC_ALLOCATOR_SELFTEST_VERBOSE
+
+static inline void print_first_chunk_bitmap(struct gen_pool *pool) {}
+
+#else
+
+static void print_first_chunk_bitmap(struct gen_pool *pool)
+{
+ struct gen_pool_chunk *chunk;
+ char bitmap[BITMAP_SIZE_C * 2 + 1];
+ unsigned long i;
+ char *bm = bitmap;
+ char *entry;
+
+ if (unlikely(pool == NULL || pool->chunks.next == NULL))
+ return;
+
+ chunk = container_of(pool->chunks.next, struct gen_pool_chunk,
+ next_chunk);
+ entry = (void *)chunk->entries;
+ for (i = 1; i <= BITMAP_SIZE_C; i++)
+ bm += snprintf(bm, 3, "%02hhx", entry[BITMAP_SIZE_C - i]);
+ *bm = '\0';
+ pr_notice("chunk: %p bitmap: 0x%s\n", chunk, bitmap);
+
+}
+
+#endif
+
+enum test_commands {
+ CMD_ALLOCATOR,
+ CMD_ALLOCATE,
+ CMD_FLUSH,
+ CMD_FREE,
+ CMD_NUMBER,
+ CMD_END = CMD_NUMBER,
+};
+
+struct null_struct {
+ void *null;
+};
+
+struct test_allocator {
+ genpool_algo_t algo;
+ union {
+ struct genpool_data_align align;
+ struct genpool_data_fixed offset;
+ struct null_struct null;
+ } data;
+};
+
+struct test_action {
+ unsigned int location;
+ char pattern[BITMAP_SIZE_C];
+ unsigned int size;
+};
+
+
+struct test_command {
+ enum test_commands command;
+ union {
+ struct test_allocator allocator;
+ struct test_action action;
+ };
+};
+
+
+/* To pass an array literal as parameter to a macro, it must go through
+ * this one, first.
+ */
+#define ARR(...) __VA_ARGS__
+
+#define SET_DATA(parameter, value) \
+ .parameter = { \
+ .parameter = value, \
+ } \
+
+#define SET_ALLOCATOR(alloc, parameter, value) \
+{ \
+ .command = CMD_ALLOCATOR, \
+ .allocator = { \
+ .algo = (alloc), \
+ .data = { \
+ SET_DATA(parameter, value), \
+ }, \
+ } \
+}
+
+#define ACTION_MEM(act, mem_size, mem_loc, match) \
+{ \
+ .command = act, \
+ .action = { \
+ .size = (mem_size), \
+ .location = (mem_loc), \
+ .pattern = match, \
+ }, \
+}
+
+#define ALLOCATE_MEM(mem_size, mem_loc, match) \
+ ACTION_MEM(CMD_ALLOCATE, mem_size, mem_loc, ARR(match))
+
+#define FREE_MEM(mem_size, mem_loc, match) \
+ ACTION_MEM(CMD_FREE, mem_size, mem_loc, ARR(match))
+
+#define FLUSH_MEM() \
+{ \
+ .command = CMD_FLUSH, \
+}
+
+#define END() \
+{ \
+ .command = CMD_END, \
+}
+
+static inline int compare_bitmaps(const struct gen_pool *pool,
+ const char *reference)
+{
+ struct gen_pool_chunk *chunk;
+ char *bitmap;
+ unsigned int i;
+
+ chunk = container_of(pool->chunks.next, struct gen_pool_chunk,
+ next_chunk);
+ bitmap = (char *)chunk->entries;
+
+ for (i = 0; i < BITMAP_SIZE_C; i++)
+ if (bitmap[i] != reference[i])
+ return -1;
+ return 0;
+}
+
+static void callback_set_allocator(struct gen_pool *pool,
+ const struct test_command *cmd,
+ unsigned long *locations)
+{
+ gen_pool_set_algo(pool, cmd->allocator.algo,
+ (void *)&cmd->allocator.data);
+}
+
+static void callback_allocate(struct gen_pool *pool,
+ const struct test_command *cmd,
+ unsigned long *locations)
+{
+ const struct test_action *action = &cmd->action;
+
+ locations[action->location] = gen_pool_alloc(pool, action->size);
+ BUG_ON(!locations[action->location]);
+ print_first_chunk_bitmap(pool);
+ BUG_ON(compare_bitmaps(pool, action->pattern));
+}
+
+static void callback_flush(struct gen_pool *pool,
+ const struct test_command *cmd,
+ unsigned long *locations)
+{
+ unsigned int i;
+
+ for (i = 0; i < ENTRIES; i++)
+ if (locations[i]) {
+ gen_pool_free(pool, locations[i], 0);
+ locations[i] = 0;
+ }
+}
+
+static void callback_free(struct gen_pool *pool,
+ const struct test_command *cmd,
+ unsigned long *locations)
+{
+ const struct test_action *action = &cmd->action;
+
+ gen_pool_free(pool, locations[action->location], 0);
+ locations[action->location] = 0;
+ print_first_chunk_bitmap(pool);
+ BUG_ON(compare_bitmaps(pool, action->pattern));
+}
+
+static void (* const callbacks[CMD_NUMBER])(struct gen_pool *,
+ const struct test_command *,
+ unsigned long *) = {
+ [CMD_ALLOCATOR] = callback_set_allocator,
+ [CMD_ALLOCATE] = callback_allocate,
+ [CMD_FREE] = callback_free,
+ [CMD_FLUSH] = callback_flush,
+};
+
+const struct test_command test_first_fit[] = {
+ SET_ALLOCATOR(gen_pool_first_fit, null, NULL),
+ ALLOCATE_MEM(3, 0, ARR({0x2b})),
+ ALLOCATE_MEM(2, 1, ARR({0xeb, 0x02})),
+ ALLOCATE_MEM(5, 2, ARR({0xeb, 0xae, 0x0a})),
+ FREE_MEM(2, 1, ARR({0x2b, 0xac, 0x0a})),
+ ALLOCATE_MEM(1, 1, ARR({0xeb, 0xac, 0x0a})),
+ FREE_MEM(0, 2, ARR({0xeb})),
+ FREE_MEM(0, 0, ARR({0xc0})),
+ FREE_MEM(0, 1, ARR({0x00})),
+ END(),
+};
+
+/* To make the test work for both 32bit and 64bit ulong sizes,
+ * allocate (8 / 2 * 4 - 1) = 15 bytes bytes, then 16, then 2.
+ * The first allocation prepares for the crossing of the 32bit ulong
+ * threshold. The following crosses the 32bit threshold and prepares for
+ * crossing the 64bit thresholds. The last is large enough (2 bytes) to
+ * cross the 64bit threshold.
+ * Then free the allocations in the order: 2nd, 1st, 3rd.
+ */
+const struct test_command test_ulong_span[] = {
+ SET_ALLOCATOR(gen_pool_first_fit, null, NULL),
+ ALLOCATE_MEM(15, 0, ARR({0xab, 0xaa, 0xaa, 0x2a})),
+ ALLOCATE_MEM(16, 1, ARR({0xab, 0xaa, 0xaa, 0xea,
+ 0xaa, 0xaa, 0xaa, 0x2a})),
+ ALLOCATE_MEM(2, 2, ARR({0xab, 0xaa, 0xaa, 0xea,
+ 0xaa, 0xaa, 0xaa, 0xea,
+ 0x02})),
+ FREE_MEM(0, 1, ARR({0xab, 0xaa, 0xaa, 0x2a,
+ 0x00, 0x00, 0x00, 0xc0,
+ 0x02})),
+ FREE_MEM(0, 0, ARR({0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0xc0,
+ 0x02})),
+ FREE_MEM(0, 2, ARR({0x00})),
+ END(),
+};
+
+/* Create progressively smaller allocations A B C D E.
+ * then free B and D.
+ * Then create new allocation that would fit in both of the gaps left by
+ * B and D. Verify that it uses the gap from B.
+ */
+const struct test_command test_first_fit_gaps[] = {
+ SET_ALLOCATOR(gen_pool_first_fit, null, NULL),
+ ALLOCATE_MEM(10, 0, ARR({0xab, 0xaa, 0x0a})),
+ ALLOCATE_MEM(8, 1, ARR({0xab, 0xaa, 0xba, 0xaa,
+ 0x0a})),
+ ALLOCATE_MEM(6, 2, ARR({0xab, 0xaa, 0xba, 0xaa,
+ 0xba, 0xaa})),
+ ALLOCATE_MEM(4, 3, ARR({0xab, 0xaa, 0xba, 0xaa,
+ 0xba, 0xaa, 0xab})),
+ ALLOCATE_MEM(2, 4, ARR({0xab, 0xaa, 0xba, 0xaa,
+ 0xba, 0xaa, 0xab, 0x0b})),
+ FREE_MEM(0, 1, ARR({0xab, 0xaa, 0x0a, 0x00,
+ 0xb0, 0xaa, 0xab, 0x0b})),
+ FREE_MEM(0, 3, ARR({0xab, 0xaa, 0x0a, 0x00,
+ 0xb0, 0xaa, 0x00, 0x0b})),
+ ALLOCATE_MEM(3, 3, ARR({0xab, 0xaa, 0xba, 0x02,
+ 0xb0, 0xaa, 0x00, 0x0b})),
+ FLUSH_MEM(),
+ END(),
+};
+
+/* Test first fit align */
+const struct test_command test_first_fit_align[] = {
+ SET_ALLOCATOR(gen_pool_first_fit_align, align, 4),
+ ALLOCATE_MEM(5, 0, ARR({0xab, 0x02})),
+ ALLOCATE_MEM(3, 1, ARR({0xab, 0x02, 0x2b})),
+ ALLOCATE_MEM(2, 2, ARR({0xab, 0x02, 0x2b, 0x0b})),
+ ALLOCATE_MEM(1, 3, ARR({0xab, 0x02, 0x2b, 0x0b, 0x03})),
+ FREE_MEM(0, 0, ARR({0x00, 0x00, 0x2b, 0x0b, 0x03})),
+ FREE_MEM(0, 2, ARR({0x00, 0x00, 0x2b, 0x00, 0x03})),
+ ALLOCATE_MEM(2, 0, ARR({0x0b, 0x00, 0x2b, 0x00, 0x03})),
+ FLUSH_MEM(),
+ END(),
+};
+
+
+/* Test fixed alloc */
+const struct test_command test_fixed_data[] = {
+ SET_ALLOCATOR(gen_pool_fixed_alloc, offset, 1),
+ ALLOCATE_MEM(5, 0, ARR({0xac, 0x0a})),
+ SET_ALLOCATOR(gen_pool_fixed_alloc, offset, 8),
+ ALLOCATE_MEM(3, 1, ARR({0xac, 0x0a, 0x2b})),
+ SET_ALLOCATOR(gen_pool_fixed_alloc, offset, 6),
+ ALLOCATE_MEM(2, 2, ARR({0xac, 0xba, 0x2b})),
+ SET_ALLOCATOR(gen_pool_fixed_alloc, offset, 30),
+ ALLOCATE_MEM(40, 3, ARR({0xac, 0xba, 0x2b, 0x00,
+ 0x00, 0x00, 0x00, 0xb0,
+ 0xaa, 0xaa, 0xaa, 0xaa,
+ 0xaa, 0xaa, 0xaa, 0xaa})),
+ FLUSH_MEM(),
+ END(),
+};
+
+
+/* Test first fit order align */
+const struct test_command test_first_fit_order_align[] = {
+ SET_ALLOCATOR(gen_pool_first_fit_order_align, null, NULL),
+ ALLOCATE_MEM(5, 0, ARR({0xab, 0x02})),
+ ALLOCATE_MEM(3, 1, ARR({0xab, 0x02, 0x2b})),
+ ALLOCATE_MEM(2, 2, ARR({0xab, 0xb2, 0x2b})),
+ ALLOCATE_MEM(1, 3, ARR({0xab, 0xbe, 0x2b})),
+ ALLOCATE_MEM(1, 4, ARR({0xab, 0xbe, 0xeb})),
+ ALLOCATE_MEM(2, 5, ARR({0xab, 0xbe, 0xeb, 0x0b})),
+ FLUSH_MEM(),
+ END(),
+};
+
+
+/* 007 Test best fit */
+const struct test_command test_best_fit[] = {
+ SET_ALLOCATOR(gen_pool_best_fit, null, NULL),
+ ALLOCATE_MEM(5, 0, ARR({0xab, 0x02})),
+ ALLOCATE_MEM(3, 1, ARR({0xab, 0xae})),
+ ALLOCATE_MEM(3, 2, ARR({0xab, 0xae, 0x2b})),
+ ALLOCATE_MEM(1, 3, ARR({0xab, 0xae, 0xeb})),
+ FREE_MEM(0, 0, ARR({0x00, 0xac, 0xeb})),
+ FREE_MEM(0, 2, ARR({0x00, 0xac, 0xc0})),
+ ALLOCATE_MEM(2, 0, ARR({0x00, 0xac, 0xcb})),
+ FLUSH_MEM(),
+ END(),
+};
+
+
+enum test_cases_indexes {
+ TEST_CASE_FIRST_FIT,
+ TEST_CASE_ULONG_SPAN,
+ TEST_CASE_FIRST_FIT_GAPS,
+ TEST_CASE_FIRST_FIT_ALIGN,
+ TEST_CASE_FIXED_DATA,
+ TEST_CASE_FIRST_FIT_ORDER_ALIGN,
+ TEST_CASE_BEST_FIT,
+ TEST_CASES_NUM,
+};
+
+const struct test_command *test_cases[TEST_CASES_NUM] = {
+ [TEST_CASE_FIRST_FIT] = test_first_fit,
+ [TEST_CASE_ULONG_SPAN] = test_ulong_span,
+ [TEST_CASE_FIRST_FIT_GAPS] = test_first_fit_gaps,
+ [TEST_CASE_FIRST_FIT_ALIGN] = test_first_fit_align,
+ [TEST_CASE_FIXED_DATA] = test_fixed_data,
+ [TEST_CASE_FIRST_FIT_ORDER_ALIGN] = test_first_fit_order_align,
+ [TEST_CASE_BEST_FIT] = test_best_fit,
+};
+
+
+void genalloc_selftest(void)
+{
+ static struct gen_pool *pool;
+ unsigned long locations[ENTRIES];
+ char chunk[CHUNK_SIZE];
+ int retval;
+ unsigned int i;
+ const struct test_command *cmd;
+
+ pool = gen_pool_create(ALLOC_ORDER, -1);
+ if (unlikely(!pool)) {
+ pr_err("genalloc-selftest: no memory for pool.");
+ return;
+ }
+
+ retval = gen_pool_add_virt(pool, (unsigned long)chunk, 0,
+ CHUNK_SIZE, -1);
+ if (unlikely(retval)) {
+ pr_err("genalloc-selftest: could not register chunk.");
+ goto destroy_pool;
+ }
+
+ memset(locations, 0, ENTRIES * sizeof(unsigned long));
+ for (i = 0; i < TEST_CASES_NUM; i++)
+ for (cmd = test_cases[i]; cmd->command < CMD_END; cmd++)
+ callbacks[cmd->command](pool, cmd, locations);
+ pr_notice("genalloc-selftest: executed successfully %d tests",
+ TEST_CASES_NUM);
+
+destroy_pool:
+ gen_pool_destroy(pool);
+}
--
2.9.3
Detailed documentation about the protectable memory allocator.
Signed-off-by: Igor Stoppa <[email protected]>
---
Documentation/core-api/pmalloc.txt | 104 +++++++++++++++++++++++++++++++++++++
1 file changed, 104 insertions(+)
create mode 100644 Documentation/core-api/pmalloc.txt
diff --git a/Documentation/core-api/pmalloc.txt b/Documentation/core-api/pmalloc.txt
new file mode 100644
index 0000000..934d356
--- /dev/null
+++ b/Documentation/core-api/pmalloc.txt
@@ -0,0 +1,104 @@
+============================
+Protectable memory allocator
+============================
+
+Introduction
+------------
+
+When trying to perform an attack toward a system, the attacker typically
+wants to alter the execution flow, in a way that allows actions which
+would otherwise be forbidden.
+
+In recent years there has been lots of effort in preventing the execution
+of arbitrary code, so the attacker is progressively pushed to look for
+alternatives.
+
+If code changes are either detected or even prevented, what is left is to
+alter kernel data.
+
+As countermeasure, constant data is collected in a section which is then
+marked as readonly.
+To expand on this, also statically allocated variables which are tagged
+as __ro_after_init will receive a similar treatment.
+The difference from constant data is that such variables can be still
+altered freely during the kernel init phase.
+
+However, such solution does not address those variables which could be
+treated essentially as read-only, but whose size is not known at compile
+time or cannot be fully initialized during the init phase.
+
+
+Design
+------
+
+pmalloc builds on top of genalloc, using the same concept of memory pools
+A pool is a handle to a group of chunks of memory of various sizes.
+When created, a pool is empty. It will be populated by allocating chunks
+of memory, either when the first memory allocation request is received, or
+when a pre-allocation is performed.
+
+Either way, one or more memory pages will be obtained from vmalloc and
+registered in the pool as chunk. Subsequent requests will be satisfied by
+either using any available free space from the current chunks, or by
+allocating more vmalloc pages, should the current free space not suffice.
+
+This is the key point of pmalloc: it groups data that must be protected
+into a set of pages. The protection is performed through the mmu, which
+is a prerequisite and has a minimum granularity of one page.
+
+If the relevant variables were not grouped, there would be a problem of
+allowing writes to other variables that might happen to share the same
+page, but require further alterations over time.
+
+A pool is a group of pages that are write protected at the same time.
+Ideally, they have some high level correlation (ex: they belong to the
+same module), which justifies write protecting them all together.
+
+To keep it to a minimum, locking is left to the user of the API, in
+those cases where it's not strictly needed.
+Ideally, no further locking is required, since each module can have own
+pool (or pools), which should, for example, avoid the need for cross
+module or cross thread synchronization about write protecting a pool.
+
+The overhead of creating an additional pool is minimal: a handful of bytes
+from kmalloc space for the metadata and then what is left unused from the
+page(s) registered as chunks.
+
+Compared to plain use of vmalloc, genalloc has the advantage of tightly
+packing the allocations, reducing the number of pages used and therefore
+the pressure on the TLB. The slight overhead in execution time of the
+allocation should be mostly irrelevant, because pmalloc memory is not
+meant to be allocated/freed in tight loops. Rather it ought to be taken
+in use, initialized and write protected. Possibly destroyed.
+
+Considering that not much data is supposed to be dynamically allocated
+and then marked as read-only, it shouldn't be an issue that the address
+range for pmalloc is limited, on 32-bit systems.
+
+Regarding SMP systems, the allocations are expected to happen mostly
+during an initial transient, after which there should be no more need to
+perform cross-processor synchronizations of page tables.
+
+
+Use
+---
+
+The typical sequence, when using pmalloc, is:
+
+1. create a pool
+2. [optional] pre-allocate some memory in the pool
+3. issue one or more allocation requests to the pool
+4. initialize the memory obtained
+ - iterate over points 3 & 4 as needed -
+5. write protect the pool
+6. use in read-only mode the handlers obtained through the allocations
+7. [optional] destroy the pool
+
+
+In a scenario where, for example due to some error, part or all of the
+allocations performed at point 3 must be reverted, it is possible to free
+them, as long as point 5 has not been executed, and the pool is still
+modifiable. Such freed memory can be re-used.
+Performing a free operation on a write-protected pool will, instead,
+simply release the corresponding memory from the accounting, but it will
+be still impossible to alter its content.
--
2.9.3
Add basic self-test functionality for pmalloc.
Signed-off-by: Igor Stoppa <[email protected]>
---
lib/genalloc.c | 2 +-
mm/Kconfig | 7 ++++++
mm/Makefile | 1 +
mm/pmalloc-selftest.c | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++
mm/pmalloc-selftest.h | 30 ++++++++++++++++++++++++
mm/pmalloc.c | 9 ++++---
6 files changed, 110 insertions(+), 4 deletions(-)
create mode 100644 mm/pmalloc-selftest.c
create mode 100644 mm/pmalloc-selftest.h
diff --git a/lib/genalloc.c b/lib/genalloc.c
index 62f69b3..7ba2ec9 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -542,7 +542,7 @@ void gen_pool_flush_chunk(struct gen_pool *pool,
memset(chunk->entries, 0,
DIV_ROUND_UP(size >> pool->min_alloc_order * BITS_PER_ENTRY,
BITS_PER_BYTE));
- atomic_set(&chunk->avail, size);
+ atomic_long_set(&chunk->avail, size);
}
diff --git a/mm/Kconfig b/mm/Kconfig
index 03ff770..f0c960e 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -765,3 +765,10 @@ config GUP_BENCHMARK
performance of get_user_pages_fast().
See tools/testing/selftests/vm/gup_benchmark.c
+
+config PROTECTABLE_MEMORY_SELFTEST
+ bool "Run self test for pmalloc memory allocator"
+ default n
+ help
+ Tries to verify that pmalloc works correctly and that the memory
+ is effectively protected.
diff --git a/mm/Makefile b/mm/Makefile
index a6a47e1..1e76a9b 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -66,6 +66,7 @@ obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
obj-$(CONFIG_SLOB) += slob.o
obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
obj-$(CONFIG_ARCH_HAS_SET_MEMORY) += pmalloc.o
+obj-$(CONFIG_PROTECTABLE_MEMORY_SELFTEST) += pmalloc-selftest.o
obj-$(CONFIG_KSM) += ksm.o
obj-$(CONFIG_PAGE_POISONING) += page_poison.o
obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/pmalloc-selftest.c b/mm/pmalloc-selftest.c
new file mode 100644
index 0000000..1c025f3
--- /dev/null
+++ b/mm/pmalloc-selftest.c
@@ -0,0 +1,65 @@
+/*
+ * pmalloc-selftest.c
+ *
+ * (C) Copyright 2018 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+#include <linux/pmalloc.h>
+#include <linux/mm.h>
+
+
+#define SIZE_1 (PAGE_SIZE * 3)
+#define SIZE_2 1000
+
+#define validate_alloc(expected, variable, size) \
+ pr_notice("must be " expected ": %s", \
+ is_pmalloc_object(variable, size) > 0 ? "ok" : "no")
+
+#define is_alloc_ok(variable, size) \
+ validate_alloc("ok", variable, size)
+
+#define is_alloc_no(variable, size) \
+ validate_alloc("no", variable, size)
+
+void pmalloc_selftest(void)
+{
+ struct gen_pool *pool_unprot;
+ struct gen_pool *pool_prot;
+ void *var_prot, *var_unprot, *var_vmall;
+
+ pr_notice("pmalloc self-test");
+ pool_unprot = pmalloc_create_pool("unprotected", 0);
+ pool_prot = pmalloc_create_pool("protected", 0);
+ BUG_ON(!(pool_unprot && pool_prot));
+
+ var_unprot = pmalloc(pool_unprot, SIZE_1 - 1, GFP_KERNEL);
+ var_prot = pmalloc(pool_prot, SIZE_1, GFP_KERNEL);
+ var_vmall = vmalloc(SIZE_2);
+ is_alloc_ok(var_unprot, 10);
+ is_alloc_ok(var_unprot, SIZE_1);
+ is_alloc_ok(var_unprot, PAGE_SIZE);
+ is_alloc_no(var_unprot, SIZE_1 + 1);
+ is_alloc_no(var_vmall, 10);
+
+
+ pfree(pool_unprot, var_unprot);
+ vfree(var_vmall);
+
+ pmalloc_protect_pool(pool_prot);
+
+ /* This will intentionally trigger a WARN because the pool being
+ * destroyed is not protected, which is unusual and should happen
+ * on error paths only, where probably other warnings are already
+ * displayed.
+ */
+ pmalloc_destroy_pool(pool_unprot);
+
+ /* This must not cause WARNings */
+ pmalloc_destroy_pool(pool_prot);
+}
diff --git a/mm/pmalloc-selftest.h b/mm/pmalloc-selftest.h
new file mode 100644
index 0000000..3673d23
--- /dev/null
+++ b/mm/pmalloc-selftest.h
@@ -0,0 +1,30 @@
+/*
+ * pmalloc-selftest.h
+ *
+ * (C) Copyright 2018 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+
+#ifndef __PMALLOC_SELFTEST_H__
+#define __PMALLOC_SELFTEST_H__
+
+
+#ifdef CONFIG_PROTECTABLE_MEMORY_SELFTEST
+
+#include <linux/pmalloc.h>
+
+void pmalloc_selftest(void);
+
+#else
+
+static inline void pmalloc_selftest(void){};
+
+#endif
+
+#endif
diff --git a/mm/pmalloc.c b/mm/pmalloc.c
index a64ac49..73387d7 100644
--- a/mm/pmalloc.c
+++ b/mm/pmalloc.c
@@ -25,6 +25,8 @@
#include <asm/cacheflush.h>
#include <asm/page.h>
+#include "pmalloc-selftest.h"
+
/**
* pmalloc_data contains the data specific to a pmalloc pool,
* in a format compatible with the design of gen_alloc.
@@ -152,7 +154,7 @@ static void pmalloc_disconnect(struct pmalloc_data *data,
do { \
sysfs_attr_init(&data->attr_##attr_name.attr); \
data->attr_##attr_name.attr.name = #attr_name; \
- data->attr_##attr_name.attr.mode = VERIFY_OCTAL_PERMISSIONS(0444); \
+ data->attr_##attr_name.attr.mode = VERIFY_OCTAL_PERMISSIONS(0400); \
data->attr_##attr_name.show = pmalloc_pool_show_##attr_name; \
} while (0)
@@ -335,7 +337,7 @@ bool pmalloc_prealloc(struct gen_pool *pool, size_t size)
return true;
abort:
- vfree(chunk);
+ vfree_atomic(chunk);
return false;
}
@@ -401,7 +403,7 @@ void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
abort:
untag_chunk(chunk);
free:
- vfree(chunk);
+ vfree_atomic(chunk);
return NULL;
}
@@ -508,6 +510,7 @@ static int __init pmalloc_late_init(void)
}
}
mutex_unlock(&pmalloc_mutex);
+ pmalloc_selftest();
return 0;
}
late_initcall(pmalloc_late_init);
--
2.9.3
The genalloc library is only capable of tracking if a certain unit of
allocation is in use or not.
It is not capable of discerning where the memory associated to an
allocation request begins and where it ends.
The reason is that units of allocations are tracked by using a bitmap,
where each bit represents that the unit is either allocated (1) or
available (0).
The user of the API must keep track of how much space was requested, if
it ever needs to be freed.
This can cause errors being undetected.
Ex:
* Only a subset of the memory provided to an allocation request is freed
* The memory from a subsequent allocation is freed
* The memory being freed doesn't start at the beginning of an
allocation.
The bitmap is used because it allows to perform lockless read/write
access, where this is supported by hw through cmpxchg.
Similarly, it is possible to scan the bitmap for a sufficiently long
sequence of zeros, to identify zones available for allocation.
--
This patch doubles the space reserved in the bitmap for each allocation.
By using 2 bits per allocation, it is possible to encode also the
information of where the allocation starts:
(msb to the left, lsb to the right, in the following "dictionary")
11: first allocation unit in the allocation
10: any subsequent allocation unit (if any) in the allocation
00: available allocation unit
01: invalid
Ex, with the same notation as above - MSb.......LSb:
...000010111100000010101011 <-- Read in this direction.
\__|\__|\|\____|\______|
| | | | \___ 4 used allocation units
| | | \___________ 3 empty allocation units
| | \_________________ 1 used allocation unit
| \___________________ 2 used allocation units
\_______________________ 2 empty allocation units
Because of the encoding, the previous lockless operations are still
possible. The only caveat is to change the parameter of the zero-finding
function which establishes the alignment at which to perform the test
for first zero.
The original value of the parameter is 0, meaning that an allocation can
start at any point in the bitmap, while the new value is 1, meaning that
allocations can start only at even places (bit 0, bit 2, etc.)
The number of zeroes to look for, must therefore be doubled.
When it's time to free the memory associated to an allocation request,
it's a matter of checking if the corresponding allocation unit is really
the beginning of an allocation (both bits are set to 1).
Looking for the ending can also be performed locklessly.
It's sufficient to identify the first mapped allocation unit
that is represented either as free (00) or busy (11).
Even if the allocation status should change in the meanwhile, it doesn't
matter, since it can only transition between free (00) and
first-allocated (11).
The parameter indicating to the *_free() function the size of the space
that should be freed is not currently removed, to facilitate the
transition, but it is verified, whenever it is not zero.
If it is set to zero, then the free function will autonomously decide the
size to be free, by scanning the bitmap.
About the implementation: the patch introduces the concept of "bitmap
entry", which has a 1:1 mapping with allocation units, while the code
being patched has a 1:1 mapping between allocation units and bits.
This means that, now, the bitmap can be extended (by following powers of
2), to track also other properties of the allocations, if ever needed.
Signed-off-by: Igor Stoppa <[email protected]>
---
include/linux/genalloc.h | 2 +-
lib/genalloc.c | 417 ++++++++++++++++++++++++++++++++---------------
2 files changed, 288 insertions(+), 131 deletions(-)
diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index 872f930..0377681 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -76,7 +76,7 @@ struct gen_pool_chunk {
phys_addr_t phys_addr; /* physical starting address of memory chunk */
unsigned long start_addr; /* start address of memory chunk */
unsigned long end_addr; /* end address of memory chunk (inclusive) */
- unsigned long bits[0]; /* bitmap for allocating memory chunk */
+ unsigned long entries[0]; /* bitmap for allocating memory chunk */
};
/*
diff --git a/lib/genalloc.c b/lib/genalloc.c
index ca06adc..dde7830 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -36,114 +36,221 @@
#include <linux/genalloc.h>
#include <linux/of_device.h>
+#define ENTRY_ORDER 1UL
+#define ENTRY_MASK ((1UL << ((ENTRY_ORDER) + 1UL)) - 1UL)
+#define ENTRY_HEAD ENTRY_MASK
+#define ENTRY_UNUSED 0UL
+#define BITS_PER_ENTRY (1U << ENTRY_ORDER)
+#define BITS_DIV_ENTRIES(x) ((x) >> ENTRY_ORDER)
+#define ENTRIES_TO_BITS(x) ((x) << ENTRY_ORDER)
+#define BITS_DIV_LONGS(x) ((x) / BITS_PER_LONG)
+#define ENTRIES_DIV_LONGS(x) (BITS_DIV_LONGS(ENTRIES_TO_BITS(x)))
+
+#define ENTRIES_PER_LONG BITS_DIV_ENTRIES(BITS_PER_LONG)
+
+/* Binary pattern of 1010...1010 that spans one unsigned long. */
+#define MASK (~0UL / 3 * 2)
+
+/**
+ * get_bitmap_entry - extracts the specified entry from the bitmap
+ * @map: pointer to a bitmap
+ * @entry_index: the index of the desired entry in the bitmap
+ *
+ * Returns the requested bitmap.
+ */
+static inline unsigned long get_bitmap_entry(unsigned long *map,
+ int entry_index)
+{
+ return (map[ENTRIES_DIV_LONGS(entry_index)] >>
+ ENTRIES_TO_BITS(entry_index % ENTRIES_PER_LONG)) &
+ ENTRY_MASK;
+}
+
+
+/**
+ * mem_to_units - convert references to memory into orders of allocation
+ * @size: amount in bytes
+ * @order: pow of 2 represented by each entry in the bitmap
+ *
+ * Returns the number of units representing the size.
+ */
+static inline unsigned long mem_to_units(unsigned long size,
+ unsigned long order)
+{
+ return (size + (1UL << order) - 1) >> order;
+}
+
+/**
+ * chunk_size - dimension of a chunk of memory
+ * @chunk: pointer to the struct describing the chunk
+ *
+ * Returns the size of the chunk.
+ */
static inline size_t chunk_size(const struct gen_pool_chunk *chunk)
{
return chunk->end_addr - chunk->start_addr + 1;
}
-static int set_bits_ll(unsigned long *addr, unsigned long mask_to_set)
+
+/**
+ * set_bits_ll - according to the mask, sets the bits specified by
+ * value, at the address specified.
+ * @addr: where to write
+ * @mask: filter to apply for the bits to alter
+ * @value: actual configuration of bits to store
+ *
+ * Returns 0 upon success, -EBUSY otherwise
+ */
+static int set_bits_ll(unsigned long *addr,
+ unsigned long mask, unsigned long value)
{
- unsigned long val, nval;
+ unsigned long nval;
+ unsigned long present;
+ unsigned long target;
nval = *addr;
do {
- val = nval;
- if (val & mask_to_set)
+ present = nval;
+ if (present & mask)
return -EBUSY;
+ target = present | value;
cpu_relax();
- } while ((nval = cmpxchg(addr, val, val | mask_to_set)) != val);
-
+ } while ((nval = cmpxchg(addr, present, target)) != target);
return 0;
}
-static int clear_bits_ll(unsigned long *addr, unsigned long mask_to_clear)
+
+/**
+ * cleart_bits_ll - according to the mask, clears the bits specified by
+ * value, at the address specified.
+ * @addr: where to write
+ * @mask: filter to apply for the bits to alter
+ * @value: actual configuration of bits to clear
+ *
+ * Returns 0 upon success, -EBUSY otherwise
+ */
+static int clear_bits_ll(unsigned long *addr,
+ unsigned long mask, unsigned long value)
{
- unsigned long val, nval;
+ unsigned long nval;
+ unsigned long present;
+ unsigned long target;
nval = *addr;
+ present = nval;
+ if (unlikely((present & mask) ^ value))
+ return -EBUSY;
do {
- val = nval;
- if ((val & mask_to_clear) != mask_to_clear)
+ present = nval;
+ if (unlikely((present & mask) ^ value))
return -EBUSY;
+ target = present & ~mask;
cpu_relax();
- } while ((nval = cmpxchg(addr, val, val & ~mask_to_clear)) != val);
-
+ } while ((nval = cmpxchg(addr, present, target)) != target);
return 0;
}
-/*
- * bitmap_set_ll - set the specified number of bits at the specified position
+
+/**
+ * get_boundary - verify that an allocation effectively
+ * starts at the given address, then measure its length.
* @map: pointer to a bitmap
- * @start: a bit position in @map
- * @nr: number of bits to set
+ * @start_entry: the index of the first entry in the bitmap
+ * @nentries: number of entries to alter
*
- * Set @nr bits start from @start in @map lock-lessly. Several users
- * can set/clear the same bitmap simultaneously without lock. If two
- * users set the same bit, one user will return remain bits, otherwise
- * return 0.
+ * Returns the length of an allocation, otherwise -EINVAL if the
+ * parameters do not refer to a correct allocation.
*/
-static int bitmap_set_ll(unsigned long *map, int start, int nr)
+static int get_boundary(unsigned long *map, int start_entry, int nentries)
{
- unsigned long *p = map + BIT_WORD(start);
- const int size = start + nr;
- int bits_to_set = BITS_PER_LONG - (start % BITS_PER_LONG);
- unsigned long mask_to_set = BITMAP_FIRST_WORD_MASK(start);
-
- while (nr - bits_to_set >= 0) {
- if (set_bits_ll(p, mask_to_set))
- return nr;
- nr -= bits_to_set;
- bits_to_set = BITS_PER_LONG;
- mask_to_set = ~0UL;
- p++;
- }
- if (nr) {
- mask_to_set &= BITMAP_LAST_WORD_MASK(size);
- if (set_bits_ll(p, mask_to_set))
- return nr;
- }
+ int i;
+ unsigned long bitmap_entry;
- return 0;
+
+ if (unlikely(get_bitmap_entry(map, start_entry) != ENTRY_HEAD))
+ return -EINVAL;
+ for (i = start_entry + 1; i < nentries; i++) {
+ bitmap_entry = get_bitmap_entry(map, i);
+ if (bitmap_entry == ENTRY_HEAD ||
+ bitmap_entry == ENTRY_UNUSED)
+ return i;
+ }
+ return nentries - start_entry;
}
+
+#define SET_BITS 1
+#define CLEAR_BITS 0
+
/*
- * bitmap_clear_ll - clear the specified number of bits at the specified position
+ * alter_bitmap_ll - set or clear the entries associated to an allocation
+ * @alteration: selection if the bits selected should be set or cleared
* @map: pointer to a bitmap
- * @start: a bit position in @map
- * @nr: number of bits to set
+ * @start: the index of the first entry in the bitmap
+ * @nentries: number of entries to alter
*
- * Clear @nr bits start from @start in @map lock-lessly. Several users
- * can set/clear the same bitmap simultaneously without lock. If two
- * users clear the same bit, one user will return remain bits,
- * otherwise return 0.
+ * The modification happens lock-lessly.
+ * Several users can write to the same map simultaneously, without lock.
+ * If two users alter the same bit, one user will return remaining
+ * entries, otherwise return 0.
*/
-static int bitmap_clear_ll(unsigned long *map, int start, int nr)
+static int alter_bitmap_ll(bool alteration, unsigned long *map,
+ int start_entry, int nentries)
{
- unsigned long *p = map + BIT_WORD(start);
- const int size = start + nr;
- int bits_to_clear = BITS_PER_LONG - (start % BITS_PER_LONG);
- unsigned long mask_to_clear = BITMAP_FIRST_WORD_MASK(start);
-
- while (nr - bits_to_clear >= 0) {
- if (clear_bits_ll(p, mask_to_clear))
- return nr;
- nr -= bits_to_clear;
- bits_to_clear = BITS_PER_LONG;
- mask_to_clear = ~0UL;
- p++;
- }
- if (nr) {
- mask_to_clear &= BITMAP_LAST_WORD_MASK(size);
- if (clear_bits_ll(p, mask_to_clear))
- return nr;
+ unsigned long start_bit;
+ unsigned long end_bit;
+ unsigned long mask;
+ unsigned long value;
+ int nbits;
+ int bits_to_write;
+ int index;
+ int (*action)(unsigned long *addr,
+ unsigned long mask, unsigned long value);
+
+ action = (alteration == SET_BITS) ? set_bits_ll : clear_bits_ll;
+
+ /* Prepare for writing the initial part of the allocation, from
+ * starting entry, to the end of the UL bitmap element which
+ * contains it. It might be larger than the actual allocation.
+ */
+ start_bit = ENTRIES_TO_BITS(start_entry);
+ end_bit = ENTRIES_TO_BITS(start_entry + nentries);
+ nbits = ENTRIES_TO_BITS(nentries);
+ bits_to_write = BITS_PER_LONG - start_bit % BITS_PER_LONG;
+ mask = BITMAP_FIRST_WORD_MASK(start_bit);
+ /* Mark the beginning of the allocation. */
+ value = MASK | (1UL << (start_bit % BITS_PER_LONG));
+ index = BITS_DIV_LONGS(start_bit);
+
+ /* Writes entries to the bitmap, as long as the reminder is
+ * positive or zero.
+ * Might be skipped if the entries to write do not reach the end
+ * of a bitmap UL unit.
+ */
+ while (nbits >= bits_to_write) {
+ if (action(map + index, mask, value & mask))
+ return BITS_DIV_ENTRIES(nbits);
+ nbits -= bits_to_write;
+ bits_to_write = BITS_PER_LONG;
+ mask = ~0UL;
+ value = MASK;
+ index++;
}
+ /* Takes care of the ending part of the entries to mark. */
+ if (nbits > 0) {
+ mask ^= BITMAP_FIRST_WORD_MASK((end_bit) % BITS_PER_LONG);
+ bits_to_write = nbits;
+ if (action(map + index, mask, value & mask))
+ return BITS_DIV_ENTRIES(nbits);
+ }
return 0;
}
+
/**
* gen_pool_create - create a new special memory pool
- * @min_alloc_order: log base 2 of number of bytes each bitmap bit represents
+ * @min_alloc_order: log base 2 of number of bytes each bitmap entry represents
* @nid: node id of the node the pool structure should be allocated on, or -1
*
* Create a new special memory pool that can be used to manage special purpose
@@ -183,10 +290,12 @@ int gen_pool_add_virt(struct gen_pool *pool, unsigned long virt, phys_addr_t phy
size_t size, int nid)
{
struct gen_pool_chunk *chunk;
- int nbits = size >> pool->min_alloc_order;
- int nbytes = sizeof(struct gen_pool_chunk) +
- BITS_TO_LONGS(nbits) * sizeof(long);
+ int nentries;
+ int nbytes;
+ nentries = size >> pool->min_alloc_order;
+ nbytes = sizeof(struct gen_pool_chunk) +
+ ENTRIES_DIV_LONGS(nentries) * sizeof(long);
chunk = kzalloc_node(nbytes, GFP_KERNEL, nid);
if (unlikely(chunk == NULL))
return -ENOMEM;
@@ -248,7 +357,7 @@ void gen_pool_destroy(struct gen_pool *pool)
list_del(&chunk->next_chunk);
end_bit = chunk_size(chunk) >> order;
- bit = find_next_bit(chunk->bits, end_bit, 0);
+ bit = find_next_bit(chunk->entries, end_bit, 0);
BUG_ON(bit < end_bit);
kfree(chunk);
@@ -292,7 +401,7 @@ unsigned long gen_pool_alloc_algo(struct gen_pool *pool, size_t size,
struct gen_pool_chunk *chunk;
unsigned long addr = 0;
int order = pool->min_alloc_order;
- int nbits, start_bit, end_bit, remain;
+ int nentries, start_entry, end_entry, remain;
#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
BUG_ON(in_nmi());
@@ -301,29 +410,32 @@ unsigned long gen_pool_alloc_algo(struct gen_pool *pool, size_t size,
if (size == 0)
return 0;
- nbits = (size + (1UL << order) - 1) >> order;
+ nentries = mem_to_units(size, order);
rcu_read_lock();
list_for_each_entry_rcu(chunk, &pool->chunks, next_chunk) {
if (size > atomic_long_read(&chunk->avail))
continue;
- start_bit = 0;
- end_bit = chunk_size(chunk) >> order;
+ start_entry = 0;
+ end_entry = chunk_size(chunk) >> order;
retry:
- start_bit = algo(chunk->bits, end_bit, start_bit,
- nbits, data, pool);
- if (start_bit >= end_bit)
+ start_entry = algo(chunk->entries, end_entry, start_entry,
+ nentries, data, pool);
+ if (start_entry >= end_entry)
continue;
- remain = bitmap_set_ll(chunk->bits, start_bit, nbits);
+ remain = alter_bitmap_ll(SET_BITS, chunk->entries,
+ start_entry, nentries);
if (remain) {
- remain = bitmap_clear_ll(chunk->bits, start_bit,
- nbits - remain);
- BUG_ON(remain);
+ remain = alter_bitmap_ll(CLEAR_BITS,
+ chunk->entries,
+ start_entry,
+ nentries - remain);
goto retry;
}
- addr = chunk->start_addr + ((unsigned long)start_bit << order);
- size = nbits << order;
+ addr = chunk->start_addr +
+ ((unsigned long)start_entry << order);
+ size = nentries << order;
atomic_long_sub(size, &chunk->avail);
break;
}
@@ -365,7 +477,7 @@ EXPORT_SYMBOL(gen_pool_dma_alloc);
* gen_pool_free - free allocated special memory back to the pool
* @pool: pool to free to
* @addr: starting address of memory to free back to pool
- * @size: size in bytes of memory to free
+ * @size: size in bytes of memory to free or 0, for auto-detection
*
* Free previously allocated special memory back to the specified
* pool. Can not be used in NMI handler on architectures without
@@ -375,22 +487,29 @@ void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
{
struct gen_pool_chunk *chunk;
int order = pool->min_alloc_order;
- int start_bit, nbits, remain;
+ int start_entry, remaining_entries, nentries, remain;
+ int boundary;
#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
BUG_ON(in_nmi());
#endif
- nbits = (size + (1UL << order) - 1) >> order;
rcu_read_lock();
list_for_each_entry_rcu(chunk, &pool->chunks, next_chunk) {
if (addr >= chunk->start_addr && addr <= chunk->end_addr) {
BUG_ON(addr + size - 1 > chunk->end_addr);
- start_bit = (addr - chunk->start_addr) >> order;
- remain = bitmap_clear_ll(chunk->bits, start_bit, nbits);
+ start_entry = (addr - chunk->start_addr) >> order;
+ remaining_entries = (chunk->end_addr - addr) >> order;
+ boundary = get_boundary(chunk->entries, start_entry,
+ remaining_entries);
+ BUG_ON(boundary < 0);
+ nentries = boundary - start_entry;
+ BUG_ON(size &&
+ (nentries != mem_to_units(size, order)));
+ remain = alter_bitmap_ll(CLEAR_BITS, chunk->entries,
+ start_entry, nentries);
BUG_ON(remain);
- size = nbits << order;
- atomic_long_add(size, &chunk->avail);
+ atomic_long_add(nentries << order, &chunk->avail);
rcu_read_unlock();
return;
}
@@ -517,9 +636,9 @@ EXPORT_SYMBOL(gen_pool_set_algo);
* gen_pool_first_fit - find the first available region
* of memory matching the size requirement (no alignment constraint)
* @map: The address to base the search on
- * @size: The bitmap size in bits
- * @start: The bitnumber to start searching at
- * @nr: The number of zeroed bits we're looking for
+ * @size: The number of allocation units in the bitmap
+ * @start: The allocation unit to start searching at
+ * @nr: The number of allocation units we're looking for
* @data: additional data - unused
* @pool: pool to find the fit region memory from
*/
@@ -527,7 +646,15 @@ unsigned long gen_pool_first_fit(unsigned long *map, unsigned long size,
unsigned long start, unsigned int nr, void *data,
struct gen_pool *pool)
{
- return bitmap_find_next_zero_area(map, size, start, nr, 0);
+ unsigned long align_mask;
+ unsigned long bit_index;
+
+ align_mask = roundup_pow_of_two(BITS_PER_ENTRY) - 1;
+ bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
+ ENTRIES_TO_BITS(start),
+ ENTRIES_TO_BITS(nr),
+ align_mask);
+ return BITS_DIV_ENTRIES(bit_index);
}
EXPORT_SYMBOL(gen_pool_first_fit);
@@ -535,9 +662,9 @@ EXPORT_SYMBOL(gen_pool_first_fit);
* gen_pool_first_fit_align - find the first available region
* of memory matching the size requirement (alignment constraint)
* @map: The address to base the search on
- * @size: The bitmap size in bits
- * @start: The bitnumber to start searching at
- * @nr: The number of zeroed bits we're looking for
+ * @size: The number of allocation units in the bitmap
+ * @start: The allocation unit to start searching at
+ * @nr: The number of allocation units we're looking for
* @data: data for alignment
* @pool: pool to get order from
*/
@@ -547,21 +674,28 @@ unsigned long gen_pool_first_fit_align(unsigned long *map, unsigned long size,
{
struct genpool_data_align *alignment;
unsigned long align_mask;
+ unsigned long bit_index;
int order;
alignment = data;
order = pool->min_alloc_order;
- align_mask = ((alignment->align + (1UL << order) - 1) >> order) - 1;
- return bitmap_find_next_zero_area(map, size, start, nr, align_mask);
+ align_mask = roundup_pow_of_two(
+ ENTRIES_TO_BITS(mem_to_units(alignment->align,
+ order))) - 1;
+ bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
+ ENTRIES_TO_BITS(start),
+ ENTRIES_TO_BITS(nr),
+ align_mask);
+ return BITS_DIV_ENTRIES(bit_index);
}
EXPORT_SYMBOL(gen_pool_first_fit_align);
/**
* gen_pool_fixed_alloc - reserve a specific region
* @map: The address to base the search on
- * @size: The bitmap size in bits
- * @start: The bitnumber to start searching at
- * @nr: The number of zeroed bits we're looking for
+ * @size: The number of allocation units in the bitmap
+ * @start: The allocation unit to start searching at
+ * @nr: The number of allocation units we're looking for
* @data: data for alignment
* @pool: pool to get order from
*/
@@ -571,20 +705,23 @@ unsigned long gen_pool_fixed_alloc(unsigned long *map, unsigned long size,
{
struct genpool_data_fixed *fixed_data;
int order;
- unsigned long offset_bit;
- unsigned long start_bit;
+ unsigned long offset;
+ unsigned long align_mask;
+ unsigned long bit_index;
fixed_data = data;
order = pool->min_alloc_order;
- offset_bit = fixed_data->offset >> order;
if (WARN_ON(fixed_data->offset & ((1UL << order) - 1)))
return size;
+ offset = fixed_data->offset >> order;
+ align_mask = roundup_pow_of_two(BITS_PER_ENTRY) - 1;
+ bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
+ ENTRIES_TO_BITS(start + offset),
+ ENTRIES_TO_BITS(nr), align_mask);
+ if (bit_index != ENTRIES_TO_BITS(offset))
+ return size;
- start_bit = bitmap_find_next_zero_area(map, size,
- start + offset_bit, nr, 0);
- if (start_bit != offset_bit)
- start_bit = size;
- return start_bit;
+ return BITS_DIV_ENTRIES(bit_index);
}
EXPORT_SYMBOL(gen_pool_fixed_alloc);
@@ -593,9 +730,9 @@ EXPORT_SYMBOL(gen_pool_fixed_alloc);
* of memory matching the size requirement. The region will be aligned
* to the order of the size specified.
* @map: The address to base the search on
- * @size: The bitmap size in bits
- * @start: The bitnumber to start searching at
- * @nr: The number of zeroed bits we're looking for
+ * @size: The number of allocation units in the bitmap
+ * @start: The allocation unit to start searching at
+ * @nr: The number of allocation units we're looking for
* @data: additional data - unused
* @pool: pool to find the fit region memory from
*/
@@ -603,9 +740,15 @@ unsigned long gen_pool_first_fit_order_align(unsigned long *map,
unsigned long size, unsigned long start,
unsigned int nr, void *data, struct gen_pool *pool)
{
- unsigned long align_mask = roundup_pow_of_two(nr) - 1;
-
- return bitmap_find_next_zero_area(map, size, start, nr, align_mask);
+ unsigned long align_mask;
+ unsigned long bit_index;
+
+ align_mask = roundup_pow_of_two(ENTRIES_TO_BITS(nr)) - 1;
+ bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
+ ENTRIES_TO_BITS(start),
+ ENTRIES_TO_BITS(nr),
+ align_mask);
+ return BITS_DIV_ENTRIES(bit_index);
}
EXPORT_SYMBOL(gen_pool_first_fit_order_align);
@@ -613,9 +756,9 @@ EXPORT_SYMBOL(gen_pool_first_fit_order_align);
* gen_pool_best_fit - find the best fitting region of memory
* macthing the size requirement (no alignment constraint)
* @map: The address to base the search on
- * @size: The bitmap size in bits
- * @start: The bitnumber to start searching at
- * @nr: The number of zeroed bits we're looking for
+ * @size: The number of allocation units in the bitmap
+ * @start: The allocation unit to start searching at
+ * @nr: The number of allocation units we're looking for
* @data: additional data - unused
* @pool: pool to find the fit region memory from
*
@@ -626,27 +769,41 @@ unsigned long gen_pool_best_fit(unsigned long *map, unsigned long size,
unsigned long start, unsigned int nr, void *data,
struct gen_pool *pool)
{
- unsigned long start_bit = size;
+ unsigned long start_bit = ENTRIES_TO_BITS(size);
unsigned long len = size + 1;
unsigned long index;
+ unsigned long align_mask;
+ unsigned long bit_index;
- index = bitmap_find_next_zero_area(map, size, start, nr, 0);
+ align_mask = roundup_pow_of_two(BITS_PER_ENTRY) - 1;
+ bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
+ ENTRIES_TO_BITS(start),
+ ENTRIES_TO_BITS(nr),
+ align_mask);
+ index = BITS_DIV_ENTRIES(bit_index);
while (index < size) {
- int next_bit = find_next_bit(map, size, index + nr);
- if ((next_bit - index) < len) {
- len = next_bit - index;
- start_bit = index;
+ int next_bit;
+
+ next_bit = find_next_bit(map, ENTRIES_TO_BITS(size),
+ ENTRIES_TO_BITS(index + nr));
+ if ((BITS_DIV_ENTRIES(next_bit) - index) < len) {
+ len = BITS_DIV_ENTRIES(next_bit) - index;
+ start_bit = ENTRIES_TO_BITS(index);
if (len == nr)
- return start_bit;
+ return BITS_DIV_ENTRIES(start_bit);
}
- index = bitmap_find_next_zero_area(map, size,
- next_bit + 1, nr, 0);
+ bit_index =
+ bitmap_find_next_zero_area(map,
+ ENTRIES_TO_BITS(size),
+ next_bit + 1,
+ ENTRIES_TO_BITS(nr),
+ align_mask);
+ index = BITS_DIV_ENTRIES(bit_index);
}
- return start_bit;
+ return BITS_DIV_ENTRIES(start_bit);
}
-EXPORT_SYMBOL(gen_pool_best_fit);
static void devm_gen_pool_release(struct device *dev, void *res)
{
--
2.9.3
On Tue, 30 Jan 2018 17:14:45 +0200
Igor Stoppa <[email protected]> wrote:
> Detailed documentation about the protectable memory allocator.
>
> Signed-off-by: Igor Stoppa <[email protected]>
> ---
> Documentation/core-api/pmalloc.txt | 104 +++++++++++++++++++++++++++++++++++++
> 1 file changed, 104 insertions(+)
> create mode 100644 Documentation/core-api/pmalloc.txt
Please don't put plain-text files into core-api - that's a directory full
of RST documents. Your document is 99.9% RST already, better to just
finish the job and tie it into the rest of the kernel docs.
> diff --git a/Documentation/core-api/pmalloc.txt b/Documentation/core-api/pmalloc.txt
> new file mode 100644
> index 0000000..934d356
> --- /dev/null
> +++ b/Documentation/core-api/pmalloc.txt
> @@ -0,0 +1,104 @@
We might as well put the SPDX tag here, it's a new file.
> +============================
> +Protectable memory allocator
> +============================
> +
> +Introduction
> +------------
> +
> +When trying to perform an attack toward a system, the attacker typically
> +wants to alter the execution flow, in a way that allows actions which
> +would otherwise be forbidden.
> +
> +In recent years there has been lots of effort in preventing the execution
> +of arbitrary code, so the attacker is progressively pushed to look for
> +alternatives.
> +
> +If code changes are either detected or even prevented, what is left is to
> +alter kernel data.
> +
> +As countermeasure, constant data is collected in a section which is then
> +marked as readonly.
> +To expand on this, also statically allocated variables which are tagged
> +as __ro_after_init will receive a similar treatment.
> +The difference from constant data is that such variables can be still
> +altered freely during the kernel init phase.
> +
> +However, such solution does not address those variables which could be
> +treated essentially as read-only, but whose size is not known at compile
> +time or cannot be fully initialized during the init phase.
This is all good information, but I'd suggest it belongs more in the 0/n
patch posting than here. The introduction of *this* document should say
what it actually covers.
> +
> +Design
> +------
> +
> +pmalloc builds on top of genalloc, using the same concept of memory pools
> +A pool is a handle to a group of chunks of memory of various sizes.
> +When created, a pool is empty. It will be populated by allocating chunks
> +of memory, either when the first memory allocation request is received, or
> +when a pre-allocation is performed.
> +
> +Either way, one or more memory pages will be obtained from vmalloc and
> +registered in the pool as chunk. Subsequent requests will be satisfied by
> +either using any available free space from the current chunks, or by
> +allocating more vmalloc pages, should the current free space not suffice.
> +
> +This is the key point of pmalloc: it groups data that must be protected
> +into a set of pages. The protection is performed through the mmu, which
> +is a prerequisite and has a minimum granularity of one page.
> +
> +If the relevant variables were not grouped, there would be a problem of
> +allowing writes to other variables that might happen to share the same
> +page, but require further alterations over time.
> +
> +A pool is a group of pages that are write protected at the same time.
> +Ideally, they have some high level correlation (ex: they belong to the
> +same module), which justifies write protecting them all together.
> +
> +To keep it to a minimum, locking is left to the user of the API, in
> +those cases where it's not strictly needed.
This seems like a relevant and important aspect of the API that shouldn't
be buried in the middle of a section talking about random things.
> +Ideally, no further locking is required, since each module can have own
> +pool (or pools), which should, for example, avoid the need for cross
> +module or cross thread synchronization about write protecting a pool.
> +
> +The overhead of creating an additional pool is minimal: a handful of bytes
> +from kmalloc space for the metadata and then what is left unused from the
> +page(s) registered as chunks.
> +
> +Compared to plain use of vmalloc, genalloc has the advantage of tightly
> +packing the allocations, reducing the number of pages used and therefore
> +the pressure on the TLB. The slight overhead in execution time of the
> +allocation should be mostly irrelevant, because pmalloc memory is not
> +meant to be allocated/freed in tight loops. Rather it ought to be taken
> +in use, initialized and write protected. Possibly destroyed.
> +
> +Considering that not much data is supposed to be dynamically allocated
> +and then marked as read-only, it shouldn't be an issue that the address
> +range for pmalloc is limited, on 32-bit systems.
> +
> +Regarding SMP systems, the allocations are expected to happen mostly
> +during an initial transient, after which there should be no more need to
> +perform cross-processor synchronizations of page tables.
> +
> +
> +Use
> +---
> +
> +The typical sequence, when using pmalloc, is:
> +
> +1. create a pool
> +2. [optional] pre-allocate some memory in the pool
> +3. issue one or more allocation requests to the pool
> +4. initialize the memory obtained
> + - iterate over points 3 & 4 as needed -
> +5. write protect the pool
> +6. use in read-only mode the handlers obtained through the allocations
> +7. [optional] destroy the pool
So one gets this far, but has no actual idea of how to do these things.
Which leads me to wonder: what is this document for? Who are you expecting
to read it?
You could improve things a lot by (once again) going to RST and using
directives to bring in the kerneldoc comments from the source (which, I
note, do exist). But I'd suggest rethinking this document and its
audience. Most of the people reading it are likely wanting to learn how to
*use* this API; I think it would be best to not leave them frustrated.
Thanks,
jon
On Tue, 30 Jan 2018, Igor Stoppa wrote:
> @@ -1769,6 +1774,9 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
>
> kmemleak_vmalloc(area, size, gfp_mask);
>
> + for (page_counter = 0; page_counter < area->nr_pages; page_counter++)
> + area->pages[page_counter]->area = area;
> +
> return addr;
Well this introduces significant overhead for large sized allocation. Does
this not matter because the areas are small?
Would it not be better to use compound page allocations here?
page_head(whatever) gets you the head page where you can store all sorts
of information about the chunk of memory.
On 01/02/18 02:00, Christopher Lameter wrote:
> On Tue, 30 Jan 2018, Igor Stoppa wrote:
>
>> @@ -1769,6 +1774,9 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
>>
>> kmemleak_vmalloc(area, size, gfp_mask);
>>
>> + for (page_counter = 0; page_counter < area->nr_pages; page_counter++)
>> + area->pages[page_counter]->area = area;
>> +
>> return addr;
>
> Well this introduces significant overhead for large sized allocation. Does
> this not matter because the areas are small?
Relatively significant?
I do not object to your comment, but in practice i see that:
- vmalloc is used relatively little
- allocations do not seem to be huge
- there seem to be way larger overheads in the handling of virtual pages
(see my proposal for the LFS/m summit, about collapsing struct
vm_struct and struct vmap_area)
> Would it not be better to use compound page allocations here?
> page_head(whatever) gets you the head page where you can store all sorts
> of information about the chunk of memory.
Can you please point me to this function/macro? I don't seem to be able
to find it, at least not in 4.15
During hardened user copy permission check, I need to confirm if the
memory range that would be exposed to userspace is a legitimate
sub-range of a pmalloc allocation.
So, I start with the pair (address, size) and I must end up to something
I can compare it against.
The idea here is to pass through struct_page and then the related
vm_struct/vmap_area, which already has the information about the
specific chunk of virtual memory.
I cannot comment on your proposal because I do not know where to find
the reference you made, or maybe I do not understand what you mean :-(
--
igor
On Thu, Feb 1, 2018 at 11:42 PM, Igor Stoppa <[email protected]> wrote:
> On 01/02/18 02:00, Christopher Lameter wrote:
>> Would it not be better to use compound page allocations here?
>> page_head(whatever) gets you the head page where you can store all sorts
>> of information about the chunk of memory.
>
> Can you please point me to this function/macro? I don't seem to be able
> to find it, at least not in 4.15
IIUC, he means PageHead(), which is also hard to grep for, since it is
a constructed name, via Page##uname in include/linux/page-flags.h:
__PAGEFLAG(Head, head, PF_ANY) CLEARPAGEFLAG(Head, head, PF_ANY)
-Kees
--
Kees Cook
Pixel Security
Hi Igor,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on linus/master]
[also build test WARNING on v4.15]
[cannot apply to next-20180201]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Igor-Stoppa/mm-security-ro-protection-for-dynamic-data/20180202-123437
config: i386-randconfig-x071-201804 (attached as .config)
compiler: gcc-7 (Debian 7.2.0-12) 7.2.1 20171025
reproduce:
# save the attached .config to linux build tree
make ARCH=i386
All warnings (new ones prefixed by >>):
mm/pmalloc.c: In function 'pmalloc_pool_show_avail':
>> mm/pmalloc.c:71:25: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
return sprintf(buf, "%lu\n", gen_pool_avail(data->pool));
~~^ ~~~~~~~~~~~~~~~~~~~~~~~~~~
%u
mm/pmalloc.c: In function 'pmalloc_pool_show_size':
mm/pmalloc.c:81:25: warning: format '%lu' expects argument of type 'long unsigned int', but argument 3 has type 'size_t {aka unsigned int}' [-Wformat=]
return sprintf(buf, "%lu\n", gen_pool_size(data->pool));
~~^ ~~~~~~~~~~~~~~~~~~~~~~~~~
%u
vim +71 mm/pmalloc.c
63
64 static ssize_t pmalloc_pool_show_avail(struct kobject *dev,
65 struct kobj_attribute *attr,
66 char *buf)
67 {
68 struct pmalloc_data *data;
69
70 data = container_of(attr, struct pmalloc_data, attr_avail);
> 71 return sprintf(buf, "%lu\n", gen_pool_avail(data->pool));
72 }
73
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
Hi Igor,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on linus/master]
[also build test ERROR on v4.15]
[cannot apply to next-20180201]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Igor-Stoppa/mm-security-ro-protection-for-dynamic-data/20180202-123437
config: i386-tinyconfig (attached as .config)
compiler: gcc-7 (Debian 7.2.0-12) 7.2.1 20171025
reproduce:
# save the attached .config to linux build tree
make ARCH=i386
All errors (new ones prefixed by >>):
mm/pmalloc.o: In function `pmalloc_pool_show_chunks':
>> pmalloc.c:(.text+0x50): undefined reference to `gen_pool_for_each_chunk'
mm/pmalloc.o: In function `pmalloc_pool_show_size':
>> pmalloc.c:(.text+0x6e): undefined reference to `gen_pool_size'
mm/pmalloc.o: In function `pmalloc_pool_show_avail':
>> pmalloc.c:(.text+0x8a): undefined reference to `gen_pool_avail'
mm/pmalloc.o: In function `pmalloc_chunk_free':
>> pmalloc.c:(.text+0x171): undefined reference to `gen_pool_flush_chunk'
mm/pmalloc.o: In function `pmalloc_create_pool':
>> pmalloc.c:(.text+0x19b): undefined reference to `gen_pool_create'
>> pmalloc.c:(.text+0x2bb): undefined reference to `gen_pool_destroy'
mm/pmalloc.o: In function `pmalloc_prealloc':
>> pmalloc.c:(.text+0x350): undefined reference to `gen_pool_add_virt'
mm/pmalloc.o: In function `pmalloc':
>> pmalloc.c:(.text+0x3a7): undefined reference to `gen_pool_alloc'
pmalloc.c:(.text+0x3f1): undefined reference to `gen_pool_add_virt'
pmalloc.c:(.text+0x401): undefined reference to `gen_pool_alloc'
mm/pmalloc.o: In function `pmalloc_destroy_pool':
pmalloc.c:(.text+0x4a1): undefined reference to `gen_pool_for_each_chunk'
pmalloc.c:(.text+0x4a8): undefined reference to `gen_pool_destroy'
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
Hi Igor,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on linus/master]
[also build test ERROR on v4.15]
[cannot apply to next-20180201]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Igor-Stoppa/mm-security-ro-protection-for-dynamic-data/20180202-123437
config: xtensa-allyesconfig (attached as .config)
compiler: xtensa-linux-gcc (GCC) 7.2.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=xtensa
All error/warnings (new ones prefixed by >>):
mm/pmalloc-selftest.c: In function 'pmalloc_selftest':
>> mm/pmalloc-selftest.c:43:14: error: implicit declaration of function 'vmalloc'; did you mean 'kvmalloc'? [-Werror=implicit-function-declaration]
var_vmall = vmalloc(SIZE_2);
^~~~~~~
kvmalloc
>> mm/pmalloc-selftest.c:43:12: warning: assignment makes pointer from integer without a cast [-Wint-conversion]
var_vmall = vmalloc(SIZE_2);
^
>> mm/pmalloc-selftest.c:52:2: error: implicit declaration of function 'vfree'; did you mean 'kvfree'? [-Werror=implicit-function-declaration]
vfree(var_vmall);
^~~~~
kvfree
cc1: some warnings being treated as errors
vim +43 mm/pmalloc-selftest.c
19
20 #define validate_alloc(expected, variable, size) \
21 pr_notice("must be " expected ": %s", \
22 is_pmalloc_object(variable, size) > 0 ? "ok" : "no")
23
24 #define is_alloc_ok(variable, size) \
25 validate_alloc("ok", variable, size)
26
27 #define is_alloc_no(variable, size) \
28 validate_alloc("no", variable, size)
29
30 void pmalloc_selftest(void)
31 {
32 struct gen_pool *pool_unprot;
33 struct gen_pool *pool_prot;
34 void *var_prot, *var_unprot, *var_vmall;
35
36 pr_notice("pmalloc self-test");
37 pool_unprot = pmalloc_create_pool("unprotected", 0);
38 pool_prot = pmalloc_create_pool("protected", 0);
39 BUG_ON(!(pool_unprot && pool_prot));
40
41 var_unprot = pmalloc(pool_unprot, SIZE_1 - 1, GFP_KERNEL);
42 var_prot = pmalloc(pool_prot, SIZE_1, GFP_KERNEL);
> 43 var_vmall = vmalloc(SIZE_2);
44 is_alloc_ok(var_unprot, 10);
45 is_alloc_ok(var_unprot, SIZE_1);
46 is_alloc_ok(var_unprot, PAGE_SIZE);
47 is_alloc_no(var_unprot, SIZE_1 + 1);
48 is_alloc_no(var_vmall, 10);
49
50
51 pfree(pool_unprot, var_unprot);
> 52 vfree(var_vmall);
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
Thanks for the review and apologies for the delay.
Replies inlined below.
On 30/01/18 19:08, Jonathan Corbet wrote:
> On Tue, 30 Jan 2018 17:14:45 +0200
> Igor Stoppa <[email protected]> wrote:
[...]
> Please don't put plain-text files into core-api - that's a directory full
ok
>> diff --git a/Documentation/core-api/pmalloc.txt b/Documentation/core-api/pmalloc.txt
>> new file mode 100644
>> index 0000000..934d356
>> --- /dev/null
>> +++ b/Documentation/core-api/pmalloc.txt
>> @@ -0,0 +1,104 @@
>
> We might as well put the SPDX tag here, it's a new file.
ok, this is all new stuff to me ... I suppose I should do it also for
all the other new files I create
But what is the license for the documentation? It's not code, so GPL
seems wrong. Creative commons?
I just noticed a patch for checkpatch.pl about SPDX and asked the same
question there.
https://lkml.org/lkml/2018/2/2/365
>> +============================
>> +Protectable memory allocator
>> +============================
>> +
>> +Introduction
>> +------------
[...]
> This is all good information, but I'd suggest it belongs more in the 0/n
> patch posting than here. The introduction of *this* document should say
> what it actually covers.
ok
>
>> +
>> +Design
>> +------
[...]
>> +To keep it to a minimum, locking is left to the user of the API, in
>> +those cases where it's not strictly needed.
>
> This seems like a relevant and important aspect of the API that shouldn't
> be buried in the middle of a section talking about random things.
I'll move it to the Use section.
[...]
>> +Use
>> +---
>> +
>> +The typical sequence, when using pmalloc, is:
>> +
>> +1. create a pool
>> +2. [optional] pre-allocate some memory in the pool
>> +3. issue one or more allocation requests to the pool
>> +4. initialize the memory obtained
>> + - iterate over points 3 & 4 as needed -
>> +5. write protect the pool
>> +6. use in read-only mode the handlers obtained through the allocations
>> +7. [optional] destroy the pool
>
> So one gets this far, but has no actual idea of how to do these things.
> Which leads me to wonder: what is this document for? Who are you expecting
> to read it?
I will add a reference to the selftest file.
In practice it can also work as example.
> You could improve things a lot by (once again) going to RST and using
> directives to bring in the kerneldoc comments from the source (which, I
> note, do exist). But I'd suggest rethinking this document and its
> audience. Most of the people reading it are likely wanting to learn how to
> *use* this API; I think it would be best to not leave them frustrated.
ok, the example route should be more explicative.
--
thanks again for the review, igor
On 01/02/18 23:11, Kees Cook wrote:
> IIUC, he means PageHead(), which is also hard to grep for, since it is
> a constructed name, via Page##uname in include/linux/page-flags.h:
>
> __PAGEFLAG(Head, head, PF_ANY) CLEARPAGEFLAG(Head, head, PF_ANY)
Thank you, I'll try to provide a meaningful reply soon, but I'll be AFK
during most of next 2 weeks, so it might be delayed :-(
--
igor
On Thu, 1 Feb 2018, Igor Stoppa wrote:
> > Would it not be better to use compound page allocations here?
> > page_head(whatever) gets you the head page where you can store all sorts
> > of information about the chunk of memory.
>
> Can you please point me to this function/macro? I don't seem to be able
> to find it, at least not in 4.15
Ok its compound_head(). See also the use in the SLAB and SLUB allocator.
> During hardened user copy permission check, I need to confirm if the
> memory range that would be exposed to userspace is a legitimate
> sub-range of a pmalloc allocation.
If you save the size in the head page struct then you could do that pretty
fast.
> I cannot comment on your proposal because I do not know where to find
> the reference you made, or maybe I do not understand what you mean :-(
compund pages are higher order pages that are handled as a single page by
the VM. See https://lwn.net/Articles/619514/
On 02/02/18 20:43, Christopher Lameter wrote:
> On Thu, 1 Feb 2018, Igor Stoppa wrote:
>
>>> Would it not be better to use compound page allocations here?
[...]
> Ok its compound_head(). See also the use in the SLAB and SLUB allocator.
>
>> During hardened user copy permission check, I need to confirm if the
>> memory range that would be exposed to userspace is a legitimate
>> sub-range of a pmalloc allocation.
>
> If you save the size in the head page struct then you could do that pretty
> fast.
Ok, now I get what you mean.
But it doesn't seem to fit the intended use case, for other reasons
(maybe the same, from 2 different POV):
- compound pages are aggregates of regular pages, in numbers that are
powers of 2, while the amount of pages to allocate is not known upfront.
One *could* give a hint to pmalloc about how many pages to allocate
every time there is a need to grow the pool.
Iow it would be the size of a chunk. But I'm afraid the granularity
would still be pretty low, so maybe it would be 2-4 times less.
- the property of the compound page will affect the property of all the
pages in the compound, so when one is write protected, it can generate a
lot of wasted memory, if there is too much slack (because of the order)
With vmalloc, I can allocate any number of pages, minimizing the waste.
Finally, there was a discussion about optimization:
http://www.openwall.com/lists/kernel-hardening/2017/08/07/2
The patch I sent does indeed take advantage of the new information, not
just for pmalloc use.
I have not measured if/where/what there is gain, but it does look like
the extra info can be exploited also elsewhere.
--
igor
On Sat, 3 Feb 2018, Igor Stoppa wrote:
> - the property of the compound page will affect the property of all the
> pages in the compound, so when one is write protected, it can generate a
> lot of wasted memory, if there is too much slack (because of the order)
> With vmalloc, I can allocate any number of pages, minimizing the waste.
I thought the intend here is to create a pool where the whole pool becomes
RO?
On Tue, Jan 30, 2018 at 05:14:43PM +0200, Igor Stoppa wrote:
> @@ -1744,6 +1748,7 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
> const void *caller)
> {
> struct vm_struct *area;
> + unsigned int page_counter;
> void *addr;
> unsigned long real_size = size;
>
> @@ -1769,6 +1774,9 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
>
> kmemleak_vmalloc(area, size, gfp_mask);
>
> + for (page_counter = 0; page_counter < area->nr_pages; page_counter++)
> + area->pages[page_counter]->area = area;
> +
> return addr;
>
LOCAL variable names should be short, and to the point. If you have
some random integer loop counter, it should probably be called ``i``.
Calling it ``loop_counter`` is non-productive, if there is no chance of it
being mis-understood. Similarly, ``tmp`` can be just about any type of
variable that is used to hold a temporary value.
(Documentation/process/coding-style.rst)
On 05/02/18 17:33, Christopher Lameter wrote:
> On Sat, 3 Feb 2018, Igor Stoppa wrote:
>
>> - the property of the compound page will affect the property of all the
>> pages in the compound, so when one is write protected, it can generate a
>> lot of wasted memory, if there is too much slack (because of the order)
>> With vmalloc, I can allocate any number of pages, minimizing the waste.
>
> I thought the intend here is to create a pool where the whole pool becomes
> RO?
Yes, but why would I force the number of pages in the pool to be a power
of 2, when it can be any number?
If a need, say, 17 pages, I would have to allocate 32.
But it can be worse than that.
Since the size of the overall allocated memory is not known upfront, I
wold have a problem to decide how many pages to allocate, every time
there is need to grow the pool.
Or push the problem to the user of the API, who might be equally unaware.
Notice that there is already a function (prealloc) available to the user
of the API, if the size is known upfront.
So I do not really see how using compound pages would make memory
utilization better or even not worse.
--
igor
On 06/02/18 14:37, Matthew Wilcox wrote:
[...]
> LOCAL variable names should be short, and to the point.
[...]
> (Documentation/process/coding-style.rst)
ok, will do, thanks for the pointer!
--
igor
On Fri, Feb 02, 2018 at 05:56:29PM +0200, Igor Stoppa wrote:
> But what is the license for the documentation? It's not code, so GPL
> seems wrong. Creative commons?
I've done this as the first line of my new documentation files:
.. SPDX-License-Identifier: CC-BY-SA-4.0
I think this is the CC license that's closest in spirit to the GPL without
the unintended consequences of the GPL when used on documentation. The
GFDL seems to be out of favour these days.
On Fri, 9 Feb 2018 19:37:14 -0800
Matthew Wilcox <[email protected]> wrote:
> I've done this as the first line of my new documentation files:
>
> .. SPDX-License-Identifier: CC-BY-SA-4.0
>
> I think this is the CC license that's closest in spirit to the GPL without
> the unintended consequences of the GPL when used on documentation. The
> GFDL seems to be out of favour these days.
I think that's a great license. I still fear that it is not suitable for
kernel documentation, though, especially when we produce documents that
include significant text from the (GPL-licensed) kernel source. The
result is almost certainly not distributable, and I don't think that's a
good thing. The GPL is not perfect for documentation, but I don't think
that we have a better alternative for in-kernel docs.
jon
On Mon, Feb 12, 2018 at 08:28:49AM -0700, Jonathan Corbet wrote:
> On Fri, 9 Feb 2018 19:37:14 -0800
> Matthew Wilcox <[email protected]> wrote:
>
> > I've done this as the first line of my new documentation files:
> >
> > .. SPDX-License-Identifier: CC-BY-SA-4.0
> >
> > I think this is the CC license that's closest in spirit to the GPL without
> > the unintended consequences of the GPL when used on documentation. The
> > GFDL seems to be out of favour these days.
>
> I think that's a great license. I still fear that it is not suitable for
> kernel documentation, though, especially when we produce documents that
> include significant text from the (GPL-licensed) kernel source. The
> result is almost certainly not distributable, and I don't think that's a
> good thing. The GPL is not perfect for documentation, but I don't think
> that we have a better alternative for in-kernel docs.
That's a reasonable concern. I've read other reasonable concerns about
the unintended effects of using the GPL to produce a printed book (eg
can you print it in a proprietary font, do you have to provide an
electronic version of the text, and so on). I fear these wise words
still ring true:
But the real problem is that we as a community lack a copyleft license
that works well for both code and text. About the only thing that even
comes close to working is putting the documentation under the GPL as
well, but the GPL is a poor fit for text. Nonetheless, it may be the
best we have in cases where GPL-licensed code is to be incorporated
into documentation.
I dare suggest another possibility: that we create a further exception to
the license that the kernel is distributed under. Something along these
lines:
Documentation [1] extracted from files marked as GPL [2] may be
distributed under the terms of the CC-BY-SA-4.0 license.
[1] This includes text explicitly marked for extraction using the kernel-doc
tool. It may include short example code sequences. It does not include
code that would normally be expected to be compiled.
[2] GPL-2.0, GPL-2.0+, GPL-1.0+, LGPL-2.0, LGPL-2.0+, LGPL-2.1, LGPL-2.1+
We'd want to run it by a lawyer, of course, to have them check for
unintended consequences.