2018-03-13 21:48:24

by Igor Stoppa

[permalink] [raw]
Subject: [RFC PATCH v19 0/8] mm: security: ro protection for dynamic data

This patch-set introduces the possibility of protecting memory that has
been allocated dynamically.

The memory is managed in pools: when a memory pool is turned into R/O,
all the memory that is part of it, will become R/O.

A R/O pool can be destroyed, to recover its memory, but it cannot be
turned back into R/W mode.

This is intentional. This feature is meant for data that doesn't need
further modifications after initialization.

However the data might need to be released, for example as part of module
unloading.
To do this, the memory must first be freed, then the pool can be destroyed.

An example is provided, in the form of self-testing.

Changes since v18:

[http://www.openwall.com/lists/kernel-hardening/2018/02/28/21]

* Code refactoring in pmalloc & genalloc:
- simplify the logic for handling pools before and after sysf init
- reduced section holding mutex on pmalloc list, when adding a pool
- reduced the steps in finding length of an existing allocation
- split various functions into smaller ones
* clarified in the comments the need for pfree()
* Various fixes to the documentation:
- remove kerneldoc duplicates
- added cross-reference lables
- miscellaneous typos
* improved error notifications: use WARNs with specific messages
* added missing tests for possible error conditions


Discussion topics that are unclear if they are closed and would need
comment from those who initiated them, if my answers are accepted or not:

* @Kees Cook proposed to have first self testing for genalloc, to
validate the following patch, adding tracing of allocations
My answer was that such tests would also need patching, therefore they
could not certify that the functionality is corect both before and
after the genalloc bitmap modification.

* @Kees Cook proposed to turn the self testing into modules.
My answer was that the functionality is intentionally tested very early
in the boot phase, to prevent unexplainable errors, should the feature
really fail.

* @Matthew Wilcox proposed to use a different mechanism for th genalloc
bitmap: 2 bitmaps, one for occupation and one for start.
And possibly use an rbtree for the starts.
My answer was that this solution is less optimized, because it scatters
the data of one allocation across multiple words/pages, plus is not
a transaction anymore. And the particular distribution of sizes of
allocation is likely to eat up much more memory than the bitmap.

Igor Stoppa (8):
genalloc: track beginning of allocations
Add label to genalloc.rst for cross reference
genalloc: selftest
struct page: add field for vm_struct
Protectable Memory
Pmalloc selftest
lkdtm: crash on overwriting protected pmalloc var
Documentation for Pmalloc

Documentation/core-api/genalloc.rst | 2 +
Documentation/core-api/index.rst | 1 +
Documentation/core-api/pmalloc.rst | 111 ++++++
drivers/misc/lkdtm.h | 1 +
drivers/misc/lkdtm_core.c | 3 +
drivers/misc/lkdtm_perms.c | 28 ++
include/linux/genalloc.h | 116 +++---
include/linux/mm_types.h | 1 +
include/linux/pmalloc.h | 163 ++++++++
include/linux/test_genalloc.h | 26 ++
include/linux/test_pmalloc.h | 24 ++
include/linux/vmalloc.h | 1 +
init/main.c | 4 +
lib/Kconfig | 15 +
lib/Makefile | 1 +
lib/genalloc.c | 765 ++++++++++++++++++++++++++----------
lib/test_genalloc.c | 410 +++++++++++++++++++
mm/Kconfig | 17 +
mm/Makefile | 2 +
mm/pmalloc.c | 643 ++++++++++++++++++++++++++++++
mm/test_pmalloc.c | 238 +++++++++++
mm/usercopy.c | 33 ++
mm/vmalloc.c | 2 +
23 files changed, 2352 insertions(+), 255 deletions(-)
create mode 100644 Documentation/core-api/pmalloc.rst
create mode 100644 include/linux/pmalloc.h
create mode 100644 include/linux/test_genalloc.h
create mode 100644 include/linux/test_pmalloc.h
create mode 100644 lib/test_genalloc.c
create mode 100644 mm/pmalloc.c
create mode 100644 mm/test_pmalloc.c

--
2.14.1



2018-03-13 21:48:57

by Igor Stoppa

[permalink] [raw]
Subject: [PATCH 1/8] genalloc: track beginning of allocations

The genalloc library is only capable of tracking if a certain unit of
allocation is in use or not.

It is not capable of discerning where the memory associated to an
allocation request begins and where it ends.

The reason is that units of allocations are tracked by using a bitmap,
where each bit represents that the unit is either allocated (1) or
available (0).

The user of the API must keep track of how much space was requested, if
it ever needs to be freed.

This can cause errors being undetected.
Examples:
* Only a subset of the memory provided to an allocation request is freed
* The memory from a subsequent allocation is freed
* The memory being freed doesn't start at the beginning of an
allocation.

The bitmap is used because it allows to perform lockless read/write
access, where this is supported by hw through cmpxchg.
Similarly, it is possible to scan the bitmap for a sufficiently long
sequence of zeros, to identify zones available for allocation.

This patch doubles the space reserved in the bitmap for each allocation,
to track their beginning.

For details, see the documentation inside lib/genalloc.c

The primary effect of this patch is that code using the gen_alloc
library does not need anymore to keep track of the size of the
allocations it makes.

Prior to this patch, it was necessary to keep track of the size of the
allocation, so that it would be possible, later on, to know how much
space should be freed.

Now, users of the api can choose to etiher still specify explicitly the
size, or let the library determine it, by giving a value of 0.

However, even when the value is specified, the library still uses its on
understanding of the space associated with a certain allocation, to
confirm that they are consistent.

This verification also confirms that the patch works correctly.

Eventually, the extra parameter (and the corresponding verification)
could be dropped, in favor of a simplified API.

Signed-off-by: Igor Stoppa <[email protected]>
---
include/linux/genalloc.h | 112 +++----
lib/genalloc.c | 742 ++++++++++++++++++++++++++++++++++-------------
2 files changed, 599 insertions(+), 255 deletions(-)

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index 872f930f1b06..ff7229520656 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -32,7 +32,7 @@

#include <linux/types.h>
#include <linux/spinlock_types.h>
-#include <linux/atomic.h>
+#include <linux/slab.h>

struct device;
struct device_node;
@@ -76,7 +76,7 @@ struct gen_pool_chunk {
phys_addr_t phys_addr; /* physical starting address of memory chunk */
unsigned long start_addr; /* start address of memory chunk */
unsigned long end_addr; /* end address of memory chunk (inclusive) */
- unsigned long bits[0]; /* bitmap for allocating memory chunk */
+ unsigned long entries[0]; /* bitmap for allocating memory chunk */
};

/*
@@ -93,74 +93,82 @@ struct genpool_data_fixed {
unsigned long offset; /* The offset of the specific region */
};

-extern struct gen_pool *gen_pool_create(int, int);
-extern phys_addr_t gen_pool_virt_to_phys(struct gen_pool *pool, unsigned long);
-extern int gen_pool_add_virt(struct gen_pool *, unsigned long, phys_addr_t,
- size_t, int);
-/**
- * gen_pool_add - add a new chunk of special memory to the pool
- * @pool: pool to add new memory chunk to
- * @addr: starting address of memory chunk to add to pool
- * @size: size in bytes of the memory chunk to add to pool
- * @nid: node id of the node the chunk structure and bitmap should be
- * allocated on, or -1
- *
- * Add a new chunk of special memory to the specified pool.
- *
- * Returns 0 on success or a -ve errno on failure.
- */
+struct gen_pool *gen_pool_create(int min_alloc_order, int nid);
+
+int gen_pool_add_virt(struct gen_pool *pool, unsigned long virt,
+ phys_addr_t phys, size_t size, int nid);
+
+
static inline int gen_pool_add(struct gen_pool *pool, unsigned long addr,
size_t size, int nid)
{
return gen_pool_add_virt(pool, addr, -1, size, nid);
}
-extern void gen_pool_destroy(struct gen_pool *);
-extern unsigned long gen_pool_alloc(struct gen_pool *, size_t);
-extern unsigned long gen_pool_alloc_algo(struct gen_pool *, size_t,
- genpool_algo_t algo, void *data);
-extern void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size,
- dma_addr_t *dma);
-extern void gen_pool_free(struct gen_pool *, unsigned long, size_t);
-extern void gen_pool_for_each_chunk(struct gen_pool *,
- void (*)(struct gen_pool *, struct gen_pool_chunk *, void *), void *);
-extern size_t gen_pool_avail(struct gen_pool *);
-extern size_t gen_pool_size(struct gen_pool *);

-extern void gen_pool_set_algo(struct gen_pool *pool, genpool_algo_t algo,
- void *data);
+phys_addr_t gen_pool_virt_to_phys(struct gen_pool *pool, unsigned long addr);

-extern unsigned long gen_pool_first_fit(unsigned long *map, unsigned long size,
- unsigned long start, unsigned int nr, void *data,
- struct gen_pool *pool);
+void gen_pool_destroy(struct gen_pool *pool);

-extern unsigned long gen_pool_fixed_alloc(unsigned long *map,
- unsigned long size, unsigned long start, unsigned int nr,
- void *data, struct gen_pool *pool);
+unsigned long gen_pool_alloc(struct gen_pool *pool, size_t size);

-extern unsigned long gen_pool_first_fit_align(unsigned long *map,
- unsigned long size, unsigned long start, unsigned int nr,
- void *data, struct gen_pool *pool);
+unsigned long gen_pool_alloc_algo(struct gen_pool *pool, size_t size,
+ genpool_algo_t algo, void *data);

+void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size, dma_addr_t *dma);

-extern unsigned long gen_pool_first_fit_order_align(unsigned long *map,
- unsigned long size, unsigned long start, unsigned int nr,
- void *data, struct gen_pool *pool);

-extern unsigned long gen_pool_best_fit(unsigned long *map, unsigned long size,
- unsigned long start, unsigned int nr, void *data,
- struct gen_pool *pool);
+void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size);


-extern struct gen_pool *devm_gen_pool_create(struct device *dev,
- int min_alloc_order, int nid, const char *name);
-extern struct gen_pool *gen_pool_get(struct device *dev, const char *name);
+void gen_pool_for_each_chunk(struct gen_pool *pool,
+ void (*func)(struct gen_pool *pool,
+ struct gen_pool_chunk *chunk,
+ void *data),
+ void *data);

bool addr_in_gen_pool(struct gen_pool *pool, unsigned long start,
- size_t size);
+ size_t size);
+
+size_t gen_pool_avail(struct gen_pool *pool);
+
+size_t gen_pool_size(struct gen_pool *pool);
+
+void gen_pool_set_algo(struct gen_pool *pool, genpool_algo_t algo, void *data);
+
+unsigned long gen_pool_first_fit(unsigned long *map, unsigned long size,
+ unsigned long start, unsigned int nr,
+ void *data, struct gen_pool *pool);
+
+
+unsigned long gen_pool_first_fit_align(unsigned long *map,
+ unsigned long size,
+ unsigned long start,
+ unsigned int nr, void *data,
+ struct gen_pool *pool);
+
+unsigned long gen_pool_fixed_alloc(unsigned long *map, unsigned long size,
+ unsigned long start, unsigned int nr,
+ void *data, struct gen_pool *pool);
+
+
+unsigned long gen_pool_first_fit_order_align(unsigned long *map,
+ unsigned long size,
+ unsigned long start,
+ unsigned int nr, void *data,
+ struct gen_pool *pool);
+
+unsigned long gen_pool_best_fit(unsigned long *map, unsigned long size,
+ unsigned long start, unsigned int nr,
+ void *data, struct gen_pool *pool);
+
+struct gen_pool *gen_pool_get(struct device *dev, const char *name);
+
+struct gen_pool *devm_gen_pool_create(struct device *dev, int min_alloc_order,
+ int nid, const char *name);

#ifdef CONFIG_OF
-extern struct gen_pool *of_gen_pool_get(struct device_node *np,
- const char *propname, int index);
+struct gen_pool *of_gen_pool_get(struct device_node *np,
+ const char *propname, int index);
#else
static inline struct gen_pool *of_gen_pool_get(struct device_node *np,
const char *propname, int index)
diff --git a/lib/genalloc.c b/lib/genalloc.c
index ca06adc4f445..b5f5e1f9b6cf 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -1,3 +1,4 @@
+// SPDX-License-Identifier: GPL-2.0
/*
* Basic general purpose allocator for managing special purpose
* memory, for example, memory that is not managed by the regular
@@ -24,8 +25,72 @@
*
* Copyright 2005 (C) Jes Sorensen <[email protected]>
*
- * This source code is licensed under the GNU General Public License,
- * Version 2. See the file COPYING for more details.
+ *
+ * Encoding of the bitmap tracking the allocations
+ * -----------------------------------------------
+ *
+ * The bitmap is composed of units of allocations.
+ *
+ * Each unit of allocation is represented using 2 consecutive bits.
+ *
+ * This makes it possible to encode, for each unit of allocation,
+ * information about:
+ * - allocation status (busy/free)
+ * - beginning of a sequennce of allocation units (first / successive)
+ *
+ *
+ * Dictionary of allocation units (msb to the left, lsb to the right):
+ *
+ * 11: first allocation unit in the allocation
+ * 10: any subsequent allocation unit (if any) in the allocation
+ * 00: available allocation unit
+ * 01: invalid
+ *
+ * Example, using the same notation as above - MSb.......LSb:
+ *
+ * ...000010111100000010101011 <-- Read in this direction.
+ * \__|\__|\|\____|\______|
+ * | | | | \___ 4 used allocation units
+ * | | | \___________ 3 empty allocation units
+ * | | \_________________ 1 used allocation unit
+ * | \___________________ 2 used allocation units
+ * \_______________________ 2 empty allocation units
+ *
+ * The encoding allows for lockless operations, such as:
+ * - search for a sufficiently large range of allocation units
+ * - reservation of a selected range of allocation units
+ * - release of a specific allocation
+ *
+ * The alignment at which to perform the search for sequence of empty
+ * allocation units (marked as zeros in the bitmap) is 2^1.
+ *
+ * This means that an allocation can start only at even places
+ * (bit 0, bit 2, etc.) in the bitmap.
+ *
+ * Therefore, the number of zeroes to look for must be twice the number
+ * of desired allocation units.
+ *
+ * When it's time to free the memory associated to an allocation request,
+ * it's a matter of checking if the corresponding allocation unit is
+ * really the beginning of an allocation (both bits are set to 1).
+ *
+ * Looking for the ending can also be performed locklessly.
+ * It's sufficient to identify the first mapped allocation unit
+ * that is represented either as free (00) or busy (11).
+ * Even if the allocation status should change in the meanwhile, it
+ * doesn't matter, since it can only transition between free (00) and
+ * first-allocated (11).
+ *
+ * The parameter indicating to the *_free() function the size of the
+ * space that should be freed can be either set to 0, for automated
+ * assessment, or it can be specified explicitly.
+ *
+ * In case it is specified explicitly, the value is verified agaisnt what
+ * the library is tracking internally.
+ *
+ * If ever needed, the bitmap could be extended, assigning larger amounts
+ * of bits to each allocation unit (the increase must follow powers of 2),
+ * to track other properties of the allocations.
*/

#include <linux/slab.h>
@@ -35,119 +100,261 @@
#include <linux/interrupt.h>
#include <linux/genalloc.h>
#include <linux/of_device.h>
+#include <linux/bug.h>
+
+#define ENTRY_ORDER 1UL
+#define ENTRY_MASK ((1UL << ((ENTRY_ORDER) + 1UL)) - 1UL)
+#define ENTRY_HEAD ENTRY_MASK
+#define ENTRY_UNUSED 0UL
+#define BITS_PER_ENTRY (1U << ENTRY_ORDER)
+#define BITS_DIV_ENTRIES(x) ((x) >> ENTRY_ORDER)
+#define ENTRIES_TO_BITS(x) ((x) << ENTRY_ORDER)
+#define BITS_DIV_LONGS(x) ((x) / BITS_PER_LONG)
+#define ENTRIES_DIV_LONGS(x) (BITS_DIV_LONGS(ENTRIES_TO_BITS(x)))
+
+#define ENTRIES_PER_LONG BITS_DIV_ENTRIES(BITS_PER_LONG)
+
+/* Binary pattern of 1010...1010 that spans one unsigned long. */
+#define MASK (~0UL / 3 * 2)

+/**
+ * get_bitmap_entry() - extracts the specified entry from the bitmap
+ * @map: pointer to a bitmap
+ * @entry_index: the index of the desired entry in the bitmap
+ *
+ * Return: The requested bitmap entry.
+ */
+static inline unsigned long get_bitmap_entry(unsigned long *map,
+ int entry_index)
+{
+ return (map[ENTRIES_DIV_LONGS(entry_index)] >>
+ ENTRIES_TO_BITS(entry_index % ENTRIES_PER_LONG)) &
+ ENTRY_MASK;
+}
+
+
+/**
+ * mem_to_units() - convert references to memory into orders of allocation
+ * @size: amount in bytes
+ * @order: power of 2 represented by each entry in the bitmap
+ *
+ * Return: the number of units representing the size.
+ */
+static inline unsigned long mem_to_units(unsigned long size,
+ unsigned long order)
+{
+ return (size + (1UL << order) - 1) >> order;
+}
+
+/**
+ * chunk_size() - dimension of a chunk of memory, in bytes
+ * @chunk: pointer to the struct describing the chunk
+ *
+ * Return: The size of the chunk, in bytes.
+ */
static inline size_t chunk_size(const struct gen_pool_chunk *chunk)
{
return chunk->end_addr - chunk->start_addr + 1;
}

-static int set_bits_ll(unsigned long *addr, unsigned long mask_to_set)
+
+/**
+ * set_bits_ll() - based on value and mask, sets bits at address
+ * @addr: where to write
+ * @mask: filter to apply for the bits to alter
+ * @value: actual configuration of bits to store
+ *
+ * Return:
+ * * 0 - success
+ * * -EBUSY - otherwise
+ */
+static int set_bits_ll(unsigned long *addr,
+ unsigned long mask, unsigned long value)
{
- unsigned long val, nval;
+ unsigned long nval;
+ unsigned long present;
+ unsigned long target;

nval = *addr;
do {
- val = nval;
- if (val & mask_to_set)
+ present = nval;
+ if (present & mask)
return -EBUSY;
+ target = present | value;
cpu_relax();
- } while ((nval = cmpxchg(addr, val, val | mask_to_set)) != val);
-
+ } while ((nval = cmpxchg(addr, present, target)) != target);
return 0;
}

-static int clear_bits_ll(unsigned long *addr, unsigned long mask_to_clear)
+
+/**
+ * clear_bits_ll() - based on value and mask, clears bits at address
+ * @addr: where to write
+ * @mask: filter to apply for the bits to alter
+ * @value: actual configuration of bits to clear
+ *
+ * Return:
+ * * 0 - success
+ * * -EBUSY - otherwise
+ */
+static int clear_bits_ll(unsigned long *addr,
+ unsigned long mask, unsigned long value)
{
- unsigned long val, nval;
+ unsigned long nval;
+ unsigned long present;
+ unsigned long target;

nval = *addr;
+ present = nval;
+ if (unlikely((present & mask) ^ value))
+ return -EBUSY;
do {
- val = nval;
- if ((val & mask_to_clear) != mask_to_clear)
+ present = nval;
+ if (unlikely((present & mask) ^ value))
return -EBUSY;
+ target = present & ~mask;
cpu_relax();
- } while ((nval = cmpxchg(addr, val, val & ~mask_to_clear)) != val);
-
+ } while ((nval = cmpxchg(addr, present, target)) != target);
return 0;
}

-/*
- * bitmap_set_ll - set the specified number of bits at the specified position
+
+/**
+ * get_length() - length of the allocation beginning at start_entry index
* @map: pointer to a bitmap
- * @start: a bit position in @map
- * @nr: number of bits to set
+ * @start_entry: the index of the first entry in the bitmap
+ * @chunk_entries: number of entries in the chunk
*
- * Set @nr bits start from @start in @map lock-lessly. Several users
- * can set/clear the same bitmap simultaneously without lock. If two
- * users set the same bit, one user will return remain bits, otherwise
- * return 0.
+ * Return:
+ * * length of an allocation - success
+ * * 0 - invalid parameters or bitmap
*/
-static int bitmap_set_ll(unsigned long *map, int start, int nr)
-{
- unsigned long *p = map + BIT_WORD(start);
- const int size = start + nr;
- int bits_to_set = BITS_PER_LONG - (start % BITS_PER_LONG);
- unsigned long mask_to_set = BITMAP_FIRST_WORD_MASK(start);
-
- while (nr - bits_to_set >= 0) {
- if (set_bits_ll(p, mask_to_set))
- return nr;
- nr -= bits_to_set;
- bits_to_set = BITS_PER_LONG;
- mask_to_set = ~0UL;
- p++;
- }
- if (nr) {
- mask_to_set &= BITMAP_LAST_WORD_MASK(size);
- if (set_bits_ll(p, mask_to_set))
- return nr;
- }
+static unsigned int get_length(unsigned long *map,
+ unsigned int start_entry,
+ unsigned int chunk_entries)
+{
+ int i;
+ unsigned long bitmap_entry;

- return 0;
+
+ if (unlikely(get_bitmap_entry(map, start_entry) != ENTRY_HEAD))
+ return 0;
+ for (i = start_entry + 1; i < chunk_entries; i++) {
+ bitmap_entry = get_bitmap_entry(map, i);
+ if (bitmap_entry == ENTRY_HEAD ||
+ bitmap_entry == ENTRY_UNUSED)
+ break;
+ }
+ return i - start_entry;
}

+
/*
- * bitmap_clear_ll - clear the specified number of bits at the specified position
+ * alter_bitmap_ll() - set/clear the entries associated with an allocation
+ * @alteration: indicates if the bits selected should be set or cleared
* @map: pointer to a bitmap
- * @start: a bit position in @map
- * @nr: number of bits to set
+ * @start: the index of the first entry in the bitmap
+ * @nentries: number of entries to alter
*
- * Clear @nr bits start from @start in @map lock-lessly. Several users
- * can set/clear the same bitmap simultaneously without lock. If two
- * users clear the same bit, one user will return remain bits,
- * otherwise return 0.
+ * The modification happens lock-lessly.
+ * Several users can write to the same map simultaneously, without lock.
+ * In case of mid-air conflict, when 2 or more writers try to alter the
+ * same word in the bitmap, only one will succeed and continue, the others
+ * will fail and receive as return value the amount of entries that were
+ * not written. Each failed writer is responsible to revert the changes
+ * it did to the bitmap.
+ * The lockless conflict resolution is implemented through cmpxchg.
+ * Success or failure is purely based on first come first served basis.
+ * The first writer that manages to gain write access to the target word
+ * of the bitmap wins. Whatever can affect the order and priority of execution
+ * of the writers can and will affect the result of the race.
+ *
+ * Return:
+ * * 0 - success
+ * * remaining entries - failure
*/
-static int bitmap_clear_ll(unsigned long *map, int start, int nr)
-{
- unsigned long *p = map + BIT_WORD(start);
- const int size = start + nr;
- int bits_to_clear = BITS_PER_LONG - (start % BITS_PER_LONG);
- unsigned long mask_to_clear = BITMAP_FIRST_WORD_MASK(start);
-
- while (nr - bits_to_clear >= 0) {
- if (clear_bits_ll(p, mask_to_clear))
- return nr;
- nr -= bits_to_clear;
- bits_to_clear = BITS_PER_LONG;
- mask_to_clear = ~0UL;
- p++;
- }
- if (nr) {
- mask_to_clear &= BITMAP_LAST_WORD_MASK(size);
- if (clear_bits_ll(p, mask_to_clear))
- return nr;
+static unsigned int alter_bitmap_ll(int (*action)(unsigned long *addr,
+ unsigned long mask,
+ unsigned long value),
+ unsigned long *map,
+ unsigned int start_entry,
+ unsigned int nentries)
+{
+ unsigned long start_bit;
+ unsigned long end_bit;
+ unsigned long mask;
+ unsigned long value;
+ unsigned int nbits;
+ unsigned int bits_to_write;
+ unsigned int index;
+
+ /*
+ * Prepare for writing the initial part of the allocation, from
+ * starting entry, to the end of the UL bitmap element which
+ * contains it. It might be larger than the actual allocation.
+ */
+ start_bit = ENTRIES_TO_BITS(start_entry);
+ end_bit = ENTRIES_TO_BITS(start_entry + nentries);
+ nbits = ENTRIES_TO_BITS(nentries);
+ bits_to_write = BITS_PER_LONG - start_bit % BITS_PER_LONG;
+ mask = BITMAP_FIRST_WORD_MASK(start_bit);
+ /* Mark the beginning of the allocation. */
+ value = MASK | (1UL << (start_bit % BITS_PER_LONG));
+ index = BITS_DIV_LONGS(start_bit);
+
+ /*
+ * Writes entries to the bitmap, as long as the reminder is
+ * positive or zero.
+ * Might be skipped if the entries to write do not reach the end
+ * of a bitmap UL unit.
+ */
+ while (nbits >= bits_to_write) {
+ if (action(map + index, mask, value & mask))
+ return BITS_DIV_ENTRIES(nbits);
+ nbits -= bits_to_write;
+ bits_to_write = BITS_PER_LONG;
+ mask = ~0UL;
+ value = MASK;
+ index++;
}

+ /* Takes care of the ending part of the entries to mark. */
+ if (nbits > 0) {
+ mask ^= BITMAP_FIRST_WORD_MASK((end_bit) % BITS_PER_LONG);
+ bits_to_write = nbits;
+ if (action(map + index, mask, value & mask))
+ return BITS_DIV_ENTRIES(nbits);
+ }
return 0;
}

+static inline unsigned int set_bitmap_ll(unsigned long *map,
+ unsigned int start_entry,
+ unsigned int nentries)
+{
+ return alter_bitmap_ll(set_bits_ll, map, start_entry, nentries);
+}
+
+static inline unsigned int clear_bitmap_ll(unsigned long *map,
+ unsigned int start_entry,
+ unsigned int nentries)
+{
+ return alter_bitmap_ll(clear_bits_ll, map, start_entry, nentries);
+}
+
/**
- * gen_pool_create - create a new special memory pool
- * @min_alloc_order: log base 2 of number of bytes each bitmap bit represents
- * @nid: node id of the node the pool structure should be allocated on, or -1
+ * gen_pool_create() - create a new special memory pool
+ * @min_alloc_order: log base 2 of number of bytes each bitmap entry
+ * represents
+ * @nid: node id of the node the pool structure should be allocated on,
+ * or -1
*
- * Create a new special memory pool that can be used to manage special purpose
- * memory not managed by the regular kmalloc/kfree interface.
+ * Create a new special memory pool that can be used to manage special
+ * purpose memory not managed by the regular kmalloc/kfree interface.
+ *
+ * Return:
+ * * pointer to the pool - success
+ * * NULL - otherwise
*/
struct gen_pool *gen_pool_create(int min_alloc_order, int nid)
{
@@ -167,7 +374,7 @@ struct gen_pool *gen_pool_create(int min_alloc_order, int nid)
EXPORT_SYMBOL(gen_pool_create);

/**
- * gen_pool_add_virt - add a new chunk of special memory to the pool
+ * gen_pool_add_virt() - add a new chunk of special memory to the pool
* @pool: pool to add new memory chunk to
* @virt: virtual starting address of memory chunk to add to pool
* @phys: physical starting address of memory chunk to add to pool
@@ -177,16 +384,20 @@ EXPORT_SYMBOL(gen_pool_create);
*
* Add a new chunk of special memory to the specified pool.
*
- * Returns 0 on success or a -ve errno on failure.
+ * Return:
+ * * 0 - success
+ * * -ve errno - failure
*/
-int gen_pool_add_virt(struct gen_pool *pool, unsigned long virt, phys_addr_t phys,
- size_t size, int nid)
+int gen_pool_add_virt(struct gen_pool *pool, unsigned long virt,
+ phys_addr_t phys, size_t size, int nid)
{
struct gen_pool_chunk *chunk;
- int nbits = size >> pool->min_alloc_order;
- int nbytes = sizeof(struct gen_pool_chunk) +
- BITS_TO_LONGS(nbits) * sizeof(long);
+ unsigned int nentries;
+ unsigned int nbytes;

+ nentries = size >> pool->min_alloc_order;
+ nbytes = sizeof(struct gen_pool_chunk) +
+ ENTRIES_DIV_LONGS(nentries) * sizeof(long);
chunk = kzalloc_node(nbytes, GFP_KERNEL, nid);
if (unlikely(chunk == NULL))
return -ENOMEM;
@@ -205,11 +416,13 @@ int gen_pool_add_virt(struct gen_pool *pool, unsigned long virt, phys_addr_t phy
EXPORT_SYMBOL(gen_pool_add_virt);

/**
- * gen_pool_virt_to_phys - return the physical address of memory
+ * gen_pool_virt_to_phys() - return the physical address of memory
* @pool: pool to allocate from
* @addr: starting address of memory
*
- * Returns the physical address on success, or -1 on error.
+ * Return:
+ * * the physical address - success
+ * * \-1 - error
*/
phys_addr_t gen_pool_virt_to_phys(struct gen_pool *pool, unsigned long addr)
{
@@ -230,7 +443,7 @@ phys_addr_t gen_pool_virt_to_phys(struct gen_pool *pool, unsigned long addr)
EXPORT_SYMBOL(gen_pool_virt_to_phys);

/**
- * gen_pool_destroy - destroy a special memory pool
+ * gen_pool_destroy() - destroy a special memory pool
* @pool: pool to destroy
*
* Destroy the specified special memory pool. Verifies that there are no
@@ -240,26 +453,33 @@ void gen_pool_destroy(struct gen_pool *pool)
{
struct list_head *_chunk, *_next_chunk;
struct gen_pool_chunk *chunk;
- int order = pool->min_alloc_order;
- int bit, end_bit;
+ unsigned int order = pool->min_alloc_order;
+ unsigned long bit, end_bit;
+ bool empty = true;

list_for_each_safe(_chunk, _next_chunk, &pool->chunks) {
chunk = list_entry(_chunk, struct gen_pool_chunk, next_chunk);
list_del(&chunk->next_chunk);

end_bit = chunk_size(chunk) >> order;
- bit = find_next_bit(chunk->bits, end_bit, 0);
- BUG_ON(bit < end_bit);
-
+ bit = find_next_bit(chunk->entries, end_bit, 0);
+ if (WARN(bit < end_bit,
+ "Attempt to destroy non-empty pool %s",
+ pool->name)) {
+ empty = false;
+ continue;
+ }
kfree(chunk);
}
- kfree_const(pool->name);
- kfree(pool);
+ if (likely(empty)) {
+ kfree_const(pool->name);
+ kfree(pool);
+ }
}
EXPORT_SYMBOL(gen_pool_destroy);

/**
- * gen_pool_alloc - allocate special memory from the pool
+ * gen_pool_alloc() - get memory from the pool
* @pool: pool to allocate from
* @size: number of bytes to allocate from the pool
*
@@ -267,6 +487,10 @@ EXPORT_SYMBOL(gen_pool_destroy);
* Uses the pool allocation function (with first-fit algorithm by default).
* Can not be used in NMI handler on architectures without
* NMI-safe cmpxchg implementation.
+ *
+ * Return:
+ * * address of the memory allocated - success
+ * * NULL - error
*/
unsigned long gen_pool_alloc(struct gen_pool *pool, size_t size)
{
@@ -275,24 +499,31 @@ unsigned long gen_pool_alloc(struct gen_pool *pool, size_t size)
EXPORT_SYMBOL(gen_pool_alloc);

/**
- * gen_pool_alloc_algo - allocate special memory from the pool
+ * gen_pool_alloc_algo() - get memory from pool with specified algorythm
* @pool: pool to allocate from
* @size: number of bytes to allocate from the pool
* @algo: algorithm passed from caller
* @data: data passed to algorithm
*
* Allocate the requested number of bytes from the specified pool.
- * Uses the pool allocation function (with first-fit algorithm by default).
+ * Uses the provided @algo function to find room for the allocation.
* Can not be used in NMI handler on architectures without
* NMI-safe cmpxchg implementation.
+ *
+ * Return:
+ * * address of the memory allocated - success
+ * * NULL - error
*/
unsigned long gen_pool_alloc_algo(struct gen_pool *pool, size_t size,
genpool_algo_t algo, void *data)
{
struct gen_pool_chunk *chunk;
unsigned long addr = 0;
- int order = pool->min_alloc_order;
- int nbits, start_bit, end_bit, remain;
+ unsigned int start_entry;
+ unsigned int end_entry;
+ unsigned int nentries;
+ unsigned int remain;
+ unsigned int order = pool->min_alloc_order;

#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
BUG_ON(in_nmi());
@@ -301,29 +532,30 @@ unsigned long gen_pool_alloc_algo(struct gen_pool *pool, size_t size,
if (size == 0)
return 0;

- nbits = (size + (1UL << order) - 1) >> order;
+ nentries = mem_to_units(size, order);
rcu_read_lock();
list_for_each_entry_rcu(chunk, &pool->chunks, next_chunk) {
if (size > atomic_long_read(&chunk->avail))
continue;

- start_bit = 0;
- end_bit = chunk_size(chunk) >> order;
+ start_entry = 0;
+ end_entry = chunk_size(chunk) >> order;
retry:
- start_bit = algo(chunk->bits, end_bit, start_bit,
- nbits, data, pool);
- if (start_bit >= end_bit)
+ start_entry = algo(chunk->entries, end_entry, start_entry,
+ nentries, data, pool);
+ if (start_entry >= end_entry)
continue;
- remain = bitmap_set_ll(chunk->bits, start_bit, nbits);
+ remain = set_bitmap_ll(chunk->entries, start_entry, nentries);
if (remain) {
- remain = bitmap_clear_ll(chunk->bits, start_bit,
- nbits - remain);
- BUG_ON(remain);
+ remain = clear_bitmap_ll(chunk->entries,
+ start_entry,
+ nentries - remain);
goto retry;
}

- addr = chunk->start_addr + ((unsigned long)start_bit << order);
- size = nbits << order;
+ addr = chunk->start_addr +
+ ((unsigned long)start_entry << order);
+ size = nentries << order;
atomic_long_sub(size, &chunk->avail);
break;
}
@@ -333,7 +565,7 @@ unsigned long gen_pool_alloc_algo(struct gen_pool *pool, size_t size,
EXPORT_SYMBOL(gen_pool_alloc_algo);

/**
- * gen_pool_dma_alloc - allocate special memory from the pool for DMA usage
+ * gen_pool_dma_alloc() - allocate special memory from the pool for DMA usage
* @pool: pool to allocate from
* @size: number of bytes to allocate from the pool
* @dma: dma-view physical address return value. Use NULL if unneeded.
@@ -342,14 +574,15 @@ EXPORT_SYMBOL(gen_pool_alloc_algo);
* Uses the pool allocation function (with first-fit algorithm by default).
* Can not be used in NMI handler on architectures without
* NMI-safe cmpxchg implementation.
+ *
+ * Return:
+ * * address of the memory allocated - success
+ * * NULL - error
*/
void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size, dma_addr_t *dma)
{
unsigned long vaddr;

- if (!pool)
- return NULL;
-
vaddr = gen_pool_alloc(pool, size);
if (!vaddr)
return NULL;
@@ -361,11 +594,46 @@ void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size, dma_addr_t *dma)
}
EXPORT_SYMBOL(gen_pool_dma_alloc);

+static void chunk_free_allocation(struct gen_pool *pool,
+ struct gen_pool_chunk *chunk,
+ unsigned long addr, size_t size)
+{
+ unsigned int length;
+ unsigned int start_entry;
+ unsigned int chunk_entries;
+ unsigned int unprocessed;
+ unsigned int order = pool->min_alloc_order;
+
+ if (WARN(addr + size - 1 > chunk->end_addr,
+ "Trying to free unallocated memory from pool %s",
+ pool->name))
+ return;
+
+ chunk_entries = (chunk->end_addr - chunk->start_addr + 1) >> order;
+ start_entry = (addr - chunk->start_addr) >> order;
+ length = get_length(chunk->entries, start_entry, chunk_entries);
+ if (WARN(length == 0,
+ "Corrupted pool %s", pool->name))
+ return;
+
+ if (WARN(size && (length != mem_to_units(size, order)),
+ "Size provided and size measured in pool %s differ",
+ pool->name))
+ return;
+
+ unprocessed = clear_bitmap_ll(chunk->entries, start_entry, length);
+ if (WARN(unprocessed, "bitmap collision freeing memory in pool %s",
+ pool->name))
+ return;
+
+ atomic_long_add(length << order, &chunk->avail);
+}
+
/**
- * gen_pool_free - free allocated special memory back to the pool
+ * gen_pool_free() - free allocated special memory back to the pool
* @pool: pool to free to
* @addr: starting address of memory to free back to pool
- * @size: size in bytes of memory to free
+ * @size: size in bytes of memory to free or 0, for auto-detection
*
* Free previously allocated special memory back to the specified
* pool. Can not be used in NMI handler on architectures without
@@ -374,34 +642,27 @@ EXPORT_SYMBOL(gen_pool_dma_alloc);
void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
{
struct gen_pool_chunk *chunk;
- int order = pool->min_alloc_order;
- int start_bit, nbits, remain;

#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
BUG_ON(in_nmi());
#endif

- nbits = (size + (1UL << order) - 1) >> order;
rcu_read_lock();
list_for_each_entry_rcu(chunk, &pool->chunks, next_chunk) {
if (addr >= chunk->start_addr && addr <= chunk->end_addr) {
- BUG_ON(addr + size - 1 > chunk->end_addr);
- start_bit = (addr - chunk->start_addr) >> order;
- remain = bitmap_clear_ll(chunk->bits, start_bit, nbits);
- BUG_ON(remain);
- size = nbits << order;
- atomic_long_add(size, &chunk->avail);
+ chunk_free_allocation(pool, chunk, addr, size);
rcu_read_unlock();
return;
}
}
rcu_read_unlock();
- BUG();
+ WARN(true, "address not found in pool %s", pool->name);
}
EXPORT_SYMBOL(gen_pool_free);

+
/**
- * gen_pool_for_each_chunk - call func for every chunk of generic memory pool
+ * gen_pool_for_each_chunk() - call func for every chunk of generic memory pool
* @pool: the generic memory pool
* @func: func to call
* @data: additional data used by @func
@@ -410,8 +671,8 @@ EXPORT_SYMBOL(gen_pool_free);
* called with rcu_read_lock held.
*/
void gen_pool_for_each_chunk(struct gen_pool *pool,
- void (*func)(struct gen_pool *pool, struct gen_pool_chunk *chunk, void *data),
- void *data)
+ void (*func)(struct gen_pool *pool, struct gen_pool_chunk *chunk,
+ void *data), void *data)
{
struct gen_pool_chunk *chunk;

@@ -423,16 +684,19 @@ void gen_pool_for_each_chunk(struct gen_pool *pool,
EXPORT_SYMBOL(gen_pool_for_each_chunk);

/**
- * addr_in_gen_pool - checks if an address falls within the range of a pool
+ * addr_in_gen_pool() - checks if an address falls within the range of a pool
* @pool: the generic memory pool
* @start: start address
* @size: size of the region
*
- * Check if the range of addresses falls within the specified pool. Returns
- * true if the entire range is contained in the pool and false otherwise.
+ * Check if the range of addresses falls within the specified pool.
+ *
+ * Return:
+ * * true - the entire range is contained in the pool
+ * * false - otherwise
*/
bool addr_in_gen_pool(struct gen_pool *pool, unsigned long start,
- size_t size)
+ size_t size)
{
bool found = false;
unsigned long end = start + size - 1;
@@ -452,10 +716,10 @@ bool addr_in_gen_pool(struct gen_pool *pool, unsigned long start,
}

/**
- * gen_pool_avail - get available free space of the pool
+ * gen_pool_avail() - get available free space of the pool
* @pool: pool to get available free space
*
- * Return available free space of the specified pool.
+ * Return: available free space of the specified pool.
*/
size_t gen_pool_avail(struct gen_pool *pool)
{
@@ -471,10 +735,10 @@ size_t gen_pool_avail(struct gen_pool *pool)
EXPORT_SYMBOL_GPL(gen_pool_avail);

/**
- * gen_pool_size - get size in bytes of memory managed by the pool
+ * gen_pool_size() - get size in bytes of memory managed by the pool
* @pool: pool to get size
*
- * Return size in bytes of memory managed by the pool.
+ * Return: size in bytes of memory managed by the pool.
*/
size_t gen_pool_size(struct gen_pool *pool)
{
@@ -490,7 +754,7 @@ size_t gen_pool_size(struct gen_pool *pool)
EXPORT_SYMBOL_GPL(gen_pool_size);

/**
- * gen_pool_set_algo - set the allocation algorithm
+ * gen_pool_set_algo() - set the allocation algorithm
* @pool: pool to change allocation algorithm
* @algo: custom algorithm function
* @data: additional data used by @algo
@@ -514,137 +778,200 @@ void gen_pool_set_algo(struct gen_pool *pool, genpool_algo_t algo, void *data)
EXPORT_SYMBOL(gen_pool_set_algo);

/**
- * gen_pool_first_fit - find the first available region
+ * gen_pool_first_fit() - find the first available region
* of memory matching the size requirement (no alignment constraint)
* @map: The address to base the search on
- * @size: The bitmap size in bits
- * @start: The bitnumber to start searching at
- * @nr: The number of zeroed bits we're looking for
+ * @size: The number of allocation units in the bitmap
+ * @start: The allocation unit to start searching at
+ * @nr: The number of allocation units we're looking for
* @data: additional data - unused
* @pool: pool to find the fit region memory from
+ *
+ * Return:
+ * * index of the memory allocated - sufficient space available
+ * * end of the range - insufficient space
*/
unsigned long gen_pool_first_fit(unsigned long *map, unsigned long size,
- unsigned long start, unsigned int nr, void *data,
- struct gen_pool *pool)
+ unsigned long start, unsigned int nr,
+ void *data, struct gen_pool *pool)
{
- return bitmap_find_next_zero_area(map, size, start, nr, 0);
+ unsigned long align_mask;
+ unsigned long bit_index;
+
+ align_mask = roundup_pow_of_two(BITS_PER_ENTRY) - 1;
+ bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
+ ENTRIES_TO_BITS(start),
+ ENTRIES_TO_BITS(nr),
+ align_mask);
+ return BITS_DIV_ENTRIES(bit_index);
}
EXPORT_SYMBOL(gen_pool_first_fit);

/**
- * gen_pool_first_fit_align - find the first available region
+ * gen_pool_first_fit_align() - find the first available region
* of memory matching the size requirement (alignment constraint)
* @map: The address to base the search on
- * @size: The bitmap size in bits
- * @start: The bitnumber to start searching at
- * @nr: The number of zeroed bits we're looking for
+ * @size: The number of allocation units in the bitmap
+ * @start: The allocation unit to start searching at
+ * @nr: The number of allocation units we're looking for
* @data: data for alignment
* @pool: pool to get order from
+ *
+ * Return:
+ * * index of the memory allocated - sufficient space available
+ * * end of the range - insufficient space
*/
-unsigned long gen_pool_first_fit_align(unsigned long *map, unsigned long size,
- unsigned long start, unsigned int nr, void *data,
- struct gen_pool *pool)
+unsigned long gen_pool_first_fit_align(unsigned long *map,
+ unsigned long size,
+ unsigned long start,
+ unsigned int nr, void *data,
+ struct gen_pool *pool)
{
struct genpool_data_align *alignment;
unsigned long align_mask;
- int order;
+ unsigned long bit_index;
+ unsigned int order;

alignment = data;
order = pool->min_alloc_order;
- align_mask = ((alignment->align + (1UL << order) - 1) >> order) - 1;
- return bitmap_find_next_zero_area(map, size, start, nr, align_mask);
+ align_mask = roundup_pow_of_two(
+ ENTRIES_TO_BITS(mem_to_units(alignment->align,
+ order))) - 1;
+ bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
+ ENTRIES_TO_BITS(start),
+ ENTRIES_TO_BITS(nr),
+ align_mask);
+ return BITS_DIV_ENTRIES(bit_index);
}
EXPORT_SYMBOL(gen_pool_first_fit_align);

/**
- * gen_pool_fixed_alloc - reserve a specific region
+ * gen_pool_fixed_alloc() - reserve a specific region
* @map: The address to base the search on
- * @size: The bitmap size in bits
- * @start: The bitnumber to start searching at
- * @nr: The number of zeroed bits we're looking for
+ * @size: The number of allocation units in the bitmap
+ * @start: The allocation unit to start searching at
+ * @nr: The number of allocation units we're looking for
* @data: data for alignment
* @pool: pool to get order from
+ *
+ * Return:
+ * * index of the memory allocated - sufficient space available
+ * * end of the range - insufficient space
*/
unsigned long gen_pool_fixed_alloc(unsigned long *map, unsigned long size,
- unsigned long start, unsigned int nr, void *data,
- struct gen_pool *pool)
+ unsigned long start, unsigned int nr,
+ void *data, struct gen_pool *pool)
{
struct genpool_data_fixed *fixed_data;
- int order;
- unsigned long offset_bit;
- unsigned long start_bit;
+ unsigned int order;
+ unsigned long offset;
+ unsigned long align_mask;
+ unsigned long bit_index;

fixed_data = data;
order = pool->min_alloc_order;
- offset_bit = fixed_data->offset >> order;
if (WARN_ON(fixed_data->offset & ((1UL << order) - 1)))
return size;
+ offset = fixed_data->offset >> order;
+ align_mask = roundup_pow_of_two(BITS_PER_ENTRY) - 1;
+ bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
+ ENTRIES_TO_BITS(start + offset),
+ ENTRIES_TO_BITS(nr), align_mask);
+ if (bit_index != ENTRIES_TO_BITS(offset))
+ return size;

- start_bit = bitmap_find_next_zero_area(map, size,
- start + offset_bit, nr, 0);
- if (start_bit != offset_bit)
- start_bit = size;
- return start_bit;
+ return BITS_DIV_ENTRIES(bit_index);
}
EXPORT_SYMBOL(gen_pool_fixed_alloc);

/**
- * gen_pool_first_fit_order_align - find the first available region
+ * gen_pool_first_fit_order_align() - find the first available region
* of memory matching the size requirement. The region will be aligned
* to the order of the size specified.
* @map: The address to base the search on
- * @size: The bitmap size in bits
- * @start: The bitnumber to start searching at
- * @nr: The number of zeroed bits we're looking for
+ * @size: The number of allocation units in the bitmap
+ * @start: The allocation unit to start searching at
+ * @nr: The number of allocation units we're looking for
* @data: additional data - unused
* @pool: pool to find the fit region memory from
+ *
+ * Return:
+ * * index of the memory allocated - sufficient space available
+ * * end of the range - insufficient space
*/
unsigned long gen_pool_first_fit_order_align(unsigned long *map,
- unsigned long size, unsigned long start,
- unsigned int nr, void *data, struct gen_pool *pool)
+ unsigned long size,
+ unsigned long start,
+ unsigned int nr, void *data,
+ struct gen_pool *pool)
{
- unsigned long align_mask = roundup_pow_of_two(nr) - 1;
-
- return bitmap_find_next_zero_area(map, size, start, nr, align_mask);
+ unsigned long align_mask;
+ unsigned long bit_index;
+
+ align_mask = roundup_pow_of_two(ENTRIES_TO_BITS(nr)) - 1;
+ bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
+ ENTRIES_TO_BITS(start),
+ ENTRIES_TO_BITS(nr),
+ align_mask);
+ return BITS_DIV_ENTRIES(bit_index);
}
EXPORT_SYMBOL(gen_pool_first_fit_order_align);

/**
- * gen_pool_best_fit - find the best fitting region of memory
- * macthing the size requirement (no alignment constraint)
+ * gen_pool_best_fit() - find the best fitting region of memory
+ * matching the size requirement (no alignment constraint)
* @map: The address to base the search on
- * @size: The bitmap size in bits
- * @start: The bitnumber to start searching at
- * @nr: The number of zeroed bits we're looking for
+ * @size: The number of allocation units in the bitmap
+ * @start: The allocation unit to start searching at
+ * @nr: The number of allocation units we're looking for
* @data: additional data - unused
* @pool: pool to find the fit region memory from
*
* Iterate over the bitmap to find the smallest free region
* which we can allocate the memory.
+ *
+ * Return:
+ * * index of the memory allocated - sufficient space available
+ * * end of the range - insufficient space
*/
unsigned long gen_pool_best_fit(unsigned long *map, unsigned long size,
- unsigned long start, unsigned int nr, void *data,
- struct gen_pool *pool)
+ unsigned long start, unsigned int nr,
+ void *data, struct gen_pool *pool)
{
- unsigned long start_bit = size;
+ unsigned long start_bit = ENTRIES_TO_BITS(size);
unsigned long len = size + 1;
unsigned long index;
+ unsigned long align_mask;
+ unsigned long bit_index;

- index = bitmap_find_next_zero_area(map, size, start, nr, 0);
+ align_mask = roundup_pow_of_two(BITS_PER_ENTRY) - 1;
+ bit_index = bitmap_find_next_zero_area(map, ENTRIES_TO_BITS(size),
+ ENTRIES_TO_BITS(start),
+ ENTRIES_TO_BITS(nr),
+ align_mask);
+ index = BITS_DIV_ENTRIES(bit_index);

while (index < size) {
- int next_bit = find_next_bit(map, size, index + nr);
- if ((next_bit - index) < len) {
- len = next_bit - index;
- start_bit = index;
+ unsigned long next_bit;
+
+ next_bit = find_next_bit(map, ENTRIES_TO_BITS(size),
+ ENTRIES_TO_BITS(index + nr));
+ if ((BITS_DIV_ENTRIES(next_bit) - index) < len) {
+ len = BITS_DIV_ENTRIES(next_bit) - index;
+ start_bit = ENTRIES_TO_BITS(index);
if (len == nr)
- return start_bit;
+ return BITS_DIV_ENTRIES(start_bit);
}
- index = bitmap_find_next_zero_area(map, size,
- next_bit + 1, nr, 0);
+ bit_index =
+ bitmap_find_next_zero_area(map,
+ ENTRIES_TO_BITS(size),
+ next_bit + 1,
+ ENTRIES_TO_BITS(nr),
+ align_mask);
+ index = BITS_DIV_ENTRIES(bit_index);
}

- return start_bit;
+ return BITS_DIV_ENTRIES(start_bit);
}
EXPORT_SYMBOL(gen_pool_best_fit);

@@ -668,11 +995,14 @@ static int devm_gen_pool_match(struct device *dev, void *res, void *data)
}

/**
- * gen_pool_get - Obtain the gen_pool (if any) for a device
+ * gen_pool_get() - Obtain the gen_pool (if any) for a device
* @dev: device to retrieve the gen_pool from
- * @name: name of a gen_pool or NULL, identifies a particular gen_pool on device
+ * @name: name of a gen_pool or NULL, identifies a particular gen_pool
+ * on device
*
- * Returns the gen_pool for the device if one is present, or NULL.
+ * Return:
+ * * the gen_pool for the device - if it exists
+ * * NULL - no pool exists for the device
*/
struct gen_pool *gen_pool_get(struct device *dev, const char *name)
{
@@ -687,7 +1017,7 @@ struct gen_pool *gen_pool_get(struct device *dev, const char *name)
EXPORT_SYMBOL_GPL(gen_pool_get);

/**
- * devm_gen_pool_create - managed gen_pool_create
+ * devm_gen_pool_create() - managed gen_pool_create
* @dev: device that provides the gen_pool
* @min_alloc_order: log base 2 of number of bytes each bitmap bit represents
* @nid: node selector for allocated gen_pool, %NUMA_NO_NODE for all nodes
@@ -696,6 +1026,10 @@ EXPORT_SYMBOL_GPL(gen_pool_get);
* Create a new special memory pool that can be used to manage special purpose
* memory not managed by the regular kmalloc/kfree interface. The pool will be
* automatically destroyed by the device management code.
+ *
+ * Return:
+ * * address of the pool - success
+ * * NULL - error
*/
struct gen_pool *devm_gen_pool_create(struct device *dev, int min_alloc_order,
int nid, const char *name)
@@ -738,17 +1072,19 @@ EXPORT_SYMBOL(devm_gen_pool_create);

#ifdef CONFIG_OF
/**
- * of_gen_pool_get - find a pool by phandle property
+ * of_gen_pool_get() - find a pool by phandle property
* @np: device node
* @propname: property name containing phandle(s)
* @index: index into the phandle array
*
- * Returns the pool that contains the chunk starting at the physical
- * address of the device tree node pointed at by the phandle property,
- * or NULL if not found.
+ * Return:
+ * * pool address - it contains the chunk starting at the physical
+ * address of the device tree node pointed at by
+ * the phandle property
+ * * NULL - otherwise
*/
struct gen_pool *of_gen_pool_get(struct device_node *np,
- const char *propname, int index)
+ const char *propname, int index)
{
struct platform_device *pdev;
struct device_node *np_pool, *parent;
--
2.14.1


2018-03-13 21:51:24

by Igor Stoppa

[permalink] [raw]
Subject: [PATCH 2/8] Add label to genalloc.rst for cross reference

Put a label at the beginning of the genalloc.rst, to allow other
documents to cross-reference it.

Signed-off-by: Igor Stoppa <[email protected]>
---
Documentation/core-api/genalloc.rst | 2 ++
1 file changed, 2 insertions(+)

diff --git a/Documentation/core-api/genalloc.rst b/Documentation/core-api/genalloc.rst
index 6b38a39fab24..39dba5bb7b05 100644
--- a/Documentation/core-api/genalloc.rst
+++ b/Documentation/core-api/genalloc.rst
@@ -1,3 +1,5 @@
+.. _genalloc:
+
The genalloc/genpool subsystem
==============================

--
2.14.1


2018-03-13 21:52:02

by Igor Stoppa

[permalink] [raw]
Subject: [PATCH 3/8] genalloc: selftest

Introduce a set of macros for writing concise test cases for genalloc.

The test cases are meant to provide regression testing, when working on
new functionality for genalloc.

Primarily they are meant to confirm that the various allocation strategy
will continue to work as expected.

The execution of the self testing is controlled through a Kconfig option.

The testing takes place in the very early stages of main.c, to ensure
that failures in genalloc are caught before they can cause unexplained
erratic behavior in any of genalloc users.

Therefore, it would not be advisable to implement it as module.

Signed-off-by: Igor Stoppa <[email protected]>
---
include/linux/test_genalloc.h | 26 +++
init/main.c | 2 +
lib/Kconfig | 15 ++
lib/Makefile | 1 +
lib/test_genalloc.c | 410 ++++++++++++++++++++++++++++++++++++++++++
5 files changed, 454 insertions(+)
create mode 100644 include/linux/test_genalloc.h
create mode 100644 lib/test_genalloc.c

diff --git a/include/linux/test_genalloc.h b/include/linux/test_genalloc.h
new file mode 100644
index 000000000000..cc45c6c859cf
--- /dev/null
+++ b/include/linux/test_genalloc.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * test_genalloc.h
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <[email protected]>
+ */
+
+
+#ifndef __LINUX_TEST_GENALLOC_H
+#define __LINUX_TEST_GENALLOC_H
+
+
+#ifdef CONFIG_TEST_GENERIC_ALLOCATOR
+
+#include <linux/genalloc.h>
+
+void test_genalloc(void);
+
+#else
+
+static inline void test_genalloc(void){};
+
+#endif
+
+#endif
diff --git a/init/main.c b/init/main.c
index 969eaf140ef0..2bf1312fd2fe 100644
--- a/init/main.c
+++ b/init/main.c
@@ -90,6 +90,7 @@
#include <linux/cache.h>
#include <linux/rodata_test.h>
#include <linux/jump_label.h>
+#include <linux/test_genalloc.h>

#include <asm/io.h>
#include <asm/bugs.h>
@@ -661,6 +662,7 @@ asmlinkage __visible void __init start_kernel(void)
*/
mem_encrypt_init();

+ test_genalloc();
#ifdef CONFIG_BLK_DEV_INITRD
if (initrd_start && !initrd_below_start_ok &&
page_to_pfn(virt_to_page((void *)initrd_start)) < min_low_pfn) {
diff --git a/lib/Kconfig b/lib/Kconfig
index e96089499371..361514324d64 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -287,6 +287,21 @@ config DECOMPRESS_LZ4
config GENERIC_ALLOCATOR
bool

+config TEST_GENERIC_ALLOCATOR
+ bool "genalloc tester"
+ default n
+ select GENERIC_ALLOCATOR
+ help
+ Enable automated testing of the generic allocator.
+ The testing is primarily for the tracking of allocated space.
+
+config TEST_GENERIC_ALLOCATOR_VERBOSE
+ bool "make the genalloc tester more verbose"
+ default n
+ select TEST_GENERIC_ALLOCATOR
+ help
+ More information will be displayed during the self-testing.
+
#
# reed solomon support is select'ed if needed
#
diff --git a/lib/Makefile b/lib/Makefile
index a90d4fcd748f..5b5ee8d8f6d6 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -108,6 +108,7 @@ obj-$(CONFIG_LIBCRC32C) += libcrc32c.o
obj-$(CONFIG_CRC8) += crc8.o
obj-$(CONFIG_XXHASH) += xxhash.o
obj-$(CONFIG_GENERIC_ALLOCATOR) += genalloc.o
+obj-$(CONFIG_TEST_GENERIC_ALLOCATOR) += test_genalloc.o

obj-$(CONFIG_842_COMPRESS) += 842/
obj-$(CONFIG_842_DECOMPRESS) += 842/
diff --git a/lib/test_genalloc.c b/lib/test_genalloc.c
new file mode 100644
index 000000000000..12a61c9e7558
--- /dev/null
+++ b/lib/test_genalloc.c
@@ -0,0 +1,410 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * test_genalloc.c
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <[email protected]>
+ */
+
+#include <linux/module.h>
+#include <linux/printk.h>
+#include <linux/init.h>
+#include <linux/vmalloc.h>
+#include <linux/string.h>
+#include <linux/debugfs.h>
+#include <linux/atomic.h>
+#include <linux/genalloc.h>
+
+#include <linux/test_genalloc.h>
+
+
+/*
+ * In case of failure of any of these tests, memory corruption is almost
+ * guarranteed; allowing the boot to continue means risking to corrupt
+ * also any filesystem/block device accessed write mode.
+ * Therefore, BUG_ON() is used, when testing.
+ */
+
+
+/*
+ * Keep the bitmap small, while including case of cross-ulong mapping.
+ * For simplicity, the test cases use only 1 chunk of memory.
+ */
+#define BITMAP_SIZE_C 16
+#define ALLOC_ORDER 0
+
+#define ULONG_SIZE (sizeof(unsigned long))
+#define BITMAP_SIZE_UL (BITMAP_SIZE_C / ULONG_SIZE)
+#define MIN_ALLOC_SIZE (1 << ALLOC_ORDER)
+#define ENTRIES (BITMAP_SIZE_C * 8)
+#define CHUNK_SIZE (MIN_ALLOC_SIZE * ENTRIES)
+
+#ifndef CONFIG_TEST_GENERIC_ALLOCATOR_VERBOSE
+
+static inline void print_first_chunk_bitmap(struct gen_pool *pool) {}
+
+#else
+
+static void print_first_chunk_bitmap(struct gen_pool *pool)
+{
+ struct gen_pool_chunk *chunk;
+ char bitmap[BITMAP_SIZE_C * 2 + 1];
+ unsigned long i;
+ char *bm = bitmap;
+ char *entry;
+
+ if (unlikely(pool == NULL || pool->chunks.next == NULL))
+ return;
+
+ chunk = container_of(pool->chunks.next, struct gen_pool_chunk,
+ next_chunk);
+ entry = (void *)chunk->entries;
+ for (i = 1; i <= BITMAP_SIZE_C; i++)
+ bm += snprintf(bm, 3, "%02hhx", entry[BITMAP_SIZE_C - i]);
+ *bm = '\0';
+ pr_notice("chunk: %p bitmap: 0x%s\n", chunk, bitmap);
+
+}
+
+#endif
+
+enum test_commands {
+ CMD_ALLOCATOR,
+ CMD_ALLOCATE,
+ CMD_FLUSH,
+ CMD_FREE,
+ CMD_NUMBER,
+ CMD_END = CMD_NUMBER,
+};
+
+struct null_struct {
+ void *null;
+};
+
+struct test_allocator {
+ genpool_algo_t algo;
+ union {
+ struct genpool_data_align align;
+ struct genpool_data_fixed offset;
+ struct null_struct null;
+ } data;
+};
+
+struct test_action {
+ unsigned int location;
+ char pattern[BITMAP_SIZE_C];
+ unsigned int size;
+};
+
+
+struct test_command {
+ enum test_commands command;
+ union {
+ struct test_allocator allocator;
+ struct test_action action;
+ };
+};
+
+
+/*
+ * To pass an array literal as parameter to a macro, it must go through
+ * this one, first.
+ */
+#define ARR(...) __VA_ARGS__
+
+#define SET_DATA(parameter, value) \
+ .parameter = { \
+ .parameter = value, \
+ } \
+
+#define SET_ALLOCATOR(alloc, parameter, value) \
+{ \
+ .command = CMD_ALLOCATOR, \
+ .allocator = { \
+ .algo = (alloc), \
+ .data = { \
+ SET_DATA(parameter, value), \
+ }, \
+ } \
+}
+
+#define ACTION_MEM(act, mem_size, mem_loc, match) \
+{ \
+ .command = act, \
+ .action = { \
+ .size = (mem_size), \
+ .location = (mem_loc), \
+ .pattern = match, \
+ }, \
+}
+
+#define ALLOCATE_MEM(mem_size, mem_loc, match) \
+ ACTION_MEM(CMD_ALLOCATE, mem_size, mem_loc, ARR(match))
+
+#define FREE_MEM(mem_size, mem_loc, match) \
+ ACTION_MEM(CMD_FREE, mem_size, mem_loc, ARR(match))
+
+#define FLUSH_MEM() \
+{ \
+ .command = CMD_FLUSH, \
+}
+
+#define END() \
+{ \
+ .command = CMD_END, \
+}
+
+static inline int compare_bitmaps(const struct gen_pool *pool,
+ const char *reference)
+{
+ struct gen_pool_chunk *chunk;
+ char *bitmap;
+ unsigned int i;
+
+ chunk = container_of(pool->chunks.next, struct gen_pool_chunk,
+ next_chunk);
+ bitmap = (char *)chunk->entries;
+
+ for (i = 0; i < BITMAP_SIZE_C; i++)
+ if (bitmap[i] != reference[i])
+ return -1;
+ return 0;
+}
+
+static void callback_set_allocator(struct gen_pool *pool,
+ const struct test_command *cmd,
+ unsigned long *locations)
+{
+ gen_pool_set_algo(pool, cmd->allocator.algo,
+ (void *)&cmd->allocator.data);
+}
+
+static void callback_allocate(struct gen_pool *pool,
+ const struct test_command *cmd,
+ unsigned long *locations)
+{
+ const struct test_action *action = &cmd->action;
+
+ locations[action->location] = gen_pool_alloc(pool, action->size);
+ BUG_ON(!locations[action->location]);
+ print_first_chunk_bitmap(pool);
+ BUG_ON(compare_bitmaps(pool, action->pattern));
+}
+
+static void callback_flush(struct gen_pool *pool,
+ const struct test_command *cmd,
+ unsigned long *locations)
+{
+ unsigned int i;
+
+ for (i = 0; i < ENTRIES; i++)
+ if (locations[i]) {
+ gen_pool_free(pool, locations[i], 0);
+ locations[i] = 0;
+ }
+}
+
+static void callback_free(struct gen_pool *pool,
+ const struct test_command *cmd,
+ unsigned long *locations)
+{
+ const struct test_action *action = &cmd->action;
+
+ gen_pool_free(pool, locations[action->location], 0);
+ locations[action->location] = 0;
+ print_first_chunk_bitmap(pool);
+ BUG_ON(compare_bitmaps(pool, action->pattern));
+}
+
+static void (* const callbacks[CMD_NUMBER])(struct gen_pool *,
+ const struct test_command *,
+ unsigned long *) = {
+ [CMD_ALLOCATOR] = callback_set_allocator,
+ [CMD_ALLOCATE] = callback_allocate,
+ [CMD_FREE] = callback_free,
+ [CMD_FLUSH] = callback_flush,
+};
+
+static const struct test_command test_first_fit[] = {
+ SET_ALLOCATOR(gen_pool_first_fit, null, NULL),
+ ALLOCATE_MEM(3, 0, ARR({0x2b})),
+ ALLOCATE_MEM(2, 1, ARR({0xeb, 0x02})),
+ ALLOCATE_MEM(5, 2, ARR({0xeb, 0xae, 0x0a})),
+ FREE_MEM(2, 1, ARR({0x2b, 0xac, 0x0a})),
+ ALLOCATE_MEM(1, 1, ARR({0xeb, 0xac, 0x0a})),
+ FREE_MEM(0, 2, ARR({0xeb})),
+ FREE_MEM(0, 0, ARR({0xc0})),
+ FREE_MEM(0, 1, ARR({0x00})),
+ END(),
+};
+
+/*
+ * To make the test work for both 32bit and 64bit ulong sizes,
+ * allocate (8 / 2 * 4 - 1) = 15 bytes bytes, then 16, then 2.
+ * The first allocation prepares for the crossing of the 32bit ulong
+ * threshold. The following crosses the 32bit threshold and prepares for
+ * crossing the 64bit thresholds. The last is large enough (2 bytes) to
+ * cross the 64bit threshold.
+ * Then free the allocations in the order: 2nd, 1st, 3rd.
+ */
+static const struct test_command test_ulong_span[] = {
+ SET_ALLOCATOR(gen_pool_first_fit, null, NULL),
+ ALLOCATE_MEM(15, 0, ARR({0xab, 0xaa, 0xaa, 0x2a})),
+ ALLOCATE_MEM(16, 1, ARR({0xab, 0xaa, 0xaa, 0xea,
+ 0xaa, 0xaa, 0xaa, 0x2a})),
+ ALLOCATE_MEM(2, 2, ARR({0xab, 0xaa, 0xaa, 0xea,
+ 0xaa, 0xaa, 0xaa, 0xea,
+ 0x02})),
+ FREE_MEM(0, 1, ARR({0xab, 0xaa, 0xaa, 0x2a,
+ 0x00, 0x00, 0x00, 0xc0,
+ 0x02})),
+ FREE_MEM(0, 0, ARR({0x00, 0x00, 0x00, 0x00,
+ 0x00, 0x00, 0x00, 0xc0,
+ 0x02})),
+ FREE_MEM(0, 2, ARR({0x00})),
+ END(),
+};
+
+/*
+ * Create progressively smaller allocations A B C D E.
+ * then free B and D.
+ * Then create new allocation that would fit in both of the gaps left by
+ * B and D. Verify that it uses the gap from B.
+ */
+static const struct test_command test_first_fit_gaps[] = {
+ SET_ALLOCATOR(gen_pool_first_fit, null, NULL),
+ ALLOCATE_MEM(10, 0, ARR({0xab, 0xaa, 0x0a})),
+ ALLOCATE_MEM(8, 1, ARR({0xab, 0xaa, 0xba, 0xaa,
+ 0x0a})),
+ ALLOCATE_MEM(6, 2, ARR({0xab, 0xaa, 0xba, 0xaa,
+ 0xba, 0xaa})),
+ ALLOCATE_MEM(4, 3, ARR({0xab, 0xaa, 0xba, 0xaa,
+ 0xba, 0xaa, 0xab})),
+ ALLOCATE_MEM(2, 4, ARR({0xab, 0xaa, 0xba, 0xaa,
+ 0xba, 0xaa, 0xab, 0x0b})),
+ FREE_MEM(0, 1, ARR({0xab, 0xaa, 0x0a, 0x00,
+ 0xb0, 0xaa, 0xab, 0x0b})),
+ FREE_MEM(0, 3, ARR({0xab, 0xaa, 0x0a, 0x00,
+ 0xb0, 0xaa, 0x00, 0x0b})),
+ ALLOCATE_MEM(3, 3, ARR({0xab, 0xaa, 0xba, 0x02,
+ 0xb0, 0xaa, 0x00, 0x0b})),
+ FLUSH_MEM(),
+ END(),
+};
+
+/* Test first fit align */
+static const struct test_command test_first_fit_align[] = {
+ SET_ALLOCATOR(gen_pool_first_fit_align, align, 4),
+ ALLOCATE_MEM(5, 0, ARR({0xab, 0x02})),
+ ALLOCATE_MEM(3, 1, ARR({0xab, 0x02, 0x2b})),
+ ALLOCATE_MEM(2, 2, ARR({0xab, 0x02, 0x2b, 0x0b})),
+ ALLOCATE_MEM(1, 3, ARR({0xab, 0x02, 0x2b, 0x0b, 0x03})),
+ FREE_MEM(0, 0, ARR({0x00, 0x00, 0x2b, 0x0b, 0x03})),
+ FREE_MEM(0, 2, ARR({0x00, 0x00, 0x2b, 0x00, 0x03})),
+ ALLOCATE_MEM(2, 0, ARR({0x0b, 0x00, 0x2b, 0x00, 0x03})),
+ FLUSH_MEM(),
+ END(),
+};
+
+
+/* Test fixed alloc */
+static const struct test_command test_fixed_data[] = {
+ SET_ALLOCATOR(gen_pool_fixed_alloc, offset, 1),
+ ALLOCATE_MEM(5, 0, ARR({0xac, 0x0a})),
+ SET_ALLOCATOR(gen_pool_fixed_alloc, offset, 8),
+ ALLOCATE_MEM(3, 1, ARR({0xac, 0x0a, 0x2b})),
+ SET_ALLOCATOR(gen_pool_fixed_alloc, offset, 6),
+ ALLOCATE_MEM(2, 2, ARR({0xac, 0xba, 0x2b})),
+ SET_ALLOCATOR(gen_pool_fixed_alloc, offset, 30),
+ ALLOCATE_MEM(40, 3, ARR({0xac, 0xba, 0x2b, 0x00,
+ 0x00, 0x00, 0x00, 0xb0,
+ 0xaa, 0xaa, 0xaa, 0xaa,
+ 0xaa, 0xaa, 0xaa, 0xaa})),
+ FLUSH_MEM(),
+ END(),
+};
+
+
+/* Test first fit order align */
+static const struct test_command test_first_fit_order_align[] = {
+ SET_ALLOCATOR(gen_pool_first_fit_order_align, null, NULL),
+ ALLOCATE_MEM(5, 0, ARR({0xab, 0x02})),
+ ALLOCATE_MEM(3, 1, ARR({0xab, 0x02, 0x2b})),
+ ALLOCATE_MEM(2, 2, ARR({0xab, 0xb2, 0x2b})),
+ ALLOCATE_MEM(1, 3, ARR({0xab, 0xbe, 0x2b})),
+ ALLOCATE_MEM(1, 4, ARR({0xab, 0xbe, 0xeb})),
+ ALLOCATE_MEM(2, 5, ARR({0xab, 0xbe, 0xeb, 0x0b})),
+ FLUSH_MEM(),
+ END(),
+};
+
+
+/* 007 Test best fit */
+static const struct test_command test_best_fit[] = {
+ SET_ALLOCATOR(gen_pool_best_fit, null, NULL),
+ ALLOCATE_MEM(5, 0, ARR({0xab, 0x02})),
+ ALLOCATE_MEM(3, 1, ARR({0xab, 0xae})),
+ ALLOCATE_MEM(3, 2, ARR({0xab, 0xae, 0x2b})),
+ ALLOCATE_MEM(1, 3, ARR({0xab, 0xae, 0xeb})),
+ FREE_MEM(0, 0, ARR({0x00, 0xac, 0xeb})),
+ FREE_MEM(0, 2, ARR({0x00, 0xac, 0xc0})),
+ ALLOCATE_MEM(2, 0, ARR({0x00, 0xac, 0xcb})),
+ FLUSH_MEM(),
+ END(),
+};
+
+
+enum test_cases_indexes {
+ TEST_CASE_FIRST_FIT,
+ TEST_CASE_ULONG_SPAN,
+ TEST_CASE_FIRST_FIT_GAPS,
+ TEST_CASE_FIRST_FIT_ALIGN,
+ TEST_CASE_FIXED_DATA,
+ TEST_CASE_FIRST_FIT_ORDER_ALIGN,
+ TEST_CASE_BEST_FIT,
+ TEST_CASES_NUM,
+};
+
+static const struct test_command *test_cases[TEST_CASES_NUM] = {
+ [TEST_CASE_FIRST_FIT] = test_first_fit,
+ [TEST_CASE_ULONG_SPAN] = test_ulong_span,
+ [TEST_CASE_FIRST_FIT_GAPS] = test_first_fit_gaps,
+ [TEST_CASE_FIRST_FIT_ALIGN] = test_first_fit_align,
+ [TEST_CASE_FIXED_DATA] = test_fixed_data,
+ [TEST_CASE_FIRST_FIT_ORDER_ALIGN] = test_first_fit_order_align,
+ [TEST_CASE_BEST_FIT] = test_best_fit,
+};
+
+
+void test_genalloc(void)
+{
+ static struct gen_pool *pool;
+ unsigned long locations[ENTRIES];
+ char chunk[CHUNK_SIZE];
+ int retval;
+ unsigned int i;
+ const struct test_command *cmd;
+
+ pool = gen_pool_create(ALLOC_ORDER, -1);
+ if (unlikely(!pool)) {
+ pr_err("genalloc-selftest: no memory for pool.");
+ return;
+ }
+
+ retval = gen_pool_add_virt(pool, (unsigned long)chunk, 0,
+ CHUNK_SIZE, -1);
+ if (unlikely(retval)) {
+ pr_err("genalloc-selftest: could not register chunk.");
+ goto destroy_pool;
+ }
+
+ memset(locations, 0, ENTRIES * sizeof(unsigned long));
+ for (i = 0; i < TEST_CASES_NUM; i++)
+ for (cmd = test_cases[i]; cmd->command < CMD_END; cmd++)
+ callbacks[cmd->command](pool, cmd, locations);
+ pr_notice("genalloc-selftest: executed successfully %d tests",
+ TEST_CASES_NUM);
+
+destroy_pool:
+ gen_pool_destroy(pool);
+}
--
2.14.1


2018-03-13 21:52:11

by Igor Stoppa

[permalink] [raw]
Subject: [PATCH 4/8] struct page: add field for vm_struct

When a page is used for virtual memory, it is often necessary to obtain
a handler to the corresponding vm_struct, which refers to the virtually
continuous area generated when invoking vmalloc.

The struct page has a "mapping" field, which can be re-used, to store a
pointer to the parent area.

This will avoid more expensive searches, later on.

Signed-off-by: Igor Stoppa <[email protected]>
---
include/linux/mm_types.h | 1 +
mm/vmalloc.c | 2 ++
2 files changed, 3 insertions(+)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index fd1af6b9591d..c3a4825e10c0 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -84,6 +84,7 @@ struct page {
void *s_mem; /* slab first object */
atomic_t compound_mapcount; /* first tail page */
/* page_deferred_list().next -- second tail page */
+ struct vm_struct *area;
};

/* Second double word */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index ebff729cc956..61a1ca22b0f6 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1536,6 +1536,7 @@ static void __vunmap(const void *addr, int deallocate_pages)
struct page *page = area->pages[i];

BUG_ON(!page);
+ page->area = NULL;
__free_pages(page, 0);
}

@@ -1705,6 +1706,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
area->nr_pages = i;
goto fail;
}
+ page->area = area;
area->pages[i] = page;
if (gfpflags_allow_blocking(gfp_mask|highmem_mask))
cond_resched();
--
2.14.1


2018-03-13 21:52:50

by Igor Stoppa

[permalink] [raw]
Subject: [PATCH 5/8] Protectable Memory

The MMU available in many systems running Linux can often provide R/O
protection to the memory pages it handles.

However, the MMU-based protection works efficiently only when said pages
contain exclusively data that will not need further modifications.

Statically allocated variables can be segregated into a dedicated
section, but this does not sit very well with dynamically allocated
ones.

Dynamic allocation does not provide, currently, any means for grouping
variables in memory pages that would contain exclusively data suitable
for conversion to read only access mode.

The allocator here provided (pmalloc - protectable memory allocator)
introduces the concept of pools of protectable memory.

A module can request a pool and then refer any allocation request to the
pool handler it has received.

Once all the chunks of memory associated to a specific pool are
initialized, the pool can be protected.

After this point, the pool can only be destroyed (it is up to the module
to avoid any further references to the memory from the pool, after
the destruction is invoked).

The latter case is mainly meant for releasing memory, when a module is
unloaded.

A module can have as many pools as needed, for example to support the
protection of data that is initialized in sufficiently distinct phases.

Since pmalloc memory is obtained from vmalloc, an attacker that has
gained access to the physical mapping, still has to identify where the
target of the attack is actually located.

At the same time, being also based on genalloc, pmalloc does not
generate as much trashing of the TLB as it would be caused by using
directly only vmalloc.

Signed-off-by: Igor Stoppa <[email protected]>
---
include/linux/genalloc.h | 4 +
include/linux/pmalloc.h | 163 ++++++++++++
include/linux/vmalloc.h | 1 +
lib/genalloc.c | 23 ++
mm/Kconfig | 7 +
mm/Makefile | 1 +
mm/pmalloc.c | 643 +++++++++++++++++++++++++++++++++++++++++++++++
mm/usercopy.c | 33 +++
8 files changed, 875 insertions(+)
create mode 100644 include/linux/pmalloc.h
create mode 100644 mm/pmalloc.c

diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h
index ff7229520656..9e98f3c991a8 100644
--- a/include/linux/genalloc.h
+++ b/include/linux/genalloc.h
@@ -120,6 +120,10 @@ void *gen_pool_dma_alloc(struct gen_pool *pool, size_t size, dma_addr_t *dma);
void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size);


+void gen_pool_flush_chunk(struct gen_pool *pool,
+ struct gen_pool_chunk *chunk);
+
+
void gen_pool_for_each_chunk(struct gen_pool *pool,
void (*func)(struct gen_pool *pool,
struct gen_pool_chunk *chunk,
diff --git a/include/linux/pmalloc.h b/include/linux/pmalloc.h
new file mode 100644
index 000000000000..3c393069c9f1
--- /dev/null
+++ b/include/linux/pmalloc.h
@@ -0,0 +1,163 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * pmalloc.h: Header for Protectable Memory Allocator
+ *
+ * (C) Copyright 2017 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <[email protected]>
+ */
+
+#ifndef _LINUX_PMALLOC_H
+#define _LINUX_PMALLOC_H
+
+
+#include <linux/genalloc.h>
+#include <linux/string.h>
+
+#define PMALLOC_DEFAULT_ALLOC_ORDER (-1)
+
+/*
+ * Library for dynamic allocation of pools of memory that can be,
+ * after initialization, marked as read-only.
+ *
+ * This is intended to complement __read_only_after_init, for those cases
+ * where either it is not possible to know the initialization value before
+ * init is completed, or the amount of data is variable and can be
+ * determined only at run-time.
+ *
+ * ***WARNING***
+ * The user of the API is expected to synchronize:
+ * 1) allocation,
+ * 2) writes to the allocated memory,
+ * 3) write protection of the pool,
+ * 4) freeing of the allocated memory, and
+ * 5) destruction of the pool.
+ *
+ * For a non-threaded scenario, this type of locking is not even required.
+ *
+ * Even if the library were to provide support for locking, point 2)
+ * would still depend on the user taking the lock.
+ */
+
+
+struct gen_pool *pmalloc_create_pool(const char *name,
+ int min_alloc_order);
+
+
+int is_pmalloc_object(const void *ptr, const unsigned long n);
+
+
+bool pmalloc_expand_pool(struct gen_pool *pool, size_t size);
+
+
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp);
+
+
+/**
+ * pzalloc() - zero-initialized version of pmalloc
+ * @pool: handle to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Executes pmalloc, initializing the memory requested to 0,
+ * before returning the pointer to it.
+ *
+ * Return:
+ * * pointer to the memory requested - success
+ * * NULL - either no memory available or
+ * pool already read-only
+ */
+static inline void *pzalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+ return pmalloc(pool, size, gfp | __GFP_ZERO);
+}
+
+
+/**
+ * pmalloc_array() - allocates an array according to the parameters
+ * @pool: handle to the pool to be used for memory allocation
+ * @n: number of elements in the array
+ * @size: amount of memory (in bytes) requested for each element
+ * @flags: flags for page allocation
+ *
+ * Executes pmalloc, if it has a chance to succeed.
+ *
+ * Return:
+ * * the pmalloc result - success
+ * * NULL - error
+ */
+static inline void *pmalloc_array(struct gen_pool *pool, size_t n,
+ size_t size, gfp_t flags)
+{
+ if (unlikely(!(pool && n && size)))
+ return NULL;
+ return pmalloc(pool, n * size, flags);
+}
+
+
+/**
+ * pcalloc() - allocates a 0-initialized array according to the parameters
+ * @pool: handle to the pool to be used for memory allocation
+ * @n: number of elements in the array
+ * @size: amount of memory (in bytes) requested
+ * @flags: flags for page allocation
+ *
+ * Executes pmalloc_array, if it has a chance to succeed.
+ *
+ * Return:
+ * * the pmalloc result - success
+ * * NULL - error
+ */
+static inline void *pcalloc(struct gen_pool *pool, size_t n,
+ size_t size, gfp_t flags)
+{
+ return pmalloc_array(pool, n, size, flags | __GFP_ZERO);
+}
+
+
+/**
+ * pstrdup() - duplicate a string, using pmalloc as allocator
+ * @pool: handle to the pool to be used for memory allocation
+ * @s: string to duplicate
+ * @gfp: flags for page allocation
+ *
+ * Generates a copy of the given string, allocating sufficient memory
+ * from the given pmalloc pool.
+ *
+ * Return:
+ * * pointer to the replica - success
+ * * NULL - error
+ */
+static inline char *pstrdup(struct gen_pool *pool, const char *s, gfp_t gfp)
+{
+ size_t len;
+ char *buf;
+
+ if (unlikely(pool == NULL || s == NULL))
+ return NULL;
+
+ len = strlen(s) + 1;
+ buf = pmalloc(pool, len, gfp);
+ if (likely(buf))
+ strncpy(buf, s, len);
+ return buf;
+}
+
+
+void pmalloc_protect_pool(struct gen_pool *pool);
+
+
+/**
+ * pfree() - frees memory previously allocated from a pool
+ * @pool: handle to the pool used to allocate the memory to free
+ * @addr: the beginning of the location to free
+ *
+ */
+static inline void pfree(struct gen_pool *pool, const void *addr)
+{
+ gen_pool_free(pool, (unsigned long)addr, 0);
+}
+
+
+void pmalloc_destroy_pool(struct gen_pool *pool);
+
+#endif
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 1e5d8c392f15..116d280cca53 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -20,6 +20,7 @@ struct notifier_block; /* in notifier.h */
#define VM_UNINITIALIZED 0x00000020 /* vm_struct is not fully initialized */
#define VM_NO_GUARD 0x00000040 /* don't add guard page */
#define VM_KASAN 0x00000080 /* has allocated kasan shadow memory */
+#define VM_PMALLOC 0x00000100 /* pmalloc area - see docs */
/* bits [20..32] reserved for arch specific ioremap internals */

/*
diff --git a/lib/genalloc.c b/lib/genalloc.c
index b5f5e1f9b6cf..f3a94bbf18f2 100644
--- a/lib/genalloc.c
+++ b/lib/genalloc.c
@@ -661,6 +661,29 @@ void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size)
EXPORT_SYMBOL(gen_pool_free);


+/**
+ * gen_pool_flush_chunk() - drops all the allocations from a specific chunk
+ * @pool: the generic memory pool
+ * @chunk: The chunk to wipe clear.
+ *
+ * This is meant to be called only while destroying a pool. It's up to the
+ * caller to avoid races, but really, at this point the pool should have
+ * already been retired and it should have become unavailable for any other
+ * sort of operation.
+ */
+void gen_pool_flush_chunk(struct gen_pool *pool,
+ struct gen_pool_chunk *chunk)
+{
+ size_t size;
+
+ size = chunk->end_addr + 1 - chunk->start_addr;
+ memset(chunk->entries, 0,
+ DIV_ROUND_UP(size >> pool->min_alloc_order * BITS_PER_ENTRY,
+ BITS_PER_BYTE));
+ atomic_long_set(&chunk->avail, size);
+}
+
+
/**
* gen_pool_for_each_chunk() - call func for every chunk of generic memory pool
* @pool: the generic memory pool
diff --git a/mm/Kconfig b/mm/Kconfig
index c782e8fb7235..016d29b9400b 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -760,3 +760,10 @@ config GUP_BENCHMARK
performance of get_user_pages_fast().

See tools/testing/selftests/vm/gup_benchmark.c
+
+config PROTECTABLE_MEMORY
+ bool
+ depends on MMU
+ depends on ARCH_HAS_SET_MEMORY
+ select GENERIC_ALLOCATOR
+ default y
diff --git a/mm/Makefile b/mm/Makefile
index e669f02c5a54..959fdbdac118 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -65,6 +65,7 @@ obj-$(CONFIG_SPARSEMEM) += sparse.o
obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
obj-$(CONFIG_SLOB) += slob.o
obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
+obj-$(CONFIG_PROTECTABLE_MEMORY) += pmalloc.o
obj-$(CONFIG_KSM) += ksm.o
obj-$(CONFIG_PAGE_POISONING) += page_poison.o
obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/pmalloc.c b/mm/pmalloc.c
new file mode 100644
index 000000000000..59f385922510
--- /dev/null
+++ b/mm/pmalloc.c
@@ -0,0 +1,643 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * pmalloc.c: Protectable Memory Allocator
+ *
+ * (C) Copyright 2017-2018 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <[email protected]>
+ */
+
+#include <linux/printk.h>
+#include <linux/init.h>
+#include <linux/mm.h>
+#include <linux/vmalloc.h>
+#include <linux/genalloc.h>
+#include <linux/kernel.h>
+#include <linux/log2.h>
+#include <linux/slab.h>
+#include <linux/device.h>
+#include <linux/atomic.h>
+#include <linux/rculist.h>
+#include <linux/set_memory.h>
+#include <linux/bug.h>
+#include <asm/cacheflush.h>
+#include <asm/page.h>
+
+#include <linux/pmalloc.h>
+/*
+ * pmalloc_data contains the data specific to a pmalloc pool,
+ * in a format compatible with the design of gen_alloc.
+ * Some of the fields are used for exposing the corresponding parameter
+ * to userspace, through sysfs.
+ */
+struct pmalloc_data {
+ struct gen_pool *pool; /* Link back to the associated pool. */
+ bool protected; /* Status of the pool: RO or RW. */
+ struct kobj_attribute attr_protected; /* Sysfs attribute. */
+ struct kobj_attribute attr_avail; /* Sysfs attribute. */
+ struct kobj_attribute attr_size; /* Sysfs attribute. */
+ struct kobj_attribute attr_chunks; /* Sysfs attribute. */
+ struct kobject *pool_kobject;
+ struct list_head node; /* list of pools */
+};
+
+static LIST_HEAD(pmalloc_list);
+static bool sysfs_ready;
+static DEFINE_MUTEX(pmalloc_mutex);
+static struct kobject *pmalloc_kobject;
+
+
+/**
+ * pmalloc_pool_show_protected() - shows if a pool is write-protected
+ * @dev: the associated kobject
+ * @attr:A handle to the attribute object
+ * @buf: the buffer where to write the value
+ *
+ * Return:
+ * * the number of bytes written
+ */
+static ssize_t pmalloc_pool_show_protected(struct kobject *dev,
+ struct kobj_attribute *attr,
+ char *buf)
+{
+ struct pmalloc_data *data;
+
+ data = container_of(attr, struct pmalloc_data, attr_protected);
+ if (data->protected)
+ return sprintf(buf, "protected\n");
+ else
+ return sprintf(buf, "unprotected\n");
+}
+
+
+/**
+ * pmalloc_pool_show_avail() - shows cumulative available space in a pool
+ * @dev: the associated kobject
+ * @attr:A handle to the attribute object
+ * @buf: the buffer where to write the value
+ *
+ * The value shown is only indicative, because it doesn't take in account
+ * various factors, like allocation strategy, nor fragmentation, both
+ * across multiple chunks and even within the same chunk.
+ *
+ * Return:
+ * * the number of bytes written
+ */
+static ssize_t pmalloc_pool_show_avail(struct kobject *dev,
+ struct kobj_attribute *attr,
+ char *buf)
+{
+ struct pmalloc_data *data;
+
+ data = container_of(attr, struct pmalloc_data, attr_avail);
+ return sprintf(buf, "%lu\n",
+ (unsigned long)gen_pool_avail(data->pool));
+}
+
+
+/**
+ * pmalloc_pool_show_size() - shows cumulative size of a pool
+ * @dev: the associated kobject
+ * @attr: handle to the attribute object
+ * @buf: the buffer where to write the value
+ *
+ * Return:
+ * * the number of bytes written
+ */
+static ssize_t pmalloc_pool_show_size(struct kobject *dev,
+ struct kobj_attribute *attr,
+ char *buf)
+{
+ struct pmalloc_data *data;
+
+ data = container_of(attr, struct pmalloc_data, attr_size);
+ return sprintf(buf, "%lu\n",
+ (unsigned long)gen_pool_size(data->pool));
+}
+
+/**
+ * pool_chunk_number() - callback to count the number of chunks in a pool
+ * @pool: handle to the pool
+ * @chunk: chunk for the current iteration
+ * @data: opaque data passed by the calling iterator
+ */
+static void pool_chunk_number(struct gen_pool *pool,
+ struct gen_pool_chunk *chunk, void *data)
+{
+ unsigned long *counter = data;
+
+ (*counter)++;
+}
+
+/**
+ * pmalloc_pool_show_chunks() - callback exposing the number of chunks
+ * @dev: the associated kobject
+ * @attr: handle to the attribute object
+ * @buf: the buffer where to write the value
+ *
+ * Return:
+ * * number of bytes written
+ */
+static ssize_t pmalloc_pool_show_chunks(struct kobject *dev,
+ struct kobj_attribute *attr,
+ char *buf)
+{
+ struct pmalloc_data *data;
+ unsigned long chunks_num = 0;
+
+ data = container_of(attr, struct pmalloc_data, attr_chunks);
+ gen_pool_for_each_chunk(data->pool, pool_chunk_number, &chunks_num);
+ return sprintf(buf, "%lu\n", chunks_num);
+}
+
+
+/**
+ * pmalloc_connect() - Exposes the pool and its attributes through sysfs.
+ * @data: pointer to the data structure describing a pool
+ *
+ * Return:
+ * * pointer - to the kobject created
+ * * NULL - error
+ */
+static struct kobject *pmalloc_connect(struct pmalloc_data *data)
+{
+ const struct attribute *attrs[] = {
+ &data->attr_protected.attr,
+ &data->attr_avail.attr,
+ &data->attr_size.attr,
+ &data->attr_chunks.attr,
+ NULL
+ };
+ struct kobject *kobj;
+
+ kobj = kobject_create_and_add(data->pool->name, pmalloc_kobject);
+ if (unlikely(!kobj))
+ return NULL;
+
+ if (unlikely(sysfs_create_files(kobj, attrs) < 0)) {
+ kobject_put(kobj);
+ return NULL;
+ }
+ return kobj;
+}
+
+/**
+ * pmalloc_disconnect() - Removes the pool and its attributes from sysfs.
+ * @data: opaque data passed from the caller
+ * @kobj: the object to disconnect
+ */
+static void pmalloc_disconnect(struct pmalloc_data *data,
+ struct kobject *kobj)
+{
+ const struct attribute *attrs[] = {
+ &data->attr_protected.attr,
+ &data->attr_avail.attr,
+ &data->attr_size.attr,
+ &data->attr_chunks.attr,
+ NULL
+ };
+
+ sysfs_remove_files(kobj, attrs);
+ kobject_put(kobj);
+}
+
+/* Declares an attribute of the pool. */
+#define pmalloc_attr_init(data, attr_name) \
+do { \
+ sysfs_attr_init(&data->attr_##attr_name.attr); \
+ data->attr_##attr_name.attr.name = #attr_name; \
+ data->attr_##attr_name.attr.mode = VERIFY_OCTAL_PERMISSIONS(0400); \
+ data->attr_##attr_name.show = pmalloc_pool_show_##attr_name; \
+} while (0)
+
+
+/**
+ * init_pool() - allocates and initializes the data strutures for a pool
+ * @pool: handle to the pool to initialise.
+ * @name: the name for the new pool,
+ *
+ * Return:
+ * * true - success
+ * * false - failed allocations for meta-data
+ */
+static inline bool init_pool(struct gen_pool *pool, const char *name)
+{
+ const char *pool_name;
+ struct pmalloc_data *data;
+
+ pool_name = kstrdup(name, GFP_KERNEL);
+ if (WARN(!pool_name, "failed to allocate memory for pool name"))
+ return false;
+
+ data = kzalloc(sizeof(struct pmalloc_data), GFP_KERNEL);
+ if (WARN(!data, "failed to allocate memory for pool data")) {
+ kfree(pool_name);
+ return false;
+ }
+
+ data->protected = false;
+ data->pool = pool;
+ pmalloc_attr_init(data, protected);
+ pmalloc_attr_init(data, avail);
+ pmalloc_attr_init(data, size);
+ pmalloc_attr_init(data, chunks);
+ pool->data = data;
+ pool->name = pool_name;
+ return true;
+}
+
+
+/**
+ * pmalloc_create_pool() - create a new protectable memory pool
+ * @name: the name of the pool, enforced to be unique
+ * @min_alloc_order: log2 of the minimum allocation size obtainable
+ * from the pool; -1 will pick sizeof(unsigned long)
+ *
+ * Creates a new (empty) memory pool for allocation of protectable
+ * memory. Memory will be allocated upon request (through pmalloc).
+ *
+ * Return:
+ * * pointer to the new pool - success
+ * * NULL - error
+ */
+struct gen_pool *pmalloc_create_pool(const char *name, int min_alloc_order)
+{
+ struct gen_pool *pool;
+ struct pmalloc_data *data;
+
+ if (WARN(!name, "Refusing to create unnamed pool"))
+ return NULL;
+
+ if (min_alloc_order < 0)
+ min_alloc_order = ilog2(sizeof(unsigned long));
+
+ pool = gen_pool_create(min_alloc_order, NUMA_NO_NODE);
+ if (WARN(!pool, "Could not allocate memory for pool"))
+ return NULL;
+
+ if (WARN(!init_pool(pool, name),
+ "Failed to initialize pool %s.", name))
+ goto init_pool_err;
+
+ mutex_lock(&pmalloc_mutex);
+ list_for_each_entry(data, &pmalloc_list, node) {
+ if (!strcmp(name, data->pool->name)) {
+ mutex_unlock(&pmalloc_mutex);
+ goto same_name_err;
+ }
+ }
+
+ data = (struct pmalloc_data *)pool->data;
+ list_add(&data->node, &pmalloc_list);
+ if (sysfs_ready)
+ data->pool_kobject = pmalloc_connect(data);
+ mutex_unlock(&pmalloc_mutex);
+ return pool;
+
+same_name_err:
+ kfree(pool->data);
+init_pool_err:
+ gen_pool_destroy(pool);
+ return NULL;
+}
+
+#define CHUNK_TAG true
+#define CHUNK_UNTAG false
+/**
+ * chunk_tagging() - (un)tags the area corresponding to a chunk
+ * @chunk: vmalloc allocation, as multiple of memory pages
+ * @tag: selects whether to tag or untag the pages from the chunk
+ *
+ * Return:
+ * * true - success
+ * * false - failure
+ */
+static inline bool chunk_tagging(void *chunk, bool tag)
+{
+ struct vm_struct *area;
+ struct page *page;
+
+ if (!is_vmalloc_addr(chunk))
+ return false;
+
+ page = vmalloc_to_page(chunk);
+ if (unlikely(!page))
+ return false;
+
+ area = page->area;
+ if (tag == CHUNK_UNTAG)
+ area->flags &= ~VM_PMALLOC;
+ else
+ area->flags |= VM_PMALLOC;
+ return true;
+}
+
+
+enum {
+ INVALID_PMALLOC_OBJECT = -1,
+ NOT_PMALLOC_OBJECT = 0,
+ VALID_PMALLOC_OBJECT = 1,
+};
+
+
+/**
+ * is_pmalloc_object() - validates the existence of an alleged object
+ * @ptr: address of the object
+ * @n: size of the object, in bytes
+ *
+ * Return:
+ * * 0 - the object does not belong to pmalloc
+ * * 1 - the object belongs to pmalloc
+ * * \-1 - the object overlaps pmalloc memory incorrectly
+ */
+int is_pmalloc_object(const void *ptr, const unsigned long n)
+{
+ struct vm_struct *area;
+ struct page *page;
+ unsigned long area_start;
+ unsigned long area_end;
+ unsigned long object_start;
+ unsigned long object_end;
+
+
+ /*
+ * is_pmalloc_object gets called pretty late, so chances are high
+ * that the object is indeed of vmalloc type
+ */
+ if (unlikely(!is_vmalloc_addr(ptr)))
+ return NOT_PMALLOC_OBJECT;
+
+ page = vmalloc_to_page(ptr);
+ if (unlikely(!page))
+ return NOT_PMALLOC_OBJECT;
+
+ area = page->area;
+
+ if (likely(!(area->flags & VM_PMALLOC)))
+ return NOT_PMALLOC_OBJECT;
+
+ area_start = (unsigned long)area->addr;
+ area_end = area_start + area->nr_pages * PAGE_SIZE - 1;
+ object_start = (unsigned long)ptr;
+ object_end = object_start + n - 1;
+
+ if (likely((area_start <= object_start) &&
+ (object_end <= area_end)))
+ return VALID_PMALLOC_OBJECT;
+ else
+ return INVALID_PMALLOC_OBJECT;
+}
+
+
+/**
+ * pmalloc_expand_pool() - adds a memory chunk of the requested size
+ * @pool: handle for the pool
+ * @size: amount of memory (in bytes) requested
+ *
+ * Prepares a chunk of the requested size.
+ * This is intended to both minimize latency in later memory requests and
+ * avoid sleeping during allocation.
+ * Memory allocated with prealloc is stored in one single chunk, as
+ * opposed to what is allocated on-demand when pmalloc runs out of free
+ * space already existing in the pool and has to invoke vmalloc.
+ * One additional advantage of pre-allocating larger chunks of memory is
+ * that the total slack tends to be smaller.
+ * If used for avoiding sleep, the intended user must be protected from
+ * other, parasitic users, for example with a lock.
+ *
+ * Return:
+ * * true - allocation and registration were successful
+ * * false - some error occurred
+ */
+bool pmalloc_expand_pool(struct gen_pool *pool, size_t size)
+{
+ void *chunk;
+ size_t chunk_size;
+
+ chunk_size = roundup(size, PAGE_SIZE);
+ chunk = vmalloc(chunk_size);
+ if (WARN(chunk == NULL,
+ "Could not allocate %zu bytes from vmalloc", chunk_size))
+ return false;
+
+ if (WARN(!chunk_tagging(chunk, CHUNK_TAG),
+ "Failed to tag chunk as pmalloc memory"))
+ goto free;
+
+ /* Locking is already done inside gen_pool_add */
+ if (WARN(gen_pool_add(pool, (unsigned long)chunk, chunk_size,
+ NUMA_NO_NODE),
+ "Failed to add chunk to pool %s", pool->name)) {
+ chunk_tagging(chunk, CHUNK_UNTAG);
+free:
+ /*
+ * expand_pool might be called with a lock held, so use
+ * vfree_atomic, instaed of plain vfree.
+ */
+ vfree_atomic(chunk);
+ return false;
+ }
+
+ return true;
+
+}
+
+
+/**
+ * pmalloc() - allocate protectable memory from a pool
+ * @pool: handle to the pool to be used for memory allocation
+ * @size: amount of memory (in bytes) requested
+ * @gfp: flags for page allocation
+ *
+ * Allocates memory from an unprotected pool. If the pool doesn't have
+ * enough memory, and the request did not include GFP_ATOMIC, an attempt
+ * is made to add a new chunk of memory to the pool
+ * (a multiple of PAGE_SIZE), in order to fit the new request.
+ * Otherwise, NULL is returned.
+ *
+ * Return:
+ * * pointer to the memory requested - success
+ * * NULL - either no memory available or
+ * pool already read-only
+ */
+void *pmalloc(struct gen_pool *pool, size_t size, gfp_t gfp)
+{
+ unsigned long addr;
+ struct pmalloc_data *data = (struct pmalloc_data *)(pool->data);
+
+ if (WARN(data->protected, "pool %s already protected",
+ pool->name))
+ return NULL;
+
+ /*
+ * Even when everything goes fine, 2 or more allocations might
+ * happen in parallel, where one "steals" the memory added by
+ * another, but that's ok, just try to allocate some more.
+ * Eventually the "stealing" will subside.
+ */
+ while (true) {
+ /* Try to add enough memory to the pool. */
+ addr = gen_pool_alloc(pool, size);
+ if (likely(addr))
+ break; /* Success! Retry the allocation. */
+
+ /* There was no suitable memory available in the pool. */
+ if (likely(!(gfp & __GFP_ATOMIC))) {
+ /* Not in atomic context, expand the pool. */
+ if (likely(pmalloc_expand_pool(pool, size)) ||
+ unlikely(gfp & __GFP_NOFAIL))
+ /* Retry, either upon success or if mandated. */
+ continue;
+ /* Otherwise, give up. */
+ WARN(true, "Could not add %zu bytes to %s pool",
+ size, pool->name);
+ return NULL;
+ }
+
+ /* Atomic context: no chance to expand the pool. */
+ if (WARN(!(gfp & __GFP_NOFAIL),
+ "Could not get %zu bytes from %s and ATOMIC",
+ size, pool->name))
+ return NULL; /* Fail, if possible. */
+ /* Otherwise, retry.*/
+ }
+
+ if (unlikely(gfp & __GFP_ZERO))
+ memset((void *)addr, 0, size);
+ return (void *)addr;
+}
+
+
+/**
+ * pmalloc_chunk_set_protection() - (un)protects a pool
+ * @pool: handle to the pool to (un)protect
+ * @chunk: handle to the chunk to (un)protect
+ * @data: opaque data from the chunk iterator - it's a boolean
+ * * TRUE - protect the chunk
+ * * FALSE - unprotect the chunk
+ */
+static void pmalloc_chunk_set_protection(struct gen_pool *pool,
+ struct gen_pool_chunk *chunk,
+ void *data)
+{
+ const bool *flag = data;
+ size_t chunk_size = chunk->end_addr + 1 - chunk->start_addr;
+ unsigned long pages = chunk_size / PAGE_SIZE;
+
+ if (WARN(chunk_size & (PAGE_SIZE - 1),
+ "Chunk size is not a multiple of PAGE_SIZE."))
+ return;
+
+ if (*flag)
+ set_memory_ro(chunk->start_addr, pages);
+ else
+ set_memory_rw(chunk->start_addr, pages);
+}
+
+
+/**
+ * pmalloc_pool_set_protection() - (un)protects a pool
+ * @pool: handle to the pool to (un)protect
+ * @protection:
+ * * TRUE - protect
+ * * FALSE - unprotect
+ */
+static void pmalloc_pool_set_protection(struct gen_pool *pool,
+ bool protection)
+{
+ struct pmalloc_data *data;
+ struct gen_pool_chunk *chunk;
+
+ data = pool->data;
+ if (WARN(data->protected == protection,
+ "The pool %s is already protected as requested",
+ pool->name))
+ return;
+ data->protected = protection;
+ list_for_each_entry(chunk, &(pool)->chunks, next_chunk)
+ pmalloc_chunk_set_protection(pool, chunk, &protection);
+}
+
+
+/**
+ * pmalloc_protect_pool() - turn a read/write pool into read-only
+ * @pool: the pool to protect
+ *
+ * Write-protects all the memory chunks assigned to the pool.
+ * This prevents any further allocation.
+ */
+void pmalloc_protect_pool(struct gen_pool *pool)
+{
+ pmalloc_pool_set_protection(pool, true);
+}
+
+
+/**
+ * pmalloc_chunk_free() - untags and frees the pages from a chunk
+ * @pool: handle to the pool containing the chunk
+ * @chunk: the chunk to free
+ * @data: opaque data passed by the iterator invoking this function
+ */
+static void pmalloc_chunk_free(struct gen_pool *pool,
+ struct gen_pool_chunk *chunk, void *data)
+{
+ chunk_tagging(chunk, CHUNK_UNTAG);
+ gen_pool_flush_chunk(pool, chunk);
+ vfree_atomic((void *)chunk->start_addr);
+}
+
+
+/**
+ * pmalloc_destroy_pool() - destroys a pool and all the associated memory
+ * @pool: the pool to destroy
+ *
+ * All the memory that was allocated through pmalloc in the pool will be freed.
+ */
+void pmalloc_destroy_pool(struct gen_pool *pool)
+{
+ struct pmalloc_data *data;
+
+ data = pool->data;
+
+ mutex_lock(&pmalloc_mutex);
+ list_del(&data->node);
+ mutex_unlock(&pmalloc_mutex);
+
+ if (likely(data->pool_kobject))
+ pmalloc_disconnect(data, data->pool_kobject);
+
+ pmalloc_pool_set_protection(pool, false);
+ gen_pool_for_each_chunk(pool, pmalloc_chunk_free, NULL);
+ gen_pool_destroy(pool);
+ kfree(data);
+}
+
+
+/**
+ * pmalloc_late_init() - registers to debug sysfs pools pretading it
+ *
+ * When the sysfs infrastructure is ready to receive registrations,
+ * connect all the pools previously created. Also enable further pools
+ * to be connected right away.
+ *
+ * Return:
+ * * 0 - success
+ * * \-1 - error
+ */
+static int __init pmalloc_late_init(void)
+{
+ struct pmalloc_data *data, *n;
+
+ pmalloc_kobject = kobject_create_and_add("pmalloc", kernel_kobj);
+ if (WARN(!pmalloc_kobject,
+ "Failed to create pmalloc root sysfs dir"))
+ return -1;
+
+ mutex_lock(&pmalloc_mutex);
+ sysfs_ready = true;
+ list_for_each_entry_safe(data, n, &pmalloc_list, node)
+ pmalloc_connect(data);
+ mutex_unlock(&pmalloc_mutex);
+
+ return 0;
+}
+late_initcall(pmalloc_late_init);
diff --git a/mm/usercopy.c b/mm/usercopy.c
index e9e9325f7638..946ce051e296 100644
--- a/mm/usercopy.c
+++ b/mm/usercopy.c
@@ -240,6 +240,36 @@ static inline void check_heap_object(const void *ptr, unsigned long n,
}
}

+#ifdef CONFIG_PROTECTABLE_MEMORY
+
+int is_pmalloc_object(const void *ptr, const unsigned long n);
+
+static void check_pmalloc_object(const void *ptr, unsigned long n,
+ bool to_user)
+{
+ int retv;
+
+ retv = is_pmalloc_object(ptr, n);
+ if (unlikely(retv)) {
+ if (unlikely(!to_user))
+ usercopy_abort("pmalloc",
+ "trying to write to pmalloc object",
+ to_user, (const unsigned long)ptr, n);
+ if (retv < 0)
+ usercopy_abort("pmalloc",
+ "invalid pmalloc object",
+ to_user, (const unsigned long)ptr, n);
+ }
+}
+
+#else
+
+static void check_pmalloc_object(const void *ptr, unsigned long n,
+ bool to_user)
+{
+}
+#endif
+
/*
* Validates that the given object is:
* - not bogus address
@@ -277,5 +307,8 @@ void __check_object_size(const void *ptr, unsigned long n, bool to_user)

/* Check for object in kernel to avoid text exposure. */
check_kernel_text_object((const unsigned long)ptr, n, to_user);
+
+ /* Check if object is from a pmalloc chunk. */
+ check_pmalloc_object(ptr, n, to_user);
}
EXPORT_SYMBOL(__check_object_size);
--
2.14.1


2018-03-13 21:53:01

by Igor Stoppa

[permalink] [raw]
Subject: [PATCH 6/8] Pmalloc selftest

Add basic self-test functionality for pmalloc.

The testing is introduced as early as possible, right after the main
dependency, genalloc, has passed successfully, so that it can help
diagnosing failures in pmalloc users.

Signed-off-by: Igor Stoppa <[email protected]>
---
include/linux/test_pmalloc.h | 24 +++++
init/main.c | 2 +
mm/Kconfig | 10 ++
mm/Makefile | 1 +
mm/test_pmalloc.c | 238 +++++++++++++++++++++++++++++++++++++++++++
5 files changed, 275 insertions(+)
create mode 100644 include/linux/test_pmalloc.h
create mode 100644 mm/test_pmalloc.c

diff --git a/include/linux/test_pmalloc.h b/include/linux/test_pmalloc.h
new file mode 100644
index 000000000000..c7e2e451c17c
--- /dev/null
+++ b/include/linux/test_pmalloc.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * test_pmalloc.h
+ *
+ * (C) Copyright 2018 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <[email protected]>
+ */
+
+
+#ifndef __LINUX_TEST_PMALLOC_H
+#define __LINUX_TEST_PMALLOC_H
+
+
+#ifdef CONFIG_TEST_PROTECTABLE_MEMORY
+
+void test_pmalloc(void);
+
+#else
+
+static inline void test_pmalloc(void){};
+
+#endif
+
+#endif
diff --git a/init/main.c b/init/main.c
index 2bf1312fd2fe..ea44c940070a 100644
--- a/init/main.c
+++ b/init/main.c
@@ -91,6 +91,7 @@
#include <linux/rodata_test.h>
#include <linux/jump_label.h>
#include <linux/test_genalloc.h>
+#include <linux/test_pmalloc.h>

#include <asm/io.h>
#include <asm/bugs.h>
@@ -663,6 +664,7 @@ asmlinkage __visible void __init start_kernel(void)
mem_encrypt_init();

test_genalloc();
+ test_pmalloc();
#ifdef CONFIG_BLK_DEV_INITRD
if (initrd_start && !initrd_below_start_ok &&
page_to_pfn(virt_to_page((void *)initrd_start)) < min_low_pfn) {
diff --git a/mm/Kconfig b/mm/Kconfig
index 016d29b9400b..47b0843b02d2 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -767,3 +767,13 @@ config PROTECTABLE_MEMORY
depends on ARCH_HAS_SET_MEMORY
select GENERIC_ALLOCATOR
default y
+
+config TEST_PROTECTABLE_MEMORY
+ bool "Run self test for pmalloc memory allocator"
+ depends on MMU
+ depends on ARCH_HAS_SET_MEMORY
+ select PROTECTABLE_MEMORY
+ default n
+ help
+ Tries to verify that pmalloc works correctly and that the memory
+ is effectively protected.
diff --git a/mm/Makefile b/mm/Makefile
index 959fdbdac118..1de4be5fd0bc 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -66,6 +66,7 @@ obj-$(CONFIG_SPARSEMEM_VMEMMAP) += sparse-vmemmap.o
obj-$(CONFIG_SLOB) += slob.o
obj-$(CONFIG_MMU_NOTIFIER) += mmu_notifier.o
obj-$(CONFIG_PROTECTABLE_MEMORY) += pmalloc.o
+obj-$(CONFIG_TEST_PROTECTABLE_MEMORY) += test_pmalloc.o
obj-$(CONFIG_KSM) += ksm.o
obj-$(CONFIG_PAGE_POISONING) += page_poison.o
obj-$(CONFIG_SLAB) += slab.o
diff --git a/mm/test_pmalloc.c b/mm/test_pmalloc.c
new file mode 100644
index 000000000000..598119ffb0ed
--- /dev/null
+++ b/mm/test_pmalloc.c
@@ -0,0 +1,238 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * test_pmalloc.c
+ *
+ * (C) Copyright 2018 Huawei Technologies Co. Ltd.
+ * Author: Igor Stoppa <[email protected]>
+ */
+
+#include <linux/pmalloc.h>
+#include <linux/mm.h>
+#include <linux/test_pmalloc.h>
+#include <linux/bug.h>
+
+#define SIZE_1 (PAGE_SIZE * 3)
+#define SIZE_2 1000
+
+static struct gen_pool *pool_unprot;
+static struct gen_pool *pool_prot;
+static struct gen_pool *pool_pre;
+
+static void *var_prot;
+static void *var_unprot;
+static void *var_vmall;
+
+/**
+ * validate_alloc() - wrapper for is_pmalloc_object with messages
+ * @expected: whether if the test is supposed to be ok or not
+ * @addr: base address of the range to test
+ * @size: length of he range to test
+ */
+static inline bool validate_alloc(bool expected, void *addr,
+ unsigned long size)
+{
+ bool test;
+
+ test = is_pmalloc_object(addr, size) > 0;
+ pr_notice("must be %s: %s",
+ expected ? "ok" : "no", test ? "ok" : "no");
+ return test == expected;
+}
+
+
+#define is_alloc_ok(variable, size) \
+ validate_alloc(true, variable, size)
+
+
+#define is_alloc_no(variable, size) \
+ validate_alloc(false, variable, size)
+
+/**
+ * create_pools() - tries to instantiate the pools needed for the test
+ *
+ * Creates the respective instances for each pool used in the test.
+ * In case of error, it rolls back whatever previous step passed
+ * successfully.
+ *
+ * Return:
+ * * true - success
+ * * false - something failed
+ */
+static bool create_pools(void)
+{
+ pr_notice("Testing pool creation capability");
+
+ pool_pre = pmalloc_create_pool("preallocated", 0);
+ if (unlikely(!pool_pre))
+ goto err_pre;
+
+ pool_unprot = pmalloc_create_pool("unprotected", 0);
+ if (unlikely(!pool_unprot))
+ goto err_unprot;
+
+ pool_prot = pmalloc_create_pool("protected", 0);
+ if (unlikely(!(pool_prot)))
+ goto err_prot;
+ return true;
+err_prot:
+ pmalloc_destroy_pool(pool_unprot);
+err_unprot:
+ pmalloc_destroy_pool(pool_pre);
+err_pre:
+ WARN(true, "Unable to allocate memory for pmalloc selftest.");
+ return false;
+}
+
+
+/**
+ * destroy_pools() - tears down the instances of the pools in use
+ *
+ * Mostly used on the path for error recovery, when something goes wrong,
+ * the pools allocated are dropped.
+ */
+static void destroy_pools(void)
+{
+ pmalloc_destroy_pool(pool_prot);
+ pmalloc_destroy_pool(pool_unprot);
+ pmalloc_destroy_pool(pool_pre);
+}
+
+
+/**
+ * test_alloc() - verifies that it's possible to allocate from the pools
+ *
+ * Each of the pools declared must be available for allocation, at this
+ * point. There is also a small allocation from generic vmallco memory.
+ */
+static bool test_alloc(void)
+{
+ pr_notice("Testing allocation capability");
+
+ var_vmall = vmalloc(SIZE_2);
+ if (unlikely(!var_vmall))
+ goto err_vmall;
+
+ var_unprot = pmalloc(pool_unprot, SIZE_1 - 1, GFP_KERNEL);
+ if (unlikely(!var_unprot))
+ goto err_unprot;
+
+ var_prot = pmalloc(pool_prot, SIZE_1, GFP_KERNEL);
+ if (unlikely(!var_prot))
+ goto err_prot;
+
+ return true;
+err_prot:
+ pfree(pool_unprot, var_unprot);
+err_unprot:
+ vfree(var_vmall);
+err_vmall:
+ WARN(true, "Unable to allocate memory for pmalloc selftest.");
+ return false;
+}
+
+
+/**
+ * test_is_pmalloc_object() - tests the identification of pmalloc ranges
+ *
+ * Positive and negative test of potential pmalloc objects.
+ *
+ * Return:
+ * * true - success
+ * * false - error
+ */
+static bool test_is_pmalloc_object(void)
+{
+ pr_notice("Test correctness of is_pmalloc_object()");
+ if (WARN_ON(unlikely(!is_alloc_ok(var_unprot, 10))) ||
+ WARN_ON(unlikely(!is_alloc_ok(var_unprot, SIZE_1))) ||
+ WARN_ON(unlikely(!is_alloc_ok(var_unprot, PAGE_SIZE))) ||
+ WARN_ON(unlikely(!is_alloc_no(var_unprot, SIZE_1 + 1))) ||
+ WARN_ON(unlikely(!is_alloc_no(var_vmall, 10))))
+ return false;
+ return true;
+}
+
+
+/**
+ * test_protected_allocation() - allocation from protected pool must fail
+ *
+ * Once the pool is protected, the pages associated with it become
+ * read-only and any further attempt to allocate data will be declined.
+ *
+ * Return:
+ * * true - success
+ * * false - error
+ */
+static bool test_protected_allocation(void)
+{
+ pmalloc_protect_pool(pool_prot);
+ /*
+ * This will intentionally trigger a WARN, because the pool being
+ * allocated from is already protected.
+ */
+ pr_notice("Test allocation from a protected pool. It will WARN.");
+ return !WARN(unlikely(pmalloc(pool_prot, 10, GFP_KERNEL)),
+ "no memory from a protected pool");
+}
+
+
+/**
+ * test_destroy_pool() - destroying an unprotected pool must WARN
+ *
+ * Attempting to destroy an unprotected pool will issue a warning, while
+ * destroying a protected pool is considered to be the normal behavior.
+ */
+static void test_destroy_pools(void)
+{
+ /*
+ * This will intentionally trigger a WARN because the pool being
+ * destroyed is not protected, which is unusual and should happen
+ * on error paths only, where probably other warnings are already
+ * displayed.
+ */
+ pr_notice("pmalloc-selftest: WARN in pmalloc_pool_set_protection.");
+ pmalloc_destroy_pool(pool_unprot);
+ pr_notice("pmalloc-selftest: point for expected WARN passed.");
+
+ /* This must not cause WARNings */
+ pr_notice("pmalloc-selftest: Expect no WARN below.");
+ pmalloc_destroy_pool(pool_prot);
+ pr_notice("pmalloc-selftest: passed point for unexpected WARN.");
+}
+
+
+/**
+ * test_pmalloc() - main entry point for running the test cases
+ *
+ * Performs various tests, each step subordinate to the successful
+ * execution of the previous.
+ */
+void test_pmalloc(void)
+{
+
+ pr_notice("pmalloc-selftest");
+
+ if (unlikely(!create_pools()))
+ return;
+
+ if (unlikely(!test_alloc()))
+ goto err_alloc;
+
+
+ if (unlikely(!test_is_pmalloc_object()))
+ goto err_is_object;
+
+ *(int *)var_prot = 0;
+ pfree(pool_unprot, var_unprot);
+ vfree(var_vmall);
+
+ if (unlikely(!test_protected_allocation()))
+ goto err_prot_all;
+
+ test_destroy_pools();
+ return;
+err_prot_all:
+err_is_object:
+err_alloc:
+ destroy_pools();
+}
--
2.14.1


2018-03-13 21:53:30

by Igor Stoppa

[permalink] [raw]
Subject: [PATCH 8/8] Documentation for Pmalloc

Detailed documentation about the protectable memory allocator.

Signed-off-by: Igor Stoppa <[email protected]>
---
Documentation/core-api/index.rst | 1 +
Documentation/core-api/pmalloc.rst | 111 +++++++++++++++++++++++++++++++++++++
2 files changed, 112 insertions(+)
create mode 100644 Documentation/core-api/pmalloc.rst

diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst
index c670a8031786..8f5de42d6571 100644
--- a/Documentation/core-api/index.rst
+++ b/Documentation/core-api/index.rst
@@ -25,6 +25,7 @@ Core utilities
genalloc
errseq
printk-formats
+ pmalloc

Interfaces for kernel debugging
===============================
diff --git a/Documentation/core-api/pmalloc.rst b/Documentation/core-api/pmalloc.rst
new file mode 100644
index 000000000000..10e01187d049
--- /dev/null
+++ b/Documentation/core-api/pmalloc.rst
@@ -0,0 +1,111 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _pmalloc:
+
+Protectable memory allocator
+============================
+
+Purpose
+-------
+
+The pmalloc library is meant to provide read-only status to data that,
+for some reason, could neither be declared as constant, nor could it take
+advantage of the qualifier __ro_after_init, but is write-once and
+read-only in spirit.
+It protects data from both accidental and malicious overwrites.
+
+Example: A policy that is loaded from userspace.
+
+
+Concept
+-------
+
+pmalloc builds on top of :ref:`genalloc <genalloc>`, using the same
+concept of memory pools.
+
+The value added by pmalloc is that now the memory contained in a pool can
+become read-only, for the rest of the life of the pool.
+
+Different kernel drivers and threads can use different pools, for finer
+control of what becomes read_only and when.
+And for improved lockless concurrency.
+
+
+Caveats
+-------
+
+- To facilitate the conversion of existing code to pmalloc pools, several
+ helper functions are provided, mirroring their k/vmalloc counterparts.
+ In particular, pfree(), which is mostly meant for error paths, when one
+ or more previous allocations must be rolled back.
+
+- Memory freed while a pool is not yet protected will be reused.
+
+- Once a pool is protected, it's not possible to allocate any more memory
+ from it.
+
+- Memory "freed" from a protected pool indicates that such memory is not
+ in use anymore by the requester; however, it will not become available
+ for further use, until the pool is destroyed.
+
+- pmalloc does not provide locking support with respect to allocating vs
+ protecting an individual pool, for performance reasons.
+ It is recommended not to share the same pool between unrelated functions.
+ Should sharing be a necessity, the user of the shared pool is expected
+ to implement locking for that pool.
+
+- pmalloc uses genalloc to optimize the use of the space it allocates
+ through vmalloc. Some more TLB entries will be used, however less than
+ in the case of using vmalloc directly. The exact number depends on the
+ size of each allocation request and possible slack.
+
+- Considering that not much data is supposed to be dynamically allocated
+ and then marked as read-only, it shouldn't be an issue that the address
+ range for pmalloc is limited, on 32-bit systems.
+
+- Regarding SMP systems, the allocations are expected to happen mostly
+ during an initial transient, after which there should be no more need to
+ perform cross-processor synchronizations of page tables.
+
+
+Use
+---
+
+The typical sequence, when using pmalloc, is:
+
+#. create a pool
+
+ :c:func:`pmalloc_create_pool`
+
+#. [optional] pre-allocate some memory in the pool
+
+ :c:func:`pmalloc_prealloc`
+
+#. issue one or more allocation requests to the pool with locking as needed
+
+ :c:func:`pmalloc`
+
+ :c:func:`pzalloc`
+
+#. initialize the memory obtained with desired values
+
+#. [optional] iterate over points 3 & 4 as needed
+
+#. write-protect the pool
+
+ :c::func:`pmalloc_protect_pool`
+
+#. use in read-only mode the handles obtained through the allocations
+
+#. [optional] release all the memory allocated
+
+ :c:func:`pfree`
+
+#. [optional, but depends on point 8] destroy the pool
+ :c:func:`pmalloc_destroy_pool`
+
+API
+---
+
+.. kernel-doc:: include/linux/pmalloc.h
+.. kernel-doc:: mm/pmalloc.c
--
2.14.1


2018-03-13 21:54:18

by Igor Stoppa

[permalink] [raw]
Subject: [PATCH 7/8] lkdtm: crash on overwriting protected pmalloc var

Verify that pmalloc read-only protection is in place: trying to
overwrite a protected variable will crash the kernel.

Signed-off-by: Igor Stoppa <[email protected]>
---
drivers/misc/lkdtm.h | 1 +
drivers/misc/lkdtm_core.c | 3 +++
drivers/misc/lkdtm_perms.c | 28 ++++++++++++++++++++++++++++
3 files changed, 32 insertions(+)

diff --git a/drivers/misc/lkdtm.h b/drivers/misc/lkdtm.h
index 9e513dcfd809..dcda3ae76ceb 100644
--- a/drivers/misc/lkdtm.h
+++ b/drivers/misc/lkdtm.h
@@ -38,6 +38,7 @@ void lkdtm_READ_BUDDY_AFTER_FREE(void);
void __init lkdtm_perms_init(void);
void lkdtm_WRITE_RO(void);
void lkdtm_WRITE_RO_AFTER_INIT(void);
+void lkdtm_WRITE_RO_PMALLOC(void);
void lkdtm_WRITE_KERN(void);
void lkdtm_EXEC_DATA(void);
void lkdtm_EXEC_STACK(void);
diff --git a/drivers/misc/lkdtm_core.c b/drivers/misc/lkdtm_core.c
index 2154d1bfd18b..c9fd42bda6ee 100644
--- a/drivers/misc/lkdtm_core.c
+++ b/drivers/misc/lkdtm_core.c
@@ -155,6 +155,9 @@ static const struct crashtype crashtypes[] = {
CRASHTYPE(ACCESS_USERSPACE),
CRASHTYPE(WRITE_RO),
CRASHTYPE(WRITE_RO_AFTER_INIT),
+#ifdef CONFIG_PROTECTABLE_MEMORY
+ CRASHTYPE(WRITE_RO_PMALLOC),
+#endif
CRASHTYPE(WRITE_KERN),
CRASHTYPE(REFCOUNT_INC_OVERFLOW),
CRASHTYPE(REFCOUNT_ADD_OVERFLOW),
diff --git a/drivers/misc/lkdtm_perms.c b/drivers/misc/lkdtm_perms.c
index 53b85c9d16b8..0ac9023fd2b0 100644
--- a/drivers/misc/lkdtm_perms.c
+++ b/drivers/misc/lkdtm_perms.c
@@ -9,6 +9,7 @@
#include <linux/vmalloc.h>
#include <linux/mman.h>
#include <linux/uaccess.h>
+#include <linux/pmalloc.h>
#include <asm/cacheflush.h>

/* Whether or not to fill the target memory area with do_nothing(). */
@@ -104,6 +105,33 @@ void lkdtm_WRITE_RO_AFTER_INIT(void)
*ptr ^= 0xabcd1234;
}

+#ifdef CONFIG_PROTECTABLE_MEMORY
+void lkdtm_WRITE_RO_PMALLOC(void)
+{
+ struct gen_pool *pool;
+ int *i;
+
+ pool = pmalloc_create_pool("pool", 0);
+ if (unlikely(!pool)) {
+ pr_info("Failed preparing pool for pmalloc test.");
+ return;
+ }
+
+ i = (int *)pmalloc(pool, sizeof(int), GFP_KERNEL);
+ if (unlikely(!i)) {
+ pr_info("Failed allocating memory for pmalloc test.");
+ pmalloc_destroy_pool(pool);
+ return;
+ }
+
+ *i = INT_MAX;
+ pmalloc_protect_pool(pool);
+
+ pr_info("attempting bad pmalloc write at %p\n", i);
+ *i = 0;
+}
+#endif
+
void lkdtm_WRITE_KERN(void)
{
size_t size;
--
2.14.1


2018-03-13 22:02:31

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 4/8] struct page: add field for vm_struct

On Tue, Mar 13, 2018 at 11:45:50PM +0200, Igor Stoppa wrote:
> When a page is used for virtual memory, it is often necessary to obtain
> a handler to the corresponding vm_struct, which refers to the virtually
> continuous area generated when invoking vmalloc.
>
> The struct page has a "mapping" field, which can be re-used, to store a
> pointer to the parent area.
>
> This will avoid more expensive searches, later on.
>
> Signed-off-by: Igor Stoppa <[email protected]>

Reviewed-by: Matthew Wilcox <[email protected]>

Regardless of the fate of the rest of this patchset, this makes sense
and we should have this.

2018-03-14 11:25:47

by Igor Stoppa

[permalink] [raw]
Subject: Re: [RFC PATCH v19 0/8] mm: security: ro protection for dynamic data

On 13/03/18 23:45, Igor Stoppa wrote:

[...]

Some more thoughts about the open topics:

> Discussion topics that are unclear if they are closed and would need
> comment from those who initiated them, if my answers are accepted or not:
>
> * @Kees Cook proposed to have first self testing for genalloc, to
> validate the following patch, adding tracing of allocations
> My answer was that such tests would also need patching, therefore they
> could not certify that the functionality is corect both before and
> after the genalloc bitmap modification.

This is the only one where I still couldn't find a solution.
If Matthew Wilcox's proposal about implementing the the genalloc bitmap
would work, then this too could be done, but I think this alternate
bitmap proposed has problems. More on it below.

> * @Kees Cook proposed to turn the self testing into modules.
> My answer was that the functionality is intentionally tested very early
> in the boot phase, to prevent unexplainable errors, should the feature
> really fail.

This could be workable, if it's acceptable that the early testing is
performed only when the module is compiled in.
I do not expect the module-based testing to bring much value, but it
doesn't do harm. Is this acceptable?

> * @Matthew Wilcox proposed to use a different mechanism for the genalloc
> bitmap: 2 bitmaps, one for occupation and one for start.
> And possibly use an rbtree for the starts.
> My answer was that this solution is less optimized, because it scatters
> the data of one allocation across multiple words/pages, plus is not
> a transaction anymore. And the particular distribution of sizes of
> allocation is likely to eat up much more memory than the bitmap.

I think I can describe a scenario where the split bitmaps would not work
(based on my understanding of the proposal), but I would appreciate a
review. Here it is:

* One allocation (let's call it allocation A) is already present in both
bitmaps:
- its units of allocation are marked in the "space" bitmap
- its starting bit is marked in the "starts" bitmap

* Another allocation (let's call it allocation B) is undergoing:
- some of its units of allocation (starting from the beginning) are
marked in the "space" bitmap
- the starting bit is *not* yet marked in the "starts" bitmap

* B occupies the space immediately after A

* While B is being written, A is freed

* Having to determine the length of A, the "space" bitmap will be
searched, then the "starts" bitmap


The space initially allocated for B will be wrongly accounted for A,
because there is no empty gap in-between and the beginning of B is not
yet marked.

The implementation which interleaves "space" and "start" does not suffer
from this sort of races, because the alteration of the interleaved
bitmaps is atomic.

However, at the very least, some more explanation is needed in the
documentation/code, because this scenario is not exactly obvious.

Does this justification for the use of interleaved bitmaps (iow the
current implementation) make sense?

--
igor

2018-03-14 11:58:28

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [RFC PATCH v19 0/8] mm: security: ro protection for dynamic data

On Wed, Mar 14, 2018 at 01:21:54PM +0200, Igor Stoppa wrote:
> > * @Kees Cook proposed to turn the self testing into modules.
> > My answer was that the functionality is intentionally tested very early
> > in the boot phase, to prevent unexplainable errors, should the feature
> > really fail.
>
> This could be workable, if it's acceptable that the early testing is
> performed only when the module is compiled in.
> I do not expect the module-based testing to bring much value, but it
> doesn't do harm. Is this acceptable?

Something I've been doing recently is building tests in both userspace and
kernel space. Here's an example:
http://git.infradead.org/users/willy/linux-dax.git/commitdiff/717f2aa1d4040f65966bb9dab64035962576b0f9

Essentially, tools/ contains a reasonably good set of functions which
emulate kernel functions. So you write your test suite as a kernel module
and then build it in userspace as well.

> > * @Matthew Wilcox proposed to use a different mechanism for the genalloc
> > bitmap: 2 bitmaps, one for occupation and one for start.
> > And possibly use an rbtree for the starts.
> > My answer was that this solution is less optimized, because it scatters
> > the data of one allocation across multiple words/pages, plus is not
> > a transaction anymore. And the particular distribution of sizes of
> > allocation is likely to eat up much more memory than the bitmap.
>
> I think I can describe a scenario where the split bitmaps would not work
> (based on my understanding of the proposal), but I would appreciate a
> review. Here it is:

You misread my proposal. I did not suggest storing the 'start', but the
'end'.

> * One allocation (let's call it allocation A) is already present in both
> bitmaps:
> - its units of allocation are marked in the "space" bitmap
> - its starting bit is marked in the "starts" bitmap
>
> * Another allocation (let's call it allocation B) is undergoing:
> - some of its units of allocation (starting from the beginning) are
> marked in the "space" bitmap
> - the starting bit is *not* yet marked in the "starts" bitmap
>
> * B occupies the space immediately after A
>
> * While B is being written, A is freed
>
> * Having to determine the length of A, the "space" bitmap will be
> searched, then the "starts" bitmap
>
>
> The space initially allocated for B will be wrongly accounted for A,
> because there is no empty gap in-between and the beginning of B is not
> yet marked.
>
> The implementation which interleaves "space" and "start" does not suffer
> from this sort of races, because the alteration of the interleaved
> bitmaps is atomic.

This would be a bug in the allocator implementation. Obviously it has to
maintain the integrity of its own data structures.

> Does this justification for the use of interleaved bitmaps (iow the
> current implementation) make sense?

I think you're making a mistake by basing the pmalloc allocator on
genalloc. The page_frag allocator seems like a much better place to
start than genalloc. It has a significantly lower overhead and is
much more suited to the kind of probably-identical-lifespan that the
pmalloc API is going to persuade its users to have.

2018-03-14 12:18:18

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 5/8] Protectable Memory

On Tue, Mar 13, 2018 at 11:45:51PM +0200, Igor Stoppa wrote:
> +static inline void *pmalloc_array(struct gen_pool *pool, size_t n,
> + size_t size, gfp_t flags)
> +{
> + if (unlikely(!(pool && n && size)))
> + return NULL;

Why not use the same formula as kvmalloc_array here? You've failed to
protect against integer overflow, which is the whole point of pmalloc_array.

if (size != 0 && n > SIZE_MAX / size)
return NULL;

> +static inline char *pstrdup(struct gen_pool *pool, const char *s, gfp_t gfp)
> +{
> + size_t len;
> + char *buf;
> +
> + if (unlikely(pool == NULL || s == NULL))
> + return NULL;

No, delete these checks. They'll mask real bugs.

> +static inline void pfree(struct gen_pool *pool, const void *addr)
> +{
> + gen_pool_free(pool, (unsigned long)addr, 0);
> +}

It's poor form to use a different subsystem's type here. It ties you
to genpool, so if somebody wants to replace it, you have to go through
all the users and change them. If you use your own type, it's a much
easier task.

struct pmalloc_pool {
struct gen_pool g;
}

then:

static inline void pfree(struct pmalloc_pool *pool, const void *addr)
{
gen_pool_free(&pool->g, (unsigned long)addr, 0);
}

Looking further down, you could (should) move the contents of pmalloc_data
into pmalloc_pool; that's one fewer object to keep track of.

> +struct pmalloc_data {
> + struct gen_pool *pool; /* Link back to the associated pool. */
> + bool protected; /* Status of the pool: RO or RW. */
> + struct kobj_attribute attr_protected; /* Sysfs attribute. */
> + struct kobj_attribute attr_avail; /* Sysfs attribute. */
> + struct kobj_attribute attr_size; /* Sysfs attribute. */
> + struct kobj_attribute attr_chunks; /* Sysfs attribute. */
> + struct kobject *pool_kobject;
> + struct list_head node; /* list of pools */
> +};

sysfs attributes aren't free, you know. I appreciate you want something
to help debug / analyse, but having one file for the whole subsystem or
at least one per pool would be a better idea.


2018-03-14 12:26:40

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH 6/8] Pmalloc selftest

On Tue, Mar 13, 2018 at 11:45:52PM +0200, Igor Stoppa wrote:
> Add basic self-test functionality for pmalloc.

Here're some additional tests for your test-suite:

for (i = 1; i; i *= 2)
pzalloc(pool, i - 1, GFP_KERNEL);


2018-03-14 12:57:47

by Igor Stoppa

[permalink] [raw]
Subject: Re: [RFC PATCH v19 0/8] mm: security: ro protection for dynamic data



On 14/03/18 13:56, Matthew Wilcox wrote:
> On Wed, Mar 14, 2018 at 01:21:54PM +0200, Igor Stoppa wrote:

[...]

> You misread my proposal. I did not suggest storing the 'start', but the
> 'end'.

Ok, but doesn't that only change the race scenario?

Attempting to free one allocation, while it is in progress, so that all
the "space" bits are written, but the "end bit" is not yet written.
That will eat up also the following, complete allocation, if there is no
locking in place.

[...]

>> The implementation which interleaves "space" and "start" does not suffer
>> from this sort of races, because the alteration of the interleaved
>> bitmaps is atomic.
>
> This would be a bug in the allocator implementation. Obviously it has to
> maintain the integrity of its own data structures.

But I cannot imagine how to do it, with the split bitmaps, without a
lock :-/
And genalloc is supposed to be lockless.

>> Does this justification for the use of interleaved bitmaps (iow the
>> current implementation) make sense?
>
> I think you're making a mistake by basing the pmalloc allocator on
> genalloc.

It was recommended to me because it was a close match to the allocator
that I was writing from scratch and, when I looked at it, I could only
agree that it was very close.

But I have no particular reason for preferring it, if something better
is available. It was just never brought up before.
At least not that I noticed.

> The page_frag allocator seems like a much better place to
> start than genalloc. It has a significantly lower overhead and is
> much more suited to the kind of probably-identical-lifespan that the
> pmalloc API is going to persuade its users to have.


Could you please provide me a pointer?
I did a quick search on 4.16-rc5 and found the definition of page_frag
and sk_page_frag(). Is this what you are referring to?

--
igor

2018-03-14 13:05:39

by Igor Stoppa

[permalink] [raw]
Subject: Re: [PATCH 5/8] Protectable Memory



On 14/03/18 14:15, Matthew Wilcox wrote:
> On Tue, Mar 13, 2018 at 11:45:51PM +0200, Igor Stoppa wrote:
>> +static inline void *pmalloc_array(struct gen_pool *pool, size_t n,
>> + size_t size, gfp_t flags)
>> +{
>> + if (unlikely(!(pool && n && size)))
>> + return NULL;
>
> Why not use the same formula as kvmalloc_array here? You've failed to
> protect against integer overflow, which is the whole point of pmalloc_array.
>
> if (size != 0 && n > SIZE_MAX / size)
> return NULL;


oops :-(

>> +static inline char *pstrdup(struct gen_pool *pool, const char *s, gfp_t gfp)
>> +{
>> + size_t len;
>> + char *buf;
>> +
>> + if (unlikely(pool == NULL || s == NULL))
>> + return NULL;
>
> No, delete these checks. They'll mask real bugs.

I thought I got rid of all of them, but some have escaped me

>> +static inline void pfree(struct gen_pool *pool, const void *addr)
>> +{
>> + gen_pool_free(pool, (unsigned long)addr, 0);
>> +}
>
> It's poor form to use a different subsystem's type here. It ties you
> to genpool, so if somebody wants to replace it, you have to go through
> all the users and change them. If you use your own type, it's a much
> easier task.

I thought about it, but typedef came to my mind and knowing it's usually
frowned upon, I restrained myself.

> struct pmalloc_pool {
> struct gen_pool g;
> }

I didn't think this could be acceptable either. But if it is, then ok.

> then:
>
> static inline void pfree(struct pmalloc_pool *pool, const void *addr)
> {
> gen_pool_free(&pool->g, (unsigned long)addr, 0);
> }
>
> Looking further down, you could (should) move the contents of pmalloc_data
> into pmalloc_pool; that's one fewer object to keep track of.
>
>> +struct pmalloc_data {
>> + struct gen_pool *pool; /* Link back to the associated pool. */
>> + bool protected; /* Status of the pool: RO or RW. */
>> + struct kobj_attribute attr_protected; /* Sysfs attribute. */
>> + struct kobj_attribute attr_avail; /* Sysfs attribute. */
>> + struct kobj_attribute attr_size; /* Sysfs attribute. */
>> + struct kobj_attribute attr_chunks; /* Sysfs attribute. */
>> + struct kobject *pool_kobject;
>> + struct list_head node; /* list of pools */
>> +};
>
> sysfs attributes aren't free, you know. I appreciate you want something
> to help debug / analyse, but having one file for the whole subsystem or
> at least one per pool would be a better idea.

Which means that it should not be normal sysfs, but rather debugfs, if I
understand correctly, since in sysfs 1 value -> 1 file.

--
igor


2018-03-14 13:06:45

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [RFC PATCH v19 0/8] mm: security: ro protection for dynamic data

On Wed, Mar 14, 2018 at 02:55:10PM +0200, Igor Stoppa wrote:
> > The page_frag allocator seems like a much better place to
> > start than genalloc. It has a significantly lower overhead and is
> > much more suited to the kind of probably-identical-lifespan that the
> > pmalloc API is going to persuade its users to have.
>
> Could you please provide me a pointer?
> I did a quick search on 4.16-rc5 and found the definition of page_frag
> and sk_page_frag(). Is this what you are referring to?

It's a blink-and-you'll-miss-it allocator buried deep in mm/page_alloc.c:
void *page_frag_alloc(struct page_frag_cache *nc,
unsigned int fragsz, gfp_t gfp_mask)
void page_frag_free(void *addr)

I don't necessarily think you should use it as-is, but the principle it uses
seems like a better match to me than the rather complex genalloc. Just
allocate some pages and track the offset within those pages that is the
current allocation point. It's less than 100 lines of code!

2018-03-14 16:15:01

by Igor Stoppa

[permalink] [raw]
Subject: Re: [RFC PATCH v19 0/8] mm: security: ro protection for dynamic data



On 14/03/18 15:04, Matthew Wilcox wrote:

> I don't necessarily think you should use it as-is,

I think I simply cannot use it as-is, because it seems to use linear
memory, while I need virtual. This reason alone would require a rewrite
of several parts.

> but the principle it uses
> seems like a better match to me than the rather complex genalloc.

It uses meta data in a different way than genalloc.
There is probably a tipping point where one implementation becomes more
space-efficient than the other.

Probably page_frag does well with relatively large allocations, while
genalloc seems to be better for small (few allocation units) allocations.

Also, in case of high variance in the size of the allocations, genalloc
requires the allocation unit to be small enough to fit the smallest
request (otherwise one must accept some slack), while page_frag doesn't
care if the allocation is small or large.

page_frag otoh, seems to not support the reuse of space that was freed,
since there is only

But could you please explain to what you are referring to, when you say
that page_frag has "significantly lower overhead" ?

Is it because it doesn't try to reclaim space that was freed, until the
whole page is empty?

I see different trade-offs, but I am probably either missing or
underestimating the main reason why you think this is better.

And probably I am missing the capability of judging what is acceptable
in certain cases.

Ex: if the pfree is called only on error paths, is it ok to not claim
back the memory released, if it's less than one page?

To be clear: I do not want to hold to genalloc just because I have
already implemented it. I can at least sketch a version with page_frag,
but I would like to understand why its trade-offs are better :-)

> Just allocate some pages and track the offset within those pages that

> is the current allocation point.


> It's less than 100 lines of code!

Strictly speaking it is true, but it all relies on other functions,
which must be rewritten, because they use linear address, while this
must work with virtual (vmalloc) addresses.

Also, I see that the code relies a lot on order of allocation.
I think we had similar discussion wrt compound pages.

It seems to me wasteful, if I have a request of, say, 5 pages, and I end
up allocating 8.

I do not recall anyone giving a justification like:
"yeah, it uses extra pages, but it's preferable, for reasons X, Y and Z,
so it's a good trade-off"

Could it be that it's normal RAM is considered less precious than the
special memory genalloc is written for, so normal RAM is not really
proactively reused, while special memory is treated as a more valuable
resource that should not be wasted?


--
igor

2018-03-14 17:36:13

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [RFC PATCH v19 0/8] mm: security: ro protection for dynamic data

On Wed, Mar 14, 2018 at 06:11:22PM +0200, Igor Stoppa wrote:
> On 14/03/18 15:04, Matthew Wilcox wrote:
> > but the principle it uses
> > seems like a better match to me than the rather complex genalloc.
>
> It uses meta data in a different way than genalloc.
> There is probably a tipping point where one implementation becomes more
> space-efficient than the other.

Certainly there are always tradeoffs in writing a memory allocator.

> Probably page_frag does well with relatively large allocations, while
> genalloc seems to be better for small (few allocation units) allocations.

I don't understand why you would think that. If you allocate 4096 1-byte
elements, page_frag will just use up a page. Doing the same thing with
genalloc requires allocating two bits per byte (1kB of bitmap), plus
other overheads.

> Also, in case of high variance in the size of the allocations, genalloc
> requires the allocation unit to be small enough to fit the smallest
> request (otherwise one must accept some slack), while page_frag doesn't
> care if the allocation is small or large.

Right; internal versus external fragmentation. The bane of memory
allocators ;-)

> page_frag otoh, seems to not support the reuse of space that was freed,
> since there is only

To a certain extent it does. If you free everything on a page, and that
page is still in the page_frag_cache, it will get reused.

> But could you please explain to what you are referring to, when you say
> that page_frag has "significantly lower overhead" ?

Less CPU time taken per allocation, less metadata stored per object.

> Ex: if the pfree is called only on error paths, is it ok to not claim
> back the memory released, if it's less than one page?

Yes, I think that's a great example.

> To be clear: I do not want to hold to genalloc just because I have
> already implemented it. I can at least sketch a version with page_frag,
> but I would like to understand why its trade-offs are better :-)
>
> > Just allocate some pages and track the offset within those pages that
>
> > is the current allocation point.
>
>
> > It's less than 100 lines of code!
>
> Strictly speaking it is true, but it all relies on other functions,
> which must be rewritten, because they use linear address, while this
> must work with virtual (vmalloc) addresses.

No, that's basically the whole thing. I think an implementation of
pmalloc which used a page_frag-style allocator would be larger than
100 lines, but I don't think it would have to be significantly larger
than that.

> Also, I see that the code relies a lot on order of allocation.
> I think we had similar discussion wrt compound pages.
>
> It seems to me wasteful, if I have a request of, say, 5 pages, and I end
> up allocating 8.

Yes, but the other three pages are available for use by the pmalloc pool.
Now, at pmalloc_protect() time, you might well want to release the unused
pages by calling make_alloc_exact() and hand those three pages back to the
page allocator.

> I do not recall anyone giving a justification like:
> "yeah, it uses extra pages, but it's preferable, for reasons X, Y and Z,
> so it's a good trade-off"

Sometimes it is, sometimes it isn't.

> Could it be that it's normal RAM is considered less precious than the
> special memory genalloc is written for, so normal RAM is not really
> proactively reused, while special memory is treated as a more valuable
> resource that should not be wasted?

We're certainly at the point where normal RAM is pretty cheap. A 16GB
DIMM is $200, so that's $12.50 per gigabyte. We have more of a problem
with fragmentation than we do with squeezing every last byte out of
the system.

Of course, Linux still runs on tiny systems, and we don't want to
unnecessarily bloat the kernel. And cachelines are also a precious
resource; the fewer we touch, the faster the system runs. The bitmap
in genalloc can easily occupy several cachelines; the page_frag allocator
touches a single cacheline for most allocations.

2018-03-14 17:42:19

by Jay Freyensee

[permalink] [raw]
Subject: Re: [PATCH 5/8] Protectable Memory



>>> +struct pmalloc_data {
>>> + struct gen_pool *pool; /* Link back to the associated pool. */
>>> + bool protected; /* Status of the pool: RO or RW. */
>>> + struct kobj_attribute attr_protected; /* Sysfs attribute. */
>>> + struct kobj_attribute attr_avail; /* Sysfs attribute. */
>>> + struct kobj_attribute attr_size; /* Sysfs attribute. */
>>> + struct kobj_attribute attr_chunks; /* Sysfs attribute. */
>>> + struct kobject *pool_kobject;
>>> + struct list_head node; /* list of pools */
>>> +};
>> sysfs attributes aren't free, you know. I appreciate you want something
>> to help debug / analyse, but having one file for the whole subsystem or
>> at least one per pool would be a better idea.
> Which means that it should not be normal sysfs, but rather debugfs, if I
> understand correctly, since in sysfs 1 value -> 1 file.

Yes, that is a good idea, to use debugfs so you still have a means to
debug/analyze but can be also turned off for normal system execution. 
Sorry I didn't think about that earlier to save a revision, that's one
of my favorite things I like to use for diagnosis.

Jay


2018-03-14 17:45:29

by Jay Freyensee

[permalink] [raw]
Subject: Re: [PATCH 4/8] struct page: add field for vm_struct



On 3/13/18 3:00 PM, Matthew Wilcox wrote:
> On Tue, Mar 13, 2018 at 11:45:50PM +0200, Igor Stoppa wrote:
>> When a page is used for virtual memory, it is often necessary to obtain
>> a handler to the corresponding vm_struct, which refers to the virtually
>> continuous area generated when invoking vmalloc.
>>
>> The struct page has a "mapping" field, which can be re-used, to store a
>> pointer to the parent area.
>>
>> This will avoid more expensive searches, later on.
>>
>> Signed-off-by: Igor Stoppa <[email protected]>
> Reviewed-by: Matthew Wilcox <[email protected]>

Igor, do you mind sticking these tags on the files that have spent some
time reviewing a revision of your patchset (like the Reviewed-by: tags I
provided last revision?)

Thanks,
Jay



2018-03-15 09:41:48

by Igor Stoppa

[permalink] [raw]
Subject: Re: [PATCH 4/8] struct page: add field for vm_struct

On 14/03/18 19:43, J Freyensee wrote:
> On 3/13/18 3:00 PM, Matthew Wilcox wrote:

[...]

>>> Signed-off-by: Igor Stoppa <[email protected]>
>> Reviewed-by: Matthew Wilcox <[email protected]>
>
> Igor, do you mind sticking these tags on the files that have spent some
> time reviewing a revision of your patchset (like the Reviewed-by: tags I
> provided last revision?)

Apologies, that was not intentional, I forgot it.
I will do it, although most of the files will now change so much that I
am not sure what will survive, beside this patch, in the form that you
reviewed.

I suppose the Review-by tag drops, if the patch changes.

--
igor

2018-03-15 13:45:25

by Igor Stoppa

[permalink] [raw]
Subject: Re: [RFC PATCH v19 0/8] mm: security: ro protection for dynamic data



On 14/03/2018 19:33, Matthew Wilcox wrote:
> On Wed, Mar 14, 2018 at 06:11:22PM +0200, Igor Stoppa wrote:

[...]

>> Probably page_frag does well with relatively large allocations, while
>> genalloc seems to be better for small (few allocation units) allocations.
>
> I don't understand why you would think that. If you allocate 4096 1-byte
> elements, page_frag will just use up a page. Doing the same thing with
> genalloc requires allocating two bits per byte (1kB of bitmap), plus
> other overheads.

I had misunderstood the amount of page_frag structures needed.

--
igor

2018-03-15 18:52:56

by Jay Freyensee

[permalink] [raw]
Subject: Re: [PATCH 4/8] struct page: add field for vm_struct



On 3/15/18 2:38 AM, Igor Stoppa wrote:
> On 14/03/18 19:43, J Freyensee wrote:
>> On 3/13/18 3:00 PM, Matthew Wilcox wrote:
> [...]
>
>>>> Signed-off-by: Igor Stoppa <[email protected]>
>>> Reviewed-by: Matthew Wilcox <[email protected]>
>> Igor, do you mind sticking these tags on the files that have spent some
>> time reviewing a revision of your patchset (like the Reviewed-by: tags I
>> provided last revision?)
> Apologies, that was not intentional, I forgot it.
> I will do it, although most of the files will now change so much that I
> am not sure what will survive, beside this patch, in the form that you
> reviewed.
>
> I suppose the Review-by tag drops, if the patch changes.

That's true, if so much of the patch changes it basically looks like
something different, the Reviewed-by: would drop.

Jay

>
> --
> igor


2018-03-20 01:42:40

by Igor Stoppa

[permalink] [raw]
Subject: Re: [RFC PATCH v19 0/8] mm: security: ro protection for dynamic data

On 14/03/18 19:33, Matthew Wilcox wrote:
> I think an implementation of
> pmalloc which used a page_frag-style allocator would be larger than
> 100 lines, but I don't think it would have to be significantly larger
> than that.

I have some doubt about what is the best way to implement it using
vmalloced memory.

1. Since I can allocate an arbitrary number of pages, I think allocating
a rounded up amount of memory, so that it's multiple of PAGE_SIZE should
be enough.

But maybe I could do better than that:
a) support pre-allocation of x pages

b) define, as pool parameter, the minimum number of pages to allocate
every time there is a refill

c) both a and b


----


2. the flavor of page_frag from page_alloc relies on page->_refcount,
however neither vmap_area, nor vm_struct seem to have anything like
that. (My reasoning is that I should do the accounting not on page
level, but based on the virtual area that I get when I allocate new
memory) What would be the best way to do refcounting for the area?

a) use the the page->_refcount from the first page that belongs to the area

b) add the _refcount to either vm_struct or vmap_area (I am not really
sure of why these two structures exist as separate entities, rather than
a single one - cache optimization?)


----

3. I will have to add a list of chunks (in genalloc lingo, or areas, if
we refer to the new implementation), because I will still need to
iterate over all the memory that belongs to a pool, for either write
protecting it or for destroying the pool. I have two options:

a) handle the chunks within the pmalloc pool

b) create an intermediate type of pool (vfrag_pool?) and then include it
in the pmalloc pool structure.

I'd lean toward option a, but I thought I might as well ask for advice
before I implement the less desirable option (whatever it might be).

--
thanks, igor

2018-03-25 01:34:11

by Igor Stoppa

[permalink] [raw]
Subject: Re: [PATCH 6/8] Pmalloc selftest



On 14/03/18 14:25, Matthew Wilcox wrote:
> On Tue, Mar 13, 2018 at 11:45:52PM +0200, Igor Stoppa wrote:
>> Add basic self-test functionality for pmalloc.
>
> Here're some additional tests for your test-suite:
>
> for (i = 1; i; i *= 2)
> pzalloc(pool, i - 1, GFP_KERNEL);
>

Ok, I have almost finished the rewrite.
I still have to address this comment.

When I run the test, eventually the system runs out of memory, it keeps
getting allocation errors from vmalloc, until i finally overflows and
becomes 0.

Am I supposed to do something about it?
If pmalloc receives a request that the vmalloc backend cannot satisfy, I
would prefer that vmalloc itself produces the warning and pmalloc
returns NULL.

This doesn't look like a test case that one can leave always enabled in
a build, but maybe I'm missing the point.

--
igor